Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, July 18, 2011

XPath Functions in Talend Open Studio

XPath has standard functions like fn:name() that will retrieve the name of an XML element.  You can use these in Talend Open Studio when certain parts of your XML aren't known in advance.

In this example, an XML document contains nodes representing filenames that aren't known in advance: APIUSER.INI, CISConsis.ini, CLAIM.INI.

<inifiles>
  <APIUSER.INI>
   <Section Name="FILES">
    <ROW FieldName="FILEPATH" Value="c:\API\BENCH\CWDATA.DAT"/>
   </Section>
  </APIUSER.INI>
  <CISConsis.ini>
   <Section Name="ConsisSettings">
     <ROW FieldName="ServerIP" Value="192.168.0.1"/>
     <ROW FieldName="ServerPort" Value="999999"/>
   </Section>
  </CISConsis.ini>
  <CLAIM.INI>
   <Section Name="Claim">
    <ROW FieldName="Drive Letter" Value="a:"/>
   </Section>
  </CLAIM.INI>
</inifiles>


Define the Loop

The first step to process this file with Talend Open Studio is to define the loop.  In this case, the innermost element 'ROW' will provide me with establish the granularity.  I expect each of the four ROW elements to generate a record in the main flow of a job.

The attributes in the ROW element, FieldName and Value are mapped using a single attribute selector.

Job Processing File with Unknown XML Elements
Add in Context

The parent elements -- Section, APIUSER.ini, etc. -- give the ROW element context and help differentiate one part's ROW from another's.  Section Name is mapped using an attribute selector, but with a parent reference (the relative ../).  Section Name will appear in each of the ROWS.  Since there are only three section for four elements, one of the Section Names will be repeated.

Mapping the Unknown XML

APIUSER, Cisconsis, and CLAIM are not known in advance, but they can be mapped using the same relative technique used for Section.  However, they will require the XPath function name() to provide actual data, since there is no identifying Name element the way there is with Section.

Results

Running the job produces the following four-record result set.

Output with Filename and Section Repeating Groups
XPath functions are a powerful way to process an XML document with loosely-defined or variable elements.  There are a number of string, numeric, and date functions available from XPath (count(), dateTime(), matches(), true()).  However, avoid making the logic of XPaths overly complex.  There are cleaner ways of handling business rules with the data-oriented components.

2 comments:

  1. Is it possible to do this looping in tFileInputJson

    ReplyDelete
    Replies
    1. Hi,

      I added a screenshot showing a job with tFileInputJSON here: http://bekwam.blogspot.com/2012/03/screenshot-of-tfileinputjson-component.html.

      Good luck,
      Carl

      Delete