Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, May 30, 2011

Using XPaths for XML Input in Talend Open Studio

If you have an XML document based on a schema that requires transformation, consider using XPaths in Talend Open Studio to flatten the hierarchical file for loading.
Consider the following XML fragment.
<hredielement>
  <hrediattributes>
    <hrediattribute name="patientId" value="111111" />
    <hrediattribute name="firstName" value="Carl" />
  </hrediattributes>
</hredielement>


The fragment lacks "patientId" and "firstName" elements.

XPaths can be used to flatten this by mapping the attribute "hrediattribute" values to different columns based on the name attribute.  (That's the "name" attribute of the element "hrediattribute".)

For patientId, the XPath would be

hrediattributes/hrediattribute[@name='patientId']/@value

And for firstName, the XPath would be

hrediattributes/hrediattribute[@name=firstName]/@value  

If the hrediattribute elements have their values in the element body (<hrediattribute name="patientId">11111</hrediattribute), the leave off the trailing attribute selector "/@value".

File XML in Talend Open Studio

The following screenshot is from Talend Open Studio's File XML Wizard.  The File XML is used as an Input XML and is available here.

To test this, I dragged the File XML onto the canvas as an input and hooked up a tLogRow.  Here are the results from the run.

Starting job HREDIXMLFile at 08:47 30/05/2011.
[statistics] connecting to socket on port 3547
[statistics] connected
111111|Carl
222222|Joe
[statistics] disconnected
Job HREDIXMLFile ended at 08:47 30/05/2011. [exit code=0]


If you need to transform your XML document, consider using XPaths in Talend Open Studio rather than a separate XSL.  Although you can call the XSL transformation from TOS, that won't take advantage of the TOS' browsing and dependency checking.

Specifying an XSD

Although the File XML wizard is labeled "File Settings / XML" (TOS 4.2.1), an XSD can be entered.  The XSD must be a local file.  However, make sure that any references within the XSD are web resources and not local files.  If the XSD imports another XSD namespace, the schemaLocation should to something accessible on the web and not another local file.

A Second Example

If some of the enclosing parent elements have data that needs to be mapped, additional xpaths are required.  Because the xpaths are referencing elements outside of the loop, the relative xpaths in the first example won't work with out backing up (../..) or using absolute paths.

This example also needs transId mapped.  An absolute path selecting all transIds is used (//@transId).

<hreditrans transId="101">
<hredielements><hredielement name="patient">
<hrediattributes>
<hrediattribute name="patientId">111111</hrediattribute>
<hrediattribute name="firstName">Carl</hrediattribute>
</hrediattributes>
</hredielement>


The full XML file is here.


In order to dig into the specifics instances of the hredielements, additional attribute selectors are used

hredielement[@name='patient']/hrediattributes/hrediattribute[@name='patientId']

This is a screenshot of the mappings entered into File XML metadata.

Additional XPaths Example
Loop Element

The position of the loop element will determine the repeating rows.  Using a loop element on "contact" on the following XML

<companies>
<company companyName="HUXLEY INDUSTRIES, INC.">
<contacts>
<contact firstName="David" lastName="King"/>
<contact firstName="Sybil" lastName="Bedford"/>
</contacts>
</company>

</companies> 

 
produces

HUXLEY INDUSTRIES, INC.|David|King
HUXLEY INDUSTRIES, INC.|Sybil|Bedford


with the companyName repeating.  If the loop element is "company" instead -- and companyName, firstName, lastName is mapped -- then only the first row (with "David King") would be displayed.

4 comments:

  1. How come the 2 screen shots look different for "Additional XPath Examples" looks different? There's a transId in one but not the other.

    ReplyDelete
    Replies
    1. Hi,

      These are two different examples featuring two different XML documents. The emphasis on the second example is on mapping the outermost root container with the transId attribute.

      Delete
    2. Interesting. Then which screen shot is related to the XML full file that can be downloaded? If neither is, can I get the XML files that are related to the screen shots?
      Thanks for your time btw in responding. :-)

      Delete
    3. There are links within the post to the XML test files.

      Delete