Talend Open Studio uses the tFileInputXML component to read XML documents into a job. tFileInputXML uses a Loop XPath query to to define a repeating structure in the document against which a series of mapping XPath queries are run. There is a mapping XPath query for each schema data field to be set during processing.
Bottom-up Processing
When working with a hierarchical structure like a filesystem, one starts as the top and drills down to lower level elements. However, in XPath processing with Talend Open Studio, it's important to start with the lowest-level grain that will define a record. For example, the following XML document has ID elements in a Location element contained with an IDs element.
<Locations>
<IDs>
<ID sequenceName="Name">ABCDE</ID>
<ID sequenceName="Site"/>
<ID sequenceName="Bin">XYZ</ID>
</IDs>
</Locations>
The first step in processing this XML document is to determine whether each ID is a record (in which case there will be three rows produced by tFileInputXML) or if the IDs element defines a record (only one row).
Starting with the lowest-level possible, this Talend job produces three name / value pairs, one for each ID element. The loop is set to Locations/IDs/ID. @sequenceName returns the attribute value of sequenceName. The period (".") returns the text in the ID element. The period stands for the current element which is the ID defined in the loop.
Each ID Defines a Record |
An alternative way of processing the Locations document is to specify the loop element as Locations/IDs. In this example, a single record will be produced. There are attribute selectors ([@sequenceName=""]) that map each ID element to a different field.
Containing IDs Element Defines a Record |
It's natural to think top down when looking at a hierarchy. However, for XML processors it may help to think bottom-up to identify the correct looping structure. Parents -- and other ancestors -- aren't ignored in the bottom-up processing. Access parent elements and attributes using relative (../) paths.
Namespaces Update
If your input XML uses namespaces and they can be ignored, then set the "Ignore namespaces" option on the tFileInputXML's Advanced settings tab. This will produce a temp file of the XML data with all namespace definitions and prefixes stripped out.
¿Do you know which the basics to create an XML file from an XLS file? Thanks in advance for you response.
ReplyDeleteAt a minimum, you need a tFileInputExcel, a tMap, and an XML output component like tAdvancedFileOutputXML. For examples working with the various XML components in Talend, follow the "Bekwam Wiki / Talend" link and read the posts with XML in the title.
DeleteHow will you get the elements if you have a structure like this?
ReplyDeleteABCDE
XYZ
and you want to get the attribute "num" ?
I'm not sure where the attribute num is in the input. The XML tags didn't come through in Blogger and when I look in the HTML, I only see "sequenceNumber". To get sequenceNumber, you'll add an @sequenceNumber to the mapping.
DeleteI see this is a very old blog, are you still active?
ReplyDeleteWhat if the parent does not have any children, and the loop is on the child. Will the parent be skipped?
I haven't worked with Talend for a few years now. I don't think that's an error condition, but it should be easy to test.
DeleteGood luck