An xsd:sequence is an ordered list of XML elements. The bounds of the list can be specified in an XSD. The list can be unlimited which means that XML processors should build accordingly. If the sequence is bounded, you can alternatively simplify and flatten the data structure. For example, elements street_address_1 and street_address_2 may be easier to handle than multiple address/street elements.
In this XML document, there is a sequence 'tel' containing one or more elements for each 'person'.
<?xml version="1.0" encoding="utf-8"?>
<document>
<person>
<name>Alan</name>
<tel>02087654321</tel>
<tel>07654321098</tel>
</person>
<person>
<name>Bill</name>
<tel>02078901234</tel>
<tel>07890123456</tel>
</person>
<person>
<name>Chas</name>
<tel>02066666666</tel>
</person>
</document>
Array Syntax
A possible mapping for the person elements is to define a loop 'person', map a 'name' xpath, and map each possible 'tel' element using an array index-like syntax: tel[1] or tel[position()=1]. This implies that your schemas will set a bound that may or may not to adhere to the specification. For example, if I define only 'tel1' and 'tel2' and a system sends home, office, and mobile numbers, one will be omitted.
With the following Talend Open Studio job,
XML Source to Log Target Job |
Individually Mapping Elements |
A more versatile handling will push the loop definition onto 'tel' and map the other XML elements ('name') relatively. This seems a bit counter-intuitive as I wrote in "XPath Loops in Talend Open Studio". After all, 'tel' is just one field in the more important 'person' data structure.
Building a Loop Around a Sequence |
Running Xpath-based Job with Loop on Sequence |
Very nice article. When I was creating a job to get data from XML files I had the problem of setting the correct Loop XPath Query. It's in fact vital for the job to have the loop query well defined.
ReplyDelete