When you're forming an XML document using Talend Open Studio and the XML has multiple loops, use the tFileOutputMSXML component. tFileOutputMSXML lets you map several input data flows to their own copy of the root element; this results in multiple loops, one per data flow. This is different than tAdvancedFileOutputXML which relies on a single input to define a single loop.
Consider this graphical representation of an XSD 'cruise-ports.xsd'. This blog post will walkthrough a Talend Open Studio job that will output an XML document that adheres to the target schema.
The XSD is here.
Basic Job Structure
The basic job structure for working with tFileOutputMSXML is to define connect each input source to the tFileOutputMSXML component. Unlike tAdvancedFileOutputXML, MSXML can take more than one main.
|MSXML Output Job|
Holland America;Mexico,Puerto Rico
Name;Hours of Operations
Joe's Coffee Stand;S,M,T,W,Th,F,S 6am-12pm
The following procedure configures the tFileOutputMSXML component:
- Rename both copies of the default top-level element 'rootTag' to 'cruise-ports'
- On each copy, right-click and import an XML tree based on cruise-ports.xsd
- For the copy associated with 'row1', remove the snack-shop elements
- For the copy associated with 'row2', remove the cruise-line elements
- Map the fields for row1 (Name, Destinations)
- Map the fields for row2 (Name, Hours of Operation)
The result of the run is the following XML
I received a number of errors when working with namespaces and the Talend issue navigator has 29 DI unresolved issues (11-SEP-25) regarding namespaces and tFileOutputMSXML. If namespaces are important for your particular requirement -- and namespaces are a crucial to any composable XML modeling -- this example won't work for you without some type of post-processing that will insert a namespace prefix and top-level attribute.
You can slip in a default namespace using the 'add namespace' feature if all the elements are under the same namespace.
Multiple Loops (Thanks "Rock")
If your XML document contains multiple looping elements, you can use several tAdvancedFileOutputXML components to build up the output in sections. For each input component, create a tAdvancedFileOutputXML starting with the topmost element. Each child element's tAdvancedFileOutputXML will use the "Append the source xml file" option.
These three data files are joined under the 'dept_no' identifier. In this data model, a Department (depts.txt) contains Employees (emps.txt) and Printers (printers.txt). There is no correlation between Employees and Printers, except for their parent Department.
In processing terms, there will be 3 loops, a loop building Employees, a loop building Printers, and a loop building the containing Departments. These loops will be implemented using three distinct tAdvancedFileOutputXMLs.
|Job With Multiple Loops|
The expected output of the job is a top-level set of Departments containing the Department's related Employees and Printers.
<dept dept_no="100" dept_name="Accounting">
<emp emp_no="2000" emp_first_name="joe"/>
<emp emp_no="2001" emp_first_name="carl"/>
<dept dept_no="101" dept_name="IT">
<emp emp_no="2002" emp_first_name="steve"/>
All tAdvancedFileOutputXMLs in this job write to the same XML file. tAdvancedFileOutputXML_3 and _5 have the "Append the source xml file" option set.
|XML Component for Departments|
|XML Component for Employees|
|XML Component for Printers|
Special Field Processing on Single Data Source
Another application of this technique is when normalization is needed on more than one field. Take the following input file as an example. The input file has two multi-valued attribute columns: CITY and COLOR.
The result of processing this file will be an XML document containing repeating groups of CITY and COLOR values. The key to this processing is to define 2 loops using 2 tFileInputDelimited components on the same input. One loop expands CITY, the other, COLOR. A component like tReplicate won't work in this case because it doesn't render more than one loop.
|2 Input Paths Using tNormalize|
This is the mapping for tAdvancedFileOutputXML_1.
|XML Output Component with CITY Mapped|
Here is the mapping for the tAdvancedFileOutputXML_3 component. Note that CITY is not mapped.
|XML Component with COLOR Mapped|
For an XML document based on a single input, use the tAdvancedFileOutputXML. tAdvancedFileOutputXML will also support grouping. If you need more than one loop -- say there are lists of unrelated children elements -- use more than one tAdvancedFileOutputXML component. For disjoint data sets, try tFileOutputMSXML. If namespace support is required, you will need additional processing or another technique to add them to your document.