Talend Open Studio's tAdvancedFileOutput component builds an XML tree structure and binds the input schema to the structure's XML elements and attributes. For complex XML documents, tAdvancedFileOutput also defines loops and groups to form nested elements.
This post describes an example of an XML document with three-levels of nesting.
Denormalized Data Set
In the following E-R diagram, an ITEM may contain one or more ITEM_EANS. ITEM_EANS may contain one or more SUPPLIERS.
E-R Showing ITEMS, EAN, and SUPPLIERS |
This SQL is such a query.
SELECT
ITEM.ITEMID, ITEM.ITEMNAME,
ITEM_EAN.ITEM_EAN_ID,
SUPPLIER.SUPPLIERID, SUPPLIER.SUPPLIER_NAME
FROM (ITEM INNER JOIN ITEM_EAN ON ITEM.ITEMID = ITEM_EAN.ITEMID)
INNER JOIN SUPPLIER ON ITEM_EAN.ITEM_EAN_ID = SUPPLIER.ITEM_EAN_ID
ORDER BY ITEM.ITEMID, ITEM_EAN.ITEM_EAN_ID, SUPPLIER.SUPPLIERID;
And the result may look like what is captured in this screenshot.
A Denormalized Data Set |
This screenshot shows a simple 2 component Talend Open Studio job that runs the query listed above against an Access database. The main flow of the tAccessInput component is directed to a tAdvancedFileOutputXML component.
Simple XML-Producing Job |
The configuration of the tAdvancedFileOutput begins with identifying the lowest level grain in the query's result set. This may seem backwards, as XML modeling tends to start with the toplevel element and work downward. In this example, the lowest level grain is the SUPPLIER which is "owned" by the ITEM_EAN which in turn is owned by the ITEM.
After the loop element is identified, any groups are flagged. This will avoid taking the denormalized result set and producing a similarly denormalized XML document. If you only define a loop element in this example, then there would be as many ITEMS as there are SUPPLIERS rather than the nested structure we desire.
See the following snippet for XML produced without groups.
<item item_name="ITEM A" item_id="1">
<item_ean item_ean_id="101">
<supplier supplier_name="ACME SUPPLIES" supplier_id="990"/>
</item_ean>
</item>
<item item_name="ITEM A" item_id="1">
<item_ean item_ean_id="102">
<supplier supplier_name="RIVER SUPPLIER, INC" supplier_id="991"/>
</item_ean>
</item>
<item item_name="ITEM A" item_id="1">
<item_ean item_ean_id="103">
<supplier supplier_name="MY SUPPLIES LLC" supplier_id="993"/>
</item_ean>
</item>
This screenshot shows the correct configuration of tAdvancedFileOutputXML. Note the Loop element definition on SUPPLIER and the group annotations on the ITEM_EAN and ITEM elements.
tAdvancedFileOutputXML Config |
The result is a nested document with 3 toplevel ITEM elements.
<?xml version="1.0" encoding="ISO-8859-15"?>
<rootTag>
<item item_name="ITEM A" item_id="1">
<item_ean item_ean_id="101">
<supplier supplier_name="ACME SUPPLIES" supplier_id="990"/>
</item_ean>
<item_ean item_ean_id="102">
<supplier supplier_name="RIVER SUPPLIER, INC" supplier_id="991"/>
</item_ean>
<item_ean item_ean_id="103">
<supplier supplier_name="MY SUPPLIES LLC" supplier_id="993"/>
</item_ean>
</item>
<item item_name="ITEM B" item_id="2">
<item_ean item_ean_id="201">
<supplier supplier_name="SUPPLIER 2" supplier_id="994"/>
</item_ean>
</item>
<item item_name="ITEM C" item_id="3">
<item_ean item_ean_id="301">
<supplier supplier_name="SUPPLIER ABC" supplier_id="995"/>
<supplier supplier_name="SUPPLIER DEF" supplier_id="996"/>
</item_ean>
</item>
</rootTag>
XML Note
The XML document is built with a particular transmission in mind. In this instance, the document assumes that a list of ITEMS has (through EAN) a list of SUPPLIERS. The condition could be inverted with additional relationships. Suppose that, in this case, this is a preferred vendor list where the document receiver is looking for SUPPLIERS that provide items. In a different context or with different relationships, this could be inverted. Take a catalog of ITEMS offered by SUPPLIERS.
Another Post
For another example of using tAdvancedFileOutputXML, visit Nested XML From an RDBMS with Talend Open Studio
Can you explain how is multiple branching at root level possible ..
ReplyDeleteFor example, I have 3 tables
1) Products
2) Orders
3) Vendors
Each product will have multiple Orders and can be supplied by multiple vendors. I need to get a xml
...
Hi Prashanverma,
ReplyDeleteTo adapt the XML example in this post, write a query that joins Products, Orders, and Vendors in an input component. Your results should look like the image "Denormalized Data Set".
Connect the input database component to a tAdvancedFileOutputXML. Using the tAdvancedFileOutputXML, perform the following operations.
1. Add a subelement to rootTag for Products.
2. To the Products element, add a sublement for Orders.
3. To the Orders element, add a subelement for Vendors.
4. Map the attributes for each of the subelements. For example, map ProductName to the Products element as an attribute.
5. Right-click on the Vendors element in the Link Target and Set as Loop Element.
6. Right-click on the Orders element in the Link Target and specify as a Group.
7. Right-click on the Products element in the Link Target and specify as a Group.
Good luck
How to read XML file having multiple occurring tag as same level , help me with this xml file reading
ReplyDeleteHi Carl, Thanks for posting this solution. I am wondering if you have any experience on how to map multiple nested xml to database tables. Thanks!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteNevertheless, panels should be attached using screws and corner blocks. An important aspect of taller furniture is that it has been tested for stability. You don't want a tall chest tipping over when you pull out a filled top drawer. nesting tables
ReplyDelete