Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, August 15, 2011

Three Levels XML Nesting in Talend Open Studio

To output a complex XML document with multiple nested structures, first write a denormalized query then route the main flow to Talend Open Studio's tAdvancedFileOutput component.

Talend Open Studio's tAdvancedFileOutput component builds an XML tree structure and binds the input schema to the structure's XML elements and attributes.  For complex XML documents, tAdvancedFileOutput also defines loops and groups to form nested elements.

This post describes an example of an XML document with three-levels of nesting.

Denormalized Data Set

In the following E-R diagram, an ITEM may contain one or more ITEM_EANS.  ITEM_EANS may contain one or more SUPPLIERS.

E-R Showing ITEMS, EAN, and SUPPLIERS
The first step in producing an XML document in Talend Open Studio is to write a denormalizing query that will retrieve all of the data for the XML document.  A denormalizing query is one that will produce repeating groups in a result set.

This SQL is such a query.

SELECT
ITEM.ITEMID, ITEM.ITEMNAME,
ITEM_EAN.ITEM_EAN_ID,
SUPPLIER.SUPPLIERID, SUPPLIER.SUPPLIER_NAME
FROM (ITEM INNER JOIN ITEM_EAN ON ITEM.ITEMID = ITEM_EAN.ITEMID)
INNER JOIN SUPPLIER ON ITEM_EAN.ITEM_EAN_ID = SUPPLIER.ITEM_EAN_ID
ORDER BY ITEM.ITEMID, ITEM_EAN.ITEM_EAN_ID, SUPPLIER.SUPPLIERID;

And the result may look like what is captured in this screenshot.

A Denormalized Data Set
Loops and Groups

This screenshot shows a simple 2 component Talend Open Studio job that runs the query listed above against an Access database.  The main flow of the tAccessInput component is directed to a tAdvancedFileOutputXML component.

Simple XML-Producing Job

The configuration of the tAdvancedFileOutput begins with identifying the lowest level grain in the query's result set.  This may seem backwards, as XML modeling tends to start with the toplevel element and work downward.  In this example, the lowest level grain is the SUPPLIER which is "owned" by the ITEM_EAN which in turn is owned by the ITEM.

After the loop element is identified, any groups are flagged.  This will avoid taking the denormalized result set and producing a similarly denormalized XML document.  If you only define a loop element in this example, then there would be as many ITEMS as there are SUPPLIERS rather than the nested structure we desire.

See the following snippet for XML produced without groups.


  <item item_name="ITEM A" item_id="1">
    <item_ean item_ean_id="101">
      <supplier supplier_name="ACME SUPPLIES" supplier_id="990"/>
    </item_ean>
  </item>
  <item item_name="ITEM A" item_id="1">
    <item_ean item_ean_id="102">
      <supplier supplier_name="RIVER SUPPLIER, INC" supplier_id="991"/>
    </item_ean>
  </item>
  <item item_name="ITEM A" item_id="1">
    <item_ean item_ean_id="103">
      <supplier supplier_name="MY SUPPLIES LLC" supplier_id="993"/>
    </item_ean>
  </item>

This screenshot shows the correct configuration of tAdvancedFileOutputXML.  Note the Loop element definition on SUPPLIER and the group annotations on the ITEM_EAN and ITEM elements.

tAdvancedFileOutputXML Config
Result

The result is a nested document with 3 toplevel ITEM elements.

<?xml version="1.0" encoding="ISO-8859-15"?>

<rootTag>
    <item item_name="ITEM A" item_id="1">
      <item_ean item_ean_id="101">
        <supplier supplier_name="ACME SUPPLIES" supplier_id="990"/>
      </item_ean>
      <item_ean item_ean_id="102">
        <supplier supplier_name="RIVER SUPPLIER, INC" supplier_id="991"/>
      </item_ean>
      <item_ean item_ean_id="103">
        <supplier supplier_name="MY SUPPLIES LLC" supplier_id="993"/>
      </item_ean>
    </item>
    <item item_name="ITEM B" item_id="2">
      <item_ean item_ean_id="201">
        <supplier supplier_name="SUPPLIER 2" supplier_id="994"/>
      </item_ean>
    </item>
    <item item_name="ITEM C" item_id="3">
      <item_ean item_ean_id="301">
        <supplier supplier_name="SUPPLIER ABC" supplier_id="995"/>
        <supplier supplier_name="SUPPLIER DEF" supplier_id="996"/>
      </item_ean>
    </item>
</rootTag> 

XML Note

The XML document is built with a particular transmission in mind.  In this instance, the document assumes that a list of ITEMS has (through EAN) a list of SUPPLIERS.  The condition could be inverted with additional relationships.  Suppose that, in this case, this is a preferred vendor list where the document receiver is looking for SUPPLIERS that provide items.  In a different context or with different relationships, this could be inverted.  Take a catalog of ITEMS offered by SUPPLIERS.

Another Post

For another example of using tAdvancedFileOutputXML, visit Nested XML From an RDBMS with Talend Open Studio 

5 comments:

  1. Can you explain how is multiple branching at root level possible ..
    For example, I have 3 tables
    1) Products
    2) Orders
    3) Vendors

    Each product will have multiple Orders and can be supplied by multiple vendors. I need to get a xml












    ...

    ReplyDelete
  2. Hi Prashanverma,

    To adapt the XML example in this post, write a query that joins Products, Orders, and Vendors in an input component. Your results should look like the image "Denormalized Data Set".

    Connect the input database component to a tAdvancedFileOutputXML. Using the tAdvancedFileOutputXML, perform the following operations.

    1. Add a subelement to rootTag for Products.
    2. To the Products element, add a sublement for Orders.
    3. To the Orders element, add a subelement for Vendors.
    4. Map the attributes for each of the subelements. For example, map ProductName to the Products element as an attribute.
    5. Right-click on the Vendors element in the Link Target and Set as Loop Element.
    6. Right-click on the Orders element in the Link Target and specify as a Group.
    7. Right-click on the Products element in the Link Target and specify as a Group.

    Good luck

    ReplyDelete
  3. How to read XML file having multiple occurring tag as same level , help me with this xml file reading

    ReplyDelete
  4. Hi Carl, Thanks for posting this solution. I am wondering if you have any experience on how to map multiple nested xml to database tables. Thanks!

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete