Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, April 28, 2011

Nested XML From an RDBMS with Talend Open Studio

If you have a simple XML document -- say one that closely mimics a database schema -- the example in the Talend Open Studio Users' Guide is sufficient.  However, if your XML document contains repeating groups of nested elements, you'll need a few extra steps.

The following example is based on a simple relational structure where a Company table has a one-to-many relationship with a Contact table.  A Contact must refer to just one other Company.  This is realized in a Company Id kept on the Contact table.

Company and Contact Tables
The target XML representation is



<companies>
  <company companyName="HUXLEY INDUSTRIES, INC.">
    <contacts>
      <contact firstName="David" lastName="King"/>
      <contact firstName="Sybil" lastName="Bedford"/>
    </contacts>
  </company>
  <company companyName="SWAN CONSTRUCTION CO.">
    <contacts>
      <contact firstName="Jocelyn" lastName="Brooke"/>
    </contacts>
  </company>
  <company companyName="CLASSICS PRODUCTION, LLC.">
    <contacts>
      <contact firstName="Stephen" lastName="Spender"/>
    </contacts>
  </company>
  <company companyName="NOVEL ENTERTAINMENT">
    <contacts>
      <contact firstName="Lee" lastName="Bartlet"/>
    </contacts>
  </company>
</companies>

The companies element is repeated, as is the contacts element.

To process the XML, start with an XML schema (XSD).  This link points to an XSD created using the Venitian Blind technique.  This produces a single top-level element (companies) with reusable types.  All other elements (company, contacts, contact) are nested within type definitions.

Start with a job that joins the source tables and writes the output to a tAdvancedFileOutputXML

Talend RDBMS to XML Job
Base the tAdvancedFileOutputXML on metadata created by importing the XSD.

Metadata from companies.xsd and Added Items
The Linker Target was formed by selecting the companies.xsd file in the previous step.  Several schema items in the left-hand panel were created.  The items CompanyName, FirstName, Suffix, etc. were created by using the Schema Management button.  The items were then mapped to Linker Targets.

Right mouse-click over the contact item and "Set as Loop Element".  On the company and contacts item, select "Set as Group Element".  This defines the repeating groups within the XML document and turns the normalized join into a nested structure.

Next, use a tMap to map the source RDBMS fields to the fields created from the XML metadata.  Note the FirstName field will render a firstName XML attribute.

RDBMS Mapping to Target XML
With a good XSD and a few components, you can use Talend Open Studio to render XML documents.  the XSD can be reused for other mappings, say to produce lighter-weight documents.  Also, other Talend components -- tFilterRow -- can be enlisted to restrict the data for the document.

2 comments:

  1. Good day Carl,

    do you have this article available as a Talend project?

    I am new to Talend and trying to build a similar XML structure (Multiple Accounts with multiple transaction of a financial statement). Everything seems to work up to the point of my tMap (getting 0 merged out with 9 input accounts and 190 transaction inputs)

    thank-you

    ReplyDelete
    Replies
    1. Hi,

      I don't have this particular job anymore. It sounds like you have a problem linking a Transaction to an Account. Break this down by making sure that you can produce a Transaction flow and that the Account IDs look good (Transaction input -> tLogRow). Do the same with the Account (Account input -> tLogRow). Then, link the two together with the tMap (Transaction and Account inputs -> tMap -> tLogRow). Hopefully you can find the problem and get the 190 records out.

      Good luck

      Delete