Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Sunday, May 29, 2011

A Talend Open Studio Walkthrough: Producing XML

XML is used for data transfer because it carries with it metadata and structure.  Talend Open Studio is well-suited to forming XML from sources like relational databases.

If data is modeled using an XML Schema (XSD), Talend Open Studio can read the schema and produce structures for mapping a data source like an RDBMS.  Much of this process is automated in TOS, but some manual steps may be needed to "flatten" a hierarchical XML tree.

Dimensional Model Source

The following dimensional model is a source of budget data.  The diagram was created in Sparx Systems Enterprise Architect 8.

Dimensional Model Source for XML Output
For convenience, I built a view on this model that joins all of the dimensions to the fact table.  I did this because I'm also building reports using Jaspersoft iReport Designer, and I don't want to code the same join repeatedly.  the view is called "BUDGET_ITEM_VW".

Target XSD

The view is used as the source of the RDBMS input (tAccessInput) for Talend Open Studio job that creates an XML file.  The output XML is based on the following XML model.   The diagram was also created in Sparx Systems Enterprise Architect 8.
XML Model Transferring Budget Items

 Enterprise Architect also generated a schema (XSD) from this diagram.  The schema can be found here.

Walkthrough

The following video walks through a Talend job that will produce this XML file from an RDBMS source.

Manual Steps for Schema

Talend's lets you load an XSD for use in mappings.  The elements appear as part of the schema.  These elements are used to set up loops within the document.  However, not all of the data elements -- particularly attributes -- will be present.  In the video, there is a spot highlighting several fields that were added through the XML File Wizard.  Every data element produced by the RDBMS should be present in the XML File's schema.

2 comments:

  1. What about when there are nested elements that require iteration on data from another table?

    ReplyDelete
    Replies
    1. Hi Mike,

      I'd start by seeing if you can render the input data set -- the table plus "another table" -- in one denormalized query. Using a group on the repeating values from the query, this can produce the nested elements.

      For example, if you are creating a nested players element from a root team element.

      SELECT t.*, p.* FROM Team t JOIN Player p on (t.team_id=p.team_id)

      If such a query can't be rendered -- say two child tables would produce a cartesian -- then you should use several inputs directed at a tAdvancedFileOutputXML. This post gives guidance: http://bekwam.blogspot.com/2011/09/xml-output-from-multiple-data-sources.html.

      Delete