Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, April 12, 2012

Rendering XML from a Multi-Schema Text File with Talend Open Studio

When you're working with a multi-schema text file in Talend Open Studio, use a set of tAdvancedFileOutputXMLs to render an XML document.  Make sure you use the proper connections: OnSubjobOK rather than Main.

A mutli-schema text file contains a varying structure where each line may represent a different record type.  This is an example from the Talend documentation.

01;SOFT MUSIC DANCE ALBUM;RICHARDSON;15/12/2005
02;We Danced
02;She's Everytying
02;Once in a Lifetime Love
03;National Library
01;COUNTRY MUSIC ALBUM;WHITE;02/01/2006
02;Fall Into Me
02;Another Try
02;Something About Her

Songs ("We Danced", etc) are grouped in Compact Discs ("SOFT MUSIC DANCE ALBUM").  There's also a third record type "Library".  This could be expressed in an XML document like this.


XML Document from a Multi-Schema Text File
Using Main Connectors

It might seem possible to connect each schema in a multi-schema component like tFileInputMSDelimited to a set of tAdvancedFileOutputXMLs set in Append mode.  This job looks like it would work.  The record counts all check out.

WRONG: Appended Elements Won't Show Up

However, the XML output doesn't render correctly.  The subelements "Songs" and "Libraries" are missing.
XML Document Missing Key Subelements
 OnSubJobOk

Instead of connecting each schema out with a Main, chain the tAdvancedFileOutputXMLs together using OnSubJobOks.

With OnSubJobOk
This produces the correct document.

Component Configuration

The configuration of the tFileInputMSDelimited is found in the Talend help files.  In the case of "OnSubJobOk", the tFileInputMSDelimited is duplicated, one for each output schema.

The tAdvancedFileOutputXML's are in append mode (except for the first one) and directed to the same XML file.

Mapping that Produces Toplevel Disc Container
The elements in the toplevel container should appear in all schemas: Author, Date.  This is to ensure the correct ordering of the XML elements which might be validated against an xs:sequence element in an XSD.

Mapping that Produces Song Element
This third mapping repeats <Songs> so that <Libraries> will follow it.
Mapping that Produces the Libraries Element

Connecting a bunch of tAdvancedFileOutputXMLs didn't work initially, but by restructuring the job, you can produce an XML document from a text file.

UPDATE

This is a screenshot of the schema used repeated in each of the three tFileInputMSDelimited components.


Schema Used in "With SubJob Ok" Job

3 comments:

  1. Hi,

    I can't find a way to do the same thing with tXmlMap ?

    I miss this feature to use it in DataServices exposed trough talend esb container.

    ReplyDelete
  2. Carl,

    what is the mapping schema of the MSInputDelimited?

    it appears from the mapping as though the schema would contain previous record id as part of the MSInputDelimited schema (ie records "02" would contain "01" as a field, records "03" would contain fields of "01" and "02"?) DISCName mapping within Songs and DISCName mapping within Library.

    ReplyDelete
    Replies
    1. Hi. I added a screenshot under the section heading "UPDATE".

      Delete