When using Talend Open Studio's tFileOutputMSXML recently, I was unable to add namespaces into my XML document. I noticed about 29 unresolved issues on Bug Trackers related to this component (Sept. 2011). So, I brought in a third-party XML binding library as a workaround.
This is an example of tFileOutputMSXML that doesn't require namespaces.
An XML binding library marshals and unmarshals XML to and from Java objects. Some libraries generate Java code by looking over an XML Schema (XSD). XmlBeans, Castor, and Liquid XML Data Binder are XML binding libraries. XmlBeans and Castor work on the command line; there may be some Eclipse plug-ins available. Liquid Technologies has a GUI interface.
When your programming language is Java, you can write a Java program that manipulates these XML binding objects, building up different parts of the tree-like XML structure in your algorithms. You can loop over a result set and create objects, call a function that creates some objects, etc. Then, at the end of the program, you can serialize the resulting XML to a String or other output source.
In most of my XML binding work, I've brought in a Java Pattern called the Builder Pattern. The Builder Pattern helps reduce the hassle of working with many objects by establishing facade-like functions. For example, rather than
Address a = new Address();
You might do something like
This screenshot shows an XSD 'cruise-ports.xsd'
|Builder Class with Generated Classes|
The implementation of the Builder class 'CruisePortsBuilder' consists of a method for each main loop: addCruise_line() and addSnack_shop(). There is an init() method that will clear out the document structure, set namespaces, and set toplevel attributes.
For the source of the Java builder class, follow CruisePortsBuilder.java. Here is a Main.java that uses CruisePortsBuilder without Talend.
If you had many loops or more complicated logic, the builder would absorb the complexity rather than the Talend job.
Integration with Talend Open Studio
Talend Open Studio interacts with the Builder class in three stages.
- Initialize the Builder. Load libraries, new(), put on globalMap
- For each data flow, call a Builder method with a row
- At the end, form an output XML string to be written to a file
|TOS Job Interacting with Custom XML Builder|
Stage 1 - Initialize the Builder
To initialize the builder, two tLoadLibrary components are used. One contains the CruisePortsBuilder class plus the generated Java classes: CruisePorts, CruisePort, etc. The second tLoadLibrary component loads the Liquid Technologies' proprietary libraries. This two-part loading would apply to the other binding libraries mentioned in this post.
The tJava at the start of the job creates the builder object and puts it on the map after setting a namespace prefix.
CruisePortsBuilder contains a method for each loop in the XML document. In this respect, it works like tFileOutputMSXML. The job takes advantage of metadata defined as delimited files and uses standard Talend data flow connections to invoke the builder method.
|Processing a Row with a Builder Method|
Output is handled using the standard Talend connections also. A tJava calls the builder method 'toXmlString()', storing the results in a variable that written to the output row.
The output file is a single-field "delimited" file.
|Single-field Target Schema|
It's best to use the standard Talend components where possible. The Talend Exchange can also provide some functionality. However, if you need features not available off-the-shelf, then XML binding libraries provide a workaround. A future post will demonstrate work with the xs:group element as an example.