JavaFX Tutorials

Thursday, October 20, 2011

tWriteXMLField Component in Talend Open Studio

The tWriteXMLField component is used to load XML into a database column.  This post presents a sample job on the topic.


This article is available as a PDF here.

Sometimes, it's helpful to store raw XML in the database.  A few scenarios come to mind.  One is using a column of raw XML alongside other fields for auditing or system troubleshooting purposes.  Another is caching.  An XML document may be built up within a database for fast retrieval later.

Talend Open Studio's tWriteXMLField works with this structure by mapping the fields of an input source to a field in the output source.  Most likely, the output source will be an RDBMS, but could also be a web service.

This example turns a record from a line in a CSV file into an XML document stored in a MySQL column.  As the XML is added, an AUTO_INCREMENT id is applied by MySQL.

The source schema is CONTACT_CSV(firstName, city, state).  The target schema is CONTACT_XML(contact_xml_id, contact_xml) where contact_xml is a Clob (TEXT) containing an XML document.

Job Using tWriteXMLField
The job in the screenshot is using three components: tFileInputDelimited, tMySQLOutput, and tWriteXMLField.  The screenshot shows the configuration of the tWriteField component which directs the resulting XML document to the contact_xml field in MySQL.  The following screenshot is the mapping of input to form the XML ("Configure XML Tree" was pressed).

Configuring tWriteXMLField
The tMySQLOutput component is a two-field table.  Since the id field (contact_xml_id) is handled by MySQL, I use the Advanced settings tab to turn off the insert and update.

Talend Ignoring ID Column on Insert and Update
The result of running the job on a two-line CSV file is two records in the database, each containing an XML document in the contact_xml field and a unique auto-generated id.

TEXT Column in MySQL Table
It can be useful to store an XML document in a database for caching or traceability.  I've worked with one system that maintains a cached XML document for easy web service retrieval.  In the operational environment, this document is updated with Oracle's UPDATEXML function.  The column type is Oracle's XMLTYPE.  Talend Open Studio can be used to integrate additional data in the XML column or maintain what's there is tWriteXMLField.

No comments:

Post a Comment