Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Sunday, June 17, 2012

Working with CDATA in Talend Open Studio

CDATA is an XML processing instruction that tells the XML parser to ignore a block of text.  When working with the Talend Open Studio XML component tFileInputXML, you only need to provide an XPath referring to an element's contents to return the embedded value.

This XML document defines a block of text under RootElement.  The block of text contains two CDATA directives which tell the processor to ignore the enclosed text.  In this example, one CDATA is used to protected the less than expression.  In the other, an entire XML document is embedded.
XML Document with 2 CDATA Directives
This job reads in the data from a file, passes it to a tMap and tLogRow for output.

XML Processing Job
The tFileInputXML is configured to loop on RootElement (the only element in the document).  The XPath listed on the following screen is a relative reference to the current element (which is the only element, RootElement).
An XPath Referencing a Whole Element Minus the Nodes
The Get Nodes is unchecked which will pass the block of text inside <RootElement> to the tMap.  If Get Nodes were checked, then the whole element -- tags included -- would be passed.

These are the results of running the program.

The Contents of RootElement
When dealing with XML, often callers want text.  Sometimes, that text conflicts with the XML standard.  Logical operators like '<' can be mistaken for tags, so they must be encoded (&lt;) or surrounded by a CDATA.  A good XML tool will hide the distinction between text and CDATA, because the calling program cares to treat the text as text and isn't overly concerned with the transporting XML.

No comments:

Post a Comment