Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Friday, June 10, 2011

Validating XML with Talend Open Studio

Use the tXSDValidator component to check a local XML file against an XSD.  Or, specify "Flow Mode" to validate a flow-driven Column containing XML.

XML is widely used for data transfer.  An XSD is describes the structure, format, and types of XML.  An XSD serves as a specification, often supporting integration with the development effort through tools like Sparx Systems' Enterprise Architect.  The XSD can also help with system quality by verifying that the XML produced in a data transfer -- say the result of a database query -- adheres to the agreed-upon format.

tXSDValidator

In Talend Open Studio, a component called tXSDValidator can be used to compare XML against an XSD.  If performance is sufficient, this can be a powerful quality check run in the production environment.  When a problem arises, you can be assured that what you're producing is expected.

tXSDValidator operates in two modes: Flow and File.  This blog post presents an example of File mode.  A source XML file stored on the local system is validated against an XSD.  The XSD is located on the web and a composed set of data structures; a second XSD is imported.

File Mode

This job invokes a tXSDValidator which initiates a flow to a tFilterRow.  Two arguments were entered into the Component View: Xsd file and Xml file.  Xml file is a local file.  Xsd file is a web resource.

Job Using tXSDValidate
If you want to simply print out the result of the validation, then check the "Print to console" checkbox.  This example uses programming logic to query a variable passed out of the tXSDValidator.  The logic is implemented as a tFilterRow which routes the result of the "validate" variable to one of two tJavaRows.

Here is the configuration of the tFilterRow.

Checking 'validate' Variable

Flow Mode

Alternatively, tXSDValidator can be used in Flow Mode which will use a Column rather than a local file as an XML source.  To use Flow Mode

  1. Define an input source for the XML like a tAccessInput.
  2. Run the source's main flow into the tXSDVallidator component.
  3. In the tXSDValidator, set the component to Flow Mode, specify the Input Column, and the XSD file.
  4. The XSD file can be a web resource (http://).
  5. Route the main flow of the tXSDValidator.
  6. Optionally, route the rejects flow of the tXSDValidator.
tXSDValidator in Flow Mode


No XSD?

If you don't have an XSD but want to make sure that you're working with valid XML, consider using a Talend Routine called isXML() that I posted to the Talend Exchange under "BRules".  Here is a link to the page on Exchange.

http://www.talendforge.org/exchange/tos/extension_view.php?eid=354

This page has example syntax for isXML().

Demo Resources

For a copy of the input file used in the File Mode example, go here.  The XSD can be retrieved at http://www.bekwam.net/xsd/BekwamSales.xsd.  The input file is also used in the Flow Mode example as the clob contents of a database column.

The XSD refers to another file hosted on the same server, BekwamCommon.xsd.  You don't need to pull this second schema into Talend; it's added automatically through the schemaLocation attribute set in BekwamSales.xsd.  For best results, make sure that everything in your XSD references true URLs that are accessible by the Internet.  Some tools will allow local filename paths.

tXSDValidator is a useful component that can help profile your data, verifying that what's produced in your transfers adheres to the agreed-upon spec.

Note to TOS component developers: tXSDValidator is an example of varying outgoing connections.  "Rejects" outgoing connections will only be displayed if the component is set to Flow Mode.  See tXSDValidator_java.xml for the example.

3 comments:

  1. Hi Bekwam,
    I am trying to extract CDATA content from XML using Talend but couldn't find right way to do that. Can you please suggest me some ideas.

    ReplyDelete
    Replies
    1. Hi,

      Take a look at http://bekwam.blogspot.com/2012/06/working-with-cdata-in-talend-open.html. If you have any additional questions about XML, CDATA, and Talend, please follow-up on the new post.

      Good luck,
      Carl

      Delete
  2. Hi Bekwam,

    How to stop (or to break) the validation on first error detected by tXSDValidator ?

    Can you please suggest me some ideas?

    ReplyDelete