XML is widely used for data transfer. An XSD is describes the structure, format, and types of XML. An XSD serves as a specification, often supporting integration with the development effort through tools like Sparx Systems' Enterprise Architect. The XSD can also help with system quality by verifying that the XML produced in a data transfer -- say the result of a database query -- adheres to the agreed-upon format.
tXSDValidator
In Talend Open Studio, a component called tXSDValidator can be used to compare XML against an XSD. If performance is sufficient, this can be a powerful quality check run in the production environment. When a problem arises, you can be assured that what you're producing is expected.
tXSDValidator operates in two modes: Flow and File. This blog post presents an example of File mode. A source XML file stored on the local system is validated against an XSD. The XSD is located on the web and a composed set of data structures; a second XSD is imported.
File Mode
This job invokes a tXSDValidator which initiates a flow to a tFilterRow. Two arguments were entered into the Component View: Xsd file and Xml file. Xml file is a local file. Xsd file is a web resource.
Job Using tXSDValidate |
Here is the configuration of the tFilterRow.
Checking 'validate' Variable |
Flow Mode
Alternatively, tXSDValidator can be used in Flow Mode which will use a Column rather than a local file as an XML source. To use Flow Mode
- Define an input source for the XML like a tAccessInput.
- Run the source's main flow into the tXSDVallidator component.
- In the tXSDValidator, set the component to Flow Mode, specify the Input Column, and the XSD file.
- The XSD file can be a web resource (http://).
- Route the main flow of the tXSDValidator.
- Optionally, route the rejects flow of the tXSDValidator.
tXSDValidator in Flow Mode |
No XSD?
If you don't have an XSD but want to make sure that you're working with valid XML, consider using a Talend Routine called isXML() that I posted to the Talend Exchange under "BRules". Here is a link to the page on Exchange.
http://www.talendforge.org/exchange/tos/extension_view.php?eid=354
This page has example syntax for isXML().
Demo Resources
For a copy of the input file used in the File Mode example, go here. The XSD can be retrieved at http://www.bekwam.net/xsd/BekwamSales.xsd. The input file is also used in the Flow Mode example as the clob contents of a database column.
The XSD refers to another file hosted on the same server, BekwamCommon.xsd. You don't need to pull this second schema into Talend; it's added automatically through the schemaLocation attribute set in BekwamSales.xsd. For best results, make sure that everything in your XSD references true URLs that are accessible by the Internet. Some tools will allow local filename paths.
tXSDValidator is a useful component that can help profile your data, verifying that what's produced in your transfers adheres to the agreed-upon spec.
Note to TOS component developers: tXSDValidator is an example of varying outgoing connections. "Rejects" outgoing connections will only be displayed if the component is set to Flow Mode. See tXSDValidator_java.xml for the example.
Hi Bekwam,
ReplyDeleteI am trying to extract CDATA content from XML using Talend but couldn't find right way to do that. Can you please suggest me some ideas.
Hi,
DeleteTake a look at http://bekwam.blogspot.com/2012/06/working-with-cdata-in-talend-open.html. If you have any additional questions about XML, CDATA, and Talend, please follow-up on the new post.
Good luck,
Carl
Hi Bekwam,
ReplyDeleteHow to stop (or to break) the validation on first error detected by tXSDValidator ?
Can you please suggest me some ideas?