JavaFX Tutorials

Friday, May 4, 2012

Validate XML Data with Talend Open Studio

With an XSD, you can validate the structure of an XML document.  To validate the contents of the document in Talend Open Studio, use a component like tFilterRow.

tXSDValidator is a Talend Open Studio component that verifies the structure of an XML document.  However, you may want to validate the contents of an XML document.  To do this, input the document using a tFileInputXML and apply a tFilterRow with a set of rules.

The Source Data

Make sure that your XML is in a format that is supported by Talend Open Studio.  This means that the XML processing will be based on attributes and elements rather than items encoded within the tags.  If the XML is not readily processable, convert it using a stylesheet.

For example,

<Data>    
    <CRField>TABNAM=EDI_DC40</CRField>    
    <CRField>MANDT=100</CRField>    
    <CRField>DOCNUM=0000000001234567</CRField>    
    <CRField>DOCREL=123</CRField>    
    <CRField>STATUS=30</CRField>    
    <CRField>VVV=15</CRField>    
    <CRField>SNDPRN=EXP1100</CRField>    
    <CRField>RCVPOR=A000000001</CRField>    
    <CRField>RCVPRT=LS</CRField>    
    <CRField>RCVPRN=NRPP041V3</CRField>    
    <CRField>CREDAT=20080401</CRField>    
    <CRField>CRETIM=094655</CRField>    
    <CRField>SERIAL=20080401094655</CRField>
</Data>

has values within the element CRField.  The following stylesheet will replace the text contents with an attribute "name".

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all">
   
    <xsl:template match="/Data">
        <Data>
            <xsl:apply-templates />
        </Data>       
    </xsl:template>

    <xsl:template match="CRField">
        <CRField>
            <xsl:attribute name="name">
                <xsl:value-of select="fn:substring-before(text(), '=')" />
            </xsl:attribute>
            <xsl:value-of select="fn:substring-after(text(), '=')" />
        </CRField>   
    </xsl:template>
   
</xsl:stylesheet>

To produce this variation of the XML.

<?xml version="1.0" encoding="UTF-8"?><Data>    
    <CRField name="TABNAM">EDI_DC40</CRField>    
    <CRField name="MANDT">100</CRField>    
    <CRField name="DOCNUM">0000000001234567</CRField>    
    <CRField name="DOCREL">123</CRField>    
    <CRField name="STATUS">30</CRField>    
    <CRField name="VVV">15</CRField>    
    <CRField name="SNDPRN">EXP1100</CRField>    
    <CRField name="RCVPOR">A000000001</CRField>    
    <CRField name="RCVPRT">LS</CRField>    
    <CRField name="RCVPRN">NRPP041V3</CRField>    
    <CRField name="CREDAT">20080401</CRField>    
    <CRField name="CRETIM">094655</CRField>    
    <CRField name="SERIAL">20080401094655</CRField>
</Data>

Job

The job starts by converting the XML with a stylesheet; use the tXSLT component.

Job Validating tFileInputXML

The configuration of the tXSLT component will apply a stylesheet to an XML file (Data.xml) and output a transformed XML file (DataWithAttributes-TALEND.xml).

Configuring a tXSLT Component
 The XML input -- now transformed -- maps each CRField to a different field.  The mapping is based on a new attribute "name".
Mapping the Transformed Document to a Schema
Validation
 
Now that the job parses the XML, a component like tFilterRow can be used to produce a set of rules that validate the data.  Other components, like tSchemaCompliance or even tJavaRow, could also be used.  This example does a length check (DOCNUM) and two regular expression matches (CREDT, VVV).

Some tFilterRow Rules

In order to take advantage of Talend Open Studio's XML components, make sure that your XML data conforms.  Produce a schema from the XML document, making sure that any encoded values are incorporated into the XML.  Several components can be used with the schema to do the validation; this example shows tFilterRow.

No comments:

Post a Comment