tXSDValidator is a Talend Open Studio component that verifies the structure of an XML document. However, you may want to validate the contents of an XML document. To do this, input the document using a tFileInputXML and apply a tFilterRow with a set of rules.
The Source Data
Make sure that your XML is in a format that is supported by Talend Open Studio. This means that the XML processing will be based on attributes and elements rather than items encoded within the tags. If the XML is not readily processable, convert it using a stylesheet.
For example,
<Data>
<CRField>TABNAM=EDI_DC40</CRField>
<CRField>MANDT=100</CRField>
<CRField>DOCNUM=0000000001234567</CRField>
<CRField>DOCREL=123</CRField>
<CRField>STATUS=30</CRField>
<CRField>VVV=15</CRField>
<CRField>SNDPRN=EXP1100</CRField>
<CRField>RCVPOR=A000000001</CRField>
<CRField>RCVPRT=LS</CRField>
<CRField>RCVPRN=NRPP041V3</CRField>
<CRField>CREDAT=20080401</CRField>
<CRField>CRETIM=094655</CRField>
<CRField>SERIAL=20080401094655</CRField>
</Data>
has values within the element CRField. The following stylesheet will replace the text contents with an attribute "name".
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all">
<xsl:template match="/Data">
<Data>
<xsl:apply-templates />
</Data>
</xsl:template>
<xsl:template match="CRField">
<CRField>
<xsl:attribute name="name">
<xsl:value-of select="fn:substring-before(text(), '=')" />
</xsl:attribute>
<xsl:value-of select="fn:substring-after(text(), '=')" />
</CRField>
</xsl:template>
</xsl:stylesheet>
To produce this variation of the XML.
<?xml version="1.0" encoding="UTF-8"?><Data>
<CRField name="TABNAM">EDI_DC40</CRField>
<CRField name="MANDT">100</CRField>
<CRField name="DOCNUM">0000000001234567</CRField>
<CRField name="DOCREL">123</CRField>
<CRField name="STATUS">30</CRField>
<CRField name="VVV">15</CRField>
<CRField name="SNDPRN">EXP1100</CRField>
<CRField name="RCVPOR">A000000001</CRField>
<CRField name="RCVPRT">LS</CRField>
<CRField name="RCVPRN">NRPP041V3</CRField>
<CRField name="CREDAT">20080401</CRField>
<CRField name="CRETIM">094655</CRField>
<CRField name="SERIAL">20080401094655</CRField>
</Data>
Job
The job starts by converting the XML with a stylesheet; use the tXSLT component.
Job Validating tFileInputXML |
The configuration of the tXSLT component will apply a stylesheet to an XML file (Data.xml) and output a transformed XML file (DataWithAttributes-TALEND.xml).
Configuring a tXSLT Component |
Mapping the Transformed Document to a Schema |
Now that the job parses the XML, a component like tFilterRow can be used to produce a set of rules that validate the data. Other components, like tSchemaCompliance or even tJavaRow, could also be used. This example does a length check (DOCNUM) and two regular expression matches (CREDT, VVV).
Some tFilterRow Rules |
In order to take advantage of Talend Open Studio's XML components, make sure that your XML data conforms. Produce a schema from the XML document, making sure that any encoded values are incorporated into the XML. Several components can be used with the schema to do the validation; this example shows tFilterRow.
No comments:
Post a Comment