tExtractXMLField is a Talend Open Studio (TOS) component that parses a field containing XML and breaks the structure out into individual fields. Once the data is out of XML, the resulting schema can be processed using other TOS components like tMap.
If an update is required, use a tXMLMap to rebuild the structure and direct the flow to an output component like tMySqlOutput. A tMap will need to be added in order to convert the Document output of tXMLMap to an output schema; implicit cast-style conversions or tConvertType won't work.
XML Document
Suppose this XML document was created by front-end Javascript and saved to the database. The document records user preferences which include a theme (modern) and a color palette (reds).
<?xml version="1.0" encoding="utf-8"?>
<preferences>
<theme name="modern">
<palette>reds</palette>
</theme>
</preferences>
RDBMS Schema
The XML document can be saved in a Text column of a MySQL database. The Text column is like an Oracle Clob which removes the maxlength limit of the typical VARCHAR data type. This database record stores an id (preferences_id) and a unique business key (username) along with the XML for easy retrieval by a RESTful web service.
DB Schema Showing a Text Column with XML |
Use Case
The use case described in this post is to migrate the data stored in the XML document to a new value. Specifically, the theme named "modern" is to be renamed "modern1". One way to handle this is to use a string replacement, but a more flexible way to handle this is to retain the data's definition by dealing with the perferences_xml field as XML rather than a simple string. To apply the new theme name, the XML document will be read, converted internally into something usable by TOS, modified with the results of a lookup value, and written back out the the database.
Job Parsing and Re-forming XML |
tExtractXMLField
The tExtractXMLField parses and queries the XML document with XPaths. From the XML document, it creates preference_theme and preference_color columns. Although preference_color is not modified, it needs to be extracted so that it can be re-formed later.
tExtractXMLField Configuration |
Join Tables to Update Preferences |
The tXMLMap takes the individual fields -- now converted by the preferences_update lookup -- and re-forms the XML document in a field "preferences_xml". To use tXMLMap, make sure that the target type is Document. This will make the basic subelement/attribute right-click menu functions available. Build the structure and map the fields as you would with any XML component.
tXMLMap Configuration |
A second tMap is used to convert the Document coming from the tXMLMap into the String used by the target tMySqlOutput. It seems like this should happen automatically, but you'll get an error even if you add an interceding tConvertType. Rather, call the toString() method of the Document as in this configuration.
Document to String Conversion |
Finally, an update is executed using a tMySqlOutput component. This update uses the Repository schema for preferences. But it uses the Advanced settings' Field options as specified below to prevent username from being overridden with a NULL value. Update is the action used.
Update Only preferences_xml |
A Note about tWriteXMLField:
tWriteXMLField has similar capabilities to tXMLMap. However, tWriteXMLField will only output a single field of XML. That's good for inserting, however it may not work for updating where a unique key or id needs to be carried through. See TDI-8392 on Talend's bug tracker for more info.
This comment has been removed by the author.
ReplyDeleteHi Carl:
ReplyDeleteI need help to parse following xml
703
iPhone 6-16 GB-Space Grey-KD 57 Pack
0
57
0
177
Bundle_2500_30GB_UNLIMITED
Primary
Service
2500|Local Mins.
Unlimited|SMS and MMS
30 GB|4G LTE Internet
226
Gold
Primary
SIM
G
560
iPhone 6 - 16 GB - Space Grey
Primary
Device
Space Gray|#808080
16 GB
561
4G LTE Hotspot - 0 GB - Black
Secondary Optional
Device
Black|#000000
16 GB
Could you please help.
Have you tried tFileInputXML?
Delete