Bekwam Blog: XML Output from Multiple Data Sources with Talend Open Studio

Sunday, September 25, 2011

XML Output from Multiple Data Sources with Talend Open Studio

For XML documents with strong hierarchical structure, use tAdvancedFileOutputXML in Talend Open Studio to map each source field to a target element or attribute. If the XML document is less cohesive -- there are several data sets related only by parent -- use the tFileOutputMSXML component.

When you're forming an XML document using Talend Open Studio and the XML has multiple loops, use the tFileOutputMSXML component. tFileOutputMSXML lets you map several input data flows to their own copy of the root element; this results in multiple loops, one per data flow. This is different than tAdvancedFileOutputXML which relies on a single input to define a single loop.

Target Schema

Consider this graphical representation of an XSD 'cruise-ports.xsd'. This blog post will walkthrough a Talend Open Studio job that will output an XML document that adheres to the target schema.

cruise-ports.xsd

A top-level element 'cruise-ports' contains one or more cruise-ports. Within the cruise-port, there are two subelements, cruise-line and snack-shop, that may have different cardinalities. For example, a cruise-port may have 4 cruise-lines, but only 2 snack-shops.

The XSD is here.

Basic Job Structure

The basic job structure for working with tFileOutputMSXML is to define connect each input source to the tFileOutputMSXML component. Unlike tAdvancedFileOutputXML, MSXML can take more than one main.

MSXML Output Job

The input sources are two text files. 'baltimore-cruise-lines.txt' is a 5 line text file with 1 line containing the headers. 'baltimore-snack-shops.txt' is a 3 line text file with 1 line containing the headers.

# baltimore-cruise-lines.txt
Name;Destinations
Carnival;Bahamas,Mexico,Bermuda
Holland America;Mexico,Puerto Rico
Royal Caribbean;Mexico,Trinidad,Togabo
Norwegian;Norway,Sweden,Finland,Russia,Netherlands

# baltimore-snack-shops.txt
Name;Hours of Operations
Joe's Coffee Stand;S,M,T,W,Th,F,S 6am-12pm
McDonald's;S,M,T,W,Th,F,S 6am-9pm

tFileOutputMSXML Config

The following procedure configures the tFileOutputMSXML component:

Rename both copies of the default top-level element 'rootTag' to 'cruise-ports'
On each copy, right-click and import an XML tree based on cruise-ports.xsd
For the copy associated with 'row1', remove the snack-shop elements
For the copy associated with 'row2', remove the cruise-line elements
Map the fields for row1 (Name, Destinations)
Map the fields for row2 (Name, Hours of Operation)

The following screen shot shows the completed configuration

tFileOutputMSXML Config

Result

The result of the run is the following XML

Resulting XML

Namespaces Warning

I received a number of errors when working with namespaces and the Talend issue navigator has 29 DI unresolved issues (11-SEP-25) regarding namespaces and tFileOutputMSXML. If namespaces are important for your particular requirement -- and namespaces are a crucial to any composable XML modeling -- this example won't work for you without some type of post-processing that will insert a namespace prefix and top-level attribute.

You can slip in a default namespace using the 'add namespace' feature if all the elements are under the same namespace.

Multiple Loops (Thanks "Rock")

If your XML document contains multiple looping elements, you can use several tAdvancedFileOutputXML components to build up the output in sections. For each input component, create a tAdvancedFileOutputXML starting with the topmost element. Each child element's tAdvancedFileOutputXML will use the "Append the source xml file" option.

These three data files are joined under the 'dept_no' identifier. In this data model, a Department (depts.txt) contains Employees (emps.txt) and Printers (printers.txt). There is no correlation between Employees and Printers, except for their parent Department.

depts.txt
------------------------------
dept_no,dept_name
100,Accounting
101,IT

emps.txt
------------------------------
dept_no,emp_no,emp_first_name
100,2000,joe
100,2001,carl
101,2002,steve

printers.txt
------------------------------
dept_no,printer_name
100,hp-acct-bw
100,hp-acct-color
101,hp-it-bw
101,hp-it-color
101,epson-plotter

In processing terms, there will be 3 loops, a loop building Employees, a loop building Printers, and a loop building the containing Departments. These loops will be implemented using three distinct tAdvancedFileOutputXMLs.

Job With Multiple Loops

The expected output of the job is a top-level set of Departments containing the Department's related Employees and Printers.

<company>
<depts>
<dept dept_no="100" dept_name="Accounting">
<printers>
<printer printer_name="hp-acct-bw"/>
<printer printer_name="hp-acct-color"/>
</printers>
<emps>
<emp emp_no="2000" emp_first_name="joe"/>
<emp emp_no="2001" emp_first_name="carl"/>
</emps>
</dept>
<dept dept_no="101" dept_name="IT">
<printers>
   <printer printer_name="hp-it-bw"/>
<printer printer_name="hp-it-color"/>
   <printer printer_name="epson-plotter"/>
</printers>
<emps>
<emp emp_no="2002" emp_first_name="steve"/>
</emps>
</dept>
</depts>
</company>

All tAdvancedFileOutputXMLs in this job write to the same XML file. tAdvancedFileOutputXML_3 and _5 have the   "Append the source xml file" option set.


XML Component for Departments

XML Component for Employees

XML Component for Printers

In the input, each data file contains a dept_no and that field is mapped to /depts/dept/@dept_no in each tAdvancedFileOutputXML. This associates children (Employees and Printers) with the parent Department.

Special Field Processing on Single Data Source

Another application of this technique is when normalization is needed on more than one field. Take the following input file as an example. The input file has two multi-valued attribute columns: CITY and COLOR.

NAME;CITY;COLOR
TEST1;PARIS,LONDON;RED,GREEN,BLUE
TEST2;PARIS,BOSTON;YELLOW,GREEN,BLUE

The result of processing this file will be an XML document containing repeating groups of CITY and COLOR values. The key to this processing is to define 2 loops using 2 tFileInputDelimited components on the same input. One loop expands CITY, the other, COLOR. A component like tReplicate won't work in this case because it doesn't render more than one loop.

2 Input Paths Using tNormalize

tAdvancedFileOutputXML_3 uses the "Append to xml source file" option to continue processing from _1.

This is the mapping for tAdvancedFileOutputXML_1.

XML Output Component with CITY Mapped

Note that COLOR is not mapped. A COLORS element is added to hold the place of the COLOR sub-element created in the _3 component.

Here is the mapping for the tAdvancedFileOutputXML_3 component. Note that CITY is not mapped.

XML Component with COLOR Mapped

For an XML document based on a single input, use the tAdvancedFileOutputXML. tAdvancedFileOutputXML will also support grouping. If you need more than one loop -- say there are lists of unrelated children elements -- use more than one tAdvancedFileOutputXML component. For disjoint data sets, try tFileOutputMSXML. If namespace support is required, you will need additional processing or another technique to add them to your document.

13 comments:

Renat ZubairovOctober 12, 2011 at 4:30 AM
Great post. You might have also try the new tXMLMap component which is a main processing component for new ''Document' types. Now you can pass XML structures in the talend flows and map them with tXMLMap.
ReplyDelete
Replies
CarlOctober 12, 2011 at 7:18 AM
Thanks Renat. Look for a this in a future post.
ReplyDelete
Replies
rockOctober 13, 2011 at 12:29 PM
Excellent! the Multiple loops part is exactly what I was looking for when created this topic -> http://www.talendforge.org/forum/viewtopic.php?pid=66515#p66515. Very good explanation!
ReplyDelete
Replies
ArcJuly 4, 2013 at 10:13 AM
Thanks a lot! I had a requirement to build a xml extract using talend. And I found this article really helpful. I used the component tAdvancedFileOutputXML.
ReplyDelete
Replies
AnonymousApril 23, 2015 at 1:44 AM
Fantastic! The Multiple Loops was also exactly what I needed and I wouldn't have figured it out for myself in a million years!
ReplyDelete
Replies
UnknownMay 19, 2016 at 8:53 AM
thanks a lot !!!!!!!!!!!!!!
ReplyDelete
Replies
girishdmlAugust 25, 2016 at 6:58 AM
This comment has been removed by the author.
ReplyDelete
Replies
GirishAugust 25, 2016 at 6:59 AM
Thanks Carl. Do we have similar option for jsons with multiple loops?
ReplyDelete
Replies
GirishAugust 25, 2016 at 1:49 PM
I will check, Thanks Carl.
ReplyDelete
Replies
AnonymousJanuary 19, 2018 at 7:08 PM
Thanks for the informative post. I tried to use your multiple loop approach and it works great. However, I have a scenario where if there are no child records for a parent, it should still create empty node/element for that child under the parent. e.g. Suppose there is a new department dept_no 103 and there are couple employee entries for that department but no printer entries, I need to put empty tags for the printer in the output XML.

The current approach omits the printer node altogether for dept_no 103. I tried to check "Create attribute even if its value is NULL" and "Create attribute even if it is unmapped" options in the advance tab of the tAdvancedFileOutputXML, but I didn’t get the desired output.

Could you please suggest some ideas?
ReplyDelete
Replies
KaparthicynixitOctober 30, 2021 at 9:38 AM
Thanks for sharing this Informative content. Well explained.
Visit us: Dot Net Online Training Hyderabad
Visit us: .net online training india
ReplyDelete
Replies
AnonymousMarch 20, 2022 at 2:14 AM
Магазин спортивного питания, официальный портал которого доступен по адресу: SportsNutrition-24.Com, реализует большой выбор товаров, которые принесут пользу и достижения как проф спортсменам, так и любителям. Интернет-магазин производит свою деятельность большой промежуток времени, предоставляя клиентам со всей Рф качественное спортивное питание, а также витамины и особые препараты - Витамины для спортсменов. Спортпит представляет собой категорию товаров, которая призвана не только улучшить спортивные достижения, да и благоприятно влияет на здоровье организма. Подобное питание вводится в повседневный рацион с целью получения микро- и макроэлементов, витаминов, аминокислот и белков, а помимо этого прочих недостающих веществ. Не секрет, что организм спортсмена в процессе наращивания мышечной массы и адаптации к повышенным нагрузкам, остро нуждается в должном количестве полезных веществ. При этом, даже правильное питание и употребление растительной, а кроме этого животной пищи - не гарантирует того, что организм получил нужные аминокислоты или белки. Чего нельзя сказать о высококачественном спортивном питании. Об ассортименте товаров Интернет-магазин "SportsNutrition-24.Com" реализует качественную продукцию, которая прошла ряд проверок и получила сертификаты качества. Посетив магазин, заказчики смогут получить себе товары из следующих категорий: - L-карнитинг (Л-карнитин) представляет собой вещество, родственное витамину B, синтез которого осуществляется в организме; - гейнеры, представляющие собой, белково-углеводные консистенции; - BCAA - средства, содержащие в своем составе три важные аминокислоты, стимулирующие рост мышечной массы; - протеин - чистый белок, употреблять который можно в виде коктейлей; - разнообразные аминокислоты; - а помимо этого ряд прочих товаров (нитробустеры, жиросжигатели, специальные препараты, хондропротекторы, бустеры гормона роста, тестобустеры и все остальное). Об оплате и доставке Интернет-магазин "SportsNutrition-24.Com" предлагает большое разнообразие товаров, которое в полной мере способно удовлетворить проф и начинающих любителей спорта, включая любителей. Большой опыт дозволил компании сделать связь с наикрупнейшими поставщиками и производителями спортивного питания, что позволило сделать ценовую политику гибкой, а цены - демократичными! Например, аминокислоты либо гейнер купить вы можете по цене, которая на 10-20% ниже, чем у конкурентов. Оплата возможна как наличным, так и безналичным расчетом. Магазин предлагает обширный выбор способов оплаты, включая оплату различными электронными платежными системами, а помимо этого дебетовыми и кредитными картами. Главный кабинет компании размещен в Санкт-Петербурге, однако доставка товаров осуществляется во все населенные пункты РФ. Помимо самовывоза, получить товар вы можете посредством любой транспортной организации, найти которую каждый клиент может в личном порядке.
ReplyDelete
Replies

Add comment

Bekwam Blog

Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

Sunday, September 25, 2011

XML Output from Multiple Data Sources with Talend Open Studio

13 comments: