When you need to combine several Excel workbooks or sheets inside a workbook, use Talend Open Studio's tUnite component.
The tUnite component takes one or more inputs and outputs a data flow in the style of a SQL UNION.. If using Excel sources, and configured accordingly, the tUnite component will also output each sheet in the workbook in the data flow.
These Excel workbooks contain business contacts spread across two regions:
BusinessContacts_West.xlsx and
BusinessContacts_East.xlsx. Additionally, there are sheets for each fiscal year in the workbook.
|
An Excel Workbook with Several Sheets |
Configuring a component like tFileInputExcel to process each sheet will send all of a workbook's records in each sheet to the output data flow. For example, the following component configuration is for a tFileInputExcel that references metadata from the repository.
|
Configuration for tFileInputExcel |
This configuration specifies the file to process, BusinessContacts_West.xlsx,. The configuration references previously-loaded metadata from the repository, marks all sheets for processing ("All sheets"), sets up a header ("Header 1"), and applies that header to each sheet ("Affect each sheet(header&footer)". The Excel 2007 is also set. (If you get an OLE error, be sure to verify the version.)
"Header 1" is important so that the headers repeated on each sheet aren't also repeated in the output data flow and mixed with the data.
The two workbooks are configured similarly. Each workbook's main data flow is sent to a tUnite component which sends its output to a tLogRow (for demonstration purposes).
|
Sample tUnite Job |
The tUnite configuration is straightforward. It's using the same business contacts schema as the tFileInputExcels which is taken from the repository.
|
tUnite Configuration |
And a run produces
|
Running the tUnite Example |
tUnite provides a nice alternative to writing code that loops through workbooks and sheets. You may need to write a loop if you need access to the iteration controls like the workbook name or a value encoded in the sheet name. If this isn't the case, tUnite is preferred because configuration over code means less testing.
How do I set up the metadata repository that you mention in this tutorial?
ReplyDeleteIn TOS for Data Integration, expand the Metadata item in the Repository view. Right-click on "File excel" and select "Create file Excel". Follow the wizard to select the input file and to parse out a schema.
DeleteHow to break the Talend job if wrong input excel file taken
ReplyDeleteI have excel file, in first row of the file, it contains heading like "Department-Food" or "Department-Clothes", I need to read file only if it is "Department-Food" in tMap.
if it is not "Department-Food" than the flow should break.
tflowList >> tFileInputExcel >> tMap >> tFlowMeter >> tpostgres
My requirement is, in tMap if the file is wrong file than nothing should be recorded in tFlowMeter not even "0", the flow should break at tMap itself.
Hi Carl, Thanks for sharing this guide on Combining Excel Files. I have a Suggestion we can Also use Synkronizer Excel Add-in to Combine and Merge Multiple Excel Files into one file.
ReplyDelete