Bekwam Blog: Dynamically Filtering Columns for XML with Talend Open Studio

To filter columns when producing an XML document with Talend Open Studio, use the tFileOutputMSXML component with the "Create empty element if needed" checkbox unset.

Talend Open Studio's tFileOutputMSXML component creates an XML document from an input schema. This component uses a map dialog to link the input fields to XML elements and attributes. On the Advanced settings, there is an option that prevents XML elements from being produced if there is no value. This setting can be used to filter columns.

Remove Empty Elements from Null Input

Using an Excel input and an XML output, define a schema that will include all the columns. For example, this Excel file has four columns: brand, type, price, and shelf. Note that "shelf" does not have a value for "sunkist".

Excel Spreadsheet

Create a job that uses a tFileInputExcel component run into a tJavaRow which is run into a tFileOutputMSXML. Base the tFileInputExcel component's schema on all available columns. Synchronize all schemas using the "Sync columns" button on the tJavaRow.

Job with Context Variable for Allowed Columns

Create a context variable "OUTPUT_COLUMNS" that is a comma-separated list of column names used in the schema. These names must match the column names used throughout the job.

Create the mappings for the XML document. Define a root container tag (sodas) and loop tags (soda). Map each available column as its own subelement. Each subelement must retain its input source column name.

Mapping for tFileOutputMSXML

Finally, add in the Java code for the tJavaRow. This code loops over the list of column names provided by the context variable. Java Reflection is used to dynamically map the input and output fields. This avoids having to hardcoded the names using "input_row.brand". On Advanced settings, java.lang.reflect.Field must be imported.

Class<?> c_in = input_row.getClass();
Class<?> c_out = output_row.getClass();

if( context.OUTPUT_COLUMNS != null ) {

String[] fieldNames =
context.OUTPUT_COLUMNS.split(",");

for( String fn : fieldNames ) {

// input / output schemas must match

Field f_in = c_in.getDeclaredField(fn);
Field f_out = c_out.getDeclaredField(fn);

if( f_in.get(input_row) == null ) {
f_out.set(output_row, "");
} else {
f_out.set(output_row, f_in.get(input_row));
}
}
}

The check for null is handling the case where you want an empty element to show up in the XML. "shelf" is not filled in for "sunkist", but it should show up if "shelf" is an included column.

With Java Reflection, you can build a filtering mechanism that will produce an XML document tailored to your particular requirement. Build comprehensive source and target schemas and use a tJavaRow and a setting on the tFileOutputMSXML component to apply the filtering logic.

4 comments:

AnonymousJune 17, 2015 at 4:40 PM
Hi Carl, I am trying to dynamically filter out some columns and save as JSON file. I think I can use the same way as you mentioned here. But the problem is tFileOutputJSON without options "Create empty element if needed" checkbox. So all the columns I would like to filter out still output into the json file with null value. Do you have any idea on how to do this? Thanks!

Bekwam Blog

Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

Tuesday, November 22, 2011

Dynamically Filtering Columns for XML with Talend Open Studio

4 comments: