Talend Open Studio's tFileOutputMSXML component creates an XML document from an input schema. This component uses a map dialog to link the input fields to XML elements and attributes. On the Advanced settings, there is an option that prevents XML elements from being produced if there is no value. This setting can be used to filter columns.
Remove Empty Elements from Null Input |
Excel Spreadsheet |
Job with Context Variable for Allowed Columns |
Create the mappings for the XML document. Define a root container tag (sodas) and loop tags (soda). Map each available column as its own subelement. Each subelement must retain its input source column name.
Mapping for tFileOutputMSXML |
Class<?> c_in = input_row.getClass();
Class<?> c_out = output_row.getClass();
if( context.OUTPUT_COLUMNS != null ) {
String[] fieldNames =
context.OUTPUT_COLUMNS.split(",");
for( String fn : fieldNames ) {
// input / output schemas must match
Field f_in = c_in.getDeclaredField(fn);
Field f_out = c_out.getDeclaredField(fn);
if( f_in.get(input_row) == null ) {
f_out.set(output_row, "");
} else {
f_out.set(output_row, f_in.get(input_row));
}
}
}
The check for null is handling the case where you want an empty element to show up in the XML. "shelf" is not filled in for "sunkist", but it should show up if "shelf" is an included column.
With Java Reflection, you can build a filtering mechanism that will produce an XML document tailored to your particular requirement. Build comprehensive source and target schemas and use a tJavaRow and a setting on the tFileOutputMSXML component to apply the filtering logic.
Hi Carl, I am trying to dynamically filter out some columns and save as JSON file. I think I can use the same way as you mentioned here. But the problem is tFileOutputJSON without options "Create empty element if needed" checkbox. So all the columns I would like to filter out still output into the json file with null value. Do you have any idea on how to do this? Thanks!
ReplyDeleteHave you tried adding a tMap that will replace the null with a ''? A tMap can also exclude a column with nulls, but that might filter out fields with data too.
DeleteI don't see tMap could exclude a column with nulls. May I know where it is? Thanks!
DeleteTwo other things you could try are a post-processing step or to build up multiple documents if the number of nullable columns is small.
DeleteIn the post-processing step, run a pair of regular expression replacements that remove strings of the form "_____" : null, and "____": null (no comma).
For the multiple documents approach, produce multiple JSON files. Say there are fields firstName, lastName, and companyName. Either firstName and lastName will be set or companyName will be set. Create a JSON file for the firstName / lastName case the omits companyName in the tFileOuputJSON. Next, create a JSON file for the companyName case that omits the firstName and lastName. At the end, merge the two files.
Good luck