Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, October 20, 2011

Java Code Snippet for Working with Filenames in Talend Open Studio

When I'm processing an set of input files, sometimes output files are generated, one per input file.  This code snippet can be used to form a unique output file name based on the input file.

The code example uses 4 context variables.  It's a good idea to parameterize everything in your Talend Open Studio job so that it can be re-configured later.  (This is especially true on a consulting project where the person or company taking receipt of the TOS job may deploy the job after your work is finished.)

Both the input and output files have distinguishing prefixes.  The suffix is also added in case the content types vary or a naming convention changes (".txt" to ".dat").

These are the context variables with some sample values

context.INPUT_FILE_PREFIX (Default="infile_")
context.INPUT_FILE_SUFFIX(Default=".txt");
context.OUTPUT_FILE_PREFIX(Default="outfile_")
context.OUTPUT_FILE_SUFFIX(Default=".txt")

This code is taken from a tJava which is preceded by a tFileList component and followed by a component that will retrieve the output filename from globalMap.


String inputFileName = ((String)globalMap.get("tFileList_1_CURRENT_FILE"));

Pattern p = Pattern.compile(context.INPUT_FILE_PREFIX + "(.*)" +
                            context.INPUT_FILE_SUFFIX);

Matcher m = p.matcher(inputFileName);

if( m.find() ) {
      globalMap.put("outputFileName", context.OUTPUT_FILE_PREFIX +
      m.group(1) + context.OUTPUT_FILE_SUFFIX);
}

An import is needed in the Advanced settings tab: "import java.util.regex.*".

The code works by building a regular expression based on the input file's prefix and suffix (the compile() line).  The regular expression grabs everything in between.  That's made available by the m.find() / m.group(1) lines.

So a file like "infile_ERP20111020.txt" will produce an output file of "outfile_ERP20111020.txt".

No comments:

Post a Comment