However, there's a case for preferring code when the number of components for a simple task starts to become unwieldy and affects readability. Verifying a single file with a tFileExist is clear in its intent and functionality. Verifying more than one file with multiple tFileExists presents readability problems. Significant Talend canvas real estate becomes cluttered with many components that taken together, perform a simple task.
In this case, I like to use code to reduce the clutter. This blog post shows how to do that with tGroovy, a Java-like scripting language that is more forgiving for beginners (and experts).
Using Components
This job expects 3 files to be present before processing. Otherwise, the job will immediately die. I want to do this check up front and don't want to start processing and find out in the middle of the program that I'm missing a critical file. For each required file, I've added a tFileExist / tDie pair. If the file exists, then a Run If allows the program to proceed vertically, ending with the "Output a Message" tJava. If any files are not found, the corresponding tDie is called when a Run If evalutates to "true" for a missing file. The tDie aborts the program.
One tFileExist Per Required file |
Loop to Reduce Number of Components |
Groovy
If you're a Java developer, you can write a Talend Routine to check an input list of file namesand return a "missingFile" return value. If the return value is empty, then all the files exist. Otherwise, capture the missing filename from the non-empty return value. The Talend Routine is ideal because it heightens reuse across jobs.
But if you're not a Java developer, you may be looking for an easier-to-program solution. Groovy is much more forgiving in terms of its syntax. This job reduces the number of components in the job to 3 (down from 4 in the Loop version).
A Script Reduces the Number of Components |
Script Configuration
The configuration of the tGroovy component starts with the passing of parameters. From the calling job, I pass in the globalMap object and map it to a Groovy variables "gm". globalMap will provide the mechanism with which I determine whether or not a file is missing. There are other parameters for passing in the file names.
The Groovy Program and its Parameters |
def files = [ file1, file2, file3 ]
for( fn in files) {
java.io.File f = new java.io.File(fn);
if( !f.exists() ) {
gm.put( "missingFile", fn );
break;
}
}
The Run If leading into the tDie checks to see if the missingFile item now stored in the globalMap is null. If so, the tDie is called.
"missingFile" is in globalMap |
"missingFile" not in globalMap |
The tDie component uses the "missingFile" item, not as a flag, but as a way to pull out the error message for the user from the tGroovy component.
tDie Prints Results from tGroovy |
Summary
Consider writing small scripts to replace Talend Components for readability. Don't abuse this; Talend components are still preferred. However, if the job becomes so heavy with components that aren't essential to the understanding of the job, it may be worth eliminating them.
While loops and logic are important for your development and testing, I consider the real important parts of the job to be the data flows. That is, jobs can become 75% about framing the input or error handling instead of a key RDBMS to Web Service flow. The next guy maintaining your Talend job is more likely to need to find and fix a tMysqlInput query rather than file handling code.
No comments:
Post a Comment