If you're working with boolean data in Talend Open Studio, you can define schemas using the "boolean | Boolean" data type. This can then be used to write out a boolean value to the database. Sometimes, usually for presentation purposes, boolean values are encoded as string values like "Yes" or "N" or "False". The BooleanUtils class in the Commons Lang library provides some handy functions to keep Talend expressions uncluttered.
The following job uses tRowGenerator to create a record with 3 string values and a boolean: "Yes", "N", "true", false.
BooleanUtils Test Job |
import org.apache.commons.lang.BooleanUtils;
The tRowGenerator is configured to write a single record out using a String/String/String/Boolean schema. The single record contains the following values: "Yes", "N", "true", false.
tRowGenerator Schema |
Other String Used as Booleans
"Y" and "N" are often used, but they're not included in toBoolean(), so the method call is expanded to use toBooleanObject(). toBooleanObject() takes the strings used as true, false, and null in the method signature. So, this could work for languages other than English ("Oui", "Non")
Null Value Handling
BooleanUtils.toBooleanObject() will also handle a null value. For example, if you're data can be "Y", "N", or "N/A", make a call like this to return true, false, or null.
BooleanUtils.toBooleanObject(ynColumn, "Y", "N", "N/A")
The input schema for the tJavaRow is the same as the tRowGenerator schema (3 string fields and a boolean field). The output schema is all boolean.
tJavaRow Schema Converting Strings to Booleans |
BooleanUtils Code in tJavaRow Component |
Starting job BooleanTest at 17:23 16/03/2011. [statistics] connecting to socket on port 3526 [statistics] connected true|false|true|false [statistics] disconnected Job BooleanTest ended at 17:23 16/03/2011. [exit code=0]
Even Less Code
You can use a shorthand syntax for the toBoolean() and toBooleanObject() methods that leaves off the class-qualifying "BooleanUtils". This is done using Java's static import and will let you use the following in your Talend expressions (the BooleanUtils is left off).
output_row.col1 = toBoolean(yesNoColumn);
output_row.col2 = toBooleanObject(ynColumn, "Y", "N", "N/A")
The static import is applied in the tLibraryLoad component. Rather than import the class BooleanUtils, the Advanced settings tab of the component will have
import static org.apache.commons.lang.BooleanUtils.toBoolean;
import static org.apache.commons.lang.BooleanUtils.toBooleanObject;
A single "import static org.apache.commons.lang.BooleanUtils.*" could be used instead, but I usually import each function individually. The static import should be used sparingly because it can cause conflicts with the global namespace. That is if you import similarly named functions from different classes or packages using the wildcard notation, you could have a conflict.
Consistent Handling Throughout Job
While you can certainly form your own expressions using the tertiary operator of if statements, BooleanUtils can consistently apply the same logic and null handling throughout the job.
No comments:
Post a Comment