Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, January 6, 2011

Java Libraries in Talend Open Studio

Although Talend Open Studio has a rich set of StringHandling functions, I prefer those in the StringUtils class of Apache Commons Lang.  One of my favorite functions is "isBlank" which checks for null, the empty String, and a String composed only of whitespace.  Fortunately, Talend provides and easy way to integrate this library call.

The Java library I'll be working with is the Commons Lang library:  Commons Lang.  Here is the Javadoc.

An Example
 
This is an image of Open Studio.  An input Excel file is mapped directly to an output Excel file.  However, since some of the fields in the source are empty, some null/empty string checks are required to make sure that the output spreadsheet's columns are aligned.

Talend Open Studio Job with tLibraryLoad Component
 Select the tLibraryLoad component.  In the Component panel, the "Basic settings" tab will let you find the JAR file.  Under "Advanced settings" there is a text box into which import statements can be added. Add the following import statement in this text box.

import org.apache.commons.lang.StringUtils;

Then, in the tMap component, add a Java expression that makes the StringUtils.isBlank call.

tMap with a Commons Lang StringUtils Call
Deployment Note

When you use tLoadLibrary, the JAR file is copied in the Talend internals.  This makes it eligible to be exported along with a job.  Don't try to adjust the IDE's or another classpath to find your JAR.

Flexibility

There are many possibilities for integration with this kind of flexibility.  This example focused on some useful string handling routines from a popular Java library.  But with Java, there is so much code out there that more capable libraries like Hibernate or JUnit could find their way into integration scenarios.

12 comments:

  1. Talend comes with a lot of JARs including several versions of Commons Lang. Rather than browse the file system for your downloaded JAR in the tLibraryLoad component configuration panel, try scanning the popup menu for "Commons Lang 2.5".

    ReplyDelete
  2. Hi Cart,

    I need to bring over Chinese charaters an am using tLibraryLoad to load charset.jar.

    My question for you...What should be the corresponding import statement?

    Thanks in Advance....

    ReplyDelete
  3. Hi,

    Consider upgrading to Java 6 which contains charsets.jar for Chinese encodings: Big 5, GB18030, GB2312, and GBK. You can also try placing charsets.jar in the jre/ext/lib folder in the JRE used by Talend and the target platform.

    ReplyDelete
  4. Thanks Carl...It works for one row (If only one row is in the source)...

    When there are more than one row in the source,I get ???

    ReplyDelete
  5. When I use tLibraryLoad to load charset.jar

    What should be the corresponding import statement (like import org.apache.commons.lang.StringUtils;
    )?

    ReplyDelete
    Replies
    1. I don't think you'll need an import statement. These classes are sun.* implementations that will be instantiated by more the more familiar java.io.* classes as in InputStreamReader below.

      FileInputStream fis = new FileInputStream("test.txt");
      InputStreamReader isr = new InputStreamReader(fis, "GBK");

      Delete
    2. Thanks Carl...Do we know why it works fine if there is only one source row...and TOS brings ???? marks when there are multiple rows in the source.

      Is Charset being invoked only for one row?

      Delete
    3. Can you isolate the first row (the one that's working) and repeat it? That way you can make sure that the problem is not with the data. For example, the first row may be a header that uses standard characters and subsequent rows contain local variations that are causing the error.

      Delete
    4. It brings over only one row with Chinese char,not just the first...any row you filter with a whereclause...If the whereclause returns more than onerow it displays ???

      I also tested the following case...

      I kept taking one row at a time from the source by filtering differently...It transfers nicely to the Target the entire table if run the same map multiple times by changing the where clause....

      Delete
    5. Which components are you using?

      Delete
    6. Oracle to Netezza..Just source to target
      Netezza to Netezza..Just source to target

      Delete
    7. I haven't worked with Netezza yet. Can you produce a properly encoded text file using only a tOracleInput and a tFileOutputDelimited? Set the character encoding to "CUSTOM" for the text file on the Advanced settings tab and add the appropriate Chinese encoding: "Big5", "GBK", "GB18030".

      Delete