Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, August 9, 2012

Iterating Over a Java Collection with Talend Open Studio

When working with a third-party library of Java code in Talend Open Studio, you might have need to work with Java Collections.  This post shows how to pass a java.util.List around a Talend job, loop over its contents, and print them.

You can integrate systems with Talend Open Studio via third-party libraries ("jar files").  For instance, a method call like this

     List<LoanProviders> loanAdapter.getAllProviders()

might be used to return a java.util.List of objects that you can format, transport, or manipulate with Talend Open Studio.  While the underlying implementation of loanAdapter might call a database or a web service -- things you can do conveniently with Talend Open Studio -- there may be additional business logic, parameter handling,  or other difficulty in breaking apart the composed functionality of such a class.  In this case, it may be easier to call a Java API.

This job creates a java.util.List, loops over the List, and outputs the results.  It's a demo job.  A real job might involve a database logging call or a web service call to request a loan application review.

Job Iterating Over a Collection and Printing A, B, and C
tSetGlobalVar


The job starts with a tSetGlobalVar component.  You may have seen this as a way to set up a simple structure like a file name (String), but you can put any Java object in the globalMap, including a List.


Creating a List Object and Saving it under "mylist"
I'm using a fully-qualified class name here.  There are a few components like tJava that will allow you to slip in an "import java.util.*" to shorten the syntax.

tJava_1

tJava_1 fills the list with the values A, B, and C.  If you'd like to see a more interesting post that fills up a List, visit this post.


Adding Items to a List
tLoop

There are several ways to iterate over a List in Java: an Iterator, enhanced for loop.  I'm using a counter that is increased with each iteration.  Be sure to use -1 on the size() for the end of the loop.  Java Collections are zero-based.


Defining a Loop in Talend Open Studio
tJava_2

This is the part of the program that actually does something.  In a real job, this would be the starting point for processing which could include loading a file, calling a web service, adding a message to a queue, or writing a record to a database.  My example simply "printlns".  Note the assignment at the top of the screenshot to save the typing and retrieving a variable from the globalMap.


This simple job is best written by using the off-the-shelf Talend components by running a tForEach into a tLogRow.  However, this example's purpose is to show you how to manipulate a Java Collection.  While it's not too useful to print out A, B, and C, I've integrated third party libraries in Talend , and if you had one that returned a Java Collection like a List, you may need to do something like this.


For another look at data structures, be sure to check out these posts


7 comments:

  1. Is there a screenshot missing for tJava_2 settings?

    Otherwise - great information as always, Carl!

    ReplyDelete
    Replies
    1. Hi,

      I uploaded the missing image. It's important because of the casting used, especially if you're not familiar with Java.

      Delete
  2. I was wondering how to loop true a List when the list is part of a talend schema. I am using tSolrInput and it has a multivalue field. I would like to loop thru the list of elements in the talend schema.

    ReplyDelete
    Replies
    1. Hi Martin,

      Try using a tMap to convert the java.util.List into a comma-separated java.lang.String. With the source schema, define an identical schema with an additional String field for the converted list. Use this Java code in the target cell mapping for the new field where "row1.mylist" is the java.util.List.

      (row1.mylist!=null)?row1.mylist.toString().substring(1, row1.mylist.toString().length()-1).replace(" ", ""):null

      Then, continue processing the record using the techniques in this blog post

      http://bekwam.blogspot.com/2013/03/creating-multi-valued-attribute-in.html

      Delete
  3. Thanks Carl,

    I solve my problem by using a tJavaRow and here is the code

    for (Object tempAuthfull : input_row.authfull){
    output_row.eid = input_row.eid;
    output_row.hubeid = input_row.hubeid;
    output_row.country = input_row.country;
    output_row.lang = input_row.lang;
    output_row.countrylang = input_row.countrylang;
    output_row.authfull = tempAuthfull;

    }


    http://screencast.com/t/dVB5gTCk8rB

    ReplyDelete
    Replies
    1. Actually the code above did not work out... It only kept the last element on the List...

      StringBuilder sb = new StringBuilder();
      String hubeid="";
      String country="";
      String lang="";
      String countryLang="";
      String contentType="";
      //Loop thru each author in the list
      for(Object author: input_row.authfull){
      //We want to make sure the author name is valid before generating a row for this author
      if(Normalize.isValidAuthorName(author.toString())){

      //The next for loops are used to convert a list of 1 item to a string
      //and clean up the string because tsolr adds [[ and ]] to the element.
      for(Object tempCountry : input_row.country){
      country = Normalize.cleanCountryAndLang(tempCountry.toString());
      }

      for(Object tempLang : input_row.lang){
      lang = Normalize.cleanCountryAndLang(tempLang.toString());
      }

      for(Object tempCountryLang : input_row.countrylang){
      countryLang = Normalize.cleanCountryAndLang(tempCountryLang.toString());
      }

      contentType = Normalize.removeSquareBraquets(input_row.contenttype);

      hubeid = input_row.hubeid;

      // The following piece of code takes all the input columns i.e. (author,hubeid,country,lang...)
      // and merge them into a long column named denormalizedRows.
      // The new column will look something like :
      // The @@@ will be use later in the job to extract column
      // The ### will be use to generate rows later in the job
      // HONJIMMYKF@@@Hon, Jimmy K.F.@@@1-s2.0-S0299221308X00070@@@FR@@@eng@@@FR_eng@@@JL###CHOWANDRE@@@Chow, Andre@@@1-s2.0-S0299221308X00070@@@FR@@@eng@@@FR_eng@@@JL###
      //
      sb.append(Normalize.normalizeAuthor(author.toString()));
      sb.append("@@@");
      sb.append(author.toString());
      sb.append("@@@");
      sb.append(hubeid);
      sb.append("@@@");
      sb.append(country);
      sb.append("@@@");
      sb.append(lang);
      sb.append("@@@");
      sb.append(countryLang);
      sb.append("@@@");
      sb.append(contentType);
      sb.append("###");
      }
      }



      Here is a list of screenshot of the job





      Delete