Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Tuesday, May 3, 2011

Running Count in Talend Open Studio

Most Talend components keep a count of the records processed using variables like NB_LINE or NB_LINE_OK.  But these are only available after all processing is completed.  Define your own counter variable to keep a running count for use in a tMap.

Variables like tFilterRow.NB_LINE or tAccessOutput.NB_LINE_INSERTED can be used to report the number of affected lines after a subjob's processing.  However, it may be of use to get the current line index for use in a tMap.  The index variables used to form NB_LINE aren't available during processing; they're only written out the globalMap at the end of processing.

In this example, staging records are loaded from Excel to Access.  The order in which the Excel records are read is preserved in a database column called DISPLAY_SEQ_NB.  Note that there is an auto-increment column used for record ID in the Access table.  This could be used to infer a loading order, but this job uses a separate column to keep the ID as a meaningless surrogate key to help with maintenance later.  (I can swap in a record at the same DISPLAY_SEQ_NB without having to work against the auto-incrementing mechanism.)

Talend Staging Job Using a Counter
Step 1: Define the Counter Variable

To define the counter variable, use a tSetGlobalVar.  Define a global with an initial value.  In this case, the job uses an unquoted 0 to set it as an Integer which will support an increment later.

tSetGlobalVar

Step2: Use the Variable

Use the variable in a tMap.  Retrieve the value using the globalMap and cast to the Integer type.

tMap Using budgetFileCounter Variable

Step 3: Increment the Counter

Use a tJavaRow to increment the counter.  First, use the "Generate Code" feature to pass the input fields directly to the output.  Next, add a line of Java code that unpacks the variable stored in the globalMap into a Java primitive type that can be manipulated.

Incrementing budgetFileCounter Variable
 Most component report the outcome of their processing using CID-named global variables like 'tFilterRow.NB_LINES_OK'.  However, these variables are only available after the processing has been completed.  If you want to keep a running count, set your own variable.

14 comments:

  1. An alternative is to use 'sequence' in the tMap expression, e.g. Numeric.sequence("linenumber",1,1)
    This sets up the sequence to start at 1, and increment by 1.

    ReplyDelete
    Replies
    1. Thanks. Use this alternative (Numeric.sequence()) since it's standard. I'll keep the post up as an example of using Java classes in Talend.

      Delete
  2. Hi Guys,
    First thanks for your post.
    I tried to apply this method to my talend job (parsing CSV files to XML) and unfortunately, it doesn't work.

    Something is wrong and I can't identify what. My global variable is always at 0.

    My first step is a SetGlobalVar : "Returns" at 0
    In the TMAP output I put this : (Integer)globalMap.get("NbReturns")

    This is the code of my TJavaRow :
    After the generic code :
    Integer i = (Integer)globalMap.get("NbReturns");
    globalMap.put("NbReturns", new Integer(i.intValue() + 1));

    And maybe important thing, all my links are Main

    Thanks in advance for you help.

    BR

    ReplyDelete
    Replies
    1. One more stuff. If I treat 2 files, the first has the value at 0 and the second at more than 0 but it doesn't correspond to the result than I would

      Delete
    2. Can you verify that the tJavaRow is called for each line of input? Maybe there is a component in the middle that is holding back some part of the main flow? Try a "System.out.println("here");" in the tJavaRow and make sure that there are 2 "here's", one for each file.

      Delete
    3. Hi Carl,

      Thanks for your answer.

      I've a here each time I've a line. This is right. But my variable is still at 0

      Unfortunately I can't post a pic of my flow, I'll try to explain.

      1) tSetGlobalVar (with the var I'm using to count the lines)
      2) tFileList
      3) tFileInputDelimited
      4) tMap
      5) tJavaRow
      6) tAdvancedFileOutputXML

      Between the 3 and the 4 I've a maik link
      Between the 4 and the 5 I've a main link
      Between the 5 and the 6 I've a main link

      I hope it's clear.

      Dario

      Delete
    4. One more information.
      I see that the first file is at 0 and the second contains the number of lines of the first file and the third the total of the 1 and 2 and go on

      Delete
    5. Hi,

      If you're still working on this, can you try flipping the order of 2 and 1? You'll want to reset the number of lines with each new file in the file list.

      Delete
  3. I have to create a new talend job that add a new column in target table. I have to implement auto-incremental logic in the job ...

    So that I have to get the maximum id plus one every time I insert a value in the table for each time running the job ...

    For example , If i run the job first time i have 100 records loaded to target table with max id (key is 100) - (so 100 records inserted)

    Next time if i run the job for next set of data's for inserting 10 new or updating 10 old data, my key should be start as 101 to 110 for insert and same key shud be updated again in case of update job... Not again from 1 to 100 ... How to achieve in talend ... can u you suggest me ?

    in tmap component can be achieved or which component to be used ? and also whether to use context / variable / expression ?? give me steps to achieve

    ReplyDelete
    Replies
    1. I always rely on the database for providing unique ids, whether it's the identity columns in SQL Server or an external sequence in Oracle. These vendor-specific features are the only way to fend off concurrency problems. However, I'll provide a suggestion to get you up-and-running.

      This recommendation assumes that your Talend job will be the only process updating the table while it's running. If that's an ok assumption, then get the MAX value + 1 at the start of the job, increment the value as you process, and use the value in the insert statement.

      1. At the start of the job, use a tSetGlobalVar component to define a global variable "idCounter" and set it to "new Integer(1)".
      2. Add a DB input component based on the query "SELECT MAX(val)+1 FROM tbl".
      3. Add a tJavaRow and connect it to the input component from Step 2.
      4. In the tJavaRow, assign the incoming value to idCounter. "globalMap.put("idCounter", input_row.val);"
      5. In the main data flow, tMap the value from globalMap.get("idCounter") to your output schema (which is headed to a DB output component).
      6. After the tMap, use a tJavaRow to increment idCounter.

      If there are no records in the table initially, Step 2 won't run, and idCounter will be set to 1 as defined in Step 1.

      I reserve my context variables for processing context items like configuration that define the execution environment.

      Delete
  4. how to reads the records(row count,deleted,updted,job name,project name) from multiple components in a job dynamically by using global varibles

    ReplyDelete
  5. Thanks to this post and the mention of "Numeric.sequence("linenumber",1,1)" in the comments I was able to solve my problem, thanks a lot!

    ReplyDelete
  6. hi guys! I am new to talend and I would like to ask for your help on my problem. please see https://www.talendforge.org/forum/viewtopic.php?pid=189405#p189405

    Appreciate any help. thank you!

    ReplyDelete
  7. Tks for your solution, worked fine for me.

    ReplyDelete