Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Tuesday, May 24, 2011

Passing Parameters and Variables to Child Jobs in Talend Open Studio

Jobs written in Talend Open Studio communicate with child jobs using Context Variables.  To return data from a child job using a Context Variable, define an Object variable that is a java.util.Map.

Off-the-shelf, Talend Open Studio's Context Variable facility supports passing parameters from a parent job into a child job.  However, because of Java's language semantics, the reverse (return values out of the child job) needs extra code.

Something like 'context.MYVAR="newvalue"' works within a job, but if done in a child job, the setting isn't retained.  globalMap doesn't help.  Every job has its own globalMap, including child jobs.

To overcome this, use a Map which gets around the Java language limitation.  You can't set a variable in a child job, but you can manipulate an Object.

Consider the following job.
Job Creating Subjob Context Used in Child Job
The Job uses three components: 2 tJavas and a tRunJob.  The first tJava creates a java.util.Map object and sets it to a context variable defined as an Object.

Context Variable Defined as Object
Set Child Job Context

In the first tMap, "Set Subjob Param", add the following Java code to the Basic setting.  You'll also need to import java.util.Map and java.util.HashMap in the Advanced tab.

context.SUBJOB_CONTEXT = new HashMap<String, String>();

((Map<String, String>)context.SUBJOB_CONTEXT).put("param", "value1");


This will create a Map object that can be accessed throughout the job and its child jobs.  This code block sets a single entry "param" which will be an input parameter for the child job.  Note that only "param" was set here; other components and child jobs can add to this.

tRunJob

The tRunJob component is configured to "Transmit whole context".  It must NOT use an independent process for this technique to work.  I believe that all separate process params must be strings.  (Look for a future post on this.)

tRunJob Config


The Child Job

The Child Job is defined as a single tJava.  It is configured with an identical Context View as the parent.  See the "Context Variable Defined as Object" screenshot.

Child Job

The tJava component prints the input parameter, unpacks the context.SUBJOB_CONTEXT map, and puts its own return value on it.  Remember to also import java.util.Map on the Advanced tab.

// print input param

System.out.println("param=" + ((Map<String, String>)context.SUBJOB_CONTEXT).get("param"));

// set ret val
((Map<String, String>)context.SUBJOB_CONTEXT).put("retVal", "Ok");

Using an indirect approach, you can create a context for your child jobs.  This introduces a global scope for all your jobs so watch for side effects.


 

22 comments:

  1. hi, i found this post very useful.However, how do you do this backwards, i mean, tranfer a context variable from a child job to a parent job, or between jobs at the same level?

    ReplyDelete
  2. Hi Aiurea,

    To organize your jobs' parameters, I recommend a ContextGroup paired with a properties file for a group of related jobs, child or same level). Parameterize everything in all your jobs, and add these to the ContextGroup. You may need to develop a naming schema where job-specific parameters are prepended with a qualifying string.

    This central registration of parameters will help to enforce consistency. For example, there's no reason to re-define LOG_DIR if there's only one /logs directory used.

    For any of the parameters that will change during deployment, also include a properties file with overridden values. Search "applying contexts" in this blog for a post on the components needed for the properties file. Such a properties file could be large, but it's easily managed and manipulated in a text editor.

    You can also apply several ContextGroups to a class of jobs. To do this, drag each ContextGroup to the job's Context tab. I haven't needed this in practice.

    ReplyDelete
  3. Hi, Useful article.

    There are a couple of things not quite right that trips up the cut-and-past novice for a while.

    ((Map)context.SUBJOB_CONTEXT).put("retVal", "Ok");

    this should read ((HashMap...

    In the first tMap, "Set Subjob Param"

    this should read tJava

    Helpful site. Thanks.

    ReplyDelete
    Replies
    1. Hi,

      Map is an interface and HashMap is a Map, so the assignment statement is correct. Be sure to import java.util.Map in the tJava. The benefit of this is that the Map can be swapped out in the context screen without affecting the subjob code.

      Delete
    2. Yes. Thanks!
      I missed the all important import.

      Delete
  4. hi related to above concept i have a question : that is i have a single job which reads file from folder and process and stores in db. there are 10 folders can i make the single job run simultaneously 10 times for 10 different folders

    ReplyDelete
  5. hi how do we import the java.util.Map?

    ReplyDelete
    Replies
    1. There are two ways to use the Map class

      1. Fully qualified class name. When referring to Map or HashMap, use the fully qualified names java.util.Map and java.util.HashMap.
      2. Add the imports to the Import box on the Advanced Settings tab in certain components

      Imports can make an explicit reference to a class like

      import java.util.Map;
      import java.util.HashMap;

      Or wildcarded

      import java.util.*; // imports everything in util incl. Map and HashMap

      Delete
  6. Hi,
    I have a problem - when I pass the Map, toString is done on it while calling the subjob and I don't get it back as Map in the called job but as String - and I get ClassCastException.
    A quick help would be mighty appreciated please.
    -BK

    ReplyDelete
  7. Hi, thank you for post, I find it very useful.
    I am working a lot with CSV files. I am interested in setting the context variable to some value found in CSV file fields. How can I approach this connection between tJava and CSV files ?
    Matko

    ReplyDelete
    Replies
    1. It sounds like you may want to set a global variable rather than a context variable. I use context variables to define the execution environment and configuration of a job (database connect info, root web service url, log file location, etc) and global variables to record information gathered during processing.

      Take a look at this post to pull a value from your input flow (a CSV) to be set using a tSetGlobalVar. Once your values are global variables, they can be retrieved in any tJava. In the post, substitute the tFixedFlowInput for your input component.

      http://bekwam.blogspot.com/2014/05/use-tsetglobalvar-to-record-count.html

      Delete
  8. Hi Carl,

    I am new to Talend am hoping there is a solution to my problem.

    I have a strange problem and am thinking kind of related to this issue. I use tContextLoad to update the context variables in a job at the beginning and at the end of job I update couple of context variables using tJavaRow before using the tContxtDump to output the context variables back into a file. to set the context varibles in tJavaRow I use the following code.

    context.Tdate = input_row.Tdate;
    context.Name = input_row.Name;

    In order to confirm it has changed I have the following code

    System.out.println(context.Tdate);
    System.out.println(context.Name);

    When I run the code the log shows both context variables changed to the new values. But the output of tContextDump (which is set to execute only after tJavaRow component is ok) still shows the old values.

    What am I doing wrong. Thanks in advance for your help.

    Regards,

    Shivaram

    ReplyDelete
    Replies
    1. Hi Srivaram,

      I looked at the source of tContextDump this morning. I don't know why this won't work from a Java programming perspective. I suspect the problem may be in slipping the tContextDump (a row producer) in with the tJavaRows and that the code generation is not producing the loops that you're expecting.

      I'd recommend not mixing the job execution context and parameterization (tContextLoad) with your data flows (Name, Tdate), and instead using global variables. Use the tContextDump as-is and append the global variables to the same output component.

      Take a look at this post too. It's a general warning for global variable use. We expect our flows to be loops run in sequence. However, if there's a component in the middle that uses Java Threads, this can break.

      http://bekwam.blogspot.com/2015/01/replacing-twritejsonfield-with-routine.html

      Good luck

      Delete
  9. When i'm using "Use an independent process to run subjob", how can i set the JVM-Parameters like -Xmx to the new process? In Talend Studio i can add this parameter under "preferences->Talend->Run/Debug" as an argument, but when i deploy the job on a jobserver this settings will not be used. Anyway, i dont want to use the same settings for all subjob processes. I would like to define them individually for each subjob. Is this possible?

    ReplyDelete
    Replies
    1. I don't think so.

      I took a look at some generated .javajet code from a Talend job using a tRunJob and the source on GitHub. I didn't see where you could add anything in TOS that would find its way into a subjob java.exe process' JVM settings.

      You can affect the memory settings used in a tRunJob subjob through the Window > Preferences > Talend > Run / Debug tab. These values will be pulled into the generated code. Verify the settings by checking "Print Parameters" on the Advanced tab of the tRunJob component view when you run the job.

      So, increase the overall RAM available to the job and tune the size of the subjobs with Preferences.

      Good luck.

      Delete
  10. Hi,

    I would like to run independently a subjob for testing purpose. This job take as context parameter an Object. How can I pass the Object ? In the job's context tab I set the type to Object and I type something like "(MyObject) new MyObject()" but Talend interpreded that as a String and not as an Object.

    How am I suppose to handle that ?

    Thanks for the answer.

    ReplyDelete
    Replies
    1. Try normal context_param passing of simple types rather than the Map. The premise of this post is to get a result out of the subjob and back to the parent. If there's a number of the subjob parameters that are read-only input parameters, factor those out so they can be handled consistently between the tRunJob and the standalone modes.

      For the result that you want to communicate with the caller...Provide a default empty map in the Contexts tab that can be overridden by the caller. If you need to initialize the empty map with some test values, use Java code at the start of the job.

      Good luck

      Delete
  11. hi
    I used javarow to get the array of objects and made that arrray as global.When I called that array using tjava I was not able to to get the all rowa instead I can get only the last row.

    The code that I used is given below..
    org.json.JSONArray a=new org.json.JSONArray();

    JSONObject Line_attribute_object = new JSONObject();

    JSONObject pod_object = new JSONObject();

    pod_object.put("RecordNumber",pod.RecordNumber);

    pod_object.put("Company",pod.Company);

    pod_object.put("InboundOrder",pod.InboundOrder);

    pod_object.put("RecordType",pod.RecordType);

    pod_object.put("LineNumber",pod.LineNumber);

    pod_object.put("SubLine",pod.SubLine);

    pod_object.put("UPC", pod.UPC);

    pod_object.put("ExpectedQty",pod.ExpectedQty);

    pod_object.put("ETADate",pod.ETADate);

    pod_object.put("CountryofOrigin",pod.CountryofOrigin);

    pod_object.put("StyleDivision",pod.StyleDivision);

    pod_object.put("Lot", pod.Lot);

    a.put(pod_object);

    //System.out.println(a);

    globalMap.put("val",a);

    In tjava I called it buy using..

    org.json.JSONArray js = (org.json.JSONArray)globalMap.get("val");

    System.out.println(js);

    The output that i want is 54 objects inside a single array...
    can u help me.

    Thank you.

    ReplyDelete
  12. what if one of the contexts is of type Date

    ReplyDelete
  13. Thank you! It really helped me.

    ReplyDelete
  14. Hi I m new to talend .can u please help me on how to overrite the context varible in child job if I m using transit whole context option

    ReplyDelete