JavaFX Tutorials

Sunday, January 1, 2012

Top Ten Timesaving Talend Tips of Two Thousand Eleven: Part 2

The second part of my 10 Talend Open Studio for Data Integration time-saving recommendations from 2011.

#5 Re-use connections

Each RDBMS has a connection component: tOracleConnection, tMSSqlConnection.  Add a connection to your component and reference the component with the "Use an existing connection" option in other DB components like tMSSqlOutput or tOracleInput.  This centralizes the connections configuration which includes items like username/password, auto-commit settings, and JDBC properties.

A Job with a tMSSqlConnection Component
#4 Properties files

When you're managing different environments, particularly a production environment, a text-based properties file is a convenient way to configure your jobs.  The properties file can be versioned, is easily readable, and supports files differences with Linux commands like "diff".


This is a video on using properties files in Talend Open Studio.

#3 Context groups

The standard way to parameterize a set of Talend Open Studio jobs is through Context Groups.  These are sets of global variables grouped by an environment (dev, test, prod) which can be toggled through an export or via the Run View.


Run View Referencing Several Contexts
This post describes using Contexts and exporting them in more detail: Applying Contexts to Talend Open Studio Jobs.

#2 Use queries

While Talend Open Studio will generate queries based on a table for your input components like tOracleInput, you can save your own queries and reference them throughout your jobs.  This has two advantages.  The first is to allow for queries that span multiple tables and that exceed the query-generation capability of Talend Open Studio (think Oracle set-based operations).  The second is to produce a more robust job by leaving off irrelevant queries that may be removed later.


For example, if a lookup involves only a name and an id field, there's no need to add other fields that may be dropped before the job goes to production.  If a column is dropped and it's not relevant to the query, it shouldn't break the job.

A Talend Open Studio Query
#1 Durable schemas

Schemas should be based on the Repository rather than Built-in where ever possible.  In some cases, components like tMSSqlOutput can be adjusted to ignore columns for a write operation using the Advanced tab.  That way, a complete set of columns can still be referenced in a Repository schema, but there won't be any contention over auto-generated fields.


This tip also works with #2 to support more robust jobs.  If a subset of fields is used repeatedly -- say an id/name pair -- define it as a Generic Schema or other Schema and store it in the repository.  That way, the field list never becomes out of sync with the database (as long as the lookup fields are still valid).

Best of luck to all the Talend coders in 2012.

3 comments:

  1. Hi, I have a piece of java code in tJava node to generate dynamic datetime variable, '201311' like this,

    int thisyearmonth = Integer.parseInt(TalendDate.formatDate("yyyyMM", TalendDate.getCurrentDate())); context.thisyearmonth = Integer.toString(thisyearmonth);

    Now I would like to apply this dynamic variable to all my job. In Talend manual, I have created Contexts group for this variable 'thisyearmonth'. However, where I could put my java code for its value? Is there a way to create this variable once and do not have to set value every time ? Thank you !

    ReplyDelete
    Replies
    1. If you are using Talend Enterprise edition, then Create a joblet with above code and use that joblet all your jobs.

      Thanks,
      suresh
      www.msureshreddy.blogspot.com

      Delete
  2. Joblet for that is useless, use tsetglobalvar component instead. regards

    ReplyDelete