Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, May 17, 2012

Manipulating a tHashOutput in Talend Open Studio

The Talend Open Studio tHashOutput and tHashInput allow you to save your input in RAM, offering potential performance gains.  The basic usage defines a single tHashOutput which gathers input and a tHashInput which will direct the input to a data flow.  This post describes two expanded configurations.

tHashOutput and tHashInput worked with input stored in internal memory and do so in a way consistent with other Talend components.  The Hash components allow you to define flows to retrieve data throughout a map that has been stored by some other part of the job.  In a simple scenario, this is done with a single input/output pair.

Multiple Sources

This screenshot shows a job that will merge two data sources -- a tRowGenerator and a tFileInputDelimited -- into a single Hash data structure using two tHashOutputs.  The first tHashOutput will be referenced by subsequent tHashOutputs in the "Link with a tHashOutput" control.


Configuration of Linked tHashOutput
This tHashOutput refers to the first component.
tHashOutput Referring to Prior Component
The combined data sets are available through the tHashInput.  It doesn't matter which of the two components are selected in the Component List select since they are linked.

tHashInput Configuration
Neither the Data Write Model, Keys Management, or Append settings will have any effect in this job.  Data Write Model has only one value in its select.  I think Keys Management has a bug in version 5 (see TDI-21180).  Append only takes effect in an iteration.

Clearing When Iterating

This job iterates over a data set, clearing the backing RAM structure defined in the tHashOutput with each iteration.  This is done by unchecking Append.  If Append were not unchecked, each iteration would produce more and more output as the preceding iteration's tHashOutput gathers more values.

Clearing After Each Iteration
The results of clearing the tHashOutput follow.

Results with Append Unchecked

If Append is checked, the output is repeated as it accrues through the iterations.

Iterating in Append Mode
The tHashOutput and tHashInput components can provide your Talend Open Studio job with a performance gain by saving input in RAM.  tHashOutput can be used to gather input from different sources using the Linked feature.  Append mode will work only in iterations and provides control over when a tHashOutput is cleared.

3 comments:

  1. Why you have provided the tfixedflowinput during the iteration in foreach component??

    ReplyDelete
    Replies
    1. This is for demonstration purposes to consolidate the display. The tFixedFlowInput is meant to simulate an input component like a tOracleInput. The tForEach shows repeated invocations of an input component subjob. The same demonstration could be rewritten without the tForEach using multiple input (tFixedFlowInput) components in multiple subjobs.

      Delete
  2. Does tHBASE components be only used for Big Data Batch jobs?

    ReplyDelete