Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Wednesday, February 2, 2011

A Quick Report with Talend Open Studio

If you don't have Jasper around, you can create a report or an extract with extra headers or text using the a tForeach and a tIterateToFlow.


There's a Jasper report component available, but if you don't have Jasper installed, you can create a report using a basic input/tMap/output subjob preceded by a header subjob or followed by a footer subjob.  The header and footer subjobs can use a set of values in a tForeach to produce a line of text for insertion into the report.

Talend Open Studio - A Report
 tForeach uses a set of values defined in the component to write out lines to a text file, the tFileOutputDelimited_2.  A tIterateToFlow component is needed to "convert" the tForeach iteration to a data flow.  The tIterateToFlow component has a single field that it outputs which is the value from the tForeach.

The key in this map is to have each subjob writing to the same output report file in sequence.  The header subject uses tFileOutputDelimited_1 and a simple one-field schema to write to a file called 'report.csv'.  The main subjob has "Append" checked and writes to the same report.csv file.

Set the text values in the tForeach component.

Talend Open Studio - tForeach Config

Create a single-field schema for a "File delimited".  In the tIterateToFlow component, map the value of a global variable to the single field.  The global variable is the current value from the tForeach component.
Talend Open Studio - tIterateToFlow Config
 The tFileOutputDelimited_2 component uses a single-field schema to receive the global variable.

This is a screenshot of the schema for tFileOutputDelimited_2.  It was created by pressing the "Edit schema" button on the tFileOutputDelimited_2 component tab.

Talend Open Studio - A One-Field Schema for Report Headers

The remainder of the job uses a tMap to write an Excel file to tFileOutputDelimited_1.  tFileOutputDelimited_1 does not use the single-field schema, though it points to the same actual file as tFileOutputDelimited_1.  It uses one that resembles the Excel file.   Remember to specify Append for tFileOutputDelimited_1 or the header will be overwritten.

This is the schema for tFileOutputDelimited_1.  It was created by linking up a previously-created metadata item using the "Repository" setting.

Talend Open Studio - Schema for Report Data

The tForeach component lets you define a list of items which can then be written out to a text file.  The tForeach can be used before and after to produce a header or a footer.  For more advanced reporting, especially if writing HTML, use a reporting engine like Jasper.

4 comments:

  1. A user on the Talend forum reported a problem with this approach. The column headers weren't showing up. Here's a link to the posts: http://www.talendforge.org/forum/viewtopic.php?id=14137.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. tRunJob_1 (u indicate the first job to execute (jow with only 1 flow))
    |
    onsubjobok
    |
    tRunJob-2
    |
    onsubjobok
    |
    tRunJob-3


    is this the only way ??
    somehow cant the onsubjobok step be avoided ??

    ReplyDelete
  4. Hi Jugal,

    Talend supports multiple transformations by linking together multiple tMap components in a processing chain. Each tMap component maps source fields to target fields, adds in expressions, and joins other data sources (lookup tables).

    Whether or not you use onsubjobok has to do with the processing loops in the Talend job. Roughly speaking, there will be one subjob for each loop. In this job, there is a loop iterating over files and a loop processing Excel records.

    The exception to this one subjob / loop rule is when the tIterateToFlow or tFlowToIterate adapters are used. I could re-write the job to add in these adapters, but there would be additional components and the job's algorithm wouldn't be any clearer.

    ReplyDelete