Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, January 10, 2011

[TUTORIAL] Turning a Column into Rows with Talend Open Studio

Use the tNormalize column to break a multi-valued attribute stored in an RDMBS column into individual rows.  This is great for a second pass on an input file to fill up a join.
This tutorial shows how to take a record containing a multi-valued attribute (like "SWI,CON,TEN") and write out a record for each value (a record for "SWI", a record for "CON", a record for "TEN").

Talend Open Studio has a great component for this called tNormalize.



6 comments:

  1. This is brilliant, thank you for sharing.

    ReplyDelete
  2. That's great, thanks.
    How would you do this in reverse? i.e. Taking multiple rows in a flat-file, merging to one row with a list of values seperated by a comma in the cell of the final column?

    ReplyDelete
    Replies
    1. Hi,

      Take a look at this post: http://bekwam.blogspot.com/2013/03/creating-multi-valued-attribute-in.html.

      Delete
  3. Thanks for sharing this tutorial !.

    ReplyDelete
  4. Interesting that I landed here, but my purpose is to apply this to different columns. E.g. I have an input file, it has 12 columns (one for each month) and it contains data for those months. I would like to normalize that to have instead 1 row with a "month" column & the associated data next to it.... Any similar module to tNormalize that acts upon multiple columns? Or do I need to build it up with tMap?

    ReplyDelete
    Replies
    1. You could just make one super-column out of your twelve and feed it through the components described in this post.

      Delete