Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Tuesday, February 22, 2011

Cleaning Data with tReplace in Talend Open Studio

If you need to clean data -- like mapping a source system-specific value to a standardized one -- use the tReplace component in Talend Open Studio.

The tReplace component is used to perform a string substitution on a column.  Take the following spreadsheet with a column 'SYSTEM_FUND_SOURCE_NM'.

Excel Spreadsheet with Source Name Column

The source system-specific value 'Frederick County Government' will be mapped to a standard value 'Local Government'.

Start with the following job that contains a tExcelInput, tReplace, and tLogRow.

A Talend Job with a tReplace

To configure the tReplace, first edit the schema so that the input is routed directly to the output.  Select the component and press the Edit Schema button.

Schema Mapping Input to Output
Simple Replace

The Simple Replace will take a string and swap in a replacement value.

Basic Replace in Talend Open Studio
This will swap "Frederick County Government" for "Local Government" for all records with the value in the SYSTEM_FUND_SOURCE_NM column.

Advanced Replace

The Simple Replace is fine for hardcoded strings as in the previous example.  However, the strings may need to be generalized to handle additional cases.  Instead of simply replacing 'Frederick County Government' with 'Local Government', the Advanced Replace will allow a '*County Government' to match the string.

Advanced Configuration of Talend tReplace
The result of running both configuration is the following from tLogRow.

Result of Running a Replacement

tReplace is used to swap out one value for another based on text patterns.  The Simple Replace works for literal strings.  There's supposedly a shorthand glob feature ('*' wildcard), but that doesn't seem to be working on my version of Open Studio.  For more flexibility used the Advanced option.

2 comments:

  1. Hi,

    I have the following table

    ITEM,PRICE
    pen,20
    calculater,50
    cup,10

    no i wanna update price of cup from 10 to 30 using tReplace.

    please help

    ReplyDelete
    Replies
    1. Hi,

      Have you tried a SQL update like "UPDATE item SET price = 30 WHERE item_name = 'cup'?

      Delete