Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Tuesday, July 10, 2012

Parsing a URL with Talend Open Studio

If you need to process URLs with Talend Open Studio, a few well-placed components can break apart the URL parameters to be stored, converted, or filtered.

Given the following URL


You can break the string apart with 3 Talend Open Studio components that will result in a stream of name/value pairs.  This screenshot shows the running of such a job.

Name / Value Pairs Extracted from a URL
Depending on requirements, additional columns can be carried through the processing to provide a business key (such as host or path).  This screenshot shows the job.

Job Parsing a URL - Two Extra Delimited Fields and a tNormalize
Three components hack off various pieces of the URL.  First, a tExtractDelimitedFields_1 separates the host/path from the QUERY_STRING using the "?" delimiter.  Next, a tNormalize takes each name/value pair, forming a distinct row based on the "&" delimiter.  Finally, the second tExtractDelimitedFields_2 separates the name from the value, based on "=".

The tFilterColumns component is used for presentation purposes, it removes the pre-processed "path" variable.

Here are the component configurations starting with the tFixedFlow component providing the test data.

A tFixedFlowInput with a URL

While some custom Java can be thrown into a tJavaRow, this blog post presents a cleaner alternative.  It's cleaner because it's based on the schema, rather than some Java code that could suffer a syntax error.


  1. FYI Google's Guava library has functions you can use for url processing such as domain extraction.


    1. Thanks Yash. To work with a third-party library like Guava, take a look at this post: http://bekwam.blogspot.com/2012/04/right-padding-string-with-talend-open.html.

  2. This comment has been removed by the author.

  3. Hi , i want to pass a variable in the URL, i am using context.message as a variable and values are capture under this variable . i am using component thhtprequest.


    but i am getting context.message printed , not the value for the context.message
    Any help?