If you need to process URLs with Talend Open Studio, a few well-placed components can break apart the URL parameters to be stored, converted, or filtered.
Given the following URL
/google.se/url?sa=t&rct=j&q=insights%20konsult&source=web&cd=11&ved=0CC4QFjAAOAo&url=http%3A%2F%2Fwww.inuseinsights.se%2Fom-inuse-insights%2Fpartners
You can break the string apart with 3 Talend Open Studio components that will result in a stream of name/value pairs. This screenshot shows the running of such a job.
 |
Name / Value Pairs Extracted from a URL |
Depending on requirements, additional columns can be carried through the processing to provide a business key (such as host or path). This screenshot shows the job.
 |
Job Parsing a URL - Two Extra Delimited Fields and a tNormalize |
Three components hack off various pieces of the URL. First, a tExtractDelimitedFields_1 separates the host/path from the QUERY_STRING using the "?" delimiter. Next, a tNormalize takes each name/value pair, forming a distinct row based on the "&" delimiter. Finally, the second tExtractDelimitedFields_2 separates the name from the value, based on "=".
The tFilterColumns component is used for presentation purposes, it removes the pre-processed "path" variable.
Here are the component configurations starting with the tFixedFlow component providing the test data.
 |
A tFixedFlowInput with a URL |
|
|
|
 |
tExtractDelimitedFields_1 |
 |
tNormalize | |
 |
tExtractDelimitedFields_2 |
While some custom Java can be thrown into a tJavaRow, this blog post presents a cleaner alternative. It's cleaner because it's based on the schema, rather than some Java code that could suffer a syntax error.
Thanks man .A nice post :)
ReplyDeleteIntelliMindz is a best IT Training in Bangalore with placement, offering 200 and more software courses with 100% Placement Assistance.
DeleteAzure Course in Bangalore
DevOps Course In Bangalore
Talend Training In Bangalore
MSBI Training In Bangalore
Ab Initio Training In Bangalore
Informatica Training In Bangalore
Informatica MDM Training In Bangalore
Informatica Data Quality Training In Bangalore
CCNA Course In Bangalore
Guidewire Training In Bangalore
Great Article android based projects
DeleteJava Training in Chennai Project Center in Chennai Java Training in Chennai projects for cse The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai
FYI Google's Guava library has functions you can use for url processing such as domain extraction.
ReplyDeletehttp://code.google.com/p/guava-libraries/wiki/GuavaExplained
Thanks Yash. To work with a third-party library like Guava, take a look at this post: http://bekwam.blogspot.com/2012/04/right-padding-string-with-talend-open.html.
DeleteThis comment has been removed by the author.
ReplyDeleteHi , i want to pass a variable in the URL, i am using context.message as a variable and values are capture under this variable . i am using component thhtprequest.
ReplyDelete"https://api.telegram.org/bot322480:AETfC4RyKIGcDTrsKua0daUKORg/sendmessage?chat_id=323109827&text=+context.message+"
but i am getting context.message printed , not the value for the context.message
Any help?
ReplyDeleteSplunk online training
r programming online course
react online course
Sap ariba online training
qlikview online training
etl testing online training
power bi online training
Sap ehs online training
Sap erp online training
Thanks for this blog keeep sharing your thoughts like this...
ReplyDeleteTalend Training in Chennai
Leadership Training in Chennai
Matlab Training in Chennai
Great Post!!! thanks for sharing this information with us.
ReplyDeleteSEO Benefits for small business
Why SEO is important for small business