Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Friday, January 14, 2011

Text File to JSON with Talend Open Studio

If you have a text file, Excel file, or RDBMS table, it's easy to generate a JSON file with Talend Open Studio.  This JSON file can be put under a web server and accessed by JavaScript.
JSON is a formatted string used to exchange data.  Since it's text-based, it can be used by most languages and libraries like Dojo, jQuery, and PHP.  In many applications, server-side code like a PHP page will run a query and return a JSON string (rather than HTML) for use by a JavaScript library.  For example, a query of a list of services containing codes and descriptions will be used in a select-type UI control.

The format of JSON is simple. Like XML, data is mixed with additional information describing the data.  In a simple case, this JSON returns a two-item list.

{Code: SWI, Description: "Swimming Pool"},
{Code: PET, Description: "Pets"}

The various brackets and braces are easy to form, but if you have an input source like a text file or spreadsheet containing thousands of entries, it can be time consuming to covert this.

This is where Talend Open Studio can help.

Using a simple two-stage job, you can feed a text file into a special Talend component, tFileOutputJSON.  The text file contains two columns of data: Service_Code and Service_Description.

Service Code,Service Description
PET,Pets Allowed
SWI,Swimming Pool
TEN,Tennis Court
DRY,Dry Cleaning
INT,Internet Access
WIF,WIFI Internet Access
FIT,Fitness Room

Talend Excel to JSON with Excel File Properties Showing
Here's the same job with showing the Component details of the tFileOutputJSON component.

Talend Excel to JSON with JSON File Properties Showing

And here's the JSON output

{"Service_Description":"Pets Allowed","Service_Code":"PET"},
{"Service_Description":"Swimming Pool","Service_Code":"SWI"},
{"Service_Description":"Tennis Court","Service_Code":"TEN"},
{"Service_Description":"Dry Cleaning","Service_Code":"DRY"},
{"Service_Description":"Internet Access","Service_Code":"INT"},
{"Service_Description":"WIFI Internet Access","Service_Code":"WIF"},
{"Service_Description":"Fitness Room","Service_Code":"FIT"},

Although most apps can produce JSON using server-side code, consider publishing data sets that don't change frequently using Talend.  HTML files servered up by a web server like Apache is extremely fast.  The files can be replicated all over the Internet too.  If you're dealing with a lot of reference data, infrequently refreshed data sets, or other lists, a few well-placed Talend jobs can get your Javascript developers coding.


  1. Is there a way to create a metadata schema for the repository using a json formatted file for input?

  2. A "File JSON" item in the Repository > Metadata list sounds very useful. I haven't worked on extending the Repository, but will add some blogs posts in the future on the Repository.

    One strategy could be to define a "Generic schema" and build a JSON custom component around that. The schema will provide the list of fields and a custom component can interpret that list, possibly handling the nesting using a special field/path notation (for example, "contact/email"). This pushes the Mapping of today's tFileInputJSON to the metadata

    If you select an approach for your particular problem, please post an update. I'd like to hear about it.