Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Tuesday, June 28, 2011

Handling an Empty JSON Object in Talend Open Studio

To input JSON into a flow using Talend Open Studio, use the tFileInputJSON component.  If the JSON input may be empty, use a guard condition that examines the structure beforehand.

The tFileInputJSON component takes a JSON structure as input and builds a schema based on JSON paths.  For example,

{ "attribs": [
    {
      "name": "req",
      "value": "1"
  } ]
}

Defines an array "attribs" with records containing fields "name" and "value".  This is mapped in a tFileInputJSON using the following schema and JSON paths (from the tFileInputJSON Component View)

Column        JSONPath query
---------------------------
name       "$.attribs[*].name"
value       "$.attribs[*].value"

"$" refers to the root of the JSON object.  ".", immediately following the $, is the current object.  "attribs" is an array element.  The wildcard index "*" means select all attribs.  "name" and "value" are fields.

This structure relies on the convention that there is a balanced number of names and values.  Although the JSON paths are similar, they aren't correlated.  Pull out a "value" in the middle of the list and the remaining values will shuffle up to different names.

Empty Input

Another  case one might encounter is that of empty input.  For example,

{
"attribs": [ ]
}

In Talend Open Studio, the current tFileInputJSON component will throw an error as it attempts to map the non-existent name and value fields.  In this case, use some programming logic to filter the empty input from the tFileInputJSON component.  This can be done with a second tFileInputJSON component.

Guard Condition

This job performs a beforehand check on the input prior to mapping the name and the value fields. See  "UPDATE: Reduced Number of Components" at end of post for an alternate implementation.

Job Checking JSON Input Prior to Processing

Both tFileInputJSON components operate on the same file.  However, the first JSON component maps the JSON array "attribs" rather than the individual fields "name" and "value".  From tFileInputJSON_1's Component View

Column   JSONPath query
------------------------------
attribs      "$.attribs"

Determining Empty Object

A tJavaRow applies the logic that determines whether or not the input is empty.  This code uses a regular expression to look for an empty (bracketed whitespce) array.

if( input_row.attribs == null ||
    input_row.attribs.matches("^\\[\\s*\\]$") ) {
  globalMap.put("EMPTY_FILE_FLAG", new Boolean(true));
}
else {
  globalMap.put("EMPTY_FILE_FLAG", new Boolean(false));
}

Applying the Filter

The tFixedFlowInput, tFilterRow, and tFlowToIterate components will invoke the tFileInputJSON_3 component that actually maps the name and value fields and would continue with additional processing.  The tFileInputJSON will only be invoked if the EMPTY_FILE_FLAG is set to false.  The construct is verbose because of the need for iterate and flow adapter components.

A tFilterFlow converts the global variable set in the guard condition subjob to a flow.  This enables the tFilterRow component to be used.

Retrieving a Flag in tFilterFlow
The tFilterRow component applies a simple boolean check on the input field.

tFilterRow Checking a Flag
 If the check passes, processing continues with a second pass taken on the input JSON file.

In today's version, the tFileInputJSON requires a well-rounded data structure.  Additional components are needed if input doesn't conform.  This post scanned an input file and used a regular expression to determine whether or not the input was empty.  An improvement to this is to wrap the logic and filtering into a routine or custom component.

UPDATE: Reduced Number of Components

You can skip the tFilterRow and tFlowToIterate components by using the Run If trigger from tFileInputJSON_1.  The Run If trigger supports an expression that will continue processing if true.  The tFixedFlowInput component is still needed as an adapter between the pair of tFileInputJSON components.

Here is a screenshot of the reworked job.  The Component View is showing the expression of the filter condition.

JSON Job Re-worked to Use Run If

5 comments:

  1. Cool. Another way could be to add the 'empty document' output to the tJSON component.

    ReplyDelete
  2. Carl, can you provide insight on how you checked to see if the input conforms?

    ReplyDelete
    Replies
    1. Hi,

      There is a check that is using Java Regular Expressions. See the section "Determining the Empty Object" which will set a flag if the payload of the attribs field is JSON. The check is not very rigorous, only verifying that the attribs field is not null, empty, and enclosed in square brackets.

      Delete
  3. Hi Bekwam, i like your blog that u posted very useful things in Talend. i Would like to ask you to provide an example for how to perform json data from rest api please.
    And how to put some of fields together i mean have to perfoem any aggregate operation.

    ReplyDelete
  4. Hello Bekwam,

    I am getting JSON file from MONGODB where if the value is null the key or array itself will not appear in json file.
    But tfileinputjson will reject the record.
    how to replace the array name also with empty strcuture.
    for example

    {"data":
    [{
    "id":"1",
    "name":"Arch",
    "category":"normal",
    "type":[{"id":"1",
    "desc":"emp"
    }]
    },
    {
    "id":"2",
    "name":"siv",
    "category":"abnormal",
    "type":[{"id":"2",
    "desc":"lead"
    }]
    },
    {
    "id":"3",
    "name":"kar",
    "category":"soabnormal",
    "type":[]
    },
    {
    "id":"4",
    "name":"prasad",
    "category":"soveryabnormal"
    },
    {
    "id":"5"
    }
    ]
    }

    for ids 3,4,5 I need to have replicate the main JSON structure as is.
    how can do that?


    Thanks in advance,
    Archana

    ReplyDelete