Off-the-shelf, Talend Open Studio provides the tMap and the tFilterRow components which can apply a business rule or quality check to an incoming data flow. I wrote tScriptRules last year to be a little more flexible. tScriptRules is based on a Javascript-like implementation called JEXL which will let you apply more complex conditions that the standard tFilterRow. Also, unlike tFilterRow or tMap, tScriptRules stores additional documentation with each expression.
Basic Usage
This example shows an Excel source sending input records to tScriptRules. The output of the tScriptRules is one of two tLogRows. One tLogRow handles the filter flow. Any script expressions that resolve to true (such as "!empty(customerId)") will be routed to the filter tLogRow. A script expression that fails will be sent to the reject tLogRow.
tScriptRules with 5 Rows that Met the Conditions |
Some Rules Checking for Missing Data |
Expanded Check
tScriptRules supports a Run All mode which will run each rule against the input. The normal operation (Run All = false) will stop processing of a row on the first failure. In both cases, a row failure does not kill the job, but will carry on with the next row.
Run All Option |
More Complicated Rule
The JEXL script syntax is like Javascript and supports regular expressions. This example uses a rule that checks a state value ("VA") and an email domain ("edu") to infer something about the contact.
A More Complicated Example |
No comments:
Post a Comment