Background
tScriptRules was written as a more flexible filtering mechanism that the off-the-shelf component tFilterRow and tMap. Wiith tScriptRules, you can define filtering rules using a Javascript-like syntax. Additionally, tScriptRules will add error reporting columns to a reject flow that can produce a data quality report or troubleshoot an error.
The rules in a tScriptRules can be loaded from table elements set up in the component's Component View. However, when this technique is used, the rules become embedded in the job. This means that maintenance is more difficult -- you'd have to change Talend code with a rule update -- and the rules are not transferable both within the job and to other jobs.
tScriptRulesLoad
tScriptRulesLoad separates the rules from the Talend job and the component. This means that the rules can be maintained independently, say in as a text file in a source code control system. The same rules can be applied in multiple jobs. The same rules can be applied to multiple tScriptRules components in the same jobs. In an advanced use case, you can even use the rules loaded from tScriptRulesLoad independently of any tScriptRules components.
Sample Job
In this job, rules are loaded from a text file using a tFileInputDelimited. The source is a text field 'rules_success.txt'. tScriptRulesLoad stores the loaded rules in an internal data structure. This internal data structure is connected to the tScriptRules as a reference "Rule list from" in Component View.
Basic Job Using tScriptRules with tScriptRulesLoad |
Usage
To use a tScriptRulesLoad component, drag an input source and a tScriptRulesLoad component onto the canvas. In this case, a tFixedFlowInput is the input source.
Step 1: Add Components to Canvas |
On the tScriptRulesLoad Component View, click the Edit Schema button. The following dialog is displayed.
Edit Schema Dialog |
Columns Copied to Input Source |
A Rule Defined in a tFixedFlowInput |
Rules Reused Recycled in Two tScriptRules |
Note the use of 'input_row' in the rule definition. The tScriptRules components will accept this alias for the row name. You can use the actual row name (row2, row3), but this limits the rule's usage if the actual row name changes or isn't applicable for a particular flow.
The rules must evaluate to true in order to be routed to the filter connection. If a rule is misapplied (say a rule "row2 == 'ok'" in tScriptRules_2, the rule will be rejected for each input record by the tScriptRules_2.
Adaptations
This application of tScriptRules load used a tFixedFlowInput. Any Talend input can serve as an input source for tScriptRulesLoad. This includes a text file, database, XML document, or even a web service.
If the schema set in the input source is different than what tScriptRulesLoad expexcts -- take a text field with columns "rule,code,msg" rather than "jexlExpresion,reasonCode,reasonMessage" -- use a tMap. Follow the procedures for the tFixedFlow example with the tMap. Then map the input source fields (rule/code/msg) to the tScriptRulesLoad fields in the tMap (jexlExpression,reasonCode,reasonMessage).
If you'd like to number the rules automatically, use a tMap where the reasonCode is set from a sequence: the Numeric.sequence function.
Agile Rule Development
One use case I've been experimenting with is defining the rules using a text file input. The text file can be stored in CM easily with standard diff commands highlighting differences. Developing Talend Open Studio jobs, I can bring up a text editor (Textpad) alongside my Talend Open Studio. I can then run and re-run the job making quick text edits to the input source file.
This setup seems to work in an experimenting SQL-like fashion to put a "where" clause on a input source like a web server log file.
In my consulting jobs, I deal with a lot of rules. They start of simply enough ("field x is required"), but can rapidly grow to complex business logic relating multiple fields from multiple sources. Hopefully, you find these components useful. If you have any product suggestions or bugs, please send them to dev@bekwam.com.
No comments:
Post a Comment