Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, May 7, 2012

Handling Database Warnings with Talend Open Studio

To handle nonfatal database warnings with Talend Open Studio, use a Run If connector routed to a tWarn component.  Use a global exception handler pattern -- tLogCatcher -- to save off the error message and email it.

This job loads records into a SQL Server table from a spreadsheet.  The database has a unique constraint on 'businessName'.  If you run the job once, 5 records are inserted.  If you run the job a second time on the same input, an error is generated.

Job Featuring tWarn and a Run If to Report DB Errors
 The generated error is coming from the database.  If the "Die on error" box were checked on the tMSSqlOutput , processing would stop.  In this case, processing continues as shown by the "after loading subjob" messages which comes from the tJava_1.

Error is Logged, but Processing Continues
Run If


The job uses a global exception handler subjob (tLogCatcher, etc) that can be reused among the flows in the job.  The mechanism by which the tMSSqlOutput component communicates with the exception handler is tWarn.  tWarn is invoked by logic in the Run If connector.

Logical in the Run If Connector
Walkthrough

For a detailed walkthrough of the job, watch this video.


Getting More Info

If you're willing to turn off batching, you can set up a rejects flow from the tMSSqlOutput, spooling rejected records off to a file.  The following revision produces the file attachment as part of the data flow (Excel -> MS Sql).  Batching MUST be turned off in order for the Rejects connector to be an option in the job.  Uncheck the "Use Batch Size" checkbox on the Advanced Settings tab.

Spool Rejected Records to a File; Run If Triggers a Warn
If there are no rejected records, the tWarn is not called an no email is sent.  If there are rejected records -- in this case coming straight from a database constraint error -- they accrue in a file.  The _REJECTED variable is consulted on a Run If which will trigger the tLogCatcher (and the tSendMail component).  The tJava print out ("after loading subjob") is still invoked since processing continues.

 The Run If connector provides a way to check for a result.  This example looked at the _INSERTED variable of the tMSSqlOutput component. The logic could be expanded to AND or OR other conditions such correlating a row count with the number of inserts.  tWarn can send the result of the check to a globally-defined exception handler.  That way, multiple tests can be done in the Talend job all using the same exception code.

4 comments:

  1. This is real helpful. Thanks!! I've written a blog post on my experience with mysqloutputbulkexec and why its clear table feature sucks.

    http://technologyenablingbusiness.blogspot.com/2012/06/beware-talends-tmysqloutputbulkexec.html

    I don't see a truncate table option on the component. Just wanted to make sure that is correct.

    ReplyDelete
    Replies
    1. Hi Yash,

      There is an option "Truncate" for the Action on Data control on 5.1.0.

      If that doesn't work, you can provide your own SQL Loader control file to this component. Go to Advanced Settings and check "Use existing control file". This will allow you to specify REPLACE or TRUNCATE rather than LOAD.

      The syntax for Oracle's control file is here: http://docs.oracle.com/cd/B10500_01/server.920/a96652/ch05.htm. Probably the best place to start is to look at what's generated from Talend using the Basic Settings and . By default, there will be a file $TALEND_HOME/workspace/out.csv.ctl.

      Good luck

      Delete
  2. Ah...I'm using 4.2.3 No wonder...Thanks!

    ReplyDelete
  3. Thank you Carl! The whole writing was very valuable, but especially the part on how to get the rejects connector to be an option in the tMSSqlOutput.

    ReplyDelete