Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Sunday, February 5, 2012

Calling a Stored Procedure from Talend Open Studio

Writing to a set of related tables may present a timing challenge to Talend Open Studio developers.  This post shows how to use a stored procedure to write to two related MS SQL Server tables.

Databases like MS SQL Server and MySQL use special identity columns to provide a unique auto-incremented value for a row.  This value is often used as a primary key.  Working with these columns presents a special challenge to Talend Open Studio developers because there may not be a business key that supports a lookup on the generated id.

Related Tables

The tables Business and Contact are releated by a foreign key 'businessId' which is the primary key of the Business table.  A Contact record can't be inserted without a valid foreign key.  'businessName' is not unique, so a lookup cannot be established after-the-fact.


Two Tables Related by an Auto Increment Primary Key
Get Last ID

Talend provides components for retrieving the last auto-generated id: tMSSqlLastInsertId, tMysqlLastInsertId.  These components are used in the middle of a sequence of output components.  For example, the following job shows a tMSSqlLastInsertId in between tMSSqlOutput components "Business" and "Contact".  Business writes to the Business table.  Contact writes to the Contact table.  tMSSqlLastInsertId adds a special 'last_insert_id' field in the processing flow which is used to carry the generated businessId into the INSERT statement for Contact.


Job Using last_insert_id
Pass-Through Schemas
  
When using tMSSqlOutput components, you'll generally use a schema from a repository which Talend Open Studio can keep in sync with the database.  This improves maintenance.  In the above job however, the schema used for Business needs to be expanded in order to carry the Contact fields over to the Contact component.


A Pass-Through Schema Carries Fields for Later Processing
Because the fields like Primary_Contact_FirstName aren't part of the Business table insert, they are excluded using the Advanced settings tab / Field options.  Note the unchecked "Use batch" which is required for the last_insert_id to be set.


Ignore Pass-Through Fields
With a Stored Procedure


The above example works, but the approach can become more difficult to maintain if there is another layer of dependency, say a load that will write to a dependent of Contact the requires the auto-generated contactId.  Stored procedures provide a way to encapsulate the call.  A single Talend component is used -- tMSSqlSP -- that is based on a composite schema of all the fields for all the tables, as was used in the Pass-through schema for Business in the previous example.


Such a job looks like the following with a single tMSSqlSP replacing the Business/last_insert_id/Contact processing chain.


Job Using a Stored Procedure for a Multitable Write
A composite schema containing all the fields for all the table inserts is used.  This set of fields matches the number of parameters for the stored procedure.


Schema Used in tMSSqlSP Procedure
And the corresponding configuration uses the following parameters.  Note the green double-plus button will take the schema and create a full set of parameters, one for each field in the schema.


Configuration of a tMSSqlSP Component


Stored Procedure Source

Like the original job, the stored procedure writes to both the Business and the Contact tables.  For the Contact write, a SELECT @@IDENTITY is used to set a variable.  This is the same SELECT executed by a tMSSqlLastInsertId component.


CREATE PROC dbo.InsertBusiness
  @businessName AS VARCHAR(50),
  @streetAddr1 AS VARCHAR(50),
  @streetAddr2 AS VARCHAR(50),
  @city AS VARCHAR(50),
  @usState AS VARCHAR(2),
  @zip AS VARCHAR(10),
  @phoneNumber AS VARCHAR(25),
  @faxNumber AS VARCHAR(25),
  @websiteURL AS VARCHAR(255),
  @firstName AS VARCHAR(50),
  @lastName AS VARCHAR(50),
  @contactPhoneNumber AS VARCHAR(25),
  @email AS VARCHAR(100)
AS
    DECLARE @businessId INT
   
    INSERT INTO Business (businessName, streetAddr1, streetAddr2, city,
    state, zip, phoneNumber, faxNumber, websiteURL)
    VALUES (
    @businessName, @streetAddr1, @streetAddr2, @city, @usState, @zip,
    @phoneNumber, @faxNumber, @websiteURL
    )
   
    SELECT @businessId = @@IDENTITY

    INSERT INTO Contact (businessId, firstName, lastName, phoneNumber, email)
    VALUES (
    @businessId, @firstName, @lastName, @contactPhoneNumber, @email
    )



Talend Open Studio's management of schemas is a big productivity boost.  Both examples presented in this post would benefit with a composite schema -- one that combines Business and Contact -- saved in the Repository.  The tMSSqlLastInsertId technique can be used reliably, but if there are more related writes, consider encapsulating the data with a stored procedure call.  Also, if performance is critical, you can move certain lookups into the database in a stored procedure where the retrieval is much faster.  Note that in both cases, the inserts are made row-by-row with the input.  This is because batching is turned off in the tMSSqlLastInsertId example, though the commit may be deferred.


More

For more on working with Talend and MS SQL Server, read

7 comments:

  1. Check this helpful links too it also explained very well about auto increment id by using stored procedure in sql server...
    http://social.msdn.microsoft.com/Forums/en/transactsql/thread/8f67388b-5e34-46fc-b3db-686da32431d8
    http://www.mindstick.com/Blog/133/Auto%20increment%20id%20by%20using%20stored%20procedure
    http://www.dotnetfunda.com/articles/article1547-how-to-retrieve-identity-auto-incremented-value-using-output-paramter-thr.aspx

    ReplyDelete
  2. Replies
    1. Hi. I'm sorry, but I don't have this job anymore. It must have gotten lost in one of the upgrades.

      Delete
    2. thank for replying,but I've some problem in TMSSqlSP. I want to know how to call stored procedure with parameters.Can you help me ??

      Delete
    3. Does your job have Parameter entries in the Basic Settings of the tMSSqlSP component?

      Delete
    4. nope, let me know which tools is used to be for parameter. I see "tSchemacomplianceCheck" in your sharing but i don't know how to use :(

      Carl, thank for sharing your time :)

      Delete
    5. Connecting a component like tSchemaComplianceCheck or tMap -- as in the screenshot "Schema Used in tMSSqlSP" above -- to the tMSSqlSP will transfer a schema to the tMSSqlSP component.

      The fields of the schema can then be applied to the Parameters table in the tMSSqlSP.

      Delete