Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Wednesday, December 28, 2011

tMSSqlLastInsertId Returns 0

If you've built a Talend Open Studio job that inserts records into a multiple MS SQL Server tables, you can use the tMSSqlLastInsertId to retrieve the value of an auto-incremented IDENTITY column set in a prior insert.

SQL Server uses the special variable @@IDENTITY to retrieve the auto-incremented column value of the last insert for a table with such a column defined.

For example, for a table

CREATE TABLE Contact (
 contactId int identity(1,1) NOT NULL,
 firstName varchar(50) NOT NULL,
)

when an insert statement is executed

INSERT INTO contact (firstName) VALUES ('carl')

which is followed with a select statement

SELECT @@IDENTITY

will return the last inserted id.  If this was the first insert, then a '1' is returned.

Talend Open Studio runs this "SELECT @@IDENTITY" query in the tMSSqlLastInsertId component, returning the value in the special field "last_insert_id".

The documentation for tMSSqlLastInsertId (4.2.3) points to an equivalent example with MySQL.  In the example, an output component (tMySqlOutput) is run into a tMysqlLastInsertId.  In the example, the result is directed to a tLogRow, but that could have been a second tMySqlOutput which performs a second insert.

Batching

One problem I found with the example is that the batching is used for the MS SQL components.  This causes the tMSSqlLastInsertId component to not work.  (The tMSSqlLastInsertId component returns 0 in all cases rather than the identifier.)

To turn off batching, uncheck the box on the Advanced tab of the preceding tMySqlOutput.  See the following screenshot.

Using tMSSqlLastInsertId
Connections

Another cause of a 0 last insert id is using the wrong connection.  In your jobs, you should use a shared database connection.  However, with certain drag-and-drop operations, a connection may be created as "built-in".  Although the generated built-in may use the same credentials and database instance as the shared connection, the last_insert_id mechanism is connection based.  To fix this, make sure that "Use existing connection" is checked in the output component's configuration.

Databases like Oracle create their unique identifiers upfront using sequences.  These sequences can be carried into Talend Open Studio components allowing for multiple related inserts.  MS SQL server uses auto-incremented columns for its unique identifiers and this requires finding those identifiers after-the-fact.  To retrieve the last inserted MS SQL identifier, use the tMSSqllastInsertId component, but be sure to disable any batching.

5 comments:

  1. Thank you very much! I had the same issue with this component returning 0 and solved after disabling the batch processing! ;-)

    ReplyDelete
  2. Thank you for resolving the returning 0 ID issue for me!

    ReplyDelete
  3. Hi, thanks for this info but performance are very low with tMysqlLastinsertId and tMSSQLLastInsertId.Have you got some 'tricks' to improve job like this ? (approx 100row/s max)

    ReplyDelete
    Replies
    1. Hi,

      Unfortunately, this is bound to be slow. It's a tradeoff. The implementation is clean in terms of its algorithm, but with batching turned-off, it won't be performant in all cases.

      For a fast data load, you'll want to load each table individually using multiple passes on the input data. You'll rely on a lookup which considers the previously loaded data to correlate the independent flows. This may force you to add an additional column to one of the target tables, say a batch number, which is used throughout the job.

      As an example, loading company and employee data transmitted in the same input would first load the company table, storing a batch number. A second pass would load the employees, using the combination of the batch number plus company name to acquire the id for the employee table.

      If the business key is strong enough -- say there will only every be one and only one matching company -- you won't need the batch number column to correlate.

      Good luck

      Delete
  4. This comment has been removed by the author.

    ReplyDelete