Bekwam Blog: Updating a Table with the Talend Open Studio ELT Components

Saturday, October 15, 2011

Updating a Table with the Talend Open Studio ELT Components

ELT stands for Extract-Load-Transform and is an inversion of the data processing pattern "ETL". In ELT, data is manipulated within a database which is a huge performance improvement. Talend Open Studio comes with components like tELTMySqlMap to make the writing of such manipulations easier.

When updating data in a database, sometimes its easier to load a file in a staging table and then manipulate the data within the database. It may even be the only option based on performance requirements. Using a staging table means loading relatively unclean data from a source like a spreadsheet or CSV file into a schema based heavily on the input. Subsequent transformations and loads within the database will address the quality and move the data into a more usable format.

The Talend Open Studio (TOS) ELT components (for example, tELTMySqlMap and tELTOracle) will generate SQL INSERT OR UPDATE statements based on Talend metadata. While these statements can be formed by hand, forming the SQL with Talend will cut down on typing and errors. It's especially useful on a consulting job where large sets of unfamiliar columns need to be mapped.

An ELT Job

This TOS job will update a table CONTACT. CONTACT has a column BUSINESS_ID which will be updated based on a mapping established in the BUSINESS_CONTACT table. The BUSINESS_CONTACT table uses a business key "BUSINESS_NAME" to retrieve the value BUSINESS_ID.

ELT Job

BUSINESS and BUSINESS CONTACT are input tables to the tELTMySqlMap component. The output table is CONTACT. The schema defined for the CONTACT table in the tMySqlOutput component is a subset of the fields because this job will only be updating a single field. For inserts, the whole set of fields would be used.

ELT Map

In the map component, the input tables are joined on a business key "BUSINESS_NAME". From the BUSINESS table, BUSINESS_ID is mapped to the output. If this were an insert operation, the job is complete. Because this is an update, there is an additional step.

In an update, the input needs to be correlated with the output. This is accomplished using an alias 'C' on the target table. The C alias is imposed as a condition on the join table using an expression entered by hand. Notice that the target table is not added as a third input. That's because the UPDATE statement generated will add input tables to the FROM cause, not what we want with an "UPDATE CONTACT" statement where the table name follows the UPDATE.

To set up the C alias, change the table name to include an alias in the tELTMySqlOutput component.

Adding an Alias to tELTMySqlOutput

The result of a run generates the following SQL.

An Update Statement Considering the Output Table

Troubleshooting

If there is an error in the SQL generated by the Talend component, check the list of columns in the ETL Map. Although this example only updates a single column, I have used an ELT component -- tMSSqlMap specifically -- to update more than one column. What you're looking for in the list of columns are syntactic differences between a plain column (like 'dateCol') and a function or vendor-specific construct (like 'GETDATE()'). Try removing out anything that isn't a plain column when debugging.

Sometimes the only way to process a large amount of data is to first stage the data in a loosely-defined and unconstrained table and to manipulate it once the data is in the database. This is ELT (Extract-Load-Transform), an inversion of the more widely known "ETL". Using Talend Open Studio's ELT helps generate the SQL for this type of transformation. These SQL statements can certainly be created by hand, but if you're a consultant or dealing with an unfamiliar schema, the productivity boost of mapping unfamiliar tools with a GUI can be substantial.

8 comments:

dqvn2002March 25, 2012 at 6:49 PM
Your post is helpful with me, thanks for your guide.
ReplyDelete
Replies
sriniAugust 16, 2012 at 10:00 AM
Hi,

For above example do all the tables in the same schema?
ReplyDelete
Replies
AnselmoAugust 3, 2015 at 6:14 PM
This comment has been removed by the author.
ReplyDelete
Replies
AnselmoAugust 3, 2015 at 6:19 PM
Thank you, Carl. This post is very useful, nice work you're doing with this blog!

Unfortunately I wasn't able to follow these steps with SQL Server, because the ELT components are generating a command like this:

UPDATE mytable alias_different_table_name
SET attr1 = (SELECT attr1 FROM source_table ... INNER JOIN ...)
attr2 = (SELECT attr2 FROM source_table ... INNER JOIN ...)
WHERE ...

And the expected syntax would be:

UPDATE mytable
SET attr1 = A.attr1
attr2 = A.attr2
FROM source_table A
INNER JOIN ...
WHERE ...

I don't know if the syntax above is the only one correct, but I tried both methods manually and only the second option had success (I think that is more optimized, also). The first option only worked without the alias defined in the output component and before the 'SET' word.

Now I'm inserting all the data in a temporary table and then I use a tSQLTemplateMerge component to update/insert the target table.

Best regards,
Anselmo
ReplyDelete
Replies
UnknownJune 16, 2020 at 10:41 AM
Nice information, this is will helpfull a lot, Thank for sharing, Keep do posting i like to follow this informatica online training
informatica online course
informatica bdm training
informatica developer training
informatica training
informatica course
ReplyDelete
Replies

Add comment

Bekwam Blog

Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

Saturday, October 15, 2011

Updating a Table with the Talend Open Studio ELT Components

8 comments: