Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Wednesday, March 2, 2011

MySQL Fact Tables with Talend Open Studio

Once the dimensions are loaded in your data mart, load the fact tables.  In Talend Open Studio, this is implemented using several tMysqlInput components, a tMap, and a tMysqlOutput component for a physical model built on MySQL.

A fact table holds transaction data, either raw or summarized (Simsion 488).  In processing terms, this means that each input record -- say from a staging table -- will produce an output record in the fact table.  This is different than a dimension which normalizes a record from an input set.

A star schema is an arrangement of dimensions around a single fact table (Simsion 499).  Fact tables may share dimensions.

This post shows examples of the fact tables SR_FACT_REVENUE from the following data model.  The fact table is defined in terms of a collection of foreign keys from a dimension ("Design Tip #81").

School Financial Mart

There is an attribute, SR_ENROLL_NB that appears both on a dimension (SR_DIM_SYSTEM) and the fact table.  Enrollment (ENROLL_NB) describes a school system, but may be needed in computations, say to computer per-student spending.

In Talend Open Studio, a fact loading job for MySQL is created using several tMysqlInput tables.  One tMysqlInput is for the main input, such as the input from a staging table.  Dimensions are queried using additional tMysqlInput components to provide surrogate keys.  A tMap component joins the source schemas, the main input joined with the lookup values to the output.

This is a similar example loading the other fact table.  SR_FACT_REVENUE is loaded from SR_ETL_REVENUE, SR_DIM_YEAR, SR_DIM_SYSTEM, and SR_DIM_FUND.

A Talend Job Loading Fact Revenue Table

A typical tMap configuration will join the sources together on their business keys and use the data from the main input combined with the surrogate keys from the lookup to provide the basis for the fact load.

The configuration of the tMysqlInput components is straightforward.  Their schemas are based on the definitions found in the Repository.  The query is generated using the "Guess Query" function in Talend Open Studio.

For the tMysqlOutput component, set the "Action on data" to insert.  The fact table load will unconditionally insert records unlike the dimensional loads which will do an upsert or apply end dating logic after an insert.

In Talend Open Studio, loading a fact table requires few components.  Talend's ability to work with schemas -- in contrast to hand-coding -- makes developers more productive and less error prone.  Also, Talend flags schemas that become out-of-sync with their queries which is a nice feature if you're developing dimensions shared among different stars.

Simsion, Graeme C. and Graham C. Witt, . Data Modeling Essentials. 3rd ed. USA, Morgan Kaufman Publishers. 2005.

"Design Tip #81 Fact Table Surrogate Key." www.kimballgroup.com. Kimball Group. July 6, 2006. http://www.kimballgroup.com/html/designtipsPDF/DesignTips2006/KU81FactTableSurrogateKeys.pdf.

1 comment: