If adjusting the data isn't possible, then push the complex join to the input component by expanding the SQL statement of the input component.
Basic tMap Lookup
tMap joins lookups to the main data flow using equality. This "Load Fact Enroll" job joins a main MySQL data flow with two lookup tables.
A Job with Two Lookups |
Input Joined to Two Lookups |
The result of the map is a SQL join based on the following conditions (an equijoin is specified).
row1.SYSTEM_NM = row2.SYSTEM_NM AND
row1.YEAR_NB = row2.YEAR_NB
Date Range
These two tables are of different grain. TransactionTable has records recorded at the day level (more realistically, they would be timestamped) and TransactionWeekTable has records recorded at the week level.
Daily Transactions |
Weekly Reference Data |
row1.transaction_date >= row1.[Start Date] AND
row1.transaction_Date <= row2.[End Date]
Which isn't available using the simple drag-and-drop of the tMap interface.
Fix the Data
Once way to handle the complex join is to fix the data. Kimball recommends building a complete, business-driven date to handle each day. Each day can be allocated to a quarter or week, labeled as a holiday, or assigned a pay period. In TransactionWeekTable, the fix would turn each range-based record into a set of records, one for each day. For example,
Start Date End Date Week
--------------------------------
1/1/2011 1/8/2011 1
Would be replaced with a TransactionDayTable (or simply "CalendarTable")
Day Day of Year Week Holiday Pay Day Weekend
--------------------------------------------------------
1/1/2011 1 1 Y N Y
1/2/2011 2 1 N N Y
1/3/2011 3 1 N N N
etc.
Then, tMap's standard lookup mechanism can be used because although the range join isn't supported, a simple date function to retrieve a day is available in the Routines (TalendDate.getPartOfDate). Map the main flow column "transaction_date" to the new CalendarTable column "Day of Year" and edit the expression to use TalendDate.getPartOfDate("DAY_OF_YEAR) on the main flow column.
Here is an example spreadsheet from Kimball's web site to an example CalendarTable.
Move the Join
If you would like to fix this in the TOS code, then move the join to an input component like tAccessInput. This job removes the lookup table from the canvas; it's now embedded in the tAccessInput query.
Lookup Pushed to Input Component |
Here is the simple tMap supporting the job.
tMap with Extended Schema |
As with most integration cases, it's best to solve a data problem or deficiency in the data. In this example, breaking a week-oriented table out to a day-oriented table not only simplified the TOS design but also created a conduit for additional business requirements (quarter, pay period, etc.). If you're not able to fix this problem in the data, then overcome Talend's equality-based tMap by adding joins in the SQL supporting input components.
No comments:
Post a Comment