Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Wednesday, February 9, 2011

Scripting a Date Dimension Using Talend Open Studio

For a rigorous date dimension, you'll probably need to work with a spreadsheet.  However, if you don't have to integrate business-specific data like holidays, you can use Java's Calendar class in a Talend Open Studio Job to build the dimension.

Java's Calendar class can produce a rich date dimension.  In addition to defining the calendar years (months, days, etc.) there are a variety of codes and descriptions that are available.

This post is the starting point for such a Talend job.

The following screenshot shows a job that contains two loops: one to drive the year, one for the month.  tForeach contains a list of fiscal years, 2000-2020, that are entered as Values in the component configuration.  tLoop is a loop of months, from 0 to 11.   tForeach is an outer loop and tLoop is repeated for each iteration of the outer loop.

Talend Open Studio: Starting Point for Date Dimension Create Job
 Here is the configuration for the tForeach.
Talend Open Studio: tForeach Config

Here is the component configuration for the tLoop component.

Talend Open Studio: tLoop Config

A tIterateToFlow component is used as an adapter to convert the loops' iterations to a data flow, for use in a tLogRow component.  (This will be replaced with a database output.)  The tIterateToFlow will define a two-column schema containing the current iteratation of the outer loop (year) and the inner loop (month number).

Here is a screenshot of the definitions.  Notice the difference in Java types.  CURRENT_ITERATION is an Integer.

Talend Open Studio: tIterateToFlow Config
 Lastly, there is the component that interacts with the Java Calendar class: tJavaRow.  tJavaRow does two things: map the input components directly to the output, use the input components to lookup codes and descriptions.  Here is the code behind the tJavaRow component.

row2.Year = row1.Year;
row2.Month = row1.Month;

Integer yearNb = Integer.parseInt(row1.Year);
Integer monthNb = row1.Month;

java.util.Calendar cal = new GregorianCalendar(yearNb,monthNb-1,1);

row2.MonthName = cal.getDisplayName(Calendar.MONTH, Calendar.LONG, Locale.getDefault());

row2.MonthShortName = cal.getDisplayName(Calendar.MONTH, Calendar.LONG, Locale.getDefault());

The month from the tLoop is one-based; it needs to be converted (-1) to zero-based.  The MonthName and MonthShortName are two of many lookups available from the Calendar class.

The code works in conjunction with the tJavaRow's expanded schema which adds new output fields to the input.

Talend Open Studio: A Schema in tJavaRow

There is also a tLoadLibrary component to import the various Java libraries (Calendar, Locale) so that I don't have to fully qualify each class name.  When creating this for yourself, you'll have to specify a JAR file despite the fact that you're using standard Java libraries.  Pick "Commons-Lang 2.5" or another one, otherwise your tLoadLibrary component won't validate.

Your requirements may warrant working with a spreadsheet of several thousand rows to integrate business-specific data.  However, there are many cases where a simple Talend job that uses the Java Calendar class can create a date dimension quickly.  This post provides the basic approach.  Look for a future one that will add a third loop, day.

No comments:

Post a Comment