Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Friday, February 11, 2011

The Talend Open Studio Date Dimension Job

Use Java 1.6's Calendar class to build a rich date dimension using Talend Open Studio.

Java 1.6 has a new method getDisplayName() that will return text names and abbreviations for the different parts of a date. With some basic looping, this can produce a useful denormalized structure for a target like a table or spreadsheet when used in Talend Open Studio.

Here is a Job showing a set of components that routes a list of date records to a tLogRow.

Talend Open Studio: A Date Dimension Job
The Job contains 3 loops: YEAR_LOOP, MONTH_LOOP, and DAY_LOOP.  YEAR_LOOP loops over a numeric range, in this case 2010 to 2013.  MONTH_LOOP also loops over a numeric range, 0-11, for each month.  DAY_LOOP loops over the number of the days in the month, but to determine this, it needs some assistance.

Create Calendar is a tJava component that is called for each iteration of MONTH_LOOP, which in turn is called for each iteration of YEAR_LOOP.  For example, for the year '2011', Create Calendar will be called 12 times.  Create Calendar creates a Calendar object from the current year, current month, and the first day of the month (1).  Create Calendar then retrieves a special value computed by standard Java called actualMaximum.  This is the number of days in a particular month (28, 30, 31).  Create calendar then sets a global variable called MAXDAYS.

MAXDAYS is used as the upper bound of DAY_LOOP, which starts with 1, the first day of the month.  DAY_LOOP drives a tIterateToFlow which defines a 3-field schema: year, month, day.  The schema is filled with values taken from the CURRENT_VALUE of each of the three loops.  (Because MONTH_LOOP is zero-based, a 1 is added to it.)

the tFlowToIterate is directed to a tJavaRow component which creates another Calendar object based on the current year, month, and day.  "Lookup Cal Fields" routes the year, month, and day to the output AND adds several new fields.  These fields are month name, month abbreviation, day name, day abbreviation, and day in year ("Julian" date).

Here is the configuration for the components:

YEAR_LOOP (tLoop)

Loop Type: For
From: 2010
To: 2013
Step: 1
Values are increasing: checked

MONTH_LOOP (tLoop)

Loop Type: For
From: 0
To: 11
Step: 1
Values are increasing: checked

Create Calendar (tJava)
Import:

import java.util.Calendar;
import java.util.GregorianCalendar;


Code:

   Calendar cal = new GregorianCalendar(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")),     ((Integer)globalMap.get("tLoop_2_CURRENT_VALUE")), 1);
   globalMap.put("MAXDAYS", new Integer(cal.getActualMaximum(Calendar.DAY_OF_MONTH)));


DAY_LOOP (tLoop)

Loop Type: For
From: 1
Step: 1
Values are increasing: checked
To:

     ((Integer)globalMap.get("MAXDAYS")).intValue()



tIterateToFlow

Mapping (all Integer type):
Column / Value
year / ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))
month / ((Integer)globalMap.get("tLoop_2_CURRENT_VALUE"))+1
day / ((Integer)globalMap.get("tLoop_3_CURRENT_VALUE"))

Here is the schema used in tIterateToFlow

Schema Used in tIterateToFlow

Lookup Cal Fields (tJavaRow)

Import:


import java.util.Locale;


Code:

output_row.year = input_row.year;
output_row.month = input_row.month;
output_row.day = input_row.day;

Calendar cal = new GregorianCalendar(input_row.year, input_row.month-1, input_row.day);

output_row.monthName = cal.getDisplayName(Calendar.MONTH, Calendar.LONG, Locale.getDefault());
output_row.monthAbbrv = cal.getDisplayName(Calendar.MONTH, Calendar.SHORT, Locale.getDefault());

output_row.dayName = cal.getDisplayName(Calendar.DAY_OF_WEEK, Calendar.LONG, Locale.getDefault());
output_row.dayAbbrv = cal.getDisplayName(Calendar.DAY_OF_WEEK, Calendar.SHORT, Locale.getDefault());

output_row.dayOfYear = new Integer(cal.get(Calendar.DAY_OF_YEAR));


Here is the schema used in the tJavaRow component (Lookup Calendar).  Note that several fields are added that are computed using Calendar off of the list indexes.

Schema Used in tJavaRow
With a few components, Talend Open Studio can create a job for loading a dimension.  In addition to the typical year/month/day values, there can be descriptive additions to the date dimension like month name or abbreviation.  Even if your date dimension needs to be supplemented with business-specific date like holidays, this script can serve as the starting point for an Excel spreadsheet.

5 comments:

  1. i've followed the steps but when i run the job, i hit an error as below:

    Exception in thread "main" java.lang.Error: java.lang.Error: Unresolved compilation problems:

    monthName cannot be resolved or is not a field
    monthAbbrv cannot be resolved or is not a field
    dayName cannot be resolved or is not a field
    dayAbbrv cannot be resolved or is not a field
    dayOfYear cannot be resolved or is not a field

    at talenddemosjava.dateloop_0_1.dateloop.tLoop_1Process(dateloop.java:797)
    at talenddemosjava.dateloop_0_1.dateloop.runJobInTOS(dateloop.java:1542)
    at talenddemosjava.dateloop_0_1.dateloop.main(dateloop.java:1410)

    could someone help me plz?

    ReplyDelete
    Replies
    1. Hi,

      It sounds like those fields aren't in the schema used for the tJavaRow component. Take a look at the figure "Schema Used in tJavaRow" and make sure that you have the same fields defined.

      Delete
  2. Hi Carl,

    Yep, I've fixed it and it's ok now. I missed out the schema in tJavaRow. It's fine now. Thank you.

    ReplyDelete
  3. Hi Carl, I also tried to follow the instructions above and also got the following errors:

    DAY_OF_WEEK cannot be resolved or is not a field
    LONG cannot be resolved or is not a field
    DAY_OF_WEEK cannot be resolved or is not a field
    SHORT cannot be resolved or is not a field
    DAY_OF_YEAR cannot be resolved or is not a field
    Type mismatch, cannot convert from Gregorian Calendar to Calendar
    MONTH cannot be resolved or is not a field
    LONG cannot be resolved or is not a field
    MONTH cannot be resolved or is not a field
    SHORT cannot be resolved or is not a field
    The import java.util.Calendar conflicts with a type defined in the same file
    Type mismatch, cannot convert from Gregorian Calendar to Calendar
    DAY_OF_MONTH cannot be resolved or is not a field

    I want to have your idea on this on why Im causing this error

    Im using the latest version of Talend for DI 6.0

    Anyways, hope to hear from you soon...

    Thank you and God Bless

    ReplyDelete
    Replies
    1. Ah I have already resolved it...It seems you have to import the whole java.util.* and also put java.util in every Calendar within the codes of tJava and the codes of tJavaRow....so for example the code for the tJava above would become:

      java.util.Calendar cal = new java.util.GregorianCalendar(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")), ((Integer)globalMap.get("tLoop_2_CURRENT_VALUE")), 1);
      globalMap.put("MAXDAYS", new Integer(cal.getActualMaximum(java.util.Calendar.DAY_OF_MONTH)));

      and the code for tJavaRow would become:

      output_row.year = input_row.year;
      output_row.month = input_row.month;
      output_row.day = input_row.day;

      java.util.Calendar cal = new java.util.GregorianCalendar(input_row.year, input_row.month-1, input_row.day);

      output_row.monthName = cal.getDisplayName(java.util.Calendar.MONTH, java.util.Calendar.LONG, Locale.getDefault());
      output_row.monthAbbrv = cal.getDisplayName(java.util.Calendar.MONTH, java.util.Calendar.SHORT, Locale.getDefault());
      output_row.dayName = cal.getDisplayName(java.util.Calendar.DAY_OF_WEEK, java.util.Calendar.LONG, Locale.getDefault());
      output_row.dayAbbrv = cal.getDisplayName(java.util.Calendar.DAY_OF_WEEK, java.util.Calendar.SHORT, Locale.getDefault());
      output_row.dayOfYear = new Integer(cal.get(java.util.Calendar.DAY_OF_YEAR));


      so anyway, thanks for making this wonderful exercise

      Delete