It's best to standardize on one date format throughout your application. The date format for the RDBMS is a good candidate, like "dd-MM-yy" for Oracle. But dates can come from other sources, so you'll occasionally need to format a date.
Schemas in Talend Open Studio can automatically turn text strings into date objects. Once Talend recognizes a field as a date, that date can be made available to other components without having to worry about format.
The following screenshot is a text file with fields of various date formats.
|
Input File with Different Date Formats |
Load the file using a tFileInputDelimited. When creating the schema, define each field as a date (rather than a string). Specify a pattern to handle the different formats. Talend Open Studio provides a handy code complete that will help select the correct format. The following date pattern is for "December 2, 2010" which isn't in the code complete list.
"MMMM dd, yyyy"
MMMM uses the full month, like "December". MMM is for an abbreviated month, "Dec". MM is used for numeric values ("12").
This is a simple tFileInputDelimited to tLogRow Job.
|
Job Showing Date Conversions |
The following schema identifies each field as a date and provides a date pattern. It's for the tFileInputDelimited.
|
Schema with Different Date Format | | | | | | | | |
This schema converts the input strings to a date format. Once the fields are in date format, they can be reformatted. The tLogRow schema applies a uniform date format to all of the date fields.
|
Schema Applying a Common Date Format |
Running the job produces the following.
|
Running a Job with a Common Date Format |
When dealing with dates, it's best to standardize on a format. However, dates can come from a lot of other sources -- a web application, legacy data, external interface -- so, become familiar with handling dates in a schema or a dedicated component like tConvertType.
Does the implicit date conversion work only with text files? In my tests, date conversion from a database String/Varchar seems to fail.
ReplyDeleteHi,
DeleteI've always stored dates as their proper format in the database, so I'm not sure about the implicit conversion. Are you using type=Date and dbtype=VARCHAR? I would try a tConvertType as a next step.
I've noticed with Excel that the date handling isn't straightforward. The spreadsheet stores the date as a number rather than a String. So, once I wasn't able to read in the value then convert it using a date format (it was already a date).
I was using Varchar for date in the database, so, I suppose it just doesn't do implicit conversion from an RMDB. I ended up just converting my dates to proper date fields in the DB.
ReplyDeleteThanks.
Nice blog Carl thanks for sharing.
ReplyDelete