Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, January 31, 2011

A Type 2 Dimension for the School Revenue Data Mart

School System is one of the focal points of analysis for the School Revenue Data Mart. One of the attributes of a School System is Enrollment Number which is the number of students in a school system and will be widely used in reports and ratios.

Saturday, January 29, 2011

[AUDIO] The Integration Tools Podcast - Episode 3 (Jan. 29, 2011)

Episode 3 of the Integration Tools Podcast is available. The podcast is on integration tools' product releases, research, and events. Episode 3 covers Informatica and Oracle's placement in the Gartner MDM reports.

Spend one or two minutes a week to keep on top of the data integration tools market.

Integration Tools Podcast, Episode 3

For more podcasts, visit http://www.bekwam.com

Evaluating Integration Technology

Businesses that acquire an integration tool understand the tool's importance in productivity.  The rationale behind the acquisition will combine both technical criteria (for example, performance) and non-technical criteria (vendor financial health).  This post lists technical criteria for selecting an integration tool.

[TUTORIAL] Loading a Dimension with CloverETL

Using CloverETL, load an Excel spreadsheet of public school revenue data into a MySQL dimension table called SR_DIM_FUND. As the spreadsheet is loaded, consult a lookup CSV file to make the input data conform with a standard.

Thursday, January 27, 2011

Extending the School Revenue Data Mart Project

An earlier post showed a data mart of school revenues based on the Frederick County, Maryland (US) public school system.  This post extends the data mart to include a second school system, neighboring Washington County.

[TUTORIAL] Loading Dimensions with Talend Open Studio

This tutorial shows how to load dimensional structures with Talend Open Studio.

Tuesday, January 25, 2011

A School Revenue Data Mart Project

To try out Talend Open Studio in a Data Mart scenario, I put together a small project based on converting a spreadsheet of revenues for the Frederick Country (Maryland, US) school system into a dimensionally-modeled data mart.

Monday, January 24, 2011

[TUTORIAL] CloverETL Excel to MySQL Tutorial

Tutorial on loading a MySQL table with an Excel spreadsheet using CloverETL Community Edition

[AUDIO] The Integration Tools Podcast - Episode 2 (Jan. 22, 2011)

Episode 2 of the Integration Tools Podcast was uploaded today. The podcast is on integration tools' product releases, research, and events. Episode 2 covers Informatica's new social media support service and Pitney Bowes data service.

Episode 2 of the Integration Tools Podcast is available at http://www.bekwam.com/podcasts/integration_tools_podcast_ep2_20110122.mp3. Spend one or two minutes a week to keep on top of the data integration tools market.

For more podcasts, visit http://www.bekwam.com

Saturday, January 22, 2011

[TUTORIAL] Project Setup with Talend Open Studio

This tutorial is the first in the Splitting a Table data conversion project. It contains topics related to setting up the project such as Job Design Folders, ContextGroups, and Metadata.

Friday, January 21, 2011

[TUTORIAL] File Processing with Talend Open Studio

Splitting a table is a typical data conversion project. This tutorial uses two subjobs to load legacy data into a new CONTACT table, moving a column called 'URL' to a new CONTACT_URL table.

Thursday, January 20, 2011

A Talend Conversion Project

Splitting a table is a common conversion project.  As the data model grows, cardinality changes.  That is, a contact may have had a single email address ten years ago, but now there may be several.

Physical Model for Data Conversion

This image is a physical model similar to a data conversion project I worked on last week.

Tuesday, January 18, 2011

How Well Does Talend Perform?

I created a 138Mb text file with Python and ran it into a MySQL database using Talend Open Studio. The runtime was about 8 minutes.

[TUTORIAL] Deployment with Talend Open Studio

To provide environment-specific settings for Talend Open Studio jobs, use ContextGroups and Contexts.

Sunday, January 16, 2011

[TUTORIAL] Running Simple DML in Talend Open Studio

Video tutorial on running a simple DML statement ("DELETE FROM") using Talend Open Studio.


[AUDIO] Integration Tools Podcast: Episode 1

The first episode of the Integration Tools Podcast was uploaded today.  The podcast is on integration tools' product releases, research, and events.  Episode 1 is on Informatica Cloud Express, DataFlux, iWay, and D&B360.

This Business of DaaS

These days, it's important to get people to use your service at some basic level and then sell some of those users additional services.  Gmail, YouTube, and this blog you're reading (Blogger) are free services that have premium (for fee) versions like Google Apps for Business.  If you're selling Data as a Service (DaaS), give as much as you can away to attract users, but have a strategy to convert users to customers.

Friday, January 14, 2011

Text File to JSON with Talend Open Studio

If you have a text file, Excel file, or RDBMS table, it's easy to generate a JSON file with Talend Open Studio.  This JSON file can be put under a web server and accessed by JavaScript.

Thursday, January 13, 2011

A DFD Primer

Go to yourdon.com for a great Wiki source for Data Flow Diagrams (DFDs).

Wednesday, January 12, 2011

The Costs of Integration Tools

In my work as a consultant, I've noticed that direct software costs are the biggest barrier to working with an integration tool.  Talend Open Studio removes this barrier completely.  This post explains the difficulty in bringing integration tools in house lists the prices of the market leaders.

Monday, January 10, 2011

[TUTORIAL] Turning a Column into Rows with Talend Open Studio

Use the tNormalize column to break a multi-valued attribute stored in an RDMBS column into individual rows.  This is great for a second pass on an input file to fill up a join.

Saturday, January 8, 2011

Data Flow Diagram (DFD) Example

In today's integration projects, it's useful to sketch out a design prior to coding.  The Data Flow Diagram, or DFD, from the late 70's is helps get projects off to a great start.

Upsert or Replace with Talend

Some ETL tools mark a target for upsert or replace in the transformation action.  Talend uses the target itself.

Friday, January 7, 2011

ETL and the Logical Model

For some applications, the logical model represents a significant investment.  From Hibernate and JPA, to Microsoft's ADO.NET Entity Framework, to PHP ORM libraries, development is moving farther away from the database and into higher levels of abstraction.  This presents a challange for ETL developers who don't use these frameworks and have to operate their tools on the physical model.  What's lost are the business relationships that represent the heart of the application.

Thursday, January 6, 2011

Java Libraries in Talend Open Studio

Although Talend Open Studio has a rich set of StringHandling functions, I prefer those in the StringUtils class of Apache Commons Lang.  One of my favorite functions is "isBlank" which checks for null, the empty String, and a String composed only of whitespace.  Fortunately, Talend provides and easy way to integrate this library call.

Wednesday, January 5, 2011

Using Expressions and Variables in Talend Maps

The Map -- tMap in Talend Open Studio -- is the workhorse of data conversion and ETL processing.  Occasionally, you'll need to work with a variable or expression.  There are several places in Talend where you can put an expression.

Tuesday, January 4, 2011

Collapsing Columns in Excel: Handling Empty Values with a Talend tMap

If you're generating Talend code in Java, you can use the tertiary operator (the old C programming language 3-argument operator) to handle empty values in an Excel spreadsheet.

Monday, January 3, 2011

A PL/SQL String Tokenizer

I was surprised that I couldn't find a simple string function like a tokenizer in PL/SQL.  I used DBMS_UTILITY.NAME_TOKENIZE for a while, but found that it limits the tokens to the maximum length of an Oracle name which is 32 characters.  The following PL/SQL is a string tokenizer that doesn't have this limitation.

Sunday, January 2, 2011

[TUTORIAL] Text File to MySQL Table with Talend

This tutorial loads a MySQL table with contact data from a CSV file using Talend Open Studio.  Talend Open Studio is an Eclipse-based open source integration tool that handles many different formats and provides a rich library of objects for transformation and scripting.

Open Source DaaS with Talend

If you're looking for an open source integration tool, consider Talend Open Studio.  Talend let's you create ETL and other integration processes using a panel-based Eclipse IDE (Galileo).  Talend works with many sources and targets including Excel, flat files, XML, and relational databases.  When creating processes with Talend, there is a rich library of Components which are packaged objects for transformations and scripting.