Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Thursday, March 27, 2014

Collaborating on Talend Open Studio Routines: Part 5 - Metadata

Managing the Java code that makes up a Talend Open Studio Routine outside of Talend lets you use technologies like source code control and continuous integration to improve the maintenance of your Routine.  This blog post describes a publicly-available Maven Plugin called TalendRoutine that will bridge the gap between the standard artifacts produced by Maven and the deployable Routine needed by Talend Open Studio.
In a previous post, I mentioned that extracting the Java code of a Talend Open Studio Routine brings significant maintenance benefits especially in a team setting.  However, a Talend Open Studio Routine is more than just Java code.  There is metadata describing the Routine and the packaging of the Routine.

Talend Routine Packaging

A Talend Routine is packaged in a zip file for distribution.  This zip file needs to adhere to a strict format that is undocumented.  This post describes a packaging that has worked for me since Talend Open Studio version 4.  From some reverse-engineering of exported packages, I've determined the following structure.
  • Zip file.  The Routine is packaged in a zip file.
  • Top level project folder.  This is the name of a Talend Open Studio project.
  • talend.project file. A file in the top level project folder describing the project.
  • lib dir.  Under the top level folder, this folder contains any JAR files used by the Routine
  • code dir. Folder and subfolders containing the Routine (stored as a .item rather than a .java file) and the Routine metadata.
  • .item file.  In a subfolder of code.  A compilable Java file making up the routine.
  • .properties file.  In a subfolder of code.  A descriptor for the Routine
I use Maven to produce this structure using the Assembly Plugin.  Maven is good at producing JARs and WARs without extra configuration, but to produce the Talend Open Studio Routine format, some customization is needed.  For more about the Assembly Plugin and TOS Routines, read this post.


When I created the Talend Open Routine "BRules", I packaged the zip file with a talend.project file and a BRules.properties file that was based on an export from Talend Open Studio.  I wrote the BRules Routine in TOS, exported the Routine, pulled out the source and metadata, and re-deployed the Routine using Maven (including the Assembly Plugin).  As I added functionality to BRules, I tweaked the metadata to update the import list and version.

This tweaking is a problem in the development process because it can become out-of-sync with the Routine being developed.  Take the case of an extra JAR.  If I use a new third-party library in BRules.java, the JAR file is automatically downloaded by Maven.  With the Assembly Plugin, I can -- using a general syntax -- specify that all my dependent JARs go into the TOS Routine's /lib folder.  However, when I build the Routine distribution (the zip file for import into TOS), I have to remember to also tweak the .properties file to include the new JAR in an import.

This scenario is more complicated if I use a third-party library that requires a dependency graph (dependencies of dependencies).  I would have to look at what Maven pulled down and build up the metadata with multiple <import> entries.

Generating Metadata

To resolve this problem of out-of-sync metadata, I created a custom Maven Plugin called TalendRoutine.  This Plugin is configured through your pom.xml to generate the talend.project file and the Routine.properties file.  In the BRules case, Maven generates a talend.project file and a BRules_version.properties file based on the direct and indirect information contained in the BRules pom.xml file.  See the following UML diagram.

The TalendRoutine Maven Plugin Generates Metadata
The pom.xml loads the TalendRoutine Maven Plugin in its build/plugins section.  The pom.xml directly specifies the label, purpose, description, version, and path that will be used in the generated files: talend.project and Brules_version.properties.  The files are stored in the /target folder and zipped up into the distributable archive along with the other artifacts referenced by the Assembly Plugin (JAR files, .java code copied over as .item).

This is a fragment from the build/plugins section of the Brules pom.xml file.

    <purpose>Validation, conversion, and shorthand utilities</purpose>
    <description>A collection of functions for validating data, converting data, and shortening expressions</description>
    <version>${parsedVersion.majorVersion}.${parsedVersion.minorVersion}  </version>

The TalendRoutine Maven Plugin also has access to the build process meaning that it can use the dependencies identified by Maven (copied over by the Assembly Plugin) to form <import> directives in the Brules_version.properties file.

Lastly, the BRules pom.xml file uses another plugin (different from TalendRoutine) called build-helper to extract the version of the overall project and apply it to the archive filename and the metadata.  The BuildHelper Plugin sets the version attribute for the TalendRoutine Plugin.

This is a fragment of the pom.xml file configuring BuildHelper.



The following is a .properties file generated for BRules. I've abbreviated the xmi:id and id attributes; they're consistent in the real version.  (I'm not sure why some of the attributes are fIREHOSE-cased.) 

<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:TalendProperties="http://www.talend.org/properties">

<TalendProperties:Property xmi:id="1" id="2" label="BRules" purpose="purpose" 
  creationDate="2014-03-20T09:51:58.553-0400" modificationDate="2014-03-20T09:51:58.553-0400" version="1.6" statusCode="PROD" item="5"> 
  <author href="../../../talend.project#3"/>

  <TalendProperties:ItemState xmi:id="4" path="bekwam"/>

<TalendProperties:RoutineItem xmi:id="5" property="4" state="2"> 
  <content href="BRules_1.6.item#/0"/>

  <imports xmi:id="4" mESSAGE="Required for using this component." mODULE="libphonenumber-3.8.jar" nAME="libphonenumber" rEQUIRED="true" />

  <imports xmi:id="6" mESSAGE="Required for using this component." mODULE="commons-lang3-3.0.1.jar" nAME="commons-lang3" rEQUIRED="true" />

  <imports xmi:id="7" mESSAGE="Required for using this component." mODULE="brules-json-1.6.0.jar" nAME="brules-json" rEQUIRED="true" />

  <imports xmi:id="8" mESSAGE="Required for using this component." mODULE="joda-time-2.3.jar" nAME="joda-time" rEQUIRED="true" />




The TalendRoutine Maven Plugin is hosted on the Maven Central Repo, so it can be used in your pom.xml files without explicitly downloading or installing anything.  The source code is also available in a sources distribution.

Additionally, you can go to GitHub to see the lastet version: https://github.com/bekwam/plugins-repos-1.

No comments:

Post a Comment