In Part 1, I made a case for using standard Java development technologies (Git, Maven, Jenkins) for collaborating on Talend Open Studio Routines. These technologies are not a requirement for creating Talend Open Studio Routines; simply run Create Routine from TOS. Rather, they define a quality process by which multiple developers can interleave their changes, resolve conflicts, and fend of regression bugs.
The Bottom Line
Compare the following two screenshots. The first screenshot, Java Code in TOS shows the routine being edited in Talend Open Studio (DI). This is the Eclipse Java Code Editor which provides code completion and real-time syntax checking. Once I produce a syntactically-correct Java class, I can use the static methods in a Talend Job.
Java Code in TOS-Packaged Eclipse |
Java Code in Standard Eclipse |
The Process
The following Process Diagram shows a sequence of development activities and the products that are generated. This is just an overview, and future posts will break down each of the technologies.
Talend Routine Development Process |
The result of the Talend Routine Development Process is a zip file that can be loaded into Talend Open Studio DI using the Import Items command. I'm distributing the zip file -- shown here as brules-1.0.0-bin.zip -- on both the Talend Exchange and the Maven Central Repository. I expect most Talend users will pull the one from the Exchange. The Maven Central Repository copy is of use by developers collaborating on BRules.
Packaging
The zip file (brules-1.0.0-bin.zip) is created using a Java build tool called Maven. Let's black-box Maven for now as there are some advanced concepts like Assemblies and custom Plugins. Assume that it takes in Java code, third-party libraries, and some metadata information for TOS and produces the zip file. The zip file is placed in something called the Local Maven Repository which resides on your hard disk.
Source Code Control
An input to Maven is Java source. The Talend Routine is presented as BRules.java in the diagram. For collaboration, the best place to store Java source is a configuration management system. I'm using Git hosted at GitHub. Git lets me take in changes from different sources -- other developers or different activities -- merge them, and produce tagged versions. The following screenshot of the Git client tool SmartGit shows Git highlighting a recent change that adds a function "len" (right). The left side shows the original len-less class file.
Git Tracks Changes to your Routine |
In the middle of the diagram, there is a Test activity. There are actually two types of testing going on in the process. The first is the importing of the zip file (brules-1.0.0-bin.zip) into Talend Open Studio where the Routine is tested against Talend Open Studio jobs. The second is a unit testing activity that requires a unit test to accompany every change made to the Routine. This is a requirement that can be enforced with reports demonstrating the coverage with each commit by an individual developer (Cobertura).
Moreover, the unit tests can be run in an automated fashion using a tool called Jenkins which each changes made by a developer. If anyone introduces a change that breaks the test, we know right away.
Bringing standard Java development technologies into your Routine development enhances the quality and makes collaboration feasible. There is overhead and a learning curve associated with these technologies and this overhead is not justified in all instances. However, for BRules, I'm interested in involving other developers and the automated support means that I can run the project more efficiently.
Thanks for sharing. The job export can be through command line for TOS?
ReplyDeleteThis example is generating Talend artifacts and packaging them in a zip file for import into TOS. It's not extracting existing jobs from a project file.
Delete