Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Sunday, December 2, 2012

Dependency Injection and Talend Open Studio Custom Components

Dependency injection is well-established in enterprise Java programming.  It's present in popular frameworks like Spring and Java EE 6 and helps foster an object-oriented design that leads to improved maintenance.  It can also be used in Talend Open Studio for custom component developers.

CDI

CDI, or Contexts and Dependency Injection, allows a container to initialize member variables in a class or to parameterize a method.  Unlike the initialization supported by the Java language ("int i=0"), these member variables can be initialized with complex objects (that are themselves initialized using CDI), or with values taken from an external resource.  Now a standard part of enterprise Java programming as of Java EE 6, CDI has been used by Spring early-on and is available in other lightweight CDI implementations like Google Guice (pronounced "juice") and PicoContainer.


Dependencies

The dependencies in CDI are different than the library dependencies of a build.  CDI dependencies are used to initialize member variables using a variety of techniques: constructor injection, setter injection, field injection.  CDI can also be used to parameterize a method.  Consider the following examples.


// injected with 'new MyClassValidatorImpl()
@Inject
public MyClass(Validator validator) { 

// injected with a value from a config file
public void doSomething(@ConfigParam String configParam1) { 

In the first example (MyClass / Validator), the caller gets a new instance to MyClass by making a call to a container or by also using CDI.  In the second example (doSomething), the method may be called as part of a framework with @ConfigParam prompting the container to use a config parameter as an argument when the method is called.

Containers

A container is a factory that will produce object instances.  It can be as simple as a library like the 470k Google Guice no-AOP or as complex as a clustered Java EE-compliant app server.  In both cases, some type of lookup can be done on a container method to get an instance.  The "new" operator is not used with CDI.


// its validator will be set by container
MyClass obj = container.getInstance(MyClass.class); 

Although syntactically, "new" can be used with MyClass, it would force the caller to also create a Validator object.  This leads to an anti-pattern called "Propagating Dependency".  See this posting in the PicoContainer documentation.

Propagating Dependency

Object-oriented programming (OOP) is more than just coding Java.  OOP is a practice that leads to a more maintainable and scalable program by creating cohesive objects that are building blocks for larger, more complex objects.  OOP counts on encapsulation to isolate portions of the program (classes) from side effects.  The more parts of a program rely on classes' implementation details, the more brittle the program becomes.


Propagating Dependency is a manifestation of bad encapsulation.  For example,

// caller
Validator validator = new MyClassValidator();
MyClass obj = new MyClass(validator);
obj.doSomething();

is bad because the caller not only has to know about MyClass, but also about Validator and MyClassValidator.  This is preferred.

// caller
MyClass obj = container.getInstance(MyClass.class);
obj.doSomething();

'caller' doesn't know anything about what makes up MyClass.  MyClass may be modified one day to not include a Validator as a member variable. (It might switch to an interceptor for validation.)  'caller' can get right to doSomething() with a minimum of effort and knowledge.

Google Guice

Some part of the program needs to know how to build MyClass, and that is the CDI container.  Google Guice is a small library (~470k for the non-AOP version) that will chase down a class' dependencies, provided that the caller use Guice's container -- called an Injector -- to get an instance.


In Google Guice, the specification of class dependencies (called "wiring") is done in a Java file called a Module.  For some classes that follow the Java Beans standard of having a default constructor, they don't need to be specified.  Others that are coded to an interface (Validator / MyClassValidator) or to an object instance, need to be mentioned with a bind() command.

The following is an example Module.  "AbstractModule" is a Google class.

final public class ScriptRulesModule extends AbstractModule {

  private final RuleList ruleList;
  private final Connection inputConn;
  private final Connection filterConn;
  private final Connection rejectConn;

  public ScriptRulesModule(RuleList ruleList,
   Connection inputConn,
   Connection filterConn,
   Connection rejectConn) {
   this.ruleList = ruleList;
   this.inputConn = inputConn;
   this.filterConn = filterConn;
   this.rejectConn = rejectConn;
  }

  @Override
  protected void configure() {
   bind(RuleList.class).toInstance(ruleList);
   bind(Connection.class).annotatedWith(Names.named("Input")).toInstance(inputConn);
   bind(Connection.class).annotatedWith(Names.named("Filter")).toInstance(filterConn);
   bind(Connection.class).annotatedWith(Names.named("Reject")).toInstance(rejectConn);
  }

}


The member variables (ruleList, etc) and constructor are coded as in any other class.  The configure() method overrides an AbstractModule method and tells callers of getInstance() to make several substitutions.  When a RuleList reference is made alongside an @Inject, add the specific ruleList instance used to construct the class.  When a Connection reference is made along with a @Named("Input") annotation plus an @Inject, put in the inputConn member variable.  "Filter" and "Reject" behave similarly to "Input".

Guice in Talend

To give a Talend Open Studio Custom Component the ability to use Google Guice, package the JAR file with the component and add it to the <IMPORT> element in the XML config.  I also package the standard Javax Inject JAR file so that my code adheres to the CDI standard, except the parts that interact directly with Guice.

<IMPORT MODULE="guice-with-deps.jar" NAME="Guice 3.0 No AOP REQUIRED" REQUIRED="true"></IMPORT>
<IMPORT MODULE="javax.inject.jar" NAME="Javax Inject" REQUIRED="true"></IMPORT>

In the JET code, the Module mentioned in the previous section will be created.  In this snippet of JET code from tScriptRules_begin.javajet, rulesList, inputRowName, filterRowName, and rejectRowName are data structures created from widgets on the Component View panel and the metadata of connections created in a job.

I left off the fully qualified names needed for JET for readability.  

Injector injector_<%= cid %> =
 Guice.createInjector(new ScriptRulesModule(
  ruleList_<%= cid %>, 
  new Connection("<%= inputRowName %>", <%= inputRowName %>), 
  new Connection("<%= filterRowName %>", <%= filterRowName %>), 
  new Connection("<%= rejectRowName %>", <%= rejectRowName %>)
   )
);


This creates the container (Injector).  Remember, 'container' in this case is a 470k JAR file and not a Java EE-compliant app server.

Getting an instance of the bean that will do the Talend Open Studio Custom Component's work is simple.  (Fully qualified names are left off again.)

ScriptRulesBean rulesBean_<%= cid %> =  injector_<%= cid %>.getInstance(ScriptRulesBean.class);


The Benefit


To see the benefit in using CDI for Custom Component development, consider the following constructor.


@Inject
public ScriptRulesBean(ScriptRulesValidator validator, 
      JexlEngine jexl, 
      RuleList ruleList, 
      @Named("Input") Connection inputConn, 
      @Named("Filter") Connection filterConn, 
      @Named("Reject") Connection rejectConn) throws ScriptRulesValidationException { ...


The Module presented in the earlier section tells Guice to put inputConn into the constructor parameter with the @Named("Input") annotation.  The same goes for "Filter" and "Reject".  "validator" and "jexl" are injected using their own default constructors.  ruleList is injected using a Module class variable set up in the begin.javajet using a table value pulled from Component View.

My caller JET (tScriptRules_begin.javajet) mentions the parameters needed for construction of the ScriptRulesBean object: ruleList, inputConn, filterConn, rejectConn.  That's acceptable because it's JET's responsibility to pull values from the Component View and from the job's metadata.  JET does NOT need to know about JexlEngine and ScriptRulesValidator.  If it did, and if I change the ScriptRulesBean design later, then I also commit to updating the JET code.  This is exactly the type of ripple effect I'd like to prevent.

Additionally, I've prevented the Propagating Dependency anti-pattern.  The calling JET does not have to first construct a ScriptRulesValidator with its dependencies (JexlEngine) to carry that object over into constructing a ScriptRulesBean with its dependencies.  This is preventing a mistake in the API, say if I inject the wrong object.

The result is a cohesive set of classes that are small, focused, and easy to test.

The downside?  Training.  Using Google Guice involves a learning curve, but the complexity of adding another library and interface to your custom component will be more than offset by the reduction in complexity of managing your own dependencies.

Contexts and Dependency Injection (CDI) is everywhere in enterprise Java programming.  From a clustered WebSphere app server to a 470k Google Guice JAR for Android, CDI can be used in many deployments.  In Talend Open Studio, a custom component developer can use CDI to wire up the object instances that make up a program by including a small library like Google Guice into the component and creating objects using an Injector.




1 comment: