Alfresco is built on a rich set of web services which are available for developers in the form of a RESTful API. Talend Open Studio can make RESTful web services calls using the tREST component which saves an HTTP response in a schema's field as a chunk of text. tExtractXMLField can parse this chunk of text (an XML document) into something usable.
An Alfresco Space is a hierarchical container like a folder or a directory. RESTful web service calls can access an Alfresco Space by forming a URL from the Space's position in the hierarchy. For example, the following screenshot shows a Space CLIENT1 that is in the Clients Space which in turn is in the toplevel Company Home Space.
CLIENT1 contains a child Space, Input, and a content file TestHTML.
|Alfresco Space CLIENT1 Contents|
This job is based on tREST which retrieves an XML document returned by an Alfresco call. The tREST is routed to a tExtractXMLField to break the document up into individual fields. The fields are directed to a tLogRow. A tLoadLibrary is used to introduce a routine that will Base64-encode the username and password.
|tREST Job Calling Alfresco|
The following screenshot shows the configuration for the tREST component. A path is built using the hierarchical Spaces. This installation of Alfresco protects these Spaces using Basic Authentication. An HTTP header is used to pass along Authorization credentials. A Base64-encoded String of the form "username: password" (note the colon) is the argument. The String encoding is performed by Commons Codec.
tREST saves its results as a field "Body" in a schema (in memory). This Body field can be directed to a tExtractXMLField where XPaths will break the XML document Body into individual fields. Alfresco uses namespaces, and these are critical to the successful operation of the tExtractXMLField. To study the namespaces, I hit the URL in the tREST in the browser and brought it into LiquidXML Studio for analysis.
|Breaking Apart XML Response from Alfresco|
LiquidXML Studio has a tool for forming XPaths by selecting one of the desired elements to be returned.
In order to form an Authorization HTTP header, the username and password need to be Base64-encoded. I'm doing this with a JAR file called Commons Codec. These two screenshots show the reference to the JAR file. I'm also doing a static import to shorten the amount of typing needed in my tREST HTTP header section.
|Loading a JAR File|
|Static Import in tLoadLibrary|
The following screenshot shows the results. From the RESTful API, the two children of CLIENT1 are returned. One child is the folder "Input". The other is the content HTML document "TestHTML".
Talend Open Studio can access Alfresco functionality using the tREST component. Since the response is XML, you can use Talend's tExtractXMField to break out the fields. When a bug is fixed with tHttpRequest, another job design is available which will conveniently save the results to a file to be handled by Talend's file-based processing.