Featured Post

Applying Email Validation to a JavaFX TextField Using Binding

This example uses the same controller as in a previous post but adds a use case to support email validation.  A Commons Validator object is ...

Monday, February 7, 2011

Running a File of SQL Statements with Talend Open Studio

Read in and execute a file of SQL statements, one per line, into a MySQL database using Talend Open Studio

Some ETL tools have a facility for running a block of SQL statements stored in a text file.  To achieve this functionality in Talend Open Studio, combine File and Database components.

Take a text file like this (insert_statements.txt)


INSERT INTO SR_DIM_FUND (FUND_NM, FUND_SOURCE_NM) VALUES ('carl1', 'aaa');
INSERT INTO SR_DIM_FUND (FUND_NM, FUND_SOURCE_NM) VALUES ('carl2', 'bbb');
INSERT INTO SR_DIM_FUND (FUND_NM, FUND_SOURCE_NM) VALUES ('carl3', 'ccc');


I've also tested


ALTER TABLE SR_DIM_FUND MODIFY FUND_LOAD_DT DATETIME NOT NULL


Create a job with a tFileInputDelimited component driving a tMySQLRow. This will need to be mediated by a tFlowToIterate. Each line of SQL is part of a flow that is converted and run as a variable query in tMySQLRow.

The following job is an example

Talend Open Studio: Processing a File of SQL Statements 

tFileInputDelimited is a component that uses a single string field as a schema. I added this directly to the component, making it a built-in-property with a field "statement".

tFlowToIterate doesn't require any special configuration.

tMySQLRow uses the connection prepared in the opening subjob and uses the following for the query, rather than a hardcoded SQL string based on the schema added by Talend.


((String)globalMap.get("row1.statement"))


Use two subjobs to create and close the MySQL connection: tMySQLConnection and tMySQLClose.

tFileInputDelimited will drive the processing, setting a new "row1.statement" that provides the SQL run by tMySQLRow.

15 comments:

  1. Hi Carl,

    I've done followed your steps but I hit errors saying that

    Exception in thread "main" java.lang.Error: java.lang.Error: java.lang.Error: Unresolved compilation problems:
    Syntax error, insert ";" to complete LocalVariableDeclarationStatement
    Syntax error on token "statement", AssignmentOperator expected after this token
    Syntax error, insert ";" to complete BlockStatements
    Syntax error on token "statement", AssignmentOperator expected after this token

    at demo.autoconnect_0_1.AutoConnect.tFileInputExcel_1Process(AutoConnect.java:1131)
    at demo.autoconnect_0_1.AutoConnect.runJobInTOS(AutoConnect.java:2429)

    Could you plz help me :(

    ReplyDelete
  2. How do i get the results of the SQL being executed to display somewhere?

    I have some SQl that wants to return a 'count' of records, but i want to be able to see the number that the sql is returning. So, in the file, i have three statements that are like:
    select count(*) AS aRows from tableA
    select count(*) AS bRows from tableB
    select count(*) AS cRows from tableC

    i want to see: aRows = X, bRows = x, cRows = x ...

    Thanks.

    ReplyDelete
    Replies
    1. You can use an input dbms component and a sql query like

      SELECT 'Table A', COUNT(*) FROM table_a
      UNION
      SELECT 'Table B', COUNT(*) FROM table_b
      UNION
      SELECT 'Table C', COUNT(*) FROM table_c

      to build a flow that you can use elsewhere.

      Alternatively, run an input component (say tMssqlInput) into a tSetGlobalVar using a Main Flow. Define a key (say "field1") with a value that maps to something in the input (like "row1.field1"). You can access the value in the rest of the job like

      globalMap.get("field1")

      Good luck

      Delete
    2. Thanks for the reply.
      I'm using tMySqlInput with a query:

      "SELECT count(*) AS rowOne FROM `myTable`
      UNION
      SELECT count(*) AS rowTwo FROM `myTable`"

      hooked up to a tLogRow and the only result I get is:

      Starting job z_testing at 15:46 06/09/2012.


      [statistics] connecting to socket on port 3641
      [statistics] connected
      37|0|||||||||||||||||false|||||||||||||||||||||||||||||||||||||
      [statistics] disconnected
      Job z_testing ended at 15:46 06/09/2012. [exit code=0]

      '37' is the right number, but it's only shown once (should be twice at least), I dont know what the other empty piped results are and it doesnt give me the 'rowOne' or 'rowTwo' as well. I would like to know which result count is from what call.

      Any other additional feedback would be greatly appreciated
      Thanks.

      Delete
    3. Hi,

      Use a "UNION ALL" rather than a UNION. Also, in my comment, I suggested using a constant for the table name so that you can correlate the COUNT(*) statement with the table. With this query, you should have a 2-element schema (table_name, table_count).

      You can also include the table name as part of the key used in tSetGlobal. As the key, use the Java concatenation operator to form a unique variable name like

      "counter_" + row1.table_name

      which would be accessed like this later on

      globalMap.get("counter_tableA");

      Delete
    4. I'm a new TOS user. I need to check the count of available ID's before I start loading data. I have a query SELECT count(1) AS ACCTIDCNT FROM sysdba.SLX_ACCOUNT_IDS_TMP WHERE DATEUSED IS NULL if I connect the tMSSqlInput to tLogRow I can see the count returned by the query. If I try to connect it tSetGlobalVar as you recommended in the post on September 5, 2012 at 7:30 PM I only get the number 1. What do you mean by "Define a key (say "field1") with a value that maps to something in the input (like "row1.field1"). " When I try to access row1.ACCTIDCNT I get the error msg. ACCTIDCNT cannot be resolved or is not a field.
      thanks,
      john

      Delete
    5. Hi,

      My post from 2012 may be a bit misleading. It's a tSetGlobalVar injected in the middle of a data flow and doesn't have a direct relationship to the flow. In the post, the tSetGlobalVar is used to set a constant value that will be blended with the data from an Excel input flow. I don't make reference to "row1" in the configuration of the tSetGlobalVar.

      Fortunately, you can use a tSetGlobalVar to capture the counnt of valid IDs from a query. You'll need an extra component called tFlowToIterate. This will take all of the input fields and put them in the globalMap. With this usage, you can make a reference to "row1.ACCTIDCNT".

      So, the sequence in the job will be tMSSqlInput -> tFlowToIterate -> tSetGlobalMap. The tMSSqlInput will also have an On Subjob Ok routed to something that will use your count, say a tJava with a System.out rather than a tLogRow.

      I'll post a job on this later today with some screenshots.

      Delete
    6. No need for the screen shots I was able to map it out and get it to work.

      Thanks for your help and especially the quick response. Your site has been a Godsend. I use it daily.

      Thanks again,
      John

      Delete
    7. I'm glad to hear you got this working. I finally got around to adding the post on this topic with screenshots here: http://bekwam.blogspot.com/2014/05/use-tsetglobalvar-to-record-count.html.

      Delete
  3. Hi,
    If tFileInputDelimited is directly connected via 'row1(Main)' to tMySQLRow, is it possible to assign the input (row1.columnName1) to a variable within the tMySQLRow?

    Thank you.

    ReplyDelete
    Replies
    1. Have you tried a tMap between the two components?

      Delete
    2. Hi,
      the job states that it has 2 execs finished, but it has not populated the table. The csv contains the following code only:
      create table tbl1 (col1 int);
      insert into tbl1 (col1)values(1);

      Delete
    3. Hi,

      Try adding a third statement "commit;".

      Delete
    4. Hi,

      Thanks for the help - so obvious when you think about it!!

      Delete