VariableSpace and Database Connections within transformations.

66 views
Skip to first unread message

Brandon Jackson

unread,
Mar 11, 2015, 8:31:09 PM3/11/15
to kettle-developers
I am trying to trace a problem very different from those that I am used to and wanted to get some pointers to a few source files or perhaps even breakpoint suggestions to dive further.

Our team has been migrating to the PDI 5.3.0-EE DI Server at set of jobs and transformations that take advantage of variables to set specific JNDI connection names that allow the set of down stream transformations to read from separate database machines.

What seems to be happening is the parameterize JNDI name ${JNDI_NAME} within a connection SG_LG in our case, does not appear to be getting set when the transformation is dispatched and the Table Input step initializes itself and tries tries to use the connection. Errors in the logs state that ${JNDI_NAME} is not bound in Context.  The connections work fine in other ETL.  Logging steps, UDJCs, all verify that ${JNDI_NAME} does get initialized. If I change the connection to any static configuration, the DI server is happy with it. All variable expansion within SQL statements occurs normally.

I've attached a debugger to the DI server, and have the PDI 5.3.0-R code base loaded up and have tried some breakpoints around init() in TableInput, but suspect a better target may be BaseDatabaseMeta or maybe the MSSqlServerMeta.  Sometimes IntelliJ acts like it wants to decompile classes instead of reading the code.  Maybe I have some old 5.2.X stuff hanging around.

Am I on the right track or is there another place(s) that would make more sense when trying to fumble around in the code?  
Are there any rules of initialization that PDI goes through with respect to Jobs containing transformations and their connections that I should know, but may not be immediately obvious?

Thanks for any pointers.


Matt Casters

unread,
Mar 11, 2015, 8:40:09 PM3/11/15
to Kettle Developers mailing list

I stopped reading at JNDI ☺

Op 11-mrt.-2015 20:31 schreef "Brandon Jackson" <usbr...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kettle-develop...@googlegroups.com.
To post to this group, send email to kettle-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/kettle-developers.
For more options, visit https://groups.google.com/d/optout.

Brandon Jackson

unread,
Mar 11, 2015, 8:50:45 PM3/11/15
to kettle-d...@googlegroups.com
Been happy with JDBC for years.  Throttling connection volume and query volume pushed us towards it.

Sent from my iPhone

Matt Casters

unread,
Mar 12, 2015, 10:23:57 AM3/12/15
to Kettle Developers mailing list

People constantly tell me about the problems they have with JNDI but they rarely change to regular JDBC connections in Kettle.  Go figure ☺

Op 11-mrt.-2015 20:50 schreef "Brandon Jackson" <usbr...@gmail.com>:

Brandon Jackson

unread,
Mar 13, 2015, 1:24:15 PM3/13/15
to kettle-developers
Our Electronic Medical Record vendor made a stink about how many connections we established per day (22,000) via JDBC.  I just wanted to use connection pooling to reduce that number and also the connection overhead.  Pentaho Support said only the DI server supported connection pooling via JNDI.  It might be possible to do something with simple-jndi, but I am on my own.

5 employees, full time, for 7 day, we imported our filesystem based ktr and kjb files and consolidated all of our one off connections to JNDI names.
We had to follow all the tips you gave at Pentaho World. (Don't use internal kettle variables, be careful about steps that want files on disk, etc).
We now have a network share, carefully tested to where launching Kettle as a client, and launching DI, would start looking for files so the repo based ETL can work both on a client computer and the remote server.

mklink in MS Windows (Microsoft's pseudo secret symbolic link tool) , in conjunction with mounting an SSH share.

Only one single set of ETL did not work correctly, which was using variables to make the back end database connection dynamic.

That's pretty good really! Just hard as heck to debug via logs on a remote server.  Naturally it works perfectly on any OS (Linux and Windows, Mac is a hit or miss because sometimes your install will find the simple-jndi jar, sometimes it won't) using the PDI stand alone client.

Now I'm just trying to get familiar with building and debugging kettle in Intellij. 



Matt Casters

unread,
Mar 13, 2015, 1:48:45 PM3/13/15
to Kettle Developers mailing list

Sound like you're having a lot of fun AND succes.  Thanks for sharing your story Brandon, I really appreciate it.
Connection pooling would also be possible on a job level for the duration of the job. That's why it's beneficial to organise loops, queues and the like in a job (repeat option etc) or even better in a transformation. JNDI then doesn't really matter in those cases. If you have 3rd party software calling the shots it gets really hard to reduce the number of connections made.
Good luck!
Matt

Op 13-mrt.-2015 13:24 schreef "Brandon Jackson" <usbr...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages