dynamic set of formulas / UDJC

Skip to first unread message

Paul Stoellberger

Nov 13, 2012, 5:05:14 PM11/13/12
to kettle-d...@googlegroups.com

I'm trying to create a nice little UDJC step that reads from 2 inputs:
a) a list of formulas / fields i want to compute
b) the actual input data

So basically there is an arbitrary number of formulas i want to compute (that I will get from lets say a CSV file) on a given input data.

Sometimes it works, sometimes it doesn't. From what I can tell its because its undefined which input set reaches the UDJC first.
I'm getting the one with index 1, but that can be true or not.....

I wanted to use an "info stream" but I dont know how I can do that in a UDJC.
How do I do that?

If somebody thinks this step would be useful I can consider turning it into a real step later!

Any input welcome!



P.S: I dont know if KTRs get through.. so here a dropbox link to the sample transformation:

P.P.S: We wanted to use the metadata injection step for that, but I was told thats not working for formulas so I created this


Jens Bleuel

Nov 13, 2012, 5:19:19 PM11/13/12
to kettle-d...@googlegroups.com
Hi Paul,

try it this way:

  RowSet rowSetMain=findInputRowSet("Row Data Stream");
  Object[] r=getRowFrom(rowSetMain);
  RowSet rowSetInfo=findInputRowSet("Info Data Stream");
  Object[] rowInfo = getRowFrom(rowSetInfo);

See also http://jira.pentaho.com/browse/PDI-8738

Viele Grüße
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-d...@googlegroups.com.
To unsubscribe from this group, send email to kettle-develop...@googlegroups.com.
Visit this group at http://groups.google.com/group/kettle-developers?hl=en-US.

Paul Stoellberger

Nov 13, 2012, 5:25:01 PM11/13/12
to kettle-d...@googlegroups.com

Thanks for the quick response.

Is it also safe to process this 1 input rowset first and then assume it won't show up in getRow() ?
From what i gathered / tested it seems to be ok.


Matt Casters

Nov 13, 2012, 5:26:50 PM11/13/12
to kettle-d...@googlegroups.com
That's how most steps do it.  First you deplete the specific rowset and then just read from all others.

2012/11/13 Paul Stoellberger <p.stoel...@gmail.com>

Matt Casters <mcas...@pentaho.org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)
Fonteinstraat 70 - 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho  -  Powerful Analytics Made Easy

Paul Stoellberger

Nov 13, 2012, 5:34:20 PM11/13/12
to kettle-d...@googlegroups.com


Oct 7, 2014, 5:17:14 AM10/7/14
to kettle-d...@googlegroups.com
Hi Jens,

Picking up a very old thread here, but back to Pauls original problem. Is it a bug that Object[] r = getRow() seems to be indeterministic?

I've just had the same problem, in lots of examples on the net and in the source getRow() is called, but if the above is true then that call really should be avoided if you have multiple incoming streams.

For me it seems if you have a small number of rows it typically works as expected, but when there's more data it "reliably" goes wrong.

I've seen a couple of jiras related too - It seems to me the jira you mention 8738 still exists in 5.2.  (Suspect 11546 was a variant of this too - not enough info to reproduce tho)


Jens Bleuel

Oct 7, 2014, 9:01:31 AM10/7/14
to kettle-d...@googlegroups.com

Hi Dan,


PDI-8738 fixed the example: “Changed code snippet example of process row to read out first info rows then usual”


Also the description over here has been modified: http://wiki.pentaho.com/display/EAI/User+Defined+Java+Class


Any other action item, you would like? J




To unsubscribe from this group and stop receiving emails from it, send an email to kettle-develop...@googlegroups.com.

To post to this group, send email to kettle-d...@googlegroups.com.

Reply all
Reply to author
0 new messages