Row Normaliser and scale

26 views
Skip to first unread message

Dan

unread,
Jul 25, 2016, 8:04:56 AM7/25/16
to pentaho-...@googlegroups.com
Hi,

I'm wondering whether or not I should raise this:

The row normaliser doesn't scale linearly.

Now; this step is all about blowing up the number of rows, so I had thought it would make sense that if it can produce 100k rows per second, it would ALWAYS produce that rate, regardless of the level of normalisation. However it doesnt.  (Assuming you have adequate memory etc - this step does benefit from a lot of RAM)

Example:

900 fields being normalised - 86k records per second
2500 records being normalised (i.e. ~2.6x) - 26k records per second.

I feel it should still be able to produce 86k records per second even when normalising 2500 records. Do you agree?  (Or at least, it would be nicer if it got closer)

In other words, double the number of attributes, double the time it takes, but no more.  At the moment if you use 2.6x the attributes it takes 7.7x as long.

(And of course, you can then scale it beyond that by partitioning etc.)

Thanks,
Dan

Dan

unread,
Apr 26, 2017, 4:43:06 AM4/26/17
to pentaho-...@googlegroups.com
Hello everyone!

I'm very pleased to say that after some quite robust defence of my case I managed to get through to a Pentaho developer who immediately agreed this was a bug.

The bug was fixed overnight, and now the test case goes from 12 minute to 1.5 minutes. This fix will be in the next service pack release.

Normalisation is a vital tool for #IOT scenarios so I'm quite relieved about this.  I just wish i'd raised the case back in July rather than last week!

Dan

Pedro Alves

unread,
Apr 26, 2017, 5:28:14 AM4/26/17
to pentaho-...@googlegroups.com
Always your fault! 

--
You received this message because you are subscribed to the Google Groups "Pentaho Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pentaho-community+unsubscribe@googlegroups.com.
To post to this group, send email to pentaho-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/CAGujiv0QVQgZ%2BVd82_0BLJ8PpM4Pv-8Az%3DSdzvjqX-%2BC9_-SMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages