Thanks for the info Vijay.
I have been reading the existing development documentation, and I also
had a look to the implementation of some Steps.
I wrote some conclusions, and I've also got some concrete questions,
that I paste bellow. I'd be happy if someone could confirm (or
correct) my conclusions, and answer my questions:
"Kettle offers two different ways for processing information:
Transformations and Jobs.
Transformations are formed by one or more Transformation Steps and
only apply to tabular data, which can be processed row by row. In
Transformations, several Steps can be processed at the same time
(concurrent execution), because each row is processed independently of
the others, and the result of each row flows from one step to the
following one.
Therefore, if we examine the estate of a transformation in a precise
moment, the, the first step may be processing the 3th row, the second
step may be processing the 2nd row, the third step may be processing
the 1st row, and the four step may be waiting for a result to arrive
from the third step.
As each row is processed in an atomic way, rows may be processed in
different computers, which is know as clustered execution.
By contrast, Jobs are formed by one more Job Entries, which can
perform any kind of task (they can apply to non-tabular data). Jobs
Entries are executed in a sequential order, so a Job Entry is only
executed when the previous one has totally finished. Therefore, Jobs
don’t allow clustered execution."
Now, my questions:
- If I need to transform some data which is not formed by rows (for
example, an image)... is the Transformation Step architecture flexible
enough to process this kind of data, of would I need to program my
transformation as Job Entries?
- I have seen that in Transformation Steps, rows flow from one step to
the following one. How does it work for Job Entries? Is there a
similar data flow between Job Entries?
Thank you very much,
César Martínez Izquierdo
2009/6/16 Vijay A <
avi...@dataalp.com>: