I'm coming from the world of Pig and wondering if Cascading has anything like Pig's "multi-query execution" (see:
https://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Multi-Query+Execution). In particular, I'm interested in being able to create multiple output files from a single MapReduce job (note: I'm not asking about creating multiple output files from a single Flow or Cascade). I've also heard people refer to this as creating "side files" (i.e. since Hadoop, by default, dumps all of the output to one directory, one needs to create "side files" to store distinct output sets from a single job).
For those of you who are familiar with Pig, the previously referenced link (see above) uses the following Pig script to illustrate "multi-query execution":
A = LOAD ...
...
SPLIT A' INTO B IF ..., C IF ...
...
STORE B' ...
STORE C' ...
This example is followed by a few notes on the optimizations which Pig does. Within this list, I'm particularly interested in the the following features of "multi-query execution":
2. Makes the split non-blocking and allows processing to continue. ...
3. Allows multiple outputs from a job. This way some results can be stored as a side-effect of the main job. ...
I hope that makes sense. And thanks in advance for your responses.