Hi Anjani,
On 05 Feb 2014, at 18:49 , Anjani Ragothaman <
ar...@scarletmail.rutgers.edu> wrote:
> Is there a way to select the only required files in the Input Data Unit?
What do you exactly mean with “select”?
> When there is a pipeline of tasks, if an output data unit of CU-A becomes the input data unit for the CU-B, and there are multiple jobs submitted in loop by the CUs, is there is a way to place only required files in the input data unit (like we do for the output data unit, specifying only required files to be transferred)?
I’m a bit guessing what you want to do, so please correct my if I’m wrong.
You say that the output of CU-A is a “mix” of certain types of data, of which some are required by CU-B, but others are not?
If so (and in true general), you want to group files together in a DU that have a similar dataflow.
From this it follows, that if files have a different dataflow (e.g. not all of them are required by a consecutive CU) that they should be split over multiple DUs.
In your case, this would mean that you would put all files that are needed by CU-B in one DU, and “the rest” in another DU. You can then deal with these individually.
Does that capture what you want to achieve?
Gr,
Mark