One option is specifying a DecoratorTap sub-class that manages the intermediate files. PartitionTap does some work with managing filenames, by virtue of managing folder/directory names.
thus writing files based on partitions, and using a predicate to only read those that matter would be interesting. unfortunately in 3.0 i wasn’t able to collapse PartitionTap with DecoratorTap.
(a parquet tap would allow for column projection, this might be more interesting)
hadoop provides little control over file names, and Cascading spends much time overcoming this to at least provide control over folder names.
subsequently, a HashedTap implementing DecoratorTap could be implemented to do this. (as of 2.7 you can have the planner wrap intermediate taps with a DecoratorTap for intra flow taps, see DistCacheTap)
that said, more control over this would be by leveraging the Cascading 3 planner to push down predicates to the intermediate taps. where intermediate taps only come into play when using MapReduce.
but that seems moot in the face of actually not writing intermediate data to hdfs between hash partitioned pairs of work.
I would suggest giving Cascading 3 and the Tez platform support a run through.
or time can be spend whipping incremental improvements against a dead horse.
ckw