I've been wondering how people are currently using PartitionTap to read subsets of a partitioned dataset? It seems to me that currently one can currently only read all partitions with this Tap. I'd expect that the most common use cases would be to read in subsets of partitions. Data segmentation by a key is one of the main virtues of partitioning after all.
We always require a small subset of partitions in our jobs (last hour, last day, last month, etc.) and provide a subclass of Hfs as a parent tap to the PartitionTap. This subclass provides some path filtering in an overridden getChildIdentifiers() method. Is there a way to achieve this without stepping outside the Cascading SDK, and if not, is it on a roadmap?
Thanks - Elliot.