Subsets of data using the PartitionTap

32 views
Skip to first unread message

Elliot West

unread,
Jul 1, 2015, 6:54:11 AM7/1/15
to cascadi...@googlegroups.com
I've been wondering how people are currently using PartitionTap to read subsets of a partitioned dataset? It seems to me that currently one can currently only read all partitions with this Tap. I'd expect that the most common use cases would be to read in subsets of partitions. Data segmentation by a key is one of the main virtues of partitioning after all.

We always require a small subset of partitions in our jobs (last hour, last day, last month, etc.) and provide a subclass of Hfs as a parent tap to the PartitionTap. This subclass provides some path filtering in an overridden getChildIdentifiers() method. Is there a way to achieve this without stepping outside the Cascading SDK, and if not, is it on a roadmap?

Thanks - Elliot.

Chris K Wensel

unread,
Jul 1, 2015, 11:56:19 AM7/1/15
to cascadi...@googlegroups.com
we want to provide a predicate interface that limits the returned partitions, but haven’t gotten to it.

feel free to spec something out, happy to work with you to see if we can get a contribution.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAC3gpCb8GkbjyXPiDqB%3DM%2BRTRnyGhVtX-BoQcyhtoUanfSofyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Dave Maughan

unread,
Aug 5, 2015, 6:51:43 AM8/5/15
to cascadi...@googlegroups.com
Hi Chris,

I've done some work on this feature, utilizing the existing cascading.operation.Filter interface to limit the returned partitions. It would be great to get some feedback on the PR here: https://github.com/Cascading/cascading/pull/31

Thanks - Dave.

Chris K Wensel

unread,
Aug 5, 2015, 12:08:19 PM8/5/15
to cascadi...@googlegroups.com
awesome! I’ll take a look as soon as I can. 


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Reply all
Reply to author
Forward
0 new messages