Custom Partitioner not working for GroupBy

14 views
Skip to first unread message

PaulON

unread,
Oct 11, 2018, 2:54:01 PM10/11/18
to cascading-user
Hey,

we have implemented a custom Partitoner thats working well for our CoGroups, however we just tried to use it with a GroupBy and its failing with a ClassCastException

Caused by: java.lang.ClassCastException: cascading.tuple.io.TuplePair cannot be cast to cascading.tuple.io.IndexTuple

For both CoGroup and GroupBy we are grouping on the same field

Our Partitioner is defined as:

public class ReferencePartitioner extends HasherPartitioner implements Partitioner<IndexTuple, Tuple>, Configurable {


@Override
public int getPartition(IndexTuple key, Tuple value, int numReduceTasks) {
String ref = key.getTuple().get(new Fields(REFERENCE), new Fields(REFERENCE)).toString().trim();



Can anyone shed any light on why we would see a different class type between CoGroup and GroupBy and what, if any, our options are?

Is there a simpler, more generic way to define a custom Partitioner? I have seen mention of "just use the Hasher Interface" but Im not clear on how to do this.

Cheers!
Paul

Chris K Wensel

unread,
Oct 11, 2018, 4:09:08 PM10/11/18
to cascadi...@googlegroups.com
Can you provide some context on how you are using it? and the Cascading version?

can you simply use cascading.tuple.Hasher interface to designate the hashing algorithm for a given field.


ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/e94d2393-591d-4bb0-b15a-efa817e880e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


PaulON

unread,
Oct 13, 2018, 5:03:23 AM10/13/18
to cascading-user
Basically we have some join operations that have a small set of keys (100's) and we want want one key per reducer to avoid doubling (or worse) the run time.
We match the # reducers to the # keys (which we know up-front) to get a 1 input group per reducer.

We are on 3.2.1 version.

I may be misunderstanding the usage, but I dont think the Hasher comparator will help what we are trying to do above?
Its not that we want different keys to go to the same group, we want different key groups to go to a different reducer.

I could just define another CustomPartitioner of type TuplePair rather than IndexPair, but it seems like I should be able to use a omre generic one?

Cheers!

Chris K Wensel

unread,
Oct 15, 2018, 10:46:21 PM10/15/18
to cascadi...@googlegroups.com

Basically we have some join operations that have a small set of keys (100's) and we want want one key per reducer to avoid doubling (or worse) the run time.
We match the # reducers to the # keys (which we know up-front) to get a 1 input group per reducer.

We are on 3.2.1 version.

I may be misunderstanding the usage, but I dont think the Hasher comparator will help what we are trying to do above?
Its not that we want different keys to go to the same group, we want different key groups to go to a different reducer.

reducer_partition = hash modulo num_reducers

if your hash simply returned the reducer it should go to, you are done.


I could just define another CustomPartitioner of type TuplePair rather than IndexPair, but it seems like I should be able to use a omre generic one?

GroupBy groups on the grouping key.

CoGroup groups on the ordinal/index of the tuple stream and the grouping key. this is so the right most sides arrive before the left sides of the join, so that the leftmost side is iterated over once.

I presume you are overriding the partitioner class directly on the JobConf before its submitted?

ckw

Reply all
Reply to author
Forward
0 new messages