Counters?

50 Aufrufe
Direkt zur ersten ungelesenen Nachricht

PaulON

ungelesen,
21.04.2016, 06:45:0321.04.16
an cascading-user
Hey,

is it possible to get access to counter values from within an operation/buffer etc?

I can't see anyway to get the value from the flowProcess (though obviously I can increment it)


To explain why I want it (and I dont care if the value is the local count or the global count)

I need to partition data into X buckets, ensuring that data with the same value for 1 field goes into the same bucket.

So I am doing a groupby on the important field, and then the plan was to use a Buffer to iterate over the groups and mod the counter by the number of buckets to assign the bucket in a round robin fashion.

however, if I cant get the counter value (either local or global) I cant do this.


So, 
a) is there a better way to do this?
Its kinda like partitions I guess, but its not based on a field value
b) is it possible to access the counter from within the operation?

Cheers!

PaulON

ungelesen,
21.04.2016, 07:31:0921.04.16
an cascading-user
Or, can I just use a static counter within the Buffer, I assume that this will survive between groups and that each reducer will have its own JVM and thus its own "instance" of this static?

Ken Krugler

ungelesen,
21.04.2016, 09:20:1221.04.16
an cascadi...@googlegroups.com
On Apr 21, 2016, at 4:31am, PaulON <pone...@gmail.com> wrote:

Or, can I just use a static counter within the Buffer, I assume that this will survive between groups and that each reducer will have its own JVM and thus its own "instance" of this static?

If you’re maintaining state across calls, best is to use the context that you set up in the prepare() method.

That’s thread-safe, though this is currently not an issue for most (all?) of the planners.

— Ken

PS - an alternative approach is to add a field that you set to a random() % num_buckets value, and use that to partition as normal.



On Thursday, April 21, 2016 at 11:45:03 AM UTC+1, PaulON wrote:
Hey,

is it possible to get access to counter values from within an operation/buffer etc?

I can't see anyway to get the value from the flowProcess (though obviously I can increment it)


To explain why I want it (and I dont care if the value is the local count or the global count)

I need to partition data into X buckets, ensuring that data with the same value for 1 field goes into the same bucket.

So I am doing a groupby on the important field, and then the plan was to use a Buffer to iterate over the groups and mod the counter by the number of buckets to assign the bucket in a round robin fashion.

however, if I cant get the counter value (either local or global) I cant do this.


So, 
a) is there a better way to do this?
Its kinda like partitions I guess, but its not based on a field value
b) is it possible to access the counter from within the operation?

Cheers!

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/1f611c97-0f61-487d-8832-cc9fbbe72bf9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



PaulON

ungelesen,
22.04.2016, 04:27:0722.04.16
an cascading-user
Thanks Ken, interesting idea on the random, would save a GroupBy!

Regarding your first point, can you clarify the behaviour of a Buffer for me?
Is there one buffer per reducer or one per group?

Will the context persist across groups?
My (limited) understanding is that the context is per group and will not be available across groups?

Cheers!

Ken Krugler

ungelesen,
22.04.2016, 09:52:1622.04.16
an cascadi...@googlegroups.com
On Apr 22, 2016, at 1:27am, PaulON <pone...@gmail.com> wrote:

Thanks Ken, interesting idea on the random, would save a GroupBy!

Regarding your first point, can you clarify the behaviour of a Buffer for me?
Is there one buffer per reducer or one per group?

One per reducer.

Will the context persist across groups?

Yes.

— Ken


For more options, visit https://groups.google.com/d/optout.
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten