On Jun 30, 2016, at 3:01 PM, kdel...@liveramp.com wrote:
A common challenge I run into is when I have the result of a GroupBy, and for each group I want to either pass all the values to the output or none of the values, but I don't know which it is until I've seen all the values. What's the best way to do this? I know I cannot reset the arguments iterator from bufferCall (to iterate twice) nor can I chain Buffers together, and caching them will usually result in running out of memory. What do you suggest?-Kevin
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/34b61aad-e2e4-40d0-ae24-fb2e7e9f06a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Jun 30, 2016, at 3:01pm, kdel...@liveramp.com wrote:A common challenge I run into is when I have the result of a GroupBy, and for each group I want to either pass all the values to the output or none of the values, but I don't know which it is until I've seen all the values. What's the best way to do this? I know I cannot reset the arguments iterator from bufferCall (to iterate twice) nor can I chain Buffers together, and caching them will usually result in running out of memory. What do you suggest?
On Jul 6, 2016, at 10:32am, Kevin Delgado <kdel...@liveramp.com> wrote:But this only works if the flag can be applied independent of the other tuples in the group. I am asking for the situation where I am not sure whether the group can be flagged for deletion before seeing all the values in a group. For example. if I have a stream of records with two types of ids, id1 and id2, I need to group all the records by id1, and then delete EVERY record with the same id1 if ANY two records with the same id1 have duplicate id2. A simple upstream function will not work because i cannot determine whether an id2 is a duplicate for a group of id1 without seeing all the tuples of a specific id1. I know I can use two GroupBy and Buffer calls (first for flagging, second for filtering) but is there a way to do it with just one?
On Thursday, June 30, 2016 at 3:41:41 PM UTC-7, Chris K Wensel wrote:if i understand correctly,just put an upstream Function in play to flag the tuple as undesirable, secondary sort on the flag, if it shows up at the top of the iterator in the buffer, discard the iterator.also, look at how AggregateBy works. you could build a Function that when it sees the undesirable grouping, sends a flag across the wire, but also begins aggressively dropping any new tuples found in the grouping.do the first option first, optimize with the second.ckwOn Jun 30, 2016, at 3:01 PM, kdel...@liveramp.com wrote:A common challenge I run into is when I have the result of a GroupBy, and for each group I want to either pass all the values to the output or none of the values, but I don't know which it is until I've seen all the values. What's the best way to do this? I know I cannot reset the arguments iterator from bufferCall (to iterate twice) nor can I chain Buffers together, and caching them will usually result in running out of memory. What do you suggest?-Kevin--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/34b61aad-e2e4-40d0-ae24-fb2e7e9f06a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/d0a513fa-52a9-433c-8101-b7be731bcc60%40googlegroups.com.