ordering in each group and then picking first row from each group.

32 views
Skip to first unread message

Shivank Garg

unread,
Aug 4, 2017, 8:45:49 AM8/4/17
to cascading-user
I have my schema like this

 table(dim1,dim2,dim3,dim4......)

 and want to group by (dim1,dim2) and then order, in each group, by (dim3) and then want extract the first row from each group(including other fields as well like dim4,dim5 in that row). How can I achieve this. New to cascading!! something like rank() in impala does but not like group by in sql query..

Chris K Wensel

unread,
Aug 7, 2017, 10:59:12 PM8/7/17
to cascadi...@googlegroups.com
you may have figured this out by now..

but GroupBy allows for secondary sorting. so, group on dim1,dim2, secondary sort on dim3. The use the aggregator First to get the first row seen.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/12794623-c351-4c95-a797-f0822c309b20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shivank Garg

unread,
Aug 8, 2017, 1:48:19 AM8/8/17
to cascading-user
Thanks Chris for the answer. I somehow figured this out that group by and first by would work. but since i just need one row with max value of  ,is there any way I need not sort according to dim3 after grouping as it will take more time. Is there a way to find record with max of dim3 and retaining other fields like dim4,dim5 as well. And also can you make sure that your answer of group by and first would retain all the fields .??

Chris K Wensel

unread,
Aug 8, 2017, 10:56:03 PM8/8/17
to cascadi...@googlegroups.com
you are going to pay for a sort, either by having the grouping operation do it, or keeping a sorted list of all the values.

but in the later options, if there are lots of values, you will like OOME. in the former, no chance.

note FirstNBuffer will skip the remainder of group when they are done. 

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.

Shivank Garg

unread,
Aug 9, 2017, 1:22:07 AM8/9/17
to cascading-user

Thanks chris for the suggestion !! 

On Friday, August 4, 2017 at 6:15:49 PM UTC+5:30, Shivank Garg wrote:
Reply all
Reply to author
Forward
0 new messages