Grouping by a tuple

14 views
Skip to first unread message

Vyacheslav Zholudev

unread,
Jun 11, 2014, 6:17:16 PM6/11/14
to stratosp...@googlegroups.com
Hi,

Being used to the Hive grouping like "GROUP BY userId, productId, year" I'm wondering what's the best way to do it in Stratosphere? The groupBy's KeySelector implies that a Comparable object is returned, however, the obvious choice like TupleN is not comparable. In primitive cases I would prefer to avoid introducing comparable extra entities for grouping tuples of "primitive" types. Would it make sense to introduce "ComparableTupleN<T1 extends Comparable<? extends T1>, ..., Tn extends Comparable<? extends Tn>>"?

Or am I missing the obvious way in a Stratosphere way?

Thanks,
Vyacheslav

Robert Metzger

unread,
Jun 11, 2014, 6:27:41 PM6/11/14
to d...@flink.incubator.apache.org, stratosp...@googlegroups.com
Hi Slava,

I'm forwarding your message to our new mailing list at Apache: d...@flink.incubator.apache.org
You can subscribe to the list by sending an (empty) email to: dev-su...@flink.incubator.apache.org.
We are planning to shut down the stratosphere-dev@googlegroups soon.

Regarding your question: When using the Tuples, you don't need to specify a keySelector. It is sufficient to specify the ID(s) of the keys: http://stratosphere-javadocs.github.io/eu/stratosphere/api/java/DataSet.html#groupBy(int...)
So you should be able to do a ".groupBy(0,3,4)"

Robert

--
You received this message because you are subscribed to the Google Groups "stratosphere-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stratosphere-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/stratosphere-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages