Does Scalding/Cascading optimize multiple independent joins on the same dataset?

Nikita

unread,

May 19, 2015, 8:54:21 PM5/19/15

to cascadi...@googlegroups.com

I have three typed pipes A, B and C. I'd like to join B with A and C with A. Will scalding/cascading automatically optimize B.join(A.group) and C.join(A.group) to avoid grouping A twice?

Thanks,
Nikita

Oscar Boykin

unread,

May 19, 2015, 9:04:15 PM5/19/15

to cascadi...@googlegroups.com

If you do B.join(A) (no need to write A.group) and C.join(A) that will be two independent map-reduce jobs.

A.group does not do anything unless you do something with the result.

moreover, A.join(B).join(C) is optimized into still just one mapreduce job (if you never go back to typed-pipe and just keep calling methods on CoGrouped).

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/05f643bd-72e1-425a-9ae5-93e59983d5fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Oscar Boykin :: @posco :: http://twitter.com/posco

Nikita Lytkin

unread,

May 19, 2015, 10:23:35 PM5/19/15

to cascadi...@googlegroups.com

Thanks, Oscar.

Reply all

Reply to author

Forward