Classic Workers, Sorting, Shuffling Discussion

26 views
Skip to first unread message

Tim Spurway

unread,
Nov 3, 2014, 1:01:10 PM11/3/14
to disc...@googlegroups.com
Hey Folks,

I am using Disco 0.5.3 and am noticing a difference in reduce results from pre 0.5 versions.

In pre 0.5, the total number of result 'files' was: num_partitions * num_nodes

in 0.5.3 it is: num_nodes

This is because of the reduce_shuffle phase. It combines all of the results on each node. This is good in the sense that it reduces the total number of files, but if you have sort=True, there is no way to iterate over the results 'in-order' (using a heap iterator, for example), because all of the partitions have been combined.

Unless I am doing something incorrectly!

I also can see no way of disabling the reduce_shuffle phase. I peeped into the Erlang code and it appears to be hard coded to account for being compatible with the 'classic' mode.

Before I work on and submit a patch for addressing this, I was wondering if others are working around this, or if I am simply missing something that I am not finding obvious.

cheers,
tim

slowe...@gmail.com

unread,
Nov 4, 2014, 6:01:34 PM11/4/14
to disc...@googlegroups.com
Hi Tim,
Look at this commit
https://github.com/discoproject/disco/commit/3fe2323190e62596b14f583b3ee61d07336223f5
which I believe is in 0.5.4.  Hope that's helpful.

Tim Spurway

unread,
Nov 4, 2014, 6:51:35 PM11/4/14
to disc...@googlegroups.com
Thank you, sloweater1!

Yeah, I realized that an upgrade fixed exactly my problem. Works like a charm. 

For Inferno users, I can confirm that 0.5.4 works as expected with Inferno jobs, and by default, they will have three phases. In previous versions, it created the fourth phase (reduce_shuffle), which is not what you normally want. 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to disco-dev+...@googlegroups.com.
To post to this group, send email to disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/disco-dev.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages