Tim Spurway
unread,Nov 3, 2014, 1:01:10 PM11/3/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to disc...@googlegroups.com
Hey Folks,
I am using Disco 0.5.3 and am noticing a difference in reduce results from pre 0.5 versions.
In pre 0.5, the total number of result 'files' was: num_partitions * num_nodes
in 0.5.3 it is: num_nodes
This is because of the reduce_shuffle phase. It combines all of the results on each node. This is good in the sense that it reduces the total number of files, but if you have sort=True, there is no way to iterate over the results 'in-order' (using a heap iterator, for example), because all of the partitions have been combined.
Unless I am doing something incorrectly!
I also can see no way of disabling the reduce_shuffle phase. I peeped into the Erlang code and it appears to be hard coded to account for being compatible with the 'classic' mode.
Before I work on and submit a patch for addressing this, I was wondering if others are working around this, or if I am simply missing something that I am not finding obvious.
cheers,
tim