Thanks a lot Eliot. Please see my follow up questions below...
On Sep 2, 4:31 pm, Eliot Horowitz <
eliothorow...@gmail.com> wrote:
> Answers below:
>
> > - (if defined) how are sort and limit applied?
>
> Sort and limit are applied on the shards, so limit is applied per shard.
>
> > - is the reduce phase run on all shards?
>
> Yes. Each shard will reduce the results down as much as it can. In fact,
> the general mongo map/reduce system does iterative reduce so the
> intermediary stages are never so large.
>
> > - is there a shuffle phase before reduce is run?
>
> Not really because we do reduce iteratively.
There's still one part that I'm missing. Who is coordinating this
process? What happens if the coordinator goes down? Or is it possible
to end up with the same instance being responsible for many map/reduce
execution? Where is the last part of reduce running? (by running
reduce on each node, there needs to be a final reduce applied).
> > - what happens when keeptemp/out are defined?
>
> All the results and up on 1 shard right now, so keeptemp/out are applied on
> the final output shard just as they would in single master mode.
>
Interesting. How do you know next time you run the same map/reduce
that there's already a cached result that might need or not un update?
Thanks again,
:- alex
> -Eliot