MapReduce in Sharded Environment

Alex Popescu

unread,

Sep 2, 2010, 9:21:20 AM9/2/10

to mongodb-user

Hi guys,

I'd like to understand how MongoDB map/reduce works in a sharded
environment. Running map/reduce in MongoDB offers quite a few features
and I'm not sure I understand the complete behavior.

Specific questions:

- (if defined) how are sort and limit applied?
- is the reduce phase run on all shards?
- is there a shuffle phase before reduce is run?
- what happens when keeptemp/out are defined?

Many thanks in advance,

:- alex

Eliot Horowitz

unread,

Sep 2, 2010, 9:31:03 AM9/2/10

to mongod...@googlegroups.com

Answers below:

- (if defined) how are sort and limit applied?

Sort and limit are applied on the shards, so limit is applied per shard.

- is the reduce phase run on all shards?

Yes. Each shard will reduce the results down as much as it can. In fact, the general mongo map/reduce system does iterative reduce so the intermediary stages are never so large.

- is there a shuffle phase before reduce is run?

Not really because we do reduce iteratively.

- what happens when keeptemp/out are defined?

All the results and up on 1 shard right now, so keeptemp/out are applied on the final output shard just as they would in single master mode.

-Eliot

Alex Popescu

unread,

Sep 2, 2010, 12:22:10 PM9/2/10

to mongodb-user

Thanks a lot Eliot. Please see my follow up questions below...

On Sep 2, 4:31 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Answers below:
>
> > - (if defined) how are sort and limit applied?
>
> Sort and limit are applied on the shards, so limit is applied per shard.
>
> > - is the reduce phase run on all shards?
>
> Yes. Each shard will reduce the results down as much as it can. In fact,
> the general mongo map/reduce system does iterative reduce so the
> intermediary stages are never so large.
>
> > - is there a shuffle phase before reduce is run?
>
> Not really because we do reduce iteratively.

There's still one part that I'm missing. Who is coordinating this
process? What happens if the coordinator goes down? Or is it possible
to end up with the same instance being responsible for many map/reduce
execution? Where is the last part of reduce running? (by running
reduce on each node, there needs to be a final reduce applied).

> > - what happens when keeptemp/out are defined?
>
> All the results and up on 1 shard right now, so keeptemp/out are applied on
> the final output shard just as they would in single master mode.
>

Interesting. How do you know next time you run the same map/reduce
that there's already a cached result that might need or not un update?

Thanks again,

:- alex

> -Eliot

Eliot Horowitz

unread,

Sep 2, 2010, 12:51:56 PM9/2/10

to mongod...@googlegroups.com

The mongos you talk to initially does the coordination. If it dies, you would have to restart.

1 mongos can be responsible for any number of map/reduces.

The last part gets run on the shard where the result will end up.

If using out, we only check right before we do the rename into the out collection.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Reply all

Reply to author

Forward