Temporary MapReduce collections are replicated but not visible or removable

Arkadiy

unread,

May 17, 2011, 5:20:08 PM5/17/11

to mongodb-user

We just added a new priority 0 machine to our main replica set (EC2
instance for EBS snapshots). To start it up, I shut down one of our
existing secondaries, copied the entire data directory and started
with --fastsync, as usual. However, I'm seeing a large number of
entries like his in the log on the new member:

Tue May 17 21:04:30 [replica set sync] info: creating collection
site_data.tmp.mr.foo_tmp_854_inc on add index
Tue May 17 21:04:30 [replica set sync] building new index on { 0: 1 }
for site_data.tmp.mr.foo_tmp_854_inc
Tue May 17 21:04:30 [replica set sync] done for 0 records 0secs
Tue May 17 21:04:30 [replica set sync] building new index on { _id:
1 } for site_data.tmp.mr.audioitems_foo_tmp_854
Tue May 17 21:04:30 [replica set sync] done for 0 records 0secs

The sync is taking rather long, as well. This stack overflow post:

http://stackoverflow.com/questions/4163157/mongodb-remove-mapreduce-collection

suggests I iterate through collections and remove any =~ /tmp.mr/, but
in my case they do not actually appear in db.getCollectionNames()
output.

I haven't seen this behavior while adding other secondaries, though we
did not use as much MR before.

Is this normal behavior? How can I remove these collections? Am I at
risk of running out of namespaces if they pile up?

Arkadiy

unread,

May 17, 2011, 5:22:14 PM5/17/11

to mongodb-user

Looking at system.namespaces, all of these are represented as
site_data.most_blogged_artists_tmp.$_id_, so there's no danger of
exhausting namespaces. However, I'd still like to remove these or
prevent their replication.

On May 17, 5:20 pm, Arkadiy <par...@gmail.com> wrote:
> We just added a new priority 0 machine to our main replica set (EC2
> instance for EBS snapshots). To start it up, I shut down one of our
> existing secondaries, copied the entire data directory and started
> with --fastsync, as usual. However, I'm seeing a large number of
> entries like his in the log on the new member:
>
> Tue May 17 21:04:30 [replica set sync] info: creating collection
> site_data.tmp.mr.foo_tmp_854_inc on add index
> Tue May 17 21:04:30 [replica set sync] building new index on { 0: 1 }
> for site_data.tmp.mr.foo_tmp_854_inc
> Tue May 17 21:04:30 [replica set sync] done for 0 records 0secs
> Tue May 17 21:04:30 [replica set sync] building new index on { _id:
> 1 } for site_data.tmp.mr.audioitems_foo_tmp_854
> Tue May 17 21:04:30 [replica set sync] done for 0 records 0secs
>
> The sync is taking rather long, as well. This stack overflow post:
>

> http://stackoverflow.com/questions/4163157/mongodb-remove-mapreduce-c...

Antoine Girbal

unread,

May 18, 2011, 12:24:55 AM5/18/11

to mongodb-user

Hi Arkadly,
it is normal that the temporary collections of MR are replicated.
The actual result collection is obtained by a rename() of the tmp
collection.
So the tmp collection needs to be created and populated, then renamed
on the slaves too.

But there is also the "inc" collection, which is an "incremental"
temporary collection used during MR.
This one does not need to be replicated since it does not represent
any usable data.
Opening a ticket to check on this:
https://jira.mongodb.org/browse/SERVER-3115

AG

Arkadiy

unread,

May 18, 2011, 3:13:14 PM5/18/11

to mongodb-user

Thanks, Antoine.

Are these "incremental" in the sense that they hold e.g. intermediate
reducer output? Is there a docs page somewhere outlining the MR
implementation in mongo? I'm seeing well over a thousand of these
during sync, can post logs if that's helpful.

Antoine Girbal

unread,

May 18, 2011, 7:44:22 PM5/18/11

to mongodb-user

inc holds data as it's being emitted/reduced, since mongod tries to
reduce the amount of data that is stuck in RAM.
The data is not in final form and there may still be many duplicate
keys.
There should be only 1 inc collection per map/reduce, but that
collection may get pretty big.
If it gets replicated as it sounds like, it can potentially bog down
replication.
You can follow ticket to see progress

Reply all

Reply to author

Forward