Re: map reduce taking forever to reduce 26 records!

43 views
Skip to first unread message

landon.silla

unread,
Oct 26, 2012, 5:22:45 PM10/26/12
to mongod...@googlegroups.com
Update, I'm still waiting for the mapReduce to finish ....

On Friday, October 26, 2012 2:04:06 PM UTC-7, landon.silla wrote:
Here's what I'm doing:

mongos> db.campaign_raw_data_459_imp.count()
21800002
mongos> db.campaign_raw_data_459_imp.find({ts:1350585328}).count()
26
mongos>  map = function () { emit(this.cookie, 1);}
function () {
    emit(this.cookie, 1);
}
mongos>  reduce = function (key, values) {return 1;}
function (key, values) {
    return 1;
}
mongos> 
mongos> db.campaign_raw_data_459_imp.mapReduce(map, reduce, {out: { replace : "garbage"}}, query={ts:1350585328})
 //This is hanging and taking forever, over 10 minutes now


There are 21m documents in this collection.  And, for a given ts=1350585328, there are 26 records.  The goal here is to count how many unique cookies are in the matched records.  So it should find all the match documents, 26 of them.  And then drop them in buckets based on cookie, and then count the buckets.  I would presume that it would do the find, based on the query, FIRST, and then do a map/reduce with those returned values.  If that's the case, the size of the collection shouldn't matter at all.  It did the count() in the second line in a blink of the eye.

I have this indexed on ts, and I have a three shard setup with three replica sets.

Why is this taking so long?  For prod, I'm going to open up ts to be a range, so it will have many many more matched documents than just 26.  

Jenna deBoisblanc

unread,
Nov 7, 2012, 10:37:42 AM11/7/12
to mongod...@googlegroups.com
Did the command ever finish? Could you do post the output of, 

db.campaign_raw_data_459_imp.find({ts: 1350585328}).explain() ?

Could you also post the output of, db.currentOp() while the command is running?

Jenna deBoisblanc

unread,
Nov 7, 2012, 10:45:43 AM11/7/12
to mongod...@googlegroups.com
Ok, I believe the issue is the syntax of your query-
> db.campaign_raw_data_459_imp.mapReduce(map, reduce, {out: { replace : "garbage"}}, query={ts:1350585328})
 
should be,
> db.campaign_raw_data_459_imp.mapReduce(map, reduce, {out: { replace : "garbage"}, query: {ts:1350585328}})

to be more clear, the query should be in the same object as "out", i.e.-
{out: { replace : "garbage"}, query: {ts:1350585328}}

Your current MR command probably doesn't use the query and must scan all of the documents in the collection.  The documentation admittedly doesn't do a great job illustrating the correct syntax, and I will see if we can make the docs more clear.

Please let me know if this resolves the problem.
Reply all
Reply to author
Forward
0 new messages