[mongodb-user] Performance difference between group() and mapReduce()

306 views
Skip to first unread message

Marc Esher

unread,
May 19, 2010, 4:17:47 PM5/19/10
to mongodb-user
Greetings all,
I'm getting ramped up with Mongo and having a blast. Thanks to all
for your hard work.

I have a question about relative performance of group() to
mapReduce(). In a small collection with @11k records, I need to do
your standard "average" on one of the fields in a document. I couldn't
find an easy way to do this so I tried it with both MR and group. Both
results yield the same answer, however, the MR version is usually
about twice as fast (@1 second compared with @2 seconds for the
group() solution).

I can post code if that's helpful, though it's quite uninteresting.
I'm mostly wondering if given the same number of documents to loop
over, and given the same arithmetic, if it's expected that MR will
outperform group() by such a margin. I thought perhaps MR was running
on multiple threads but the docs indicate that's not the case.

I'm using 1.4.2 with the java driver.

Thanks for an insight.

Marc

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Mathias Stearn

unread,
May 19, 2010, 9:20:52 PM5/19/10
to mongod...@googlegroups.com
Actually, the opposite is expected: group is usually faster than MR.
I'd be curious to see your code.

Frank Sorro

unread,
May 20, 2010, 5:06:50 AM5/20/10
to mongodb-user
On 20 Mai, 03:20, Mathias Stearn <math...@10gen.com> wrote:
> Actually, the opposite is expected: group is usually faster than MR.
> I'd be curious to see your code.
>
Is that also true for sharded environments? I was thinking that group
is slower in that case because it has to transmit whole documents to
mongos instead of smaller datasets emitted by the map function. I was
planning to convert all group calls to map/reduce because of that. I'd
be interested to see your code, too, Marc.

Frank

>
>
> On Wed, May 19, 2010 at 4:17 PM, Marc Esher <marc.es...@gmail.com> wrote:
> > Greetings all,
> >  I'm getting ramped up with Mongo and having a blast. Thanks to all
> > for your hard work.
>
> >  I have a question about relative performance of group() to
> > mapReduce(). In a small collection with @11k records, I need to do
> > your standard "average" on one of the fields in a document. I couldn't
> > find an easy way to do this so I tried it with both MR and group. Both
> > results yield the same answer, however, the MR version is usually
> > about twice as fast (@1 second compared with @2 seconds for the
> > group() solution).
>
> >  I can post code if that's helpful, though it's quite uninteresting.
> > I'm mostly wondering if given the same number of documents to loop
> > over, and given the same arithmetic, if it's expected that MR will
> > outperform group() by such a margin.  I thought perhaps MR was running
> > on multiple threads but the docs indicate that's not the case.
>
> > I'm using 1.4.2 with the java driver.
>
> > Thanks for an insight.
>
> > Marc
>
> > --
> > You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

Marc Esher

unread,
May 20, 2010, 7:12:52 AM5/20/10
to mongodb-user
Mathias,
Thank you for responding. I dug a little deeper and this is what I
found:

I was running a single request that inserted 100 new documents into
the collection. As soon as it inserted them, it used group() to do
some basic counting. This took on average about 2000ms on a total
collection of @12k documents

Immediately after that, I used MapReduce to do the same thing.
MapReduce would typically complete in about 1100-1300ms.

So today, I did two things:

1) switched the order.... put the MR implementation first, then the
group() implementation. Lo and behold, MR started taking around
1400-1500ms, but group() was now taking only 300ms or so.

2) pulled out group() into a separate request -- i.e. ran the same
code but just by itself, not after inserting 100 documents -- and that
code ran about 300ms as well.

So when run independent of updates to the collection, group() does in
fact work a good deal faster than MapReduce.

Now I'm on to a new question... not "why is MR faster than group?",
which is in fact not the case, but "Why are group() and MR so much
slower when run immediately after inserting new documents into the
collection?".

I should also note that if I upped the number of documents I inserted
immediately before MR and group(), the time to MR and group increased
considerably.

Thanks again for Mongo, and thanks for helping me understand how all
of this works.

Best,

Marc
> > For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

Eliot Horowitz

unread,
May 20, 2010, 7:15:21 AM5/20/10
to mongod...@googlegroups.com
You should run db.getLastError after inserting to block until they're
really finished

Marc Esher

unread,
May 20, 2010, 8:25:13 AM5/20/10
to mongodb-user
Hi Eliot,
Thanks for responding. I added getLastError(), and while it did
increase save time by a few ms (no biggie), it did not decrease the
subsequent time it took to MR or group() over the collection. Should
it have? Just trying to understand.

Thanks a lot.

Marc

On May 20, 7:15 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> You should run db.getLastError after inserting to block until they're  
> really finished
>

Eliot Horowitz

unread,
May 20, 2010, 10:23:20 AM5/20/10
to mongod...@googlegroups.com
Hard to tell you much more without the code
Could you send that?

Marc Esher

unread,
May 20, 2010, 2:10:42 PM5/20/10
to mongodb-user
Eliot,
Sorry about that... I wasn't sure if the code would be helpful in
this case. I experimented some more and something interesting came up.
First off, I'm using CFML (ColdFusion) which uses the java driver. In
CFML, adding the call to db.getLastError() added a few ms onto the
saves but didn't chop any time off the group(). However, when I
ported it all over to straight java, getLastError() added a lot of
time onto the saves but got the group() down to the same amount of
time as when I run it without the saves in front of it... basically,
confirming what you indicated earlier. In the straight java case, the
"total" time when doing save-then-group was about the same... it's
just that adding getLastError() moved the time up into the save block
and out of the group() block. So, no biggie.

At this point, I'd consider this just some weirdness with CF, and
nothing I'm concerned about. At least now I understand why the saves
appear to affect the subsequent group(), which is that the subsequent
group() needs to wait for those saves to complete, and in CF even with
getLastError it appears that they are complete but in fact they must
not be.


Here's the code, for posterity. Please note that this is just "toy"
code I'm using to teach myself how to use Mongo.

Thanks! As far as I'm concerned, case closed.

Best,

Marc

//Straight Java version:
//insert some new ratings
String vid = "4bf3ec44992a000000002e8a";
DB vicesDB = mongo.getDB("vices");
DBCollection ratingsColl = vicesDB.getCollection("ratings");
long thisRunKey = System.currentTimeMillis();
long startInserts = System.currentTimeMillis();
for(int i=1; i<=100;i++){
String thisRunUserName = "marc_" + thisRunKey + "_" +i;
BasicDBObject update = new
BasicDBObject("U",thisRunUserName).append("R", 4).append("VID", vid);
BasicDBObject criteria = new
BasicDBObject("U",thisRunUserName).append("VID", vid);
ratingsColl.update(criteria,update,true,false);
vicesDB.getLastError();
}
System.out.println("Total insert time: " +
(System.currentTimeMillis()-startInserts));

//perform the group()
String jsFunction = "function(obj,agg){ agg.COUNT++;
agg.RATINGTOTAL += obj.R; }";
BasicDBObject key = new BasicDBObject("VID",true);
BasicDBObject cond = new
BasicDBObject("VID","4bf3ec44992a000000002e8a");
BasicDBObject initial = new BasicDBObject("RATINGTOTAL",
0).append("COUNT", 0);
long start = System.currentTimeMillis();
DBObject groupResult = ratingsColl.group(key, cond, initial,
jsFunction);
long total = System.currentTimeMillis()-start;
System.out.println("total group execution: " + total);
System.out.println(groupResult);



//CFML:

//loop and save ratings
for(i=1; i <= 100; i++){
saveRating(vice,
{name="marc_stogie_#getTickCount()#_#i#",value=randRange(1,5)});
}

//update the average rating for all ratings in the collection
viceDAO.updateAverageViceRating(vice);

public Vice function saveRating(Vice vice, Struct rating){
var updateDBO =
newDBObjectFromStruct({U=rating.name,R=javacast("double",rating.value),vid=vice.getViceID()});
var criteriaDBO =
newDBObjectFromStruct({U=rating.name,vid=vice.getViceID()});
ratingsColl.update(criteriaDBO,updateDBO,true,false);
vicesDB.getLastError();
return vice;
}

public Vice function updateAverageViceRating(Vice vice){
var criteria = newDBObject("VID",vice.getViceID());
var keys = newDBObject("VID",true);
var dbl = javacast("double",0);
var initial = newDBObjectFromStruct({RATINGTOTAL=dbl,COUNT=dbl});
var jsFunction = "function(obj,agg){
agg.COUNT++;
agg.RATINGTOTAL += obj.R;
}";
var groups = ratingsColl.group(keys,criteria,initial,jsFunction);
vice.setAverageRating( groups[1]["RATINGTOTAL"] / groups[1]
["COUNT"] );
save(vice);
return vice;
Reply all
Reply to author
Forward
0 new messages