Aggregation Framework generally faster than MapReduce?

907 views
Skip to first unread message

Mark Hansen

unread,
Oct 4, 2012, 12:24:00 PM10/4/12
to mongod...@googlegroups.com
Has anybody tried a comparison of the same aggregation function using the AF vs. MR?  If so, what (if any) performance improvements did you achieve?  We have a MR library that we are thinking of converting over to AF and wondering if it will be worth the effort in terms of improved performance.

Mark Hansen

unread,
Oct 4, 2012, 6:02:24 PM10/4/12
to mongod...@googlegroups.com
Thanks Sean.  Did a quick test today on one of our long running MR transformations.  The equivalent AF ran 7 times faster.  So, I think we'll be porting a lot of code this weekend ... ;-)


On Thursday, October 4, 2012 1:04:12 PM UTC-4, Sean Reilly wrote:
TL;DR: Oh heck yes! It will almost certainly be worth it.

More detailed answer: I am using the aggregation framework in anger on a large project (hasn't released yet, and I can't go into any specifics) that is handling lots of data. Interestingly, a previous prototype of the project (somewhat different functionality, written in a different way by a completely different team) did use map reduce for data transformation purposes.

Performance wise, the aggregation framework won hands down. There's a lot of reasons why, but here are (IMO) the three biggest reasons:

1. With map reduce, you generally output to a collection (often a temporary collection). The aggregation framework is much better suited to return data directly to a calling library when that's what you actually want to do.

2. Map reduce is single threaded, per-server. A mongod instance can only run one map reduce query at a time, whereas the aggregation framework can run multiple operations at once.

3. The aggregation framework can use indexes to reduce the cost of operations where you're only interested in a subset of the contents of a collection.

For us. the combination of these three gave the aggregation framework a huge performance advantage of map reduce for similar (but not identical) problems.

In your case, things might be different, but I'd go so far as to say that the aggregation framework should be everybody's default choice for batch/data processing jobs on MongoDB (well, either AF or application code, I suppose). The list of things that map reduce is best at on this platform is quite small, and getting smaller all of the time.

Sean

Sam Martin

unread,
Oct 4, 2012, 6:15:07 PM10/4/12
to mongod...@googlegroups.com
Only downside is to output result to another collection requires the client to fetch the data and save it. with MR you can do it directly to another collection on the server... but i think this feature is coming for AF.

Mark Hansen

unread,
Oct 4, 2012, 9:30:13 PM10/4/12
to mongod...@googlegroups.com
Yes, and some of our result sets are > 16MB, so we cannot us AF in those situations.
Reply all
Reply to author
Forward
0 new messages