Received: by 10.224.183.13 with SMTP id ce13mr5809445qab.4.1349372215925; Thu, 04 Oct 2012 10:36:55 -0700 (PDT) X-BeenThere: mongodb-user@googlegroups.com Received: by 10.229.172.131 with SMTP id l3ls1702107qcz.1.gmail; Thu, 04 Oct 2012 10:36:43 -0700 (PDT) Received: by 10.224.117.143 with SMTP id r15mr5809346qaq.1.1349372203265; Thu, 04 Oct 2012 10:36:43 -0700 (PDT) Received: by 10.224.28.72 with SMTP id l8msqac; Thu, 4 Oct 2012 10:04:13 -0700 (PDT) Received: by 10.236.118.82 with SMTP id k58mr747070yhh.1.1349370253016; Thu, 04 Oct 2012 10:04:13 -0700 (PDT) Date: Thu, 4 Oct 2012 10:04:12 -0700 (PDT) From: Sean Reilly To: mongodb-user@googlegroups.com Message-Id: In-Reply-To: <51b02aa3-9c42-452a-8d39-61155fd2e883@googlegroups.com> References: <51b02aa3-9c42-452a-8d39-61155fd2e883@googlegroups.com> Subject: Re: Aggregation Framework generally faster than MapReduce? MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_50_1611082.1349370252058" ------=_Part_50_1611082.1349370252058 Content-Type: multipart/alternative; boundary="----=_Part_51_32585620.1349370252058" ------=_Part_51_32585620.1349370252058 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit *TL;DR:* Oh heck yes! It will almost certainly be worth it. More detailed answer: I am using the aggregation framework in anger on a large project (hasn't released yet, and I can't go into any specifics) that is handling lots of data. Interestingly, a previous prototype of the project (somewhat different functionality, written in a different way by a completely different team) did use map reduce for data transformation purposes. Performance wise, the aggregation framework won hands down. There's a lot of reasons why, but here are (IMO) the three biggest reasons: 1. With map reduce, you generally output to a collection (often a temporary collection). The aggregation framework is much better suited to return data directly to a calling library when that's what you actually want to do. 2. Map reduce is single threaded, per-server. A mongod instance can only run one map reduce query at a time, whereas the aggregation framework can run multiple operations at once. 3. The aggregation framework can use indexes to reduce the cost of operations where you're only interested in a subset of the contents of a collection. For us. the combination of these three gave the aggregation framework a huge performance advantage of map reduce for similar (but not identical) problems. In your case, things might be different, but I'd go so far as to say that the aggregation framework should be everybody's default choice for batch/data processing jobs on MongoDB (well, either AF or application code, I suppose). The list of things that map reduce is best at on this platform is quite small, and getting smaller all of the time. Sean On Thursday, 4 October 2012 17:24:00 UTC+1, Mark Hansen wrote: > > Has anybody tried a comparison of the same aggregation function using the > AF vs. MR? If so, what (if any) performance improvements did you achieve? > We have a MR library that we are thinking of converting over to AF and > wondering if it will be worth the effort in terms of improved performance. > > ------=_Part_51_32585620.1349370252058 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable TL;DR: Oh heck yes! It will almost certainly be worth it.

More detailed answer: I am using the aggregation framework in ange= r on a large project (hasn't released yet, and I can't go into any specific= s) that is handling lots of data. Interestingly, a previous prototype of th= e project (somewhat different functionality, written in a different way by = a completely different team) did use map reduce for data transformation pur= poses.

Performance wise, the aggregation framework= won hands down. There's a lot of reasons why, but here are (IMO) the three= biggest reasons:

1. With map reduce, you generall= y output to a collection (often a temporary collection). The aggregation fr= amework is much better suited to return data directly to a calling library = when that's what you actually want to do.

2. Map r= educe is single threaded, per-server. A mongod instance can only run one ma= p reduce query at a time, whereas the aggregation framework can r= un multiple operations at once.

3. The aggregation= framework can use indexes to reduce the cost of operations where you're on= ly interested in a subset of the contents of a collection.

For us. the combination of these three gave the aggregation framew= ork a huge performance advantage of map reduce for similar (but not identic= al) problems.

In your case, things might be differ= ent, but I'd go so far as to say that the aggregation framework should be e= verybody's default choice for batch/data processing jobs on MongoDB (well, = either AF or application code, I suppose). The list of things that map redu= ce is best at on this platform is quite small, and getting smaller all of t= he time.

Sean

On Thursday, 4 October 2012 1= 7:24:00 UTC+1, Mark Hansen wrote:
Has anybody tried a comparison of the same aggregation function using= the AF vs. MR?  If so, what (if any) performance improvements did you= achieve?  We have a MR library that we are thinking of converting ove= r to AF and wondering if it will be worth the effort in terms of improved p= erformance.

------=_Part_51_32585620.1349370252058-- ------=_Part_50_1611082.1349370252058--