HUGE performance drop of aggregation framework after migrating to 2.6

469 views
Skip to first unread message

Matthieu Rigal

unread,
Apr 15, 2014, 8:54:46 AM4/15/14
to mongod...@googlegroups.com
Hi,

We have a simple aggregate query on a big collection to count the occurence of the various kinds of documents, that they have as an attribute and that is indexed.

8 days ago, I ran the query via PyMongo 2.6.3 and MongoDB 2.4.9 over 63 533 457 objects and the aggregation result returned after 6 min 16s

Today, with PyMongo 2.7 and MongoDB 2.6.0 over 69 524 867 objects and the aggregation result returned after 26 min 33s

I don't think I need to give more details. The difference is too high for being configuration specific. You should be able to see the same kind of results everywhere. We are seriously thinking to roll back to 2.4.

Matthieu

Matthieu Rigal

unread,
Apr 15, 2014, 9:34:32 AM4/15/14
to mongod...@googlegroups.com
OK, we ran the full stats script and it appears that not all aggregation operations got so slower. Processing time varies from +15% increase (almost normal towards collection size increase to +350%)

Based on another collection similar to the one given for the Index Intersection problem, here are some of the aggregates:

The aggregates based on the full collection, with no match are the one which got very very slow, like the following, 6min ->28 min:


        result = je_pym_coll.aggregate([
                                        {"$group": {"_id": "$venue_id",
                                                      "count": {"$sum": 1}}},
                                        {"$sort": {"count": -1}}])

The ones making some matching, requiring less fetching are actually less slower, like the following 6 min -> 7 min

        result = je_pym_coll.aggregate([{"$match": {“_cls": "j.i"}},
                                        {"$group": {"_id": "$total.currency",
                                                    "total": {"$sum": "$total.amount"},
                                                    "count": {"$sum": 1}}},
                                        {"$sort": {"count": -1}}])

Best,
Matt

Ben Becker

unread,
Apr 15, 2014, 2:37:19 PM4/15/14
to mongod...@googlegroups.com
Hi Matthieu,

Could you provide the following?:
 - Indexes on this collection (db.collection.getIndexes() from the shell)
 - Distinct counts for the '_cls', 'total.currency', 'total.amount' and 'count' fields
 - Are there concurrent operations running?  If so, a rough ratio of reads to writes would be helpful.

I tried to reproduce this with the following contrived dataset:
for (var i = 0; i < 200000; i++) {
  db.foo.insert({
    count:i * Math.random(),
    _cls:i,
    venue_id:String(i % 100),
    total: {amount:i % 100, currency: i % 2}
  });
}

However, the supplied aggregate commands don't seem to show a significant performance regression (the first command is actually much faster in v2.6.0). I ran the second command with a $match operator that matches nothing, and one that matches half of the documents.  These are the median values from each run:

V2.6.0:
---
1: 2014-04-15T11:12:13.922-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $match: { _cls: "NOMATCH" } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:0 locks(micros) r:157314 reslen:96 157ms

2: 2014-04-15T10:52:57.580-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $match: { _cls: { $gte: 100000.0 } } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:31 locks(micros) r:341785 reslen:198 433ms

3: 2014-04-15T10:47:04.585-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $group: { _id: "$venue_id", count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:21 locks(micros) r:195858 reslen:3676 276ms


V2.4.9:
---
1: Tue Apr 15 11:10:21.729 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $match: { _cls: "NOMATCH" } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 locks(micros) r:146400 reslen:50 146ms

2: Tue Apr 15 11:03:52.317 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $match: { _cls: { $gte: 100000.0 } } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 numYields: 2 locks(micros) r:755421 reslen:152 431ms

3: Tue Apr 15 10:44:07.664 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 locks(micros) r:441643 reslen:152 441ms

One thing to note is that these operations yield more frequently in v2.6 than v2.4, which means concurrent operations may be faster in v2.6.0 (just food for thought; 350% variance still seems a bit high).

Best,
Ben

Matthieu Rigal

unread,
Apr 15, 2014, 5:03:30 PM4/15/14
to mongod...@googlegroups.com
Hi Ben,

From the tests I could make, there seems to be big differences between stats one hundreds of thousands and stats on dozens of millions. (even with sufficient RAM !)

There are indexes and all what is required, specifically "_cls_1", nothing else is required or will help here. No index on currency or amount, but it is not the purpose.

Distinct count on 
_cls: 24
currency: 13
amount : not relevant, this is a decimal field, has tens of thousands of value, but is only used for building a sum
count : not relevant, is a count made while using the aggregate framework

There are some concurrent operations running, but quite few, and involving no complex filtering or so. The ratio is close to 50-50

Best,
Matthieu

Ben Becker

unread,
Apr 16, 2014, 5:15:00 PM4/16/14
to mongod...@googlegroups.com
Hi Matthieu,

Is this cluster in MMS, or do you have a monitoring system that will show if the concurrent operations are less impacted by aggregation commands in v2.6.0?

Best,
Ben

Matthieu Rigal

unread,
Apr 17, 2014, 2:11:11 PM4/17/14
to mongod...@googlegroups.com
Hi Ben,

No sorry it is not on MMS, but I have a very predictable operations volume through the day and I always run these operations at the moment where this volume is low, definitely low. I can guarantee that the server is definitely bored at that time and that this has minimal impact on performance. By 10-15% difference, I wouldn't be concerned. 
Furthermore, the concurrent operations have a "heating up" cycle of a couple hours and it couldn't be that within the 40 (previously) to 90 minutes (2.6.0) required to run the 7 aggregate operations it changes that much so that some operation get 3-4 times slower, whereas other wouldn't almost be impacted.

Best,
Matthieu

Asya Kamsky

unread,
Apr 17, 2014, 4:50:21 PM4/17/14
to mongodb-user
Would it be possible to capture 'mongostat' output on both versions when the aggregations are running?

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/aa8c47f9-7a46-4c50-9560-06eaa6416121%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Matthieu Rigal

unread,
Apr 18, 2014, 10:55:17 AM4/18/14
to mongod...@googlegroups.com
I'll try to give you that next Tuesday, sorry, but I can't before... And I will be able to do it only on a less powerful system, I won't be able to roll the servers back to 2.4.x so easily.

Matthieu

Benedetto Proietti

unread,
Apr 19, 2014, 12:32:44 PM4/19/14
to mongod...@googlegroups.com
Matthieu,
could you share the average size for those 60+ million objects?
Also, if not much trouble, could you share a couple of such objects (after proper obfuscation of course)

Thanks,
  Benedetto

Dan Pasette

unread,
May 1, 2014, 7:49:11 AM5/1/14
to mongod...@googlegroups.com
Hi Matthieu,

Any update here?  Can you at least provide the log messages written in the server for the aggregation commands (similar to what Ben posted earlier in his reproduction attempt)?

Thanks
Dan

Matthieu Rigal

unread,
May 8, 2014, 3:09:59 PM5/8/14
to mongod...@googlegroups.com
Hi all,

Sorry for the absence of updates. We had some problems on the content and we are still working but close to have better environments to make these performance tests. I hope to have the possibility to replay one dump including "2.6-created" data on a 2.4.10 mongo to give a feedback on that one.

Now we have almost 90 million objects. The average size is 1300 bytes. The aggregation is a (faster) way for us to count the item_frequencies. These are aggregations on the whole collection, including divers objects with some common core data and references, so the structure of the objects/docs can vary significantly.

I have to add that the performance drop on these operations is really noticeable on the aggregation without match at the beginning of the pipeline, but that the performance drop is less (10-15%) when I aggregate for example only on the objects of last month.

Here are the logs for the two operations which slowed down seriously after the upgrade to 2.6 : 

2014-05-08T19:38:45.110+0200 [conn10572] command myorderbird.$cmd command: aggregate { aggregate: "journals", pipeline: [ { $match: {} }, { $group: { count: { $sum: 1 }, _id: "$_cls", members: { $addToSet: { entry_type: "$entry_type" } } } }, { $sort: { count: -1 } } ] } keyUpdates:0 numYields:139486 locks(micros) r:1740074951 reslen:2146 2175814ms
2014-05-08T20:13:12.837+0200 [conn10572] command myorderbird.$cmd command: aggregate { aggregate: "journals", pipeline: [ { $match: {} }, { $group: { count: { $sum: 1 }, _id: "$venue_id" } }, { $sort: { count: -1 } } ] } keyUpdates:0 numYields:136771 locks(micros) r:1699203507 reslen:64939 2067724ms

Here is one sample doc:

{
  "_id": ObjectId("id"),
  "_cls": "p",
  "venue_uuid": BinData(3, "bla"),
  "_venue_uuid": "UUID",
  "venue_id": 1,
  "object_uuid": BinData(3, "bla"),
  "_object_uuid": "UUID",
  "entry_type": "type_1",
  "metadata": {
    "uuid": "UUID",
    "schema_version": "0.5",
    "message": "bla bla",
    "created": {
      "ms_utc": 1399562267192.0991211,
      "iso_local": new Date(1399569467192),
      "tz_name": "Europe/Berlin",
      "iso_utc": new Date(1399562267192),
      "ms_local": 1399569467192.0991211
    },
    "device": {
      "identifier": "UUID",
      "name": "iPod touch"
    },
    "user": {
      "uuid": "UUID",
      "name": "K"
    },
    "created_at": new Date(1399562267192)
  },
  "synced_at": 1399562444.7612071037,
  "_synced_at": "2014-05-08 17:20:44.761207",
  "shift_uuid": BinData(3, "bla"),
  "session_uuid": BinData(3, "bla"),
  "uuid": "UUID",
  "item": {
    "uuid": "UUID",
    "name": "name",
    "plu": "153",
    "price": {
      "amount": 290,
      "currency": "EUR",
      "exp": -2
    },
    "is_expense": false,
    "cat": {
      "uuid": "UUID",
      "name": "name",
      "supercategory": "name"
    }
  },
  "booked_price": {
    "amount": 290,
    "currency": "EUR",
    "exp": -2
  },
  "quantity": 1,
  "order_number": 0,
  "table": {
    "uuid": "UUID",
    "name": "26"
  },
  "tab": {
    "uuid": BinData(3, "bla"),
    "name": "26"
  },
  "party": {
    "uuid": "UUID",
    "number": 0
  }
}

Best,
Matthieu

Asya Kamsky

unread,
May 9, 2014, 3:59:56 AM5/9/14
to mongodb-user
Same for venue_id in the second aggregation.


On Fri, May 9, 2014 at 12:59 AM, Asya Kamsky <as...@mongodb.com> wrote:
> Do you have an index on _cls? Instead of empty $match, you should try
> {$sort:{_cls:1}} as the first stage. (That's if you have an index on
> that field).
>
>
> On Thu, May 8, 2014 at 12:09 PM, Matthieu Rigal
>> https://groups.google.com/d/msgid/mongodb-user/4fb77c93-d904-4a04-9ab7-7f6a8206b7f8%40googlegroups.com.

Asya Kamsky

unread,
May 9, 2014, 3:59:26 AM5/9/14
to mongodb-user
Do you have an index on _cls? Instead of empty $match, you should try
{$sort:{_cls:1}} as the first stage. (That's if you have an index on
that field).


On Thu, May 8, 2014 at 12:09 PM, Matthieu Rigal
> https://groups.google.com/d/msgid/mongodb-user/4fb77c93-d904-4a04-9ab7-7f6a8206b7f8%40googlegroups.com.

Matthieu Rigal

unread,
May 12, 2014, 6:15:46 PM5/12/14
to mongod...@googlegroups.com
Hi Asya,

For sure I have an index on _cls ! And on venue_id ! Else it would not takes minutes but days ! :-))
The empty match happens because I have a generic "aggregate builder". This pipeline constructor became particularly unperformant in the case where no first match is done.

I tried the sort in first place, it is a disaster, after 5 hours no answer from the first calculation. But then, simply removing the empty match, gave around a 25% performance boost.

Matthieu

Asya Kamsky

unread,
May 13, 2014, 5:17:02 AM5/13/14
to mongodb-user
Matthieu,

Sort on an indexed field should not have made it slower - or do you
mean that $sort without empty $match made it faster, but with empty
match it made it slower?

Can you run explain on these? In 2.6 when you have aggregate( [
{pipline}, {here},{ },{ } ] , { explain :true}) you will get explain
back immediately as it does not run the aggregation but it will show
you all the index use/query and steps that it will take.

Can you try that and make sure the index is being used? Just to
clarify, my suggestion was that the aggregation where you sum all
$_cls you add {$sort:{_cls:1}} at the beginning, and same for the
other pipeline and venue_id sort.

Asya

On Mon, May 12, 2014 at 3:15 PM, Matthieu Rigal
> https://groups.google.com/d/msgid/mongodb-user/070849a3-00b7-4daa-9ae4-79db642bd520%40googlegroups.com.

Ben Becker

unread,
Jun 12, 2014, 12:44:25 PM6/12/14
to mongod...@googlegroups.com
Hi Matthieu,

Are you still experiencing the same performance degradation? Have you had a chance to try the sort change that Asya suggested?  Please provide the requested explain results if you're still having any issues.

Best,
Ben

Matthieu Rigal

unread,
Jun 17, 2014, 7:21:36 AM6/17/14
to mongod...@googlegroups.com
Hi Ben and Asya,

I'm sorry, I still couldn't work much further on the topic. I still couldn't have a similar configuration with 2.4.10 and 2.6 as parallel installation and we've now done some changes that influence the performance anyway (use of secondaries, etc...).
I can confirm that we had this performance degradation, but in the meanwhile, we had again the same (expected) degradation due to the simple growth of the datasets.

If there was no other way/hint to confirm something similar, just forget about it, it was rather about warning and pointing to something.

BTW, we may connect to MMS, after having had the confirmation no data content but just metadata would be transferred (we are not paranoic, but bound by german laws...).

Best,
Matthieu

Asya Kamsky

unread,
Jun 18, 2014, 6:55:12 PM6/18/14
to mongodb-user
It is absolutely 100% the case that by default no data related to anything other than aggregated performance would be transferred.   The closest thing to "data" that gets transferred are names of clusters, names of replica sets, names of hosts and names of databases (but *not* collections).   This is so that performance numbers can be displayed in a meaningful manner.

There are optional things you can enable that would transfer information such as potentially queries being run - do not enable logs or profile (they are never enabled except by you).

Asya



Reply all
Reply to author
Forward
0 new messages