Hi Matthieu,
Could you provide the following?:
- Indexes on this collection (db.collection.getIndexes() from the shell)
- Distinct counts for the '_cls', 'total.currency', 'total.amount' and 'count' fields
- Are there concurrent operations running? If so, a rough ratio of reads to writes would be helpful.
I tried to reproduce this with the following contrived dataset:
for (var i = 0; i < 200000; i++) {
db.foo.insert({
count:i * Math.random(),
_cls:i,
venue_id:String(i % 100),
total: {amount:i % 100, currency: i % 2}
});
}
However, the supplied aggregate commands don't seem to show a significant performance regression (the first command is actually much faster in v2.6.0). I ran the second command with a $match operator that matches nothing, and one that matches half of the documents. These are the median values from each run:
V2.6.0:
---
1: 2014-04-15T11:12:13.922-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $match: { _cls: "NOMATCH" } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:0 locks(micros) r:157314 reslen:96 157ms
2: 2014-04-15T10:52:57.580-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $match: { _cls: { $gte: 100000.0 } } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:31 locks(micros) r:341785 reslen:198 433ms
3: 2014-04-15T10:47:04.585-0700 [conn1] command test.$cmd command: aggregate { aggregate: "foo", pipeline: [ { $group: { _id: "$venue_id", count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ], cursor: {} } keyUpdates:0 numYields:21 locks(micros) r:195858 reslen:3676 276ms
V2.4.9:
---
1: Tue Apr 15 11:10:21.729 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $match: { _cls: "NOMATCH" } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 locks(micros) r:146400 reslen:50 146ms
2: Tue Apr 15 11:03:52.317 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $match: { _cls: { $gte: 100000.0 } } }, { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 numYields: 2 locks(micros) r:755421 reslen:152 431ms
3: Tue Apr 15 10:44:07.664 [conn1] command test.$cmd command: { aggregate: "foo", pipeline: [ { $group: { _id: "$total.currency", total: { $sum: "$total.amount" }, count: { $sum: 1.0 } } }, { $sort: { count: -1.0 } } ] } ntoreturn:1 keyUpdates:0 locks(micros) r:441643 reslen:152 441ms
One thing to note is that these operations yield more frequently in v2.6 than v2.4, which means concurrent operations may be faster in v2.6.0 (just food for thought; 350% variance still seems a bit high).
Best,
Ben