MongoDB Aggregation Performance

28 views

Skip to first unread message

Yarin Podoler

unread,

Dec 18, 2014, 5:44:16 AM12/18/14

to mongod...@googlegroups.com

We have a problem of aggregation queries running long time (couple of minutes).

Collection:

We have a collection of 250 million documents with about 20 fields per document,

The total size of the collection is 110GB.

We have indexes over "our_id" and dtKey fields.

Hardware:

Memory:

24GB RAM (6 * 4GB DIMMS 1333 Mhz)

Disk:

Lvm 11TB built from 4 disks of 3TB disks:

• 600MB/s maximum instantaneous data transfers.

• 7200 RPM spindle. Average latency = 4.16ms

· RAID 0

CPU:

2* E5-2420 0 @ 1.90GHz

Total of 12 cores with 24 threads.

Dell R420.

Problem:

We are trying to make an aggregation query of the following:

db.our_collection.aggregate(

[

{

"$match":

{

"$and":

[

{"dtKey":{"$gte":20140916}},

{"dtKey":{"$lt":20141217}},

{"our_id":"111111111"}

]

}

{

"$project":

{

"field1":1,

"date":1

}

{

"$group":

{

"_id":

{

"day":{"$dayOfYear":"$date"},

"year":{"$year":"$date"}

"field1":{"$sum":"$field1"}

}

]

);

This query takes a couple of minutes to run, when it is running we can see the followings:

1. Mongo current operation is yielding more than 300K times

2. On iostat we see ~100% disk utilization

After this query is done it seems to be in cache and this can be done again in a split second,

After running it for 3 – 4 users it seems that the first one is already been swapped out from the cache and the query takes a long time again.

We have tested a count on the matching part and seen that we have users of 50K documents as well as users with 500K documents,

We tried to get only the matching part:

db.pub_stats.aggregate(

[

{

"$match":

{

"$and":

[

{"dtKey":{"$gte":20140916}},

{"dtKey":{"$lt":20141217}},

{" our_id ":"112162107"}

]

}

]

);

And the queries seems to take approximately 300-500M of memory,

But after running the full query, it seems to take 3.5G of memory.

Questions:

1. Why the pipelining of the aggregation takes so much memory?

2. How can we increase our performance for it to run on a reasonable time for HTTP request?

Reply all

Reply to author

Forward

0 new messages