mongodb aggregation speed on huge dataset

113 views
Skip to first unread message

Hafid Mermouri

unread,
Feb 16, 2017, 6:37:24 AM2/16/17
to mongodb-user
Hi there,

i have a mongodb db with more than 100 millions documents. i want to do aggregation so i can give statistics on documents. my document looks like :



{
    "categ": "categ_4", 
    "code": 200, 
    "date": "01/01/2017", 
    "host": "www.myhost.com", 
    "hour": "19", 
    "http_ver": "HTTP/1.1", 
    "idate": 20170101, 
    "length": 21, 
    "protocol": "https", 
    "remote_ip": "111.22.333.44", 
    "resp_time": 0, 
    "time": "19:53:15", 
    "url": "my_url", 
}

when aggregating, i perform a query like this in my shell :

db
.data.aggregate([{"$group": {_id : "$code", total : {"$sum" : 1}}},{"$sort" : {_id: 1}}])

the problem is that it takes a very long time to compute (severals minutes). this is too slow. is there any way to speed up this operation ? i tryed to create an index on "code" field but with no success


db.data.createIndex({code:1})


what can I do to make this aggregation faster ?

thank you


Rhys Campbell

unread,
Feb 16, 2017, 7:10:43 AM2/16/17
to mongodb-user
If this is immuatble log data then why don't you pre-aggregate certain days / weeks and then perform then summing on from this? That would probably be your best bet application wise.

How big is your data? Can it all fit in RAM? What version of MongoDB are you running? Are you using WildTiger?

Hafid Mermouri

unread,
Feb 16, 2017, 10:06:12 AM2/16/17
to mongodb-user
hi Rhys
thank you for your answer, 
i'm just saving them and do really want to perform pre-aggregation because i tought mongodb can do this for me... if i don't have other choice then yes, may be i'll reconsider my approach...
for your questions: 
my data is about 10Go size. It can't fit in RAM and don't want to use my RAM for this because it'll not resolve my problem as my data is growing every day. I'm using the last version of mongo (3.0.2). I don't know what is WildTiger, i'll google it :)

Rhys Campbell

unread,
Feb 17, 2017, 2:44:36 AM2/17/17
to mongodb-user
* Sorry WiredTiger... this became the default engine in 3.2 


You should upgrade to a new version and migrate your data to the WiredTiger engine. Generally you want all of your data (if not just the indexes) to fit into RAM so increase this is needed.

Pre-aggregation is probably the best way to go. 

Hafid Mermouri

unread,
Feb 18, 2017, 3:07:11 AM2/18/17
to mongodb-user
I'm now using the 3.2.5 version.
i have the same problem.

Rhys Campbell

unread,
Feb 18, 2017, 12:05:49 PM2/18/17
to mongodb-user
Did you migrate the WiredTiger? This is a storage engine using compression. If your workload is disk-bound you will benefit from this over MMAPV1. Read a bit into it to see if it applies to you http://objectrocket.com/blog/company/mongodb-wiredtiger

Do a..

db.stats();

So we can see a bit of info about the type of data you have. How much RAM?

If your data is continually growing you really should look at pre-aggregation. Even if you solve it another way you will eventually hit the same problem.
Reply all
Reply to author
Forward
0 new messages