Best way to remove 10 mio of documents every day

78 views
Skip to first unread message

inte...@googlemail.com

unread,
Oct 7, 2015, 10:14:01 AM10/7/15
to mongodb-user
Hi,
we use MongoDB to store some cookie information for every request in a collection. Up to know we create for every customer for every day a single collection. Every night a cronjob simple drops the collections which are older than the cookie lifetime. 

For evaluation purposes this is going to be a nightmare. Best way would be to store everything in one collection and remove the documents which are older than the cookie lifetime. We have approx. 10 mio hits per day and keep those documents about 30 days - so after all our collection has 300 mio documents. 

We tried this with a 2.x version of MongoDB and noticed that the index was rebuild every time a single document was removed and this breaks our cluster.

What would be the best way to remove those old documents? 

Kind regards
Marcus

Chris De Bruyne

unread,
Oct 7, 2015, 11:40:11 AM10/7/15
to mongodb-user
Have you investigated if the Time To Live index is something that could help. Then you wouldn't need does pesky cronjobs. Also, which version are you using because rebuilding the whole index for one removal sounds strange and how did you come to this conclusion ?

Kind regards
Chris

inte...@googlemail.com

unread,
Oct 8, 2015, 6:11:42 AM10/8/15
to mongodb-user
Hi Chris,

TTL index sounds interesting. My only concern ist about 
"The background task that removes expired documents runs every 60 seconds."
There are some peaks with heavy write load and I'm not sure if this background task does influence the performance of the cluster. Do you have any experience with this? 
Furthermore I will check the docs and search for some use-cases.

"Also, which version are you using because rebuilding the whole index for one removal sounds strange and how did you come to this conclusion ?"
We started using MongoDB about 2 years ago, I guess it was version 2.2 or 2.4. I'm not sure. We have tried some strategies how to store the request information for the lifetime period and noticed if we delete a bunch of documents CPU load raises and the delete operation itself took forever. I'm not sure if there were other performance indicators within MMS.

We have yesterday switched to from 2.4 to 3.0.6 and therefore I thought about the huge amount of collections and the awkward handling of creating and dropping day-based collections.

Kind regards
Marcus

Mike Templeman

unread,
Oct 8, 2015, 1:25:23 PM10/8/15
to mongodb-user
Marcus,

TTL with WiredTiger is the way to go for you because of the document-level locking. In the 2.X world large deletes would bring the db servers down. Even using TTL to spread out the deletes we could see the locking occur every minute.

Even with 3.X and WT, you might try arranging your TTL times so that the heavy delete load is time shifted to a lighter insertion time. For example, our heavy load is from ~5AM to ~4PM so we add 12 hours to the TTL time. This is an even better tactic with 2.X.

inte...@googlemail.com

unread,
Nov 26, 2015, 4:34:08 AM11/26/15
to mongodb-user
Hi Chris, Mike

now we use the ttl index on every collection where we want to keep documents just for 30 days, and it works like a charm. Great feature - many thanks for your help ;)

Regards Marcus

inte...@googlemail.com

unread,
Nov 26, 2015, 4:36:40 AM11/26/15
to mongodb-user
Just one question - How could I mark this post as solved? Just change the title?

Stephen Steneker

unread,
Nov 26, 2015, 8:28:20 PM11/26/15
to mongodb-user
On Thursday, 26 November 2015 20:36:40 UTC+11, interose wrote:
Just one question - How could I mark this post as solved? Just change the title?

Hi Marcus,

There is no need to mark posts as solved in the current mongodb-user group configuration, but confirming you found an approach that worked (as per your earlier comment) might be helpful for others :).

Thanks,
Stephen 
Reply all
Reply to author
Forward
0 new messages