Removing documents frequently

715 views
Skip to first unread message

sumit

unread,
Feb 28, 2011, 11:43:39 AM2/28/11
to mongodb-user
Our app stores documents in mongo with an expiry stamp of 20 mins from
now. In background we run a task every 10-15 mins, query for objects
that are over the expiry time and delete them. This works fine for
smaller collections but for some collections we store 500 documents
per second (means ~600K docs to expire in 20 mins). The remove task
sometimes has a load of 100K to be deleted and when it does that mongo
becomes unresponsive. Obviously due to the heavy locking and
'reindexing' to be done.

Any recommendations/best practices to follow?
- Can we do a batch remove, so we remove in chunks of 10Ks
- Does Mongo have a exipre option like memcache

Thanks
sumit


Scott Hernandez

unread,
Feb 28, 2011, 11:50:41 AM2/28/11
to mongod...@googlegroups.com
One thing some people do is to create collections by time period and
just drop old ones. It is much more efficient to drop a collection and
all its indexes in one fell swoop than to delete documents.

Using a system like this requires changing your client code to encode
the timestamp/number in the collection name and will require querying
both collections depending on the time period.

There is a capped collection feature on the boards based on time (TTL)
which might work for you, but it isn't implemented yet.
http://jira.mongodb.org/browse/SERVER-211

>
> Thanks
> sumit
>
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Keith Branton

unread,
Feb 28, 2011, 12:25:33 PM2/28/11
to mongod...@googlegroups.com
Hi Sumit,

I presume you are using a version of mongo>1.3 or are not using $atomic:true in your remove call.

As Scott mentioned http://jira.mongodb.org/browse/SERVER-211 is probably your best solution.

In the interim, before restructuring your data, perhaps you might try simply running your delete job much more frequently, so it is not deleting so much at a time? That could ensure that no more than 10K rows are removed at a time if you get the frequency right (sounds like about 20s)

Do you have an index on the expiry time - would probably help the more frequent remove query run faster for bigger collections.

Th only other way I can think of to limit the number of deletes (since it doesn't look like the remove function accepts limit) is to do some client-side processing - query with limit 10K to fetch the keys, and use those in a remove with the $in operator. I think the more frequent remove job will probably work better than that though.

Regards,

Keith.

sumit

unread,
Feb 28, 2011, 1:53:28 PM2/28/11
to mongodb-user
Thanks Scott and Keith.

I replied to Scott's post earlier but for some reason it didnt show up
(hope it doesnt show up as a duplicate).

Definitely looking forward to the capped collection feature. Till
then

- We are doing the frequent deletes currently to limit the volume.
But we are doubting that 're-index' and frequent deletes would cause
more locking. So - looking into more options.
- I like the idea of multiple collections per time period. The only
question raising thing would be the price of querying multiple
collections. As mongo does db level locking is it better to have
multiple dbs or they come with a overhead?








On Feb 28, 12:25 pm, Keith Branton <ke...@branton.co.uk> wrote:
> Hi Sumit,
>
> I presume you are using a version of mongo>1.3 or are not using $atomic:true
> in your remove call.
>
> As Scott mentionedhttp://jira.mongodb.org/browse/SERVER-211is probably

Valery

unread,
Mar 1, 2011, 3:17:45 PM3/1/11
to mongodb-user
> That could ensure that no more than 10K rows are removed
> at a time if you get the frequency right (sounds like about 20s)

In my case about 200'000 rows have been deleted in 1 hour from a
collection containing about 2 millions rows. The locking caused by
removal was always near 100%.

I've filed a ticket:
http://jira.mongodb.org/browse/SERVER-2649

All are welcome to vote for it !
:o)

regards
Valery

Hongli Lai

unread,
Apr 10, 2011, 4:40:27 AM4/10/11
to mongodb-user
MongoDB doesn't have a timed expire option. What we do in production
is removing objects in batches of 150 with find() and remove({ _id:
{ $in: [list of object ids] } }). This find()-remove() is repeated
until find() returns the empty list. We've found that this starves
other readers less than a single big remove(). It also gives the
additional benefit of being able to print removal progress.

Hongli Lai

unread,
Apr 10, 2011, 4:46:03 AM4/10/11
to mongodb-user
On Apr 10, 10:40 am, Hongli Lai <hon...@phusion.nl> wrote:
> MongoDB doesn't have a timed expire option. What we do in production
> is removing objects in batches of 150 with find() and remove({ _id:
> { $in: [list of object ids] } }). This find()-remove() is repeated
> until find() returns the empty list. We've found that this starves
> other readers less than a single big remove(). It also gives the
> additional benefit of being able to print removal progress.

A few more things I'd like to note. Our production system is running a
big removal right now but it's still very, very responsive. My guess
is that the find() operates outside of the write lock so it doesn't
starve other readers, with the additional benefit that it pages in all
the objects that are going to be removed, making the subsequent
remove() operation very fast and not holding the write lock for very
long.
Reply all
Reply to author
Forward
0 new messages