remove large amount of data quickly with query

2,115 views
Skip to first unread message

scott

unread,
Oct 21, 2010, 8:03:39 PM10/21/10
to mongodb-user
Hey all,

We have a db collection that is around 3 million records, and I need
to trim it down to around 750K records by using a query on a date.
So, any documents less than this date will be removed.

We have around 12 indexes on this db collection, so, removing
documents definitely has an impact on removing. So, we could have a
schedule downtime, but I want to limit to like an hour.

So, would this work and be fast? Or, anyone have other ideas?

1 - remove all indexes
2 - run the query for documents less than date and remove those
documents
3 - rebuild indexes

Your thoughts would be much appreciated

Thanks
Scott

Scott Hernandez

unread,
Oct 21, 2010, 8:06:55 PM10/21/10
to mongod...@googlegroups.com
What about copying the ones you want to a new collection, indexing
that collection and dropping the old one, renaming the old one to the
new one. It will take more space, but will probably be much faster. If
you have time you could repair the database to compact/save-space.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Eliot Horowitz

unread,
Oct 21, 2010, 9:23:26 PM10/21/10
to mongod...@googlegroups.com
Doing the delete in 1.6.3 shouldn't block everything.
So if you just do that at a low volume period, its likely ok.
You can test by deleting a very small portion

juanigna...@gmail.com

unread,
Oct 22, 2010, 7:46:26 AM10/22/10
to mongod...@googlegroups.com
I like the Scott Hernandez approach, since gives you the ability to build collections in regular up-time and switch collections during the down-time (less than a minute to switch)

2010/10/21 Eliot Horowitz <elioth...@gmail.com>

Markus Gattol

unread,
Oct 22, 2010, 7:50:28 AM10/22/10
to mongodb-user
You may not even need downtime as renaming collections is atomic
http://www.markus-gattol.name/ws/mongodb.html#rename_a_collection

scott

unread,
Oct 22, 2010, 10:13:42 AM10/22/10
to mongodb-user
I guess I should have mentioned that the collection is supposed to be
continually updated. So, it is more of a downtime situation, just
trying to minimize the downtime.

so, would copying the records I want into a new collection, then
swapping collections, be faster/better than removing indexes,
deleting, and then reindexing?

Scott

Scott Hernandez

unread,
Oct 22, 2010, 12:04:15 PM10/22/10
to mongod...@googlegroups.com
As Eliot mentioned you can do a trickle delete (deleting in batches
slowly) but that will only work if you don't need to delete them all
at once. Will you app work fine if some of the data is deleted but not
all of it for a while (hours, a day or two -- it depends on the load
you can put on the server -- the slower the better so you don't affect
users)?

As for which is faster, deleting from the existing collection (with or
without indexes) and copying the data you want to a new one and then
renaming, it depends on how much data. I have found when I want to
delete a large percentage of data (60%+) it works for me to do the
latter.

If you have time based data then you could even start the copy of the
data you want to save by change/insert time now, and then you might
only need a few second to switch to the new collection if you go that
route. The trickle delete is probably the easiest but has the
potential to hose the server if you do it too quickly.

I think this is going to depend on what behavior you want and how long
you want it to take.

Reply all
Reply to author
Forward
0 new messages