efficient clean big collection

90 views
Skip to first unread message

Yulias Stolin

unread,
Apr 30, 2012, 11:17:06 AM4/30/12
to mongodb-user
I have a big collection of millions entries. I want to remove
efficiently all the entire data, but without dropping the collection.
I'm working with sharding and if i'm dropping the collection it is
automatically drop from sharding configuration.

What is the best way to do that?

db.some_coll.remove({})
takes a lot of time!!!

A. Jesse Jiryu Davis

unread,
Apr 30, 2012, 3:43:14 PM4/30/12
to mongod...@googlegroups.com
Yulias, you should just drop the collection to remove all its data or, even better, keep short-lived data in separate MongoDB databases so you can drop the whole database when the data is no longer needed. (Dropping a database returns storage to the OS, unlike dropping a collection which only frees parts of the database files for Mongo to reuse.) You can write a script to recreate the indexes and sharding configuration on the next version of the collection.

Yulias Stolin

unread,
Apr 30, 2012, 4:19:28 PM4/30/12
to mongodb-user

How can I write and store such script that each time I call it will
drop the collection, create the indexes, and insert it into shard?

A. Jesse Jiryu Davis

unread,
May 1, 2012, 8:45:15 PM5/1/12
to mongod...@googlegroups.com
If you're developing in Python, it'd go something like:

import pymongo
connection = pymongo.Connection()

# substitute your own db and collection names
db = connection.database_name
coll = db.collection_name 
coll.drop()
# recreate whatever indexes you need ...
coll.create_index([('key', 1)])
db.command("shardcollection", "collection_name", key={'my_shard_key': 1})
Reply all
Reply to author
Forward
0 new messages