Compact databases (preferably on the fly) and properly delete data

1,500 views
Skip to first unread message

Oleksandr D.

unread,
Oct 23, 2012, 4:24:56 AM10/23/12
to mongod...@googlegroups.com
Dear Community!

I have used db.foo.drop() to remove unneded collection.
But I noticed that result of show dbs; is the same hence disk space is still used.
So I have 2 questions.

1. How to "properly remove" unneded stuff - collections, databases - with full deletion of related data in MongoDB?

2. How can I now compact my database to get rid of unused stuff from deleted collections? If it can be done on the fly, I will be happy :)

Thanks in advance for your replies!

Stephen Steneker

unread,
Oct 23, 2012, 8:21:33 AM10/23/12
to mongod...@googlegroups.com

I have used db.foo.drop() to remove unneded collection.
But I noticed that result of show dbs; is the same hence disk space is still used.
So I have 2 questions.

1. How to "properly remove" unneded stuff - collections, databases - with full deletion of related data in MongoDB?

2. How can I now compact my database to get rid of unused stuff from deleted collections? If it can be done on the fly, I will be happy :)

Hi Oleksandr,

When you delete collections from MongoDB, this does not reduce the physical space preallocated in the database files.  MongoDB will reuse the space for new documents or collections added in that database.

You can run db.repairDatabase() to reduce the size of the database files:
  http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-RecoveringDeletedSpace

Cheers,
Stephen

Oleksandr D.

unread,
Oct 24, 2012, 7:24:32 AM10/24/12
to mongod...@googlegroups.com
Hi Stephen,
Many thanks for your reply!

So my fears confirms: I should use either db.runCommand({compact:'collectionname'}).for collection or (db.repairDatabase()) for database.

If I understand correctly to free up unneeded space on deleting collections the following procedure works great:

1. Remove all collection items by issuing:
db.things.remove({});
2. Compact this collection
db.runCommand({compact:'things'})
3. Remove collection
db.things.drop()

Does the mentioned procedure cope with my initial task to free up space and delete single collection completely? :)

вторник, 23 октября 2012 г., 15:21:33 UTC+3 пользователь Stephen Steneker написал:

Stephen Steneker

unread,
Oct 24, 2012, 8:13:42 AM10/24/12
to mongod...@googlegroups.com

So my fears confirms: I should use either db.runCommand({compact:'collectionname'}).for collection or (db.repairDatabase()) for database.

Hi Oleksandr,

If you want to reduce actual disk space usage, you need to repair the database (which will recreate the data files and indexes):

In general, this is not something you should need to run frequently.  You may want to run this if you have removed large amounts of data and are concerned about excessive disk space usage.

The compact command will rewrite and defragment a single collection, but does not free up any physical disk space:
   http://docs.mongodb.org/manual/reference/command/compact/

 
If I understand correctly to free up unneeded space on deleting collections the following procedure works great:

You should just go straight to step 3 (drop collection).  This will be faster than a remove(), because remove has some overhead of updating indexes .. db.things.drop() deletes the collection and indexes without unnecessary updates.


Does the mentioned procedure cope with my initial task to free up space and delete single collection completely? :)

To hopefully be clearer on the "free space" expectations: 
 - dropping a collection (or removing documents) will create "free space" within the preallocated database files that can be re-used by MongoDB
 - you need to run repair (or drop a database) to reduce actual disk space usage

Do you have a specific concern on re-using free space for your use case?  For example, are you dropping and recreating collections frequently?

MongoDB 2.2 includes a new usePowerOf2Sizes collection option which may be of interest .. it reduces potential for fragmentation but will use some more space per document (document storage is allocated in powers of 2): http://docs.mongodb.org/manual/reference/command/collMod/

Cheers,
Stephen

Oleksandr D.

unread,
Oct 24, 2012, 10:49:30 AM10/24/12
to mongod...@googlegroups.com
Thanks, Stephen!

It is quite clear to me. 

Option usePowerOf2Sizes looks interesting, I think we should upgrade to MongoDB v2.2 and try it.

But now after deletion of plenty large collection I have ~120GB of used space. Any idea how long may it take to complete db.repairDatabase()?

Env. details:
MongoDB v.2.06
CPU - 16 cores
RAM - 16 GB
Filesystem - ext4 w noatime and Readahead 32.

среда, 24 октября 2012 г., 15:13:42 UTC+3 пользователь Stephen Steneker написал:

Stephen Steneker

unread,
Nov 3, 2012, 8:29:25 AM11/3/12
to mongod...@googlegroups.com

Thanks, Stephen!

It is quite clear to me. 

Option usePowerOf2Sizes looks interesting, I think we should upgrade to MongoDB v2.2 and try it.

But now after deletion of plenty large collection I have ~120GB of used space. Any idea how long may it take to complete db.repairDatabase()?

Env. details:
MongoDB v.2.06
CPU - 16 cores
RAM - 16 GB
Filesystem - ext4 w noatime and Readahead 32.

Hi Oleksandr,

There isn't specific guidance for the time to repair .. it really depends on your data size and hardware configuration.

As the repair will be recreating your database files, you do need enough free space for a full copy of the data.

You would have to work out a realistic estimate by testing in your own environment.  Best practice to maximize server availability for production repair operations is to use replica sets and do a rolling repair (repair one secondary at a time, and eventually step down and repair the primary).

Cheers,
Stephen 

Oleksandr D.

unread,
Nov 15, 2012, 8:55:34 AM11/15/12
to mongod...@googlegroups.com
Tried repair few times and failed all the time.
Once it was interrupted by SSH timeout.
Isn't it strange repair utility design that it does not show anything in output?!
Later I have found that all activities are written to logs but keeping actual command output empty, without ETA, % done etc. is not a good idea. Unfortunately a lot of utilities have it. MySQL and now MongoDB too :(
Ok, I just gave up on this as without replica DB repair requires a long unavailability period.

>  Best practice to maximize server availability for production repair operations is to use replica sets and do a rolling repair (repair one secondary at a time, and eventually step down and repair the primary).
Absolutely agree on that point, but without replica repair operation seems to be not achievable.



суббота, 3 ноября 2012 г., 14:29:25 UTC+2 пользователь Stephen Steneker написал:
Reply all
Reply to author
Forward
0 new messages