Reclaiming Free Space

245 views
Skip to first unread message

A. Jalil @AJ

unread,
Oct 10, 2015, 12:01:58 PM10/10/15
to mongodb-user
Hello,

I am trying to figure out why I deleted close to 2,000 temp collections which totals about 200GB but I still see the database taking up same space after deletion. I was wondering if this is pre-allocated space and what would be the best approach to reclaim it without a downtime ?

Database size before collections were dropped..
mongos> show dbs
admin                (empty)
config               0.063GB
salestotal         457.730GB          <=  this is the dabase I am cleaning up..
test                 (empty)


> Some of the collections I dropped, there were close to 2000 temp collection I deleted:
2248: Dropping  salestemp_560a6fbbe4b06698775c67a9.s  , count: 25374
2249: Dropping  salestemp_560a6fbbe4b06698775c67a7  , count: 30049
2205: Dropping  salestemp_55d856b3e4b053ea1cc23690  , count: 10120
2206: Dropping  salestemp_55d856b3e4b053ea1cc2368b  , count: 13673
2207: Dropping  salestemp_55d9a836e4b0a1f5e3869597  , count: 9890

Please note, initially I did the count on the collections, there were close to 5,000 collections, and now there are only 800 collections. so my DROP query did what is supposed to..

> I checked the database after collections have been dropped, but I still see the same size: 457.730GB
mongos> show dbs
admin                (empty)
config               0.063GB
salestotal         457.730GB              <= still same size - I was expecting half of this size
test                 (empty)

I'd appreciate your input. I am deploying this database to another server, and would like to clean it up before I copy the Replica Sets to another server.. 

Thank you !
@AJ

Stephen Steneker

unread,
Oct 10, 2015, 8:47:20 PM10/10/15
to mongodb-user

On Sunday, 11 October 2015 03:01:58 UTC+11, A. Jalil @AJ wrote:

I am trying to figure out why I deleted close to 2,000 temp collections which totals about 200GB but I still see the database taking up same space after deletion. I was wondering if this is pre-allocated space and what would be the best approach to reclaim it without a downtime ?

Hi AJ,

With the MMAP storage engine, space from deleted documents and collections remains preallocated and available for reuse. Unless you drop an entire database the storage will effectively remain at peak usage. You need to repair or rebuild the database to reclaim this space: http://docs.mongodb.org/manual/faq/storage/#how-do-i-reclaim-disk-space.

Repair requires downtime and available disk space to make a copy of the data to rebuild your database.

For a replica set deployment a better option is usually to re-sync the node: http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/.

Re-syncing doesn’t require additional disk space, and allows the node to automatically rejoin the replica set once the node has completed initial sync.

Note: Disk usage behaviour is determined by the storage engine. For the MMAP storage engine (the default in MongoDB 3.0 and older), all data for a given database shares a common set of files (eg. the salestotal.* files in your example). The WiredTiger storage engine creates separate files for each index and collection, so dropping indexes or collections in WiredTiger will free up disk space without admin intervention.

Regards,
Stephen

A. Jalil @AJ

unread,
Oct 11, 2015, 2:19:02 AM10/11/15
to mongodb-user
Hi Stephen,

I was wondering if  I do mongodump on source DB, then mongorestore on the destination DB will reclaim the space ?

I also noticed mongo uses mongoexport / mongoimport - I was wondering what is the difference between mongodump & mongoepxort ?

Lastly, my source DB resides on /data/configdb but my destination database reside on /data/db - can you please check steps 1 2 3 below and advise if that is the proper way to dump the whole database and restore it  on /data/db folder..

> Here is what I am planning to run on each member of my Replica Sets:

1. mongodump -d salesdb -o /dump          (source db)
2. transfer the dump file [ salesdb ] to AWS & save on this folder  /data/db
3. mongorestore -d salesdb /data/db         (destination db)

Please advise.

Thank you !
@AJ

Stephen Steneker

unread,
Oct 11, 2015, 2:35:55 AM10/11/15
to mongodb-user

On Sunday, 11 October 2015 17:19:02 UTC+11, A. Jalil @AJ wrote:
I was wondering if I do mongodump on source DB, then mongorestore on the destination DB will reclaim the space ?

Hi AJ,

If you are restoring into an empty destination DB you won’t have any extra preallocated storage from the collections you dropped earlier.

I also noticed mongo uses mongoexport / mongoimport - I was wondering what is the difference between mongodump & mongoepxort ?

mongoexport and mongoimport are tools for importing/exporting text data (JSON, CSV, TSV) to/from single collections.

For backup & restore purposes you should instead use mongodump and mongorestore, which work with binary data (in BSON format), will preserve original data types and index options, and can be used to dump data from all databases or a specific namespace.

Lastly, my source DB resides on /data/configdb but my destination database reside on /data/db - can you please check steps 1 2 3 below and advise if that is the proper way to dump the whole database and restore it on /data/db folder..

Here is what I am planning to run on each member of my Replica Sets:

  1. mongodump -d salesdb -o /dump (source db)
  1. transfer the dump file [ salesdb ] to AWS & save on this folder /data/db
  1. mongorestore -d salesdb /data/db (destination db)

    The path for mongorestore in step 3 should be to your “dump” directory, so I’d expect something like /tmp/dump rather than /data/db (which is MongoDB’s default dbpath). The dump files cannot be directly copied into your dbpath; they need to be imported via mongorestore.

    Regards,
    Stephen

    A. Jalil @AJ

    unread,
    Oct 11, 2015, 1:32:21 PM10/11/15
    to mongodb-user
    Thank Sephen !

    I was reading the Doc http://docs.mongodb.org/master/tutorial/backup-small-sharded-cluster-with-mongodump/ which says:
    If you use mongodump without specifying a database or collection, mongodump will capture collection data and the cluster meta-data from the config servers."

    In my situation, I do have sharded cluster database, I am planning to copy the config servers <dbpath> to new server as well, so when I do <mongodump> for the replica set members, I believe I do need to specify <database Name> since the meta-data is already copied when I copied config servers to new location, right ?

    Lastly, I also seen on the Doc that I can perform the <mongodump> from mongos level exp:  mongodump  -d salesdb --host rs0.server.com --port 27017 

    I was wondering if can I run <mongodump> on replica set server level as well, exp:  
    rs2.secondary> mongodump -d salesdb -o /dump


    Thanks again!
    @AJ



    Stephen Steneker

    unread,
    Oct 11, 2015, 7:30:27 PM10/11/15
    to mongodb-user

    On Monday, 12 October 2015 04:32:21 UTC+11, A. Jalil @AJ wrote:

    “ If you use mongodump without specifying a database or collection, mongodump will capture collection data and the cluster meta-data from the config servers.”

    Hi AJ,

    If the database to backup is sharded, you will want to mongodump via the mongos. Otherwise, you can backup via mongod. In general I would refer to the tutorials in the MongoDB manual Backup and Recovery section as they should cover the caveats and supported approaches for different deployment types.

    Note: if you mongodump a sharded database, you are effectively re-sharding it when you restore. See the Restore Data information in the tutorial you referenced.

    In my situation, I do have sharded cluster database, I am planning to copy the config servers to new server as well, so when I do for the replica set members, I believe I do need to specify since the meta-data is already copied when I copied config servers to new location, right ?

    I’m not certain what you’re doing with the config servers in this case. From previous threads I gather you’ve migrated all your servers to AWS, so I thought the goal of this question was to reclaim the disk space from a large amount of data deleted within a replica set. As mentioned earlier, the best option is normally to Resync each member of the replica set. You can do this as rolling maintenance: re-sync one secondary at at time until finally you step-down the current primary and re-sync it. For more information see: Your Ultimate Guide to Rolling Upgrades.

    To be clear: are you planning to copy the data into a new sharded cluster, or just performing maintenance on your existing sharded cluster?

    Regards,
    Stephen

    A. Jalil @AJ

    unread,
    Oct 11, 2015, 8:31:02 PM10/11/15
    to mongodb-user
    Hi Stephen,

    Sorry for the confusion. I am deployment 2 environments Test & Prod, my Test environment is done successfully by simply copying <dbpath> files form 3 source config nodes to 3 destination config nodes, plus copying <dbpath> of each member of RS0 + RS1 from source to destination, so Test is done so far by just using <dbpath> filesystem copy & didn't use <mongodump> for Test deployment. However, in my Prod system, I had to delete over 3,000 temp collections in hopes to reduce space before I can start Prod deployment, but because of the preallocated space issue, I thought I can use <mongodump> & <mongorestore> instead in hopes to reclaim space during this process.

    So I was wondering if can do <dbpath> copy on each config server, but for the Replica Sets RS0 & RS1 I can do <mongodump> & <mongorestore> instead in order to reclaim space.. I am not sure if this is the right approach.. At the end of the day, my goal is to deploy RS0 & RS1 to AWS and reclaim free space at the same time.. 

    So, I would do something similar to this from mongos like so:
    mongos> mongodump  -d dbname --host rs0.member1.server.com --port 27017
    mongos> mongodump  -d dbname --host rs0.member2.server.com --port 27017
    mongos> mongodump  -d dbname --host rs0.member3.server.com --port 27017

    Thank you.
    @AJ


    Stephen Steneker

    unread,
    Oct 12, 2015, 12:43:20 AM10/12/15
    to mongodb-user

    On Monday, 12 October 2015 11:31:02 UTC+11, A. Jalil @AJ wrote:

    Sorry for the confusion. I am deployment 2 environments Test & Prod

    At the end of the day, my goal is to deploy RS0 & RS1 to AWS and reclaim free space at the same time..

    Hi AJ,

    Thanks for clarifying the environments :).

    So, I would do something similar to this from mongos like so:
    mongos> mongodump -d dbname --host rs0.member1.server.com --port 27017
     

    mongodump has to be run from the command line (not within the mongo shell), so the invocation would be more like:

    mongodump -d dbname --host mongos --port 27017
    

    However, if you are dumping and restoring to an existing replica set through mongos, this isn’t going to free up any preallocated space unless you drop the destination database before restoring.

    My suggested approach to reclaim space and minimize downtime is re-syncing each member via rolling maintenance.

    As a first step before maintenance like this, I would disable the balancer.

    Then, assuming rs0.member1 is the current primary for rs0:

    • resync rs0.member2
    • when complete, resync rs0.member3
    • finally, step down and resync rs0.member1

    You only want to re-sync one member at a time to ensure there is a healthy quorum available in your replica set. Re-sync doesn’t require any additional disk space on each member.

    You would then complete similar maintenance for members of rs1, and finally enabler the balancer when maintenance is complete.

    Regards,
    Stephen

    MoroccoIT

    unread,
    Oct 12, 2015, 1:10:41 AM10/12/15
    to mongod...@googlegroups.com
    You are correct, mongodump has to be run from the command line (not within the mongo shell), I only used [ mongos> ] prompt, to show you that I will be doing mongodump at the mongos level, but that was a bad example :)

    In any case, I will try the [ resync ] approach and if all goes well, that would be great, this way I will follow the same dry-run I did on Test without changing any of my steps I documented already and hopefully I get the Prod working with no issues..

    Thank you so much Stephen !! Now I have something to work with this week..
    Reply all
    Reply to author
    Forward
    0 new messages