On Sunday, 11 October 2015 03:01:58 UTC+11, A. Jalil @AJ wrote:
I am trying to figure out why I deleted close to 2,000 temp collections which totals about 200GB but I still see the database taking up same space after deletion. I was wondering if this is pre-allocated space and what would be the best approach to reclaim it without a downtime ?
Hi AJ,
With the MMAP storage engine, space from deleted documents and collections remains preallocated and available for reuse. Unless you drop an entire database the storage will effectively remain at peak usage. You need to repair or rebuild the database to reclaim this space: http://docs.mongodb.org/manual/faq/storage/#how-do-i-reclaim-disk-space.
Repair requires downtime and available disk space to make a copy of the data to rebuild your database.
For a replica set deployment a better option is usually to re-sync the node: http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/.
Re-syncing doesn’t require additional disk space, and allows the node to automatically rejoin the replica set once the node has completed initial sync.
Note: Disk usage behaviour is determined by the storage engine. For the MMAP storage engine (the default in MongoDB 3.0 and older), all data for a given database shares a common set of files (eg. the salestotal.* files in your example). The WiredTiger storage engine creates separate files for each index and collection, so dropping indexes or collections in WiredTiger will free up disk space without admin intervention.
Regards,
Stephen
On Sunday, 11 October 2015 17:19:02 UTC+11, A. Jalil @AJ wrote:
I was wondering if I do mongodump on source DB, then mongorestore on the destination DB will reclaim the space ?
Hi AJ,
If you are restoring into an empty destination DB you won’t have any extra preallocated storage from the collections you dropped earlier.
I also noticed mongo uses mongoexport / mongoimport - I was wondering what is the difference between mongodump & mongoepxort ?
mongoexport and mongoimport are tools for importing/exporting text data (JSON, CSV, TSV) to/from single collections.
For backup & restore purposes you should instead use mongodump and mongorestore, which work with binary data (in BSON format), will preserve original data types and index options, and can be used to dump data from all databases or a specific namespace.
Lastly, my source DB resides on /data/configdb but my destination database reside on /data/db - can you please check steps 1 2 3 below and advise if that is the proper way to dump the whole database and restore it on /data/db folder..
Here is what I am planning to run on each member of my Replica Sets:
- mongodump -d salesdb -o /dump (source db)
- transfer the dump file [ salesdb ] to AWS & save on this folder /data/db
- mongorestore -d salesdb /data/db (destination db)
The path for mongorestore in step 3 should be to your “dump” directory, so I’d expect something like /tmp/dump rather than /data/db (which is MongoDB’s default dbpath). The dump files cannot be directly copied into your dbpath; they need to be imported via mongorestore.
Regards,
Stephen
Lastly, I also seen on the Doc that I can perform the <mongodump> from mongos level exp: mongodump -d salesdb --host rs0.server.com --port 27017
rs2.secondary> mongodump -d salesdb -o /dump
On Monday, 12 October 2015 04:32:21 UTC+11, A. Jalil @AJ wrote:
I was reading the Doc http://docs.mongodb.org/master/tutorial/backup-small-sharded-cluster-with-mongodump/ which says:
“ If you use mongodump without specifying a database or collection, mongodump will capture collection data and the cluster meta-data from the config servers.”
Hi AJ,
If the database to backup is sharded, you will want to mongodump via the mongos. Otherwise, you can backup via mongod. In general I would refer to the tutorials in the MongoDB manual Backup and Recovery section as they should cover the caveats and supported approaches for different deployment types.
Note: if you mongodump a sharded database, you are effectively re-sharding it when you restore. See the Restore Data information in the tutorial you referenced.
In my situation, I do have sharded cluster database, I am planning to copy the config servers to new server as well, so when I do for the replica set members, I believe I do need to specify since the meta-data is already copied when I copied config servers to new location, right ?
I’m not certain what you’re doing with the config servers in this case. From previous threads I gather you’ve migrated all your servers to AWS, so I thought the goal of this question was to reclaim the disk space from a large amount of data deleted within a replica set. As mentioned earlier, the best option is normally to Resync each member of the replica set. You can do this as rolling maintenance: re-sync one secondary at at time until finally you step-down the current primary and re-sync it. For more information see: Your Ultimate Guide to Rolling Upgrades.
To be clear: are you planning to copy the data into a new sharded cluster, or just performing maintenance on your existing sharded cluster?
Regards,
Stephen
On Monday, 12 October 2015 11:31:02 UTC+11, A. Jalil @AJ wrote:
Sorry for the confusion. I am deployment 2 environments Test & Prod
…
At the end of the day, my goal is to deploy RS0 & RS1 to AWS and reclaim free space at the same time..
Hi AJ,
Thanks for clarifying the environments :).
So, I would do something similar to this from mongos like so:mongos> mongodump -d dbname --host rs0.member1.server.com --port 27017
mongodump has to be run from the command line (not within the mongo shell), so the invocation would be more like:
mongodump -d dbname --host mongos --port 27017
However, if you are dumping and restoring to an existing replica set through mongos, this isn’t going to free up any preallocated space unless you drop the destination database before restoring.
My suggested approach to reclaim space and minimize downtime is re-syncing each member via rolling maintenance.
As a first step before maintenance like this, I would disable the balancer.
Then, assuming rs0.member1 is the current primary for rs0:
You only want to re-sync one member at a time to ensure there is a healthy quorum available in your replica set. Re-sync doesn’t require any additional disk space on each member.
You would then complete similar maintenance for members of rs1, and finally enabler the balancer when maintenance is complete.
Regards,
Stephen
mongodump has to be run from the command line (not within the mongo shell), I only used [ mongos> ] prompt, to show you that I will be doing mongodump at the mongos level, but that was a bad example :)