Manually removing empty chunks?

751 views
Skip to first unread message

Derek Chen-Becker

unread,
Oct 8, 2012, 1:03:57 PM10/8/12
to mongod...@googlegroups.com
I've looked at https://jira.mongodb.org/browse/SERVER-2487 and we're in a similar situation. Part of our sharding key is a timestamp which works fine for use because the rest of the key makes write balance out across the cluster. However, we're at a point now where we need to archive off older data, and our shards are starting to get out of balance in terms of disk usage because while chunks counts are balanced, some of those chunks are empty and will never again hold data. I understand why it would be difficult to automate this, but I'm wondering if there's any harm in removing chunks manually.

Essentially, can I stop balancing on the cluster, delete some of the empty chunks from the config DB, then restart balancing? Would I need to restart mongos on the servers as well? Is this a fantastically dangerous thing to attempt?

Thanks,

Derek

Gianfranco

unread,
Oct 11, 2012, 6:04:58 AM10/11/12
to mongod...@googlegroups.com
In general it's no recommended to delete chunks even if they are empty.

As you mentioned there is SERVER ticket on Jira to implement this in a secure manner but it's not being planned yet.
You can of course vote to increase the chances of being developed first.

Though my suggestion would be to move the data you want to archive into another database or cluster.
When adding more space the empty chunks will be reused.

Gianfranco

Derek Chen-Becker

unread,
Oct 12, 2012, 12:47:36 PM10/12/12
to mongod...@googlegroups.com
I'm not sure what you mean by moving the data into another DB or cluster. I'm flat out removing it (as in db.<collection>.remove(...)) and then compacting the replsets. I have 7 replsets in the cluster, and while printShardStatus() shows that they all have equal chunk counts (+/- a few), some replsets are using 85% of disk while others are using more like 55%. The reason is that while they all have equal chunks, some of the chunks are empty. Am I missing something here? At least if empty chunks could be removed, then the config servers would try and rebalance the chunks that *do* have data in them. Right now I'm looking at a situation where I may need to add more shards to the system to prevent running out of disk on specific replsets, even though my overall cluster disk utilization is 50% of capacity.

Thanks,

Derek

Otis Zein

unread,
Oct 12, 2012, 2:11:12 PM10/12/12
to mongod...@googlegroups.com
Could you elaborate why it's not recommended to remove empty chunks.

We are almost exactly in the same situation, our shard key is based partially on timestamp.  In our case, we delete old data and have no need to archive it.  With our shard key and data, we will never see data hit an empty chunk (ie, one which we deleted data from).


From: Gianfranco <gianf...@10gen.com>
To: mongod...@googlegroups.com
Sent: Thursday, October 11, 2012 6:04 AM
Subject: [mongodb-user] Re: Manually removing empty chunks?

In general it's no recommended to delete chunks even if they are empty.

As you mentioned there is SERVER ticket on Jira to implement this in a secure manner but it's not being planned yet.
You can of course vote to increase the chances of being developed first.

Though my suggestion would be to move the data you want to archive into another database or cluster.
When adding more space the empty chunks will be reused.

Gianfranco

On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
I've looked at https://jira.mongodb.org/ browse/SERVER-2487 and we're in a similar situation. Part of our sharding key is a timestamp which works fine for use because the rest of the key makes write balance out across the cluster. However, we're at a point now where we need to archive off older data, and our shards are starting to get out of balance in terms of disk usage because while chunks counts are balanced, some of those chunks are empty and will never again hold data. I understand why it would be difficult to automate this, but I'm wondering if there's any harm in removing chunks manually.


Essentially, can I stop balancing on the cluster, delete some of the empty chunks from the config DB, then restart balancing? Would I need to restart mongos on the servers as well? Is this a fantastically dangerous thing to attempt?

Thanks,

Derek
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb


Otis Zein

unread,
Oct 18, 2012, 8:35:42 AM10/18/12
to mongod...@googlegroups.com
Bump


From: Otis Zein <otis...@yahoo.com>
To: "mongod...@googlegroups.com" <mongod...@googlegroups.com>
Sent: Friday, October 12, 2012 2:11 PM
Subject: Re: [mongodb-user] Re: Manually removing empty chunks?

Gianfranco

unread,
Oct 22, 2012, 5:56:15 AM10/22/12
to mongod...@googlegroups.com, Otis Zein
Hi Otis,

If you want to keep certain documents for a certain amount of time depending on a Date attribute, you can use a TTL Collection.
This collection uses a special index and it automatically deletes "expired" documents.

Regards,
Gianfranco

Derek Chen-Becker

unread,
Oct 25, 2012, 11:53:36 AM10/25/12
to mongod...@googlegroups.com, Otis Zein
Just as a follow up, I manually merged/removed about 1400 chunks last night and the rebalance is complete now. Disk usage dropped by the expected amount across our shards and things are much happier :)

Ted

unread,
Nov 2, 2012, 11:18:32 AM11/2/12
to mongod...@googlegroups.com
Derek - How did you accomplish the merge and remove of chunks?

To remove the chunk, are you just removing it from the config.chunks collection?

Derek Chen-Becker

unread,
Nov 3, 2012, 1:06:06 AM11/3/12
to mongod...@googlegroups.com
I should have kept my mouth shut. Shard metadata started corrupting this morning (chunk min/max keys were reversed on a large number of chunks) after some chunk splits. After a long battle I ended up having to dump the data off of each shard, wipe the database from the cluster, recreate the DB and indices, and I'm in the process now of restoring data. Not something I'd recommend doing again.

Derek

Jim Reitz

unread,
Apr 26, 2017, 5:27:56 PM4/26/17
to mongodb-user, dchen...@gmail.com
I realize this is a late response, but I'll post it anyway, just in case someone searches for this issue and finds this conversation.
The reliable way to do this is to:
Disable the balancer
 find the contiguous ranges of empty chunks, and move them all to the same shard (sh.moveChunk())
merge them all together using the mergeChunks command (db.runCommand())
After you've finished merging all the empty chunks into ones that have data, your done.  Turn the balancer back on.  Never delete chunks from the config.chunks collection manually.

Note that the above steps are now documented in the MongoDB documentation, however it's still a manual process.

Jim
Reply all
Reply to author
Forward
0 new messages