I've looked at https://jira.mongodb.org/browse/SERVER-2487 and we're in a similar situation. Part of our sharding key is a timestamp which works fine for use because the rest of the key makes write balance out across the cluster. However, we're at a point now where we need to archive off older data, and our shards are starting to get out of balance in terms of disk usage because while chunks counts are balanced, some of those chunks are empty and will never again hold data. I understand why it would be difficult to automate this, but I'm wondering if there's any harm in removing chunks manually.
Essentially, can I stop balancing on the cluster, delete some of the empty chunks from the config DB, then restart balancing? Would I need to restart mongos on the servers as well? Is this a fantastically dangerous thing to attempt?
In general it's no recommended to delete chunks even if they are empty.
As you mentioned there is SERVER ticket on Jira to implement this in a secure manner but it's not being planned yet. You can of course vote to increase the chances of being developed first.
Though my suggestion would be to move the data you want to archive into another database or cluster. When adding more space the empty chunks will be reused.
On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
> I've looked at https://jira.mongodb.org/browse/SERVER-2487 and we're in a > similar situation. Part of our sharding key is a timestamp which works fine > for use because the rest of the key makes write balance out across the > cluster. However, we're at a point now where we need to archive off older > data, and our shards are starting to get out of balance in terms of disk > usage because while chunks counts are balanced, some of those chunks are > empty and will never again hold data. I understand why it would be > difficult to automate this, but I'm wondering if there's any harm in > removing chunks manually.
> Essentially, can I stop balancing on the cluster, delete some of the empty > chunks from the config DB, then restart balancing? Would I need to restart > mongos on the servers as well? Is this a fantastically dangerous thing to > attempt?
I'm not sure what you mean by moving the data into another DB or cluster. I'm flat out removing it (as in db.<collection>.remove(...)) and then compacting the replsets. I have 7 replsets in the cluster, and while printShardStatus() shows that they all have equal chunk counts (+/- a few), some replsets are using 85% of disk while others are using more like 55%. The reason is that while they all have equal chunks, some of the chunks are empty. Am I missing something here? At least if empty chunks could be removed, then the config servers would try and rebalance the chunks that *do* have data in them. Right now I'm looking at a situation where I may need to add more shards to the system to prevent running out of disk on specific replsets, even though my overall cluster disk utilization is 50% of capacity.
On Thursday, October 11, 2012 4:04:58 AM UTC-6, Gianfranco wrote:
> In general it's no recommended to delete chunks even if they are empty.
> As you mentioned there is SERVER ticket on Jira to implement this in a > secure manner but it's not being planned yet. > You can of course vote to increase the chances of being developed first.
> Though my suggestion would be to move the data you want to archive into > another database or cluster. > When adding more space the empty chunks will be reused.
> Gianfranco
> On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
>> I've looked at https://jira.mongodb.org/browse/SERVER-2487 and we're in >> a similar situation. Part of our sharding key is a timestamp which works >> fine for use because the rest of the key makes write balance out across the >> cluster. However, we're at a point now where we need to archive off older >> data, and our shards are starting to get out of balance in terms of disk >> usage because while chunks counts are balanced, some of those chunks are >> empty and will never again hold data. I understand why it would be >> difficult to automate this, but I'm wondering if there's any harm in >> removing chunks manually.
>> Essentially, can I stop balancing on the cluster, delete some of the >> empty chunks from the config DB, then restart balancing? Would I need to >> restart mongos on the servers as well? Is this a fantastically dangerous >> thing to attempt?
Could you elaborate why it's not recommended to remove empty chunks.
We are almost exactly in the same situation, our shard key is based partially on timestamp. In our case, we delete old data and have no need to archive it. With our shard key and data, we will never see data hit an empty chunk (ie, one which we deleted data from).
________________________________ From: Gianfranco <gianfra...@10gen.com> To: mongodb-user@googlegroups.com Sent: Thursday, October 11, 2012 6:04 AM Subject: [mongodb-user] Re: Manually removing empty chunks?
In general it's no recommended to delete chunks even if they are empty.
As you mentioned there is SERVER ticket on Jira to implement this in a secure manner but it's not being planned yet. You can of course vote to increase the chances of being developed first.
Though my suggestion would be to move the data you want to archive into another database or cluster. When adding more space the empty chunks will be reused.
Gianfranco
On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
I've looked at https://jira.mongodb.org/ browse/SERVER-2487 and we're in a similar situation. Part of our sharding key is a timestamp which works fine for use because the rest of the key makes write balance out across the cluster. However, we're at a point now where we need to archive off older data, and our shards are starting to get out of balance in terms of disk usage because while chunks counts are balanced, some of those chunks are empty and will never again hold data. I understand why it would be difficult to automate this, but I'm wondering if there's any harm in removing chunks manually.
>Essentially, can I stop balancing on the cluster, delete some of the empty chunks from the config DB, then restart balancing? Would I need to restart mongos on the servers as well? Is this a fantastically dangerous thing to attempt?
>Thanks,
>Derek
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com See also the IRC channel -- freenode.net#mongodb
________________________________ From: Otis Zein <otisz...@yahoo.com> To: "mongodb-user@googlegroups.com" <mongodb-user@googlegroups.com> Sent: Friday, October 12, 2012 2:11 PM Subject: Re: [mongodb-user] Re: Manually removing empty chunks?
Could you elaborate why it's not recommended to remove empty chunks.
We are almost exactly in the same situation, our shard key is based partially on timestamp. In our case, we delete old data and have no need to archive it. With our shard key and data, we will never see data hit an empty chunk (ie, one which we deleted data from).
________________________________ From: Gianfranco <gianfra...@10gen.com> To: mongodb-user@googlegroups.com Sent: Thursday, October 11, 2012 6:04 AM Subject: [mongodb-user] Re: Manually removing empty chunks?
In general it's no recommended to delete chunks even if they are empty.
As you mentioned there is SERVER ticket on Jira to implement this in a secure manner but it's not being planned yet. You can of course vote to increase the chances of being developed first.
Though my suggestion would be to move the data you want to archive into another database or cluster. When adding more space the empty chunks will be reused.
Gianfranco
On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
I've looked at https://jira.mongodb.org/ browse/SERVER-2487 and we're in a similar situation. Part of our sharding key is a timestamp which works fine for use because the rest of the key makes write balance out across the cluster. However, we're at a point now where we need to archive off older data, and our shards are starting to get out of balance in terms of disk usage because while chunks counts are balanced, some of those chunks are empty and will never again hold data. I understand why it would be difficult to automate this, but I'm wondering if there's any harm in removing chunks manually.
>Essentially, can I stop balancing on the cluster, delete some of the empty chunks from the config DB, then restart balancing? Would I need to restart mongos on the servers as well? Is this a fantastically dangerous thing to attempt?
>Thanks,
>Derek
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com See also the IRC channel -- freenode.net#mongodb
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com See also the IRC channel -- freenode.net#mongodb
If you want to keep certain documents for a certain amount of time depending on a Date attribute, you can use a TTL Collection. This collection uses a special index and it automatically deletes "expired" documents. http://docs.mongodb.org/manual/tutorial/expire-data/
> Could you elaborate why it's not recommended to remove empty chunks.
> We are almost exactly in the same situation, our shard key is based > partially on timestamp. In our case, we delete old data and have no need > to archive it. With our shard key and data, we will never see data hit an > empty chunk (ie, one which we deleted data from).
> In general it's no recommended to delete chunks even if they are empty.
> As you mentioned there is SERVER ticket on Jira to implement this in a > secure manner but it's not being planned yet. > You can of course vote to increase the chances of being developed first.
> Though my suggestion would be to move the data you want to archive into > another database or cluster. > When adding more space the empty chunks will be reused.
> Gianfranco
> On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
> I've looked at https://jira.mongodb.org/ browse/SERVER-2487<https://jira.mongodb.org/browse/SERVER-2487>and we're in a similar situation. Part of our sharding key is a timestamp > which works fine for use because the rest of the key makes write balance > out across the cluster. However, we're at a point now where we need to > archive off older data, and our shards are starting to get out of balance > in terms of disk usage because while chunks counts are balanced, some of > those chunks are empty and will never again hold data. I understand why it > would be difficult to automate this, but I'm wondering if there's any harm > in removing chunks manually.
> Essentially, can I stop balancing on the cluster, delete some of the empty > chunks from the config DB, then restart balancing? Would I need to restart > mongos on the servers as well? Is this a fantastically dangerous thing to > attempt?
> Thanks,
> Derek
> -- > You received this message because you are subscribed to the Google > Groups "mongodb-user" group. > To post to this group, send email to mongod...@googlegroups.com<javascript:> > To unsubscribe from this group, send email to > mongodb-user...@googlegroups.com <javascript:> > See also the IRC channel -- freenode.net#mongodb
> -- > You received this message because you are subscribed to the Google > Groups "mongodb-user" group. > To post to this group, send email to mongod...@googlegroups.com<javascript:> > To unsubscribe from this group, send email to > mongodb-user...@googlegroups.com <javascript:> > See also the IRC channel -- freenode.net#mongodb
Just as a follow up, I manually merged/removed about 1400 chunks last night and the rebalance is complete now. Disk usage dropped by the expected amount across our shards and things are much happier :)
On Monday, October 22, 2012 3:56:15 AM UTC-6, Gianfranco wrote:
> Hi Otis,
> If you want to keep certain documents for a certain amount of time > depending on a Date attribute, you can use a TTL Collection. > This collection uses a special index and it automatically deletes > "expired" documents. > http://docs.mongodb.org/manual/tutorial/expire-data/
> Regards, > Gianfranco
> On Thursday, October 18, 2012 1:35:54 PM UTC+1, otis...@yahoo.com wrote:
>> Could you elaborate why it's not recommended to remove empty chunks.
>> We are almost exactly in the same situation, our shard key is based >> partially on timestamp. In our case, we delete old data and have no need >> to archive it. With our shard key and data, we will never see data hit an >> empty chunk (ie, one which we deleted data from).
>> In general it's no recommended to delete chunks even if they are empty.
>> As you mentioned there is SERVER ticket on Jira to implement this in a >> secure manner but it's not being planned yet. >> You can of course vote to increase the chances of being developed first.
>> Though my suggestion would be to move the data you want to archive into >> another database or cluster. >> When adding more space the empty chunks will be reused.
>> Gianfranco
>> On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
>> I've looked at https://jira.mongodb.org/ browse/SERVER-2487<https://jira.mongodb.org/browse/SERVER-2487>and we're in a similar situation. Part of our sharding key is a timestamp >> which works fine for use because the rest of the key makes write balance >> out across the cluster. However, we're at a point now where we need to >> archive off older data, and our shards are starting to get out of balance >> in terms of disk usage because while chunks counts are balanced, some of >> those chunks are empty and will never again hold data. I understand why it >> would be difficult to automate this, but I'm wondering if there's any harm >> in removing chunks manually.
>> Essentially, can I stop balancing on the cluster, delete some of the >> empty chunks from the config DB, then restart balancing? Would I need to >> restart mongos on the servers as well? Is this a fantastically dangerous >> thing to attempt?
>> Thanks,
>> Derek
>> -- >> You received this message because you are subscribed to the Google >> Groups "mongodb-user" group. >> To post to this group, send email to mongod...@googlegroups.com >> To unsubscribe from this group, send email to >> mongodb-user...@googlegroups.com >> See also the IRC channel -- freenode.net#mongodb
>> -- >> You received this message because you are subscribed to the Google >> Groups "mongodb-user" group. >> To post to this group, send email to mongod...@googlegroups.com >> To unsubscribe from this group, send email to >> mongodb-user...@googlegroups.com >> See also the IRC channel -- freenode.net#mongodb
> Just as a follow up, I manually merged/removed about 1400 chunks last
> night and the rebalance is complete now. Disk usage dropped by the expected
> amount across our shards and things are much happier :)
> On Monday, October 22, 2012 3:56:15 AM UTC-6, Gianfranco wrote:
>>> Could you elaborate why it's not recommended to remove empty chunks.
>>> We are almost exactly in the same situation, our shard key is based
>>> partially on timestamp. In our case, we delete old data and have no need
>>> to archive it. With our shard key and data, we will never see data hit an
>>> empty chunk (ie, one which we deleted data from).
>>> In general it's no recommended to delete chunks even if they are empty.
>>> As you mentioned there is SERVER ticket on Jira to implement this in a
>>> secure manner but it's not being planned yet.
>>> You can of course vote to increase the chances of being developed first.
>>> Though my suggestion would be to move the data you want to archive into
>>> another database or cluster.
>>> When adding more space the empty chunks will be reused.
>>> Gianfranco
>>> On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
>>> I've looked at https://jira.mongodb.org/ browse/SERVER-2487<https://jira.mongodb.org/browse/SERVER-2487>and we're in a similar situation. Part of our sharding key is a timestamp
>>> which works fine for use because the rest of the key makes write balance
>>> out across the cluster. However, we're at a point now where we need to
>>> archive off older data, and our shards are starting to get out of balance
>>> in terms of disk usage because while chunks counts are balanced, some of
>>> those chunks are empty and will never again hold data. I understand why it
>>> would be difficult to automate this, but I'm wondering if there's any harm
>>> in removing chunks manually.
>>> Essentially, can I stop balancing on the cluster, delete some of the
>>> empty chunks from the config DB, then restart balancing? Would I need to
>>> restart mongos on the servers as well? Is this a fantastically dangerous
>>> thing to attempt?
>>> Thanks,
>>> Derek
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@googlegroups.**com
>>> See also the IRC channel -- freenode.net#mongodb
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@googlegroups.**com
>>> See also the IRC channel -- freenode.net#mongodb
>>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
I should have kept my mouth shut. Shard metadata started corrupting this
morning (chunk min/max keys were reversed on a large number of chunks)
after some chunk splits. After a long battle I ended up having to dump the
data off of each shard, wipe the database from the cluster, recreate the DB
and indices, and I'm in the process now of restoring data. Not something
I'd recommend doing again.
On Fri, Nov 2, 2012 at 9:18 AM, Ted <underhi...@gmail.com> wrote:
> Derek - How did you accomplish the merge and remove of chunks?
> To remove the chunk, are you just removing it from the config.chunks
> collection?
> On Thu, Oct 25, 2012 at 11:53 AM, Derek Chen-Becker <dchenbec...@gmail.com
> > wrote:
>> Just as a follow up, I manually merged/removed about 1400 chunks last
>> night and the rebalance is complete now. Disk usage dropped by the expected
>> amount across our shards and things are much happier :)
>> On Monday, October 22, 2012 3:56:15 AM UTC-6, Gianfranco wrote:
>>>> Could you elaborate why it's not recommended to remove empty chunks.
>>>> We are almost exactly in the same situation, our shard key is based
>>>> partially on timestamp. In our case, we delete old data and have no need
>>>> to archive it. With our shard key and data, we will never see data hit an
>>>> empty chunk (ie, one which we deleted data from).
>>>> In general it's no recommended to delete chunks even if they are empty.
>>>> As you mentioned there is SERVER ticket on Jira to implement this in a
>>>> secure manner but it's not being planned yet.
>>>> You can of course vote to increase the chances of being developed first.
>>>> Though my suggestion would be to move the data you want to archive into
>>>> another database or cluster.
>>>> When adding more space the empty chunks will be reused.
>>>> Gianfranco
>>>> On Monday, October 8, 2012 6:03:57 PM UTC+1, Derek Chen-Becker wrote:
>>>> I've looked at https://jira.mongodb.org/ browse/SERVER-2487<https://jira.mongodb.org/browse/SERVER-2487>and we're in a similar situation. Part of our sharding key is a timestamp
>>>> which works fine for use because the rest of the key makes write balance
>>>> out across the cluster. However, we're at a point now where we need to
>>>> archive off older data, and our shards are starting to get out of balance
>>>> in terms of disk usage because while chunks counts are balanced, some of
>>>> those chunks are empty and will never again hold data. I understand why it
>>>> would be difficult to automate this, but I'm wondering if there's any harm
>>>> in removing chunks manually.
>>>> Essentially, can I stop balancing on the cluster, delete some of the
>>>> empty chunks from the config DB, then restart balancing? Would I need to
>>>> restart mongos on the servers as well? Is this a fantastically dangerous
>>>> thing to attempt?
>>>> Thanks,
>>>> Derek
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@googlegroups.**com
>>>> See also the IRC channel -- freenode.net#mongodb
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@googlegroups.**com
>>>> See also the IRC channel -- freenode.net#mongodb
>>>> --
>> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongodb-user@googlegroups.com
>> To unsubscribe from this group, send email to
>> mongodb-user+unsubscribe@googlegroups.com
>> See also the IRC channel -- freenode.net#mongodb
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb