Changing shard key value - Do we need to rebalance?

590 views
Skip to first unread message

Kristinn Örn Sigurðsson

unread,
Mar 17, 2014, 5:23:58 AM3/17/14
to mongod...@googlegroups.com
Hi all.

We are running a sharded environment. Unfortunately we selected a bad format for the shard key for one of our collections when we started out. Today this collection has 550 million documents. We know we should never shard on object ids but we thought it would be fine to shard on ObjectId|ObjectId (since we are storing relationship between two users, we sharded on their object ids separated with a pipe).

This did not work out so well so we had to change the sharding key and now we hash it in our application code (since we can't add hashed index afterwards to the shard key in question).

Now we have changed all the document with the bad shard key format (around 100 million documents), but still sh.status(true) informs us that MongoDB is still using the old shard key values as chunk ranges, so we still have the hot shard issue.

Do we need to rebalance to fix this? Do you have any suggestions for us?

Thanks in advance,
Kristinn.

Rod Adams

unread,
Mar 18, 2014, 2:55:23 PM3/18/14
to mongod...@googlegroups.com
Heya Kristinn --

If I understand correctly, when you say that you've changed the sharding key, you mean that you have changed the format of the string you're writing to the field which is used as the shard key. However, the chunks were split according to the old format, and the system has not, as yet, seen fit to come up with a new set of splits, and you are thus still experiencing hotspots.


To understand what is (or in this case is not) happening, you should familiarize yourself with how MongoDB handles automatic chunk splitting [1]. Namely, it is looking for chunks which are above a certain size, and the next time that chunk mutates (receives an insert or update), it will be divided in two. As a later step, the balancer will attempt to maintain a roughly equal number of chunks across all of it's shards. [4]

Thus, one possible course of action is to force MongoDB to split chunks more than it has. There's two options for doing this:
- Telling MongoDB to split at a smaller chunk size. [2]
- Forcing some manual splits. [3]
The first option is far safer. The second will likely take effect sooner.
If choosing a smaller chunk size, it should only temporarily be lowered, else risking the caveats of small chunk sizes mentioned in [1].

However, this general course of action has a hidden catch. Namely, you already have several chunks that have been created, and many of them will be empty, or effectively close enough to empty. A glance at how the balancer chooses migration [4], and you'll see it attempts to balance according to the number of chunks on each shard. Thus, you're at risk of having several of these empty chunks landing on the same shard, giving that shard a disproportionately low amount of the load, making all the other shards hotter. How bad this effect will be is a function of how many chunks already exist, how many will be later created, and how the semi-random nature of balancing decides to arrange chunks to shards.
You can use db.collection.getShardDistribution() from the mongo shell to get a better feel for how the balancer is functioning for you.
There is no easy remedy to this situation, as chunks, once split, last forever. 


An alternative solution to your situation, which avoids this problem, is to create a new sharded collection, with a more suitable shard key. Then, set up a process which moves all your documents from the old collection to the new one. This solution works best if you can take some downtime, since you won't need your application to understand how to perform reads and writes across both collections at once. This solution is superior in it's end result, but likely requires much more effort to implement.

If you choose to go with the "new collection" method, there's a couple things to consider:
- Pre-splitting the chunks can allow for the initial loading to be faster. [5]
- You have a clean slate. If you supply me with a couple sample documents, as well as a description of how the documents get created, updated, and queried, I can assist you with choosing a more optimal key. There's more than just write hotspots to consider.


Hope this helps!
-- Rod

References:

Kristinn Örn Sigurðsson

unread,
Mar 20, 2014, 5:15:48 AM3/20/14
to mongod...@googlegroups.com
Hey Rod.

Thank you so much for your detailed answer!

For now we will use the alternative solution you mentioned.

Thank you so much again. I really appreciate it.

Best regards,
Kristinn.

Asya Kamsky

unread,
Mar 23, 2014, 1:34:46 AM3/23/14
to mongodb-user
You say "we have changed all the document with the bad shard key format" but the value of the shard key in a document is immutable (i.e. you cannot change it).

So I'm wondering what exactly you mean by this?

Asya



--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

s.molinari

unread,
Mar 23, 2014, 6:48:54 AM3/23/14
to mongod...@googlegroups.com
This is the MongoDB version of Rob's answer, which also explains why Asya is questioning your actions, it also explains why you still have the hot shard.


Scott
Reply all
Reply to author
Forward
0 new messages