Unbalanced sharding

63 views

Skip to first unread message

Pablo Torres

unread,

May 7, 2012, 6:01:02 PM5/7/12

to mongod...@googlegroups.com

Hello all,

I have a blog collections and a post collections in 2 shards (each of which is a replica set consisting of 2 nodes and an arbiter); posts have a "foreign key" to a blog, meaning that they have a "blog_id" field with the ObjectID of an actual blog. Posts have a content of at least 1,000 characters and the chunkSize is 1 (for experimental purposes).

I have issued these commands:

db.runCommand( { shardcollection : "blogs.blogs_post" , key : { blog_id : 1 } })

db.runCommand( { shardcollection : "blogs.blogs_blog" , key : { _id : 1 } });

but with 100,000 blogs and 510,000 posts (with blogs having at most 12 posts), one of my shards has only 586 posts and the other has all of the rest. Why does mongo find no reason to split the posts in half or so and save one group in one shard and the other in the second shard? What would it take for the shards to have about the same amount of posts? Do blogs need more posts, say I keep only 500 blogs and give 1,000 posts or more each?

Thanks.

~ Pablo

Richard Kreuter

unread,

May 10, 2012, 3:07:40 PM5/10/12

to mongodb-user

Hi Pablo,

There are a few things going on here.

First, as you're using ObjectIds for the sharding key, at any
particular moment in time, all inserts will go to one shard. There's
always one shard that owns the maximum chunk for a sharded collection,
and so if the values for the shard key field are always going up, then
inserts will always get routed to the shard that holds that maximum
chunk. As it happens, MongoDB's ObjectIds are always going up (their
4 high-order bytes are just a timestamp, so they increment every
second). So all your inserts will go to one shard at any particular
point in time (though there are some mechanisms that should move the
inserts to other shards over time).

Second, migration of chunks is intentionally kept rather slow, in
order not to overload the system with balancing operation. So it can
take a while for a system to get into a balanced state, if it gets
unbalanced.

Third, at most one chunk can migrate at a time to or from any shard
(in other words, each shard participates in only one migration at a
time). In a cluster with lots of shards, this means you can have many
migrations at a time, but in a cluster with just two shards, at most
one migration can be going on... which, again, slows down the time to
get a system balanced.

Anyway, to get out of this situation, you might consider pre-splitting
in advance of future inserts:

http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks

and perhaps to watch the locks collection in the config database to
determine whether there are migrations happening, in general:

http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing

Hope that helps,
Richard

Reply all

Reply to author

Forward

0 new messages