Hmmm... I'm a bit stumped on this one.
I've got two collections that were across 23 shards. I just added 20
new shards as I was running out of space. One of the collections has
rebalanced normally, but the other appears to have stalled. One shard
has 15 chunks, some have 14 other have 8 and 9.
Sharding status;
http://pastebin.com/PusmJCuU
In cool graph form:
http://i.imgur.com/9ryVx.png
tc.twitter_cache is as expected, co.connections is clearly not. It's
been stuck like this for a good 24 hours after about 20 hours of
normal rebalancing.
There's no stale locks in the locks table (I had this problem once
before and cleaned out the locks to get the process started again, but
none in there this time). Only the balancer;
Array
(
[_id] => balancer
[process] => mffront6:28349:1318391291:1804289383
[state] => 0
[ts] => MongoId Object
(
[$id] => 4e9597169feb93344edceaef
)
[when] => MongoDate Object
(
[sec] => 1318426390
[usec] => 32000
)
[who] => mffront6:28349:1318391291:1804289383:Balancer:846930886
[why] => doing balance round
)
I can't find anything unusual in the log files anywhere but with 70
odd distributed log files it's tough to figure out where to look. Is
there anything I can grep for?
Cheers,
James