mongo rebalancer failing.

385 views
Skip to first unread message

Saurabh Bhartia

unread,
Apr 5, 2016, 5:23:46 PM4/5/16
to mongodb-user


I have unbalanced shared 


I am getting  

Failed with error 'chunk too big to move', from rs3 to rs2


anyone has faced this or suggestions ?

greatly appreciated.



mongos> sh.status()

--- Sharding Status --- 

  sharding version: {

"_id" : 1,

"minCompatibleVersion" : 5,

"currentVersion" : 6,

"clusterId" : ObjectId("554a73b2ba9a99ed6b0098f8")

}

  shards:

{  "_id" : "rs0",  "host" : "rs0/mdb-hellcat3:27017,mdb-hellcat7:27017" }

{  "_id" : "rs1",  "host" : "rs1/mdb-hellcat5:27017,mdb-hellcat9:27017" }

{  "_id" : "rs2",  "host" : "rs2/mdb-hellcat4:27017,mdb-hellcat8:27017" }

{  "_id" : "rs3",  "host" : "rs3/mdb-hellcat10:27017,mdb-hellcat6:27017",  "maxSize" : 25000 }

{  "_id" : "rs4",  "host" : "rs4/mdb-hellcat12:27017,mdb-hellcat2:27017" }

  balancer:

Currently enabled:  yes

Currently running:  yes

Balancer lock taken at Tue Apr 05 2016 12:22:52 GMT-0700 (PDT) by mdb-hellcat1:27017:1440526274:1804289383:Balancer:846930886

Balancer active window is set between 17:00 and 05:00 server local time

Collections with active migrations: 

sinkhole.sinkhole_log started at Tue Apr 05 2016 12:22:53 GMT-0700 (PDT)

Failed balancer rounds in last 5 attempts:  0

Migration Results for the last 24 hours: 

7 : Success

1 : Failed with error 'could not acquire collection lock for sinkhole.sinkhole_log to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : MinKey }, { : MaxKey }) in sinkhole.sinkhole_log is taken.', from rs3 to rs2

4148 : Failed with error 'data transfer error', from rs3 to rs2

3 : Failed with error 'chunk too big to move', from rs3 to rs1

4 : Failed with error 'chunk too big to move', from rs3 to rs2

1521 : Failed with error 'data transfer error', from rs3 to rs1

  databases:

{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

{  "_id" : "test",  "partitioned" : false,  "primary" : "rs3" }

{  "_id" : "dns",  "partitioned" : true,  "primary" : "rs3" }

dns.a

shard key: { "ip" : 1 }

chunks:

rs0 723

rs1 722

rs2 722

rs3 722

rs4 2918

too many chunks to print, use verbose if you want to force print

{  "_id" : "sinkhole",  "partitioned" : true,  "primary" : "rs3" }

sinkhole.sinkhole_log

shard key: { "src_ip" : 1 }

chunks:

rs0 10429

rs1 10412

rs2 10411

rs3 26947

rs4 10416

too many chunks to print, use verbose if you want to force print

{  "_id" : "cr",  "partitioned" : false,  "primary" : "rs4" }

{  "_id" : "vt",  "partitioned" : false,  "primary" : "rs0" }

{  "_id" : "information",  "partitioned" : false,  "primary" : "rs0" }

{  "_id" : "domain",  "partitioned" : false,  "primary" : "rs0" }

{  "_id" : "pdns",  "partitioned" : false,  "primary" : "rs0" }

{  "_id" : "hostname",  "partitioned" : false,  "primary" : "rs0" }

{  "_id" : "db",  "partitioned" : false,  "primary" : "rs1" }

{  "_id" : "pandb_release",  "partitioned" : false,  "primary" : "rs1" }

{  "_id" : "ip_information",  "partitioned" : false,  "primary" : "rs1" }

{  "_id" : "infomation",  "partitioned" : false,  "primary" : "rs2" }

{  "_id" : "informatioin",  "partitioned" : false,  "primary" : "rs2" }


Kevin Adistambha

unread,
Apr 20, 2016, 3:41:45 AM4/20/16
to mongodb-user

Hi Saurabh.

I am getting Failed with error ‘chunk too big to move’, from rs3 to rs2

The error “chunk too big to move” is due to MongoDB balancer being unable to move a chunk from one shard to another. There are two possible reasons that could cause this error:

If MongoDB cannot split a chunk, it will be marked as “jumbo”. You may see these jumbo-marked chunks in the output of sh.status({verbose:true}).

The main reason for “chunk too big to move” error is sub-optimal selection of the shard key, where it’s either:

  1. has a low cardinality, or
  2. has a non-equal distribution.

From the following two lines in your sh.status() output. In your case, I believe it’s due to the second reason:

shard key: { “ip” : 1 }

shard key: { “src_ip” : 1 }

If your application logs IP data in MongoDB, and some ip or src_ip are more common than others, you will create a large chunk that cannot be split. E.g., if you record a client’s IP address every time the client connects to you, then after 250,000 connections, the chunk that contains the client’s IP address now contains too many documents to be split.

Unfortunately once you select a shard key, it is immutable. There is no easy fix for this issue, other than dumping the collection and reload them with another shard key.

For topics concerning shard key selection, you might find this links informative: Considerations for Selecting Shard Keys.

Also it is worth mentioning for your case that a shard key like {"ip": "hashed"} does not solve the problem of cardinality, since the same IP address will be hashed to the same value and thus does not increase your key space at all.

I also would like to point out the following two lines from your sh.status() output:

4148 : Failed with error ‘data transfer error’, from rs3 to rs2

1521 : Failed with error ‘data transfer error’, from rs3 to rs1

There may be connectivity issues between rs3 to other shards in the cluster, since the balancer tries to move chunks out of shard rs3 and failed so many times due to data transfer error.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages