Chunk wont move (too large) and wont manually split

Nathan Ehresman

unread,

Feb 15, 2011, 1:05:25 PM2/15/11

to mongod...@googlegroups.com

I have a chunk that fails to move, throwing the following error on the mongod
currently hosting the chunk:

can't move chunk of size (aprox) 1322124983 because maximum size allowed to move is
209715200

The shard key is a unique identifier that we generate (and stick in _id). When I
examine the min and max for the problem chunk in config.chunks and then use those
values to count the documents, I get:

> db.raw_data_adt.find({_id : {$gte : '20100423.091851-17042.2933.hl7', $lte:
'20100610.135539-24674.19100.hl7'}}).count()
391278

When I followed the instructions in the documentation
(http://www.mongodb.org/display/DOCS/Splitting+Chunks), I get this error in the shell:

> db.runCommand({split: 'db_client_14881.raw_data_adt', find: {_id:'20100601'}})

{ "cause" : { }, "ok" : 0, "errmsg" : "split failed" }

and this error on the mongos log:

want to split chunk, but can't find split point chunk ns:db_client_14881.raw_data_adt
at: shard0002:10.128.0.174:27018 lastmod: 7|1 min: { _id:
"20100423.091851-17042.2933.hl7" } max: { _id: "20100610.135539-24674.19100.hl7" }
got: <empty>

This is all that is in the mongod log:

request split points lookup for chunk db_client_14881.raw_data_adt { :
"20100423.091851-17042.2933.hl7" } -->> { : "20100610.135539-24674.19100.hl7" }
Finding the split vector for db_client_14881.raw_data_adt over { _id: 1.0 } keyCount:
457579 numSplits: 0 took 132 ms.
query admin.$cmd ntoreturn:1 command: { splitVector: "db_client_14881.raw_data_adt",
keyPattern: { _id: 1.0 }, min: { _id: "20100423.091851-17042.2933.hl7" }, max: { _id:
"20100610.135539-24674.19100.hl7" }, force: true } reslen:69 132ms

I am using 1.7.5 with 4 shards. Can anybody give me any advice about how to split
this large chunk? The data set has already been loaded and I will not be receiving
any more inserts.

Thanks!

Nathan

Eliot Horowitz

unread,

Feb 15, 2011, 3:45:06 PM2/15/11

to mongod...@googlegroups.com

Are your documents potentially of varying size or are they consistent?
Also, are you sure all mongod and mongos are on 1.7.5?
There should be a hard cap on keyCount at 250k

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.
>
>

sridhar

unread,

Feb 15, 2011, 3:46:44 PM2/15/11

to mongodb-user

Can you try the alternative syntax using middle instead of find as
specied int the presplitting section of the docs. e.g.
db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )

Nathan Ehresman

unread,

Feb 15, 2011, 4:58:41 PM2/15/11

to mongod...@googlegroups.com

Yes, the documents are of varying size but not widely varying. The average document
size is 3376 bytes. paddingFactor on the collection across all shards is 1 (we
aren't doing any updates).

I just verified with lsof that all mongod daemons are using the 1.7.5 binary.

Nathan

Nathan Ehresman

unread,

Feb 15, 2011, 5:30:08 PM2/15/11

to mongod...@googlegroups.com

Using middle did in fact work. I manually split the huge chunk into 8 pieces and the
chunks were successfully moved by the balancer.

I wonder why the balancer was failing to split the chunk itself? I dug into the code
a little bit and it looks like Chunk::pickMedianKey was probably failing and that was
causing the balancer to not find a splitPoint.

Nathan

sridhar

unread,

Feb 15, 2011, 7:19:24 PM2/15/11

to mongodb-user

Created JIRA ticket for this issue http://jira.mongodb.org/browse/SERVER-2560

Eliot Horowitz

unread,

Feb 16, 2011, 12:42:53 AM2/16/11

to mongod...@googlegroups.com

Can you try 1.7.6?
There is more logging in there.

Nathan Ehresman

unread,

Feb 16, 2011, 12:50:44 AM2/16/11

to mongod...@googlegroups.com

Well, it is working now and I'm not sure how to reproduce it. It seemed to happen
during one of our data loads where we dump ~1 million documents into a sharded
collection. It has only happened once. We will upgrade though -- good idea.

Nathan

Reply all

Reply to author

Forward