Re: [mongodb-user] Pre-splitting and manual chunks balancing

277 views
Skip to first unread message

Scott Hernandez

unread,
Sep 3, 2012, 10:40:55 AM9/3/12
to mongod...@googlegroups.com
If you aren't using the balancer then splitting does not result in
anything moving shards. Splitting chunks is a completely logical
operation -- you need to then move the chunks to distribute them.

You may want to disable the balancer while importing, and then enable
it later to evenly distribute the chunks after the import.

On Mon, Sep 3, 2012 at 9:20 AM, mthenw <maciej....@gmail.com> wrote:
> Hi,
>
> I need some clarification about sharding, pre-splitting and chunks
> balancing. I have 2 shards with balancer turned off and presplit set
>
> db.runCommand( { split : "example.users" , middle : { _id : 5000 } } )
>
> I assume that every document with _id less than 5000 will go to shard 1 and
> every document with _id greater than 5000 will go to shard 2.
>
> My question is:
> if I want to add another shard first I need to move chunks manually and then
> change split options or changing split options will cause automatic chunk
> migration?
>
> PS I don't want use balancer because it slows down while importing large
> data sets.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

mthenw

unread,
Sep 4, 2012, 3:46:29 AM9/4/12
to mongod...@googlegroups.com
Thanks for your answer and I have another two. 

Is it a good practice to turn on balancer after large data import and turn it off before another import? 

Documentation says that balancer window must be sufficient to complete the migration. What happens when balancer will not migrate all data? 

Adam C

unread,
Sep 4, 2012, 3:59:35 AM9/4/12
to mongod...@googlegroups.com
Is it a good practice to turn on balancer after large data import and turn it off before another import? 

That depends - if your cluster has the capacity to both balance and do the import then that is preferred, since the writes are more likely to be more evenly distributed and the data is balanced in increments rather than a large batch (which can take a long time - each shard can take part in only a single migration at a time).  If the import is causing extremely heavy load, then turning off the balancer can help by freeing up the resources used to balance.  It's a judgement call based on your cluster, your needs etc.

 Documentation says that balancer window must be sufficient to complete the migration. What happens when balancer will not migrate all data? 

The balancer will run until all in-flight migrations are complete, then stop - your data will remain in that state until you turn the balancer on again.  If you do another import in the meantime, then the data will become more unbalanced and you will essentially repeat this pattern forever (i.e. your data will never be balanced).  Hence the note in the docs - it won't break anything per se, but your data will remain unbalanced from a shard perspective.  You can see the chunk distribution with sh.status() from the shell.

Adam
Reply all
Reply to author
Forward
0 new messages