behaviour of balancer in mongodb sharding

63 views
Skip to first unread message

ANKIT UPADHYAY

unread,
Jan 23, 2018, 1:08:51 AM1/23/18
to mongodb-user
Hi All,

I was experimenting with mongo sharding. The collection has shard key as {policyId,startTime}.
policyId - java UUID (limited values,lets say 50)
startTime - monotonically increasing time.

After inserting around 30M(32 GB) documents in the collection : Below is the data distribution:
shard key: { "policyId" : 1, "startDate" : 1 }
unique: false
balancing: true
chunks:
sharda 63
shardb 138

During insertion sh.isBalancerRunning() was giving 'false' as result. When i stopped inserting more documents, balancer started moving chunks.After that i got even distribution of data.

Below are my concerns regarding balancer:
1. If insertion in db is stopped, then only balancer is active and started moving chunks. If i insert more data for longer duration which will create more chunks and data will be more skewed.Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks ?
2. I was able notice spikes in write latency if data is getting inserted after 20M docs. Does it mean balancer is moving some of the chunks intermittently?
3. Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API will also give incorrect result(duplicate docs) ?

If is possible could any one share any documentation/blog for mongo balancer for better understanding.

Regards,
Ankit

Weishan Ang

unread,
Jan 23, 2018, 11:16:28 AM1/23/18
to mongodb-user

ANKIT UPADHYAY

unread,
Jan 24, 2018, 12:33:32 AM1/24/18
to mongodb-user
Hi Weishan,

Thank you for the reply.
I have already gone through mongo documentation,but they have not mentioned internal behavior of balancer and how does balancer actually works for different scenario(High volume data, chunk migration etc).

Kevin Adistambha

unread,
Feb 13, 2018, 12:47:37 AM2/13/18
to mongodb-user

Hi Ankit

In addition to the links Weishan provided, there is a Wiki page on MongoDB’s github repository: https://github.com/mongodb/mongo/wiki

You may be interested in particular in the Sharding Internals part of the document.

Regarding your questions:

If insertion in db is stopped, then only balancer is active and started moving chunks. If i insert more data for longer duration which will create more chunks and data will be more skewed.Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks ?

This is answered in the Balancer section of the Wiki.

I was able notice spikes in write latency if data is getting inserted after 20M docs. Does it mean balancer is moving some of the chunks intermittently?

Possibly. It is also possible that you overloaded the hardware resources of your test deployment.

Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API will also give incorrect result(duplicate docs) ?

No. Each shard will only return part of the collection that it is responsible for, as recorded in the config servers. Note that this is true if you perform the query via the mongos process. Result may be inconsistent if you connect directly to the shard servers. It is strongly not recommended to connect directly to the shards.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages