3.4.7 VERY SLOW chunks balancing to new shards

382 views
Skip to first unread message

Nikolay Dmitriev

unread,
Nov 30, 2017, 10:59:34 AM11/30/17
to mongodb-user
Hi, all.

We have a sharded cluster with 3 big shards (~100 Gb per shard of data) which are replicaSets.
There are about ~10 000 chunks distributed evenly among these 3 shards.

We added +13 more shards which are replicaSets and started the balancer (it was disabled).
All the options are by default (it doesn't wait for replication or deletes), chunk size is 64 Mb.
Still the chunk migration is VERY SLOW - 3 in parallel maximum I believe (from 3 old to 13 new) since v3.4.

In 4 hours only ~350 chunks are moved, which is maximum 18-19 Gb of data + several indexes.

shard key: { "---" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                                UE_RS_0          3543
                                UE_RS_1          3243
                                UE_RS_10        27
                                UE_RS_11        27
                                UE_RS_12        27
                                UE_RS_13        27
                                UE_RS_14        26
                                UE_RS_15        27
                                UE_RS_2          3268
                                UE_RS_3          28
                                UE_RS_4          27
                                UE_RS_5          28
                                UE_RS_6          27
                                UE_RS_7          27
                                UE_RS_8          27
                                UE_RS_9          27

We removed all load from the cluster, there's nothing but the balancing is happening right now.
I understand that mongo does some ACID stuff under the hood but it's dramatically slow.

questions:
1. What can be the reason of such slow balancing?
2. Are there any ways of making it happen faster?

Nikolay Dmitriev

unread,
Nov 30, 2017, 11:05:30 AM11/30/17
to mongodb-user
As if there's some kind of throttling happening. Which I don't need in my case at all )))

четверг, 30 ноября 2017 г., 18:59:34 UTC+3 пользователь Nikolay Dmitriev написал:

Rhys Campbell

unread,
Dec 1, 2017, 2:22:49 AM12/1/17
to mongodb-user
Perhaps pre-splitting will work better for your situation...


Nikolay Dmitriev

unread,
Dec 1, 2017, 2:48:58 AM12/1/17
to mongodb-user
You mean use moveChunk command? Why is it faster? I mean, balancer calls the same command, I guess. Am I wrong?

I suppose it is very, very non-trivial in our case, because we have hashed sharding key. I'm afraid we can do a mistake with ranges based on hashed values.

пятница, 1 декабря 2017 г., 10:22:49 UTC+3 пользователь Rhys Campbell написал:

Weishan Ang

unread,
Dec 1, 2017, 10:07:26 AM12/1/17
to mongodb-user
if you check the logs using show log, you should be able to see the speed of the chunk migration. 

IIRC, in MongoDB 3.2 version, it takes around 3s-4s to move a chunk. 

Nikolay Dmitriev

unread,
Dec 5, 2017, 6:20:25 AM12/5/17
to mongodb-user
I checked changelog collection on config database.
It seems that step 4/7 is always very slow - about 60-120 seconds.

{
        "_id" : "xxx-2017-12-05T10:44:22.600+0000-5a26788604de808362a33850",
        "server" : "xxx",
        "clientAddr" : "10.8.15.158:56362",
        "time" : ISODate("2017-12-05T10:44:22.600Z"),
        "what" : "moveChunk.from",
        "ns" : "xxx.xxx",
        "details" : {
                "min" : {
                        "xxx" : NumberLong("-803150660732027739")
                },
                "max" : {
                        "xxx" : NumberLong("-802622385751591360")
                },
                "step 1 of 7" : 0,
                "step 2 of 7" : 23,
                "step 3 of 7" : 172,
                "step 4 of 7" : 69940,
                "step 5 of 7" : 40,
                "step 6 of 7" : 143,
                "step 7 of 7" : 2,
                "to" : "UE_RS_12",
                "from" : "UE_RS_1",
                "note" : "success"
        }
}

How can it be so slow for a 64Mb chunk?
I don't think this is normal behaviour.
Maybe that's because we have hashed shard key?

пятница, 1 декабря 2017 г., 18:07:26 UTC+3 пользователь Weishan Ang написал:

Weishan Ang

unread,
Dec 5, 2017, 10:14:27 AM12/5/17
to mongodb-user
Step 4 is supposed to be the slowest of all as it consist of the time taken from step 1 to step 7? on the destination.

How is the IO/CPU/network load on the destination?

Nikolay Dmitriev

unread,
Dec 12, 2017, 7:51:05 AM12/12/17
to mongodb-user
Thanks everybody for answers.

We finally managed to raise up the speed of balancing from the problem shard by

1. dropping all the secondaries from that's shard replicaSet
2. live-migrating VM to another comparetively off-loaded physical host (the original host's VMs were struggling for resources with each other)
3. and adding temporarily x4 memory to VM

Since we've done that, migration became ~x5 faster.

вторник, 5 декабря 2017 г., 18:14:27 UTC+3 пользователь Weishan Ang написал:
Reply all
Reply to author
Forward
0 new messages