Need some advice on moving from replica set to sharded cluster

Travis Beauvais

unread,

Sep 4, 2015, 6:28:42 PM9/4/15

to mongodb-user

I am in the process of moving a replica set to a sharded cluster and have a couple questions.

Things to know:

- I waited way too long to do this. It should have been done months ago.

- The total data size is about 800gb. The collection I am sharding is about 85% of that and about 600m documents.

Yesterday I attempted to make this switch and the mongos nearly died. Here is what I did.

1. Stopped all clients connecting to the replica set.

2. Added the replica set as a shard through the mongos.

3. sh.enableSharding() for the database.

4. sh.shardCollection() for the collection.

5. Switched all clients to point to the mongos.

6. Turned on clients.

This all worked fine for about 20min. Then suddenly all clients froze. The mongos was reporting 20k+ open connection. After a few minutes the mongos froze. I couldn't even ssh into the machine for a while. At this point I switched all clients back to the replica set.

How long after enabling sharding should it be safe to start connecting through the mongos? Do I need to wait for the initial "balancing" to be over?

After sh.shardCollection(), is it still safe for clients to connect directly to the replica set?

Message has been deleted

Wan.Bachtiar

unread,

Sep 8, 2015, 12:58:27 AM9/8/15

to mongodb-user

Hi Travis,

For further investigation on this, would you be able to provide the following information:

1. What version of MongoDB and OS are you using ?

2. How many mongos instances are there? How many connections does the client make to the mongos ?

3. How many shards are there ?

4. What are the ulimits on the mongos set to ?

How long after enabling sharding should it be safe to start connecting through the mongos? Do I need to wait for the initial "balancing" to be over?

It should be safe to connect through the mongos right away. However with a collection that size (>400GB), you will want to take a look at increasing the chunk size to let the initial collection split return successfully.

Have a read of this blog post: sharding pitfalls

On point 7, it mentions about sharding a collection greater than 400GB.

After sh.shardCollection(), is it still safe for clients to connect directly to the replica set?

In a sharded cluster, your application clients should only interact through the mongos.

If you could provide the extra information requested above, it will help to provide recommendations on the next steps.

Regards,

Wan

Travis Beauvais

unread,

Sep 8, 2015, 11:49:05 AM9/8/15

to mongodb-user

1. Mongo 3.0.5 and Amazon Linux (CentOS essentially)

2. 1 mongos instance. The first time I attempted to switch to sharding, the # of connections shot up to over 20k after I turned on the clients. The second time the # of connections never went over 100.

3. Right now there is only 1 shard. Still attemping to move to a sharded cluster so I can add more shards.

4. Here are my ulimit settings.

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 30002

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 65536

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 30002

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

Thanks for pointing to the chunk size issue. For some reason I was under the impression chunk size couldn't be changed but maybe I had that confused with a different setting. I'll give that a try and see how it goes.

Travis Beauvais

unread,

Sep 9, 2015, 2:13:45 AM9/9/15

to mongodb-user

So I tried again. This time with the chunksize much higher. When I run sh.shardCollection() it just hangs. Doesn't do anything. There is nothing in the mongos log, the config logs or the mongod logs.

I also saw someone else mentioning something about the index and that reminded me that the index I have on the collection has 3 fields {a: 1, b: 1, c: -1}. But you can't have a field in a shard key in descending order so the shard key only {a: 1, b: 2}. Docs say this is ok but wanted to mention it.

Wan Bachtiar

unread,

Sep 14, 2015, 5:03:30 AM9/14/15

to mongodb-user

Hi Travis,

Thanks for the extra information. The ulimit settings look fine to me.

Did you change the chunk size through the mongos startup options or the config db ?

Can you confirm the current chunk size value using:

db.getSiblingDB("config").settings.find({_id:"chunksize"})

Here are a few other things to check and/or try:

1) Check for config.lock in the config database.

db.getSiblingDB("config").locks.find()

Find out whether there is an existing lock for the database/collection. The lock is to ensure that only one mongos instance can perform administrative tasks on the cluster at once. There may be a lock from a previous failed/aborted shardCollection process preventing the current one.

2) Check for the total collection index size using collStats

db.collection.stats()

Compare the size of the collection index versus the machine's RAM size. As the mongos is splitting the collection based on the index,

the machine maybe running out of memory tying to load a compound index of ~800GB of collection.

What is the total RAM size for the mongod instance?

3) Try spawning a dedicated mongos instance for the shardCollection().

For your scenario, where there is only 1 mongos instance and sharding a large collection (>400GB),

a separate mongos instance could improve the responsiveness to the mongos clients during the shardCollection().