Connection String Format for Sharded Cluster (v3.2)

264 views
Skip to first unread message

fmtriple

unread,
Nov 19, 2016, 6:20:05 PM11/19/16
to mongodb-user
I deployed a MongoDB (v3.2) cluster and I'm having issues connecting to the entire cluster via Mongo Shell. The architecture I deployed is what is detailed in "Figure 2" of this doc: http://docs.aws.amazon.com/quickstart/latest/mongodb/architecture.html. I'm able to connect to each shard (in a replica set) successfully, but I'm unable to connect to the entire cluster via the mongos router, which is running on each Primary in each shard. When I use the Mongo Shell to try a connect to each primary in a single URI, I get an error "Error: Cannot list multiple servers in URL without 'replicaSet' option". What would be the correct connection string format to connect to the entire MongoDB cluster?

Alex Penazzi

unread,
Nov 20, 2016, 10:27:52 AM11/20/16
to mongodb-user
Hello fmtriple,

Yur question is very generic and without details is almost impossible givin an help.

I suppose your test are originated locally on the mongos vm.
So when you say: "I'm able to connect to each shard" i guess you mean I am able to run mongo on each node and I am able to connect both locally and remotely on other nodes of same replicaset.

To help, it would be useful understand how you deployed such configuration:
number of shards,
configured TCP ports for mongod and mongos nodes.
did you configure in the same availability group or in different ones? did you configure security groups to allow mongodb traffic appropriately?
alex

fmtriple

unread,
Dec 2, 2016, 11:30:26 PM12/2/16
to mongodb-user
Thanks for your response, Alex. The cluster was deployed in AWS and matches the design in Figure 2 in the document I linked. It is a two way sharded cluster, and each shard is in a replica set. There are three Config servers, so the total instance count is nine. mongos is running on the primary replica member in each shard (port 27017), and mongod is running on all members (port 27018). I have all of the security groups set properly, and the cluster is functioning. I am able to connect to each replica set without an issue via the Mongo shell. I would like to connect to both shards (each in their own replica set) at the same time via the Mongo shell. If I need to clarify anything please let me know.

Aly Cabral

unread,
Dec 6, 2016, 5:42:58 PM12/6/16
to mongodb-user

Hi fmtriple,

It is best practice to keep your mongos) as close to your application server as possible. Typically a mongos is run in the same system as the application server, not in any part of a shard. You will only need to connect to a single mongos) to connect with the entire MongoDB cluster.

According to your description and the Amazon Architecture diagram. It looks like you are running v3.2 using 3 config servers and two replica sets that act as shards. In latest version of MongoDB, v3.4, it should be noted that the config servers are required to be replica sets (an option in v3.2)).

As seen in this tutorial Deploy a Sharded Cluster) under Connect a mongos to the Sharded Cluster you need setup your mongos query routers to connect to the entire cluster. As the tutorial illustrates, you will need to point your mongos to the config servers with the following command if your config servers are 3 mirrored servers:

mongos --configdb cfg1:port,cfg2:port,cfg3:port

Or if your config servers are a replica set:

mongos --configdb configreplset/cfg1:port,cfg2:port,cfg3:port

Then you can connect to the mongos to access the entire cluster:

mongo --host <hostname> --port <port>

Each mongos is aware of the entire cluster. Your application, in this case the MongoDB Shell, will only need to connect to one mongos to see everything.

Hope that helps! Please reach out with any other questions.

Aly

fmtriple

unread,
Dec 7, 2016, 10:20:43 AM12/7/16
to mongodb-user
Aly,

Thank you for the clear explanation. I do have concerns with network bottlenecking since the AWS MongoDB reference architecture uses small instance types for the Config Servers. Per the documentation, the reference deployment configured mongos on each Primary node: "This reference deployment runs one query router on primary nodes. Additional routers can be started manually, if needed." Hundreds of instances will connect to the cluster via mongos, so what option would be the preferred method? 

Aly Cabral

unread,
Dec 10, 2016, 12:38:06 AM12/10/16
to mongodb-user

Hi there,

I noticed that in my last post my links didn’t come through correctly, so here they are: mongos, to connect to a single mongos, (an option in v3.2), Deploy a Sharded Cluster

For failover you want to make sure your mongos does not live on any member of your shard. Let’s say you keep your current architecture and for some reason a primary goes down. You can elect a new primary with the remaining two nodes in the replica set, however you will no longer have access to that mongos. You will have to hunt for another active mongos and everything will need to be routed through your only remaining mongos. Now, let’s imagine both of your primary nodes go down. You will not have access to any mongos and this would effectively cut off access to the entire cluster even though the cluster would still be operational due to the high availability of replica sets.

In addition, for increased performance, you would always want your mongos as close as possible to the application making the queries. That way, you can make a short trip to the mongos to check with shard your data lives on, then make the long trip to the right shard to retrieve the data. If your mongos lives on the shard itself, your queries would take a long trip to the mongos and could realize that the shard it’s on does not contain the data needed to complete the query and will have to make another long trip to a different shard. Keeping the mongos close to the application helps ensure efficient querying. I would recommend having a mongos on each application server making the queries.

Data in config servers is relatively small compared to that of the shards. So the small instances may not be a problem. However, it is important that the config servers have good network performance. This is because every part of the cluster needs to communicate with the config server and every operation that modifies cluster metadata (for example chunk split, and chunk migration) needs to be communicated with the config servers. I would suggest monitoring this and you can always upgrade instances as required for the workload. See Capacity Planning and Hardware Provisioning for some guidance.

The example deployment from AWS should be used for reference and documentation purposes only. It is not intended to be a universal topology and you should expect to modify the architecture according to your use case. Sharding requires some careful planning. Check out Considerations before sharding. Before heading to production, it is very important to test with your use case and expected workload to determine the best topology.

Aly

Reply all
Reply to author
Forward
0 new messages