pymongo sharded cluster failover

79 views
Skip to first unread message

Umesh Deshpande

unread,
Mar 4, 2017, 3:15:54 AM3/4/17
to mongodb-user
Hi,

How can I use pymongo (or any other driver) to start a sharded cluster. Pymongo allows us to connect to mongos/es (query routers), which can then connect to multiple shards. But how can then pymongo detect a primary node (mongod) failure, since it's only provided with mongos IP:port and not provided that of mongods when starting the MongoClient.

E.g. client = MongoClient('mongodb://host1,host2,host3')

Thanks,
Umesh

Bernie Hackett

unread,
Mar 4, 2017, 1:15:48 PM3/4/17
to mongodb-user
PyMongo doesn't have to know about failovers in a sharded  cluster. mongos deals with failovers. PyMongo just talks to mongos.

Umesh Deshpande

unread,
Mar 4, 2017, 2:31:35 PM3/4/17
to mongodb-user
Thanks for the response.

pymongo, when connected directly (w/o mongos) to the replica set, monitors the shard servers. It can detect a failure and block IO operations until the failover is complete. Would we see the same behavior if we connect to mongos through pymongo, or would it throw an exception, which the application has to handle and retry? I wanted the pymongo's blocking/monitoring capability for the shard servers, so the apps don't have to retry.

- Umesh

Bernie Hackett

unread,
Mar 4, 2017, 2:43:44 PM3/4/17
to mongodb-user
mongos monitors the shard servers using the same algorithm as PyMongo. They should both behave the same way. That said, "so the apps don't have to retry" is incorrect. Regardless of connecting PyMongo to a mongos or directly to a replica set, your application still has to handle write errors and potentially retry write operations. PyMongo and mongos can't predict a replica set failover, only attempt to lessen the impact once a failover occurs.

Umesh Deshpande

unread,
Mar 4, 2017, 7:33:33 PM3/4/17
to mongodb-user
I misspoke, yes, the apps have to retry, but with pymongo, the apps have to retry only once (after receiving an exception), on the second try pymongo keeps the IO enqueued until the failover is complete [1]. With mongos, each time an exception is returned, so the apps have to keep retrying until the failover is complete.
Since posting here I found a code from Jesse Davis (from MongoDB), where in pymongo driver, using MongoClient, he connects to the mongos as well as all the shard servers (mongods) [2]. Is that the way to implement a sharded cluster?

- Umesh

Bernie Hackett

unread,
Mar 6, 2017, 10:41:53 AM3/6/17
to mongodb-user
If all you want is a python script to start up a sharded cluster on localhost Jesse's script will do it. mongo-orchestration is another option.

If instead you are trying to build an application in python that connects to a sharded cluster, PyMongo must connect to mongos, not directly to the shards. The reason is that PyMongo doesn't know how chunks of data are mapped to shards, so it doesn't know how to resolve queries or how to route write operations, or any of the other things that are handled by mongos and the config servers.

Umesh Deshpande

unread,
Mar 8, 2017, 1:09:54 PM3/8/17
to mongodb-user
I meant that, as in the AJ Davis's code, the pymongo can connect to all the shard servers only for monitoring, but perform the IO through the MongoClient connected to the mongos. I was wondering under such a setup, would the IO requests to the mongos be enqueued by pymongo if any shard server failed.

- Umesh

Bernie Hackett

unread,
Mar 20, 2017, 4:33:33 PM3/20/17
to mongodb-user
PyMongo does not monitor or even connect to the individual shards. When authentication is enabled, or the sharded cluster is running in a cloud services platform, PyMongo can't monitor or connect to the individual shards even if we wanted it to. The mongos instances monitor the shards, not the client driver.
Reply all
Reply to author
Forward
0 new messages