Replica Sharded Cluster D/R Inquiry

30 views

Skip to first unread message

MongoUser

unread,

Jul 31, 2015, 9:47:56 PM7/31/15

to mongodb-user

Greetings!

I have a question relative to the replicated sharded cluster setup (http://docs.mongodb.org/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/) and how it handles fail over at the shard level with replica sets. Assuming WRITES are happening continuously on the mongo database through the shard router from an API instance, when one of the shards (i.e. replica sets) fails over (manually triggered via doing a rs.stepDown() in the mongo CLI), I am noticing that even after the replica set recovers, the router is no longer able to query (perform inserts) into the database.

The following outlines the network topology:

API (1)
|
Routers (2)
               |
Config Servers (3)
            |
Primary (Shard #1)
   / \
/ \
Secondary Arbiter

The end result is the API instance can no longer write data unless a restart is performed on the router. When a manual restart of the router is performed, the API instance re-connects and writes continue as expected. It seems as if the API driver has no notion that it can't successfully execute queries against the database. For brevity, the API instance is node.js leveraging mongoose.js to interact with the database.

What is the standard operating procedure for mongo production deployments to handle this use case?
Restarting routers in production doesn't seem right, has anyone deployed with this configuration and tested failing over their replica set shards?