Greetings!
I have a question relative to the replicated sharded cluster
setup
(
http://docs.mongodb.org/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/)
and how it handles fail over at the shard level with replica sets.
Assuming WRITES are happening continuously on the mongo database through
the shard router from an API instance, when one of the shards (i.e.
replica sets) fails over (manually triggered via doing a rs.stepDown()
in the mongo CLI), I am noticing that even after the replica set
recovers, the router is no longer able to query (perform inserts) into
the database.
The following outlines the network topology:
API (1)
|
Routers (2)
|
Config Servers (3)
|
Primary (Shard #1)
/ \
/ \
Secondary Arbiter
The
end result is the API instance can no longer write data unless a
restart is performed on the router. When a manual restart of the router
is performed, the API instance re-connects and writes continue as
expected. It seems as if the API driver has no notion that it can't
successfully execute queries against the database. For brevity, the API
instance is node.js leveraging mongoose.js to interact with the
database.
- What is the standard operating procedure for mongo
production deployments to handle this use case?
- Restarting routers in
production doesn't seem right, has anyone deployed with this
configuration and tested failing over their replica set shards?
Thanks ahead of time for any feedback and suggestions