At the moment I believe Mongo's faiover rate is a problem that needs more work. In an active, demanding production system 20+ seconds, especially at a time of peak or highly visible usage, is a significant problem. In reading this and other posts on this topic, I see a couple of things:
- Concern that allowing administrative configuration of the timeout value would lead to flapping because of false assertions of failover because of intermittent network issues or delays.
- A comparison of Mongo's failover speed versus the speed with which the OS/TCP detects a socket timeout.
For the second, the comparison may be factually accurate, but it is not a good comparison - the comparison should be to how fast competitive database products respond and recover. SqlServer in a cluster environment, for example, fails over much, much faster than 20 seconds, as does MySQL under the right configuration.
For the first - 2 responses.
- I don't see why this situation should mean that no system configuration of timeout can be done. Connection speed and network issues are local to an installation. Ping speed between our primary and secondary instance are in the 1-2 ms range - leaving lots of room for network issues to clear up even if I were able to set the heartbeat to be 100 or 200 ms. I don't know the optimal setting right now, but whatever it is may very well be different for a different set up or installation.
- Perhaps a rethink of how MongoBD elections respond to failover could be made.that would reduce the issue - perhaps if a failover occurs and a new primary is elected - it will remain the primary until it is restarted or some new database command is issued telling it to trigger a new election under the original configuration. This would require an alteration in the voting/election mechanisms in Mongo, but - having reviewed no code ;-) - I don't think this would be a big change to anything.
Another way of saying my points is if there's a trade-off that can be made, where there is some reduction in election flexibility, some increase in administrative/DBA work in the event of a failover and recovery (ie - new commands have to be entered in the mongo shell to restore the normal election/replication configuration), and some greaer responsibility on me to tune heartbeat configuration in line with my local characteristics - and in exchange for that I get faiover more in the 500ms or 1000ms range (or even better) and not the 15-20 second range - I would take that trade off