Failover Internals and Timing

Cwolf

unread,

Oct 8, 2012, 3:01:09 AM10/8/12

to mongod...@googlegroups.com

Hello - It seems that the replication heartbeat is not configurable. In 1.8 it was (did not check 2.0). Are there any details on how heartbeat and fail-over timings work? What is the timing on fail-overs now (it seems fast - but need to understand). We are looking to get sub second fail-over working in an environment where all machines are on a single switch.

Thanks

Cwolf

unread,

Oct 8, 2012, 10:59:43 AM10/8/12

to mongod...@googlegroups.com

Some more information:

I just did some experiments and it is taking up to 15 seconds to fail-over on a primary step down. This seems ridiculous. Does anyone know if this is configurable?

Stephen Steneker

unread,

Oct 12, 2012, 2:19:16 AM10/12/12

to mongod...@googlegroups.com

Hello - It seems that the replication heartbeat is not configurable. In 1.8 it was (did not check 2.0). Are there any details on how heartbeat and fail-over timings work? What is the timing on fail-overs now (it seems fast - but need to understand). We are looking to get sub second fail-over working in an environment where all machines are on a single switch.

Hi,

You are correct that the replication heartbeat is not configurable. The heartbeat request can either receive a response, an error, or a timeout. Failover should happen within ~20 seconds.

The current documentation is: http://www.mongodb.org/display/DOCS/Replica+Set+Internals

Failing too fast (sub-second, in particular) generally isn't a positive/desirable outcome as you can cause flapping in the event of transient network issues.

The replica set failover is generally still faster than the default TCP timeout setting (which, depending on your O/S, can be up to a few minutes).

Cheers,

Stephen

Sean Kidder

unread,

Jun 24, 2013, 10:48:41 AM6/24/13

to mongod...@googlegroups.com

Stephen:

At the moment I believe Mongo's faiover rate is a problem that needs more work. In an active, demanding production system 20+ seconds, especially at a time of peak or highly visible usage, is a significant problem. In reading this and other posts on this topic, I see a couple of things:

Concern that allowing administrative configuration of the timeout value would lead to flapping because of false assertions of failover because of intermittent network issues or delays.
A comparison of Mongo's failover speed versus the speed with which the OS/TCP detects a socket timeout.

For the second, the comparison may be factually accurate, but it is not a good comparison - the comparison should be to how fast competitive database products respond and recover. SqlServer in a cluster environment, for example, fails over much, much faster than 20 seconds, as does MySQL under the right configuration.

For the first - 2 responses.

I don't see why this situation should mean that no system configuration of timeout can be done. Connection speed and network issues are local to an installation. Ping speed between our primary and secondary instance are in the 1-2 ms range - leaving lots of room for network issues to clear up even if I were able to set the heartbeat to be 100 or 200 ms. I don't know the optimal setting right now, but whatever it is may very well be different for a different set up or installation.
Perhaps a rethink of how MongoBD elections respond to failover could be made.that would reduce the issue - perhaps if a failover occurs and a new primary is elected - it will remain the primary until it is restarted or some new database command is issued telling it to trigger a new election under the original configuration. This would require an alteration in the voting/election mechanisms in Mongo, but - having reviewed no code ;-) - I don't think this would be a big change to anything.

Another way of saying my points is if there's a trade-off that can be made, where there is some reduction in election flexibility, some increase in administrative/DBA work in the event of a failover and recovery (ie - new commands have to be entered in the mongo shell to restore the normal election/replication configuration), and some greaer responsibility on me to tune heartbeat configuration in line with my local characteristics - and in exchange for that I get faiover more in the 500ms or 1000ms range (or even better) and not the 15-20 second range - I would take that trade off

Sean Kidder

unread,

Nov 5, 2013, 10:31:04 AM11/5/13

to mongod...@googlegroups.com

Has there been any update on this?

Asya Kamsky

unread,

Nov 5, 2013, 5:10:12 PM11/5/13

to mongodb-user

Sean,

There are several tickets tracking improvements in decreasing occurrences of false failovers, while decreasing the interval it takes to get a real failover. 20 seconds is on the high end of normal - I think the general message was it *could* take as long as, but normally it would be some number of seconds less than that.

I'm not sure 500ms is a reasonable goal for a replica set failover - it is likely that failure to verify that a failover is necessary would lead to more problems due to "false" elections. In any case, it is not the heartbeat interval that's the long pole on the failover, otherwise the failover would never take more than two seconds :)

Asya

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward