You
must ensure that the election timeout is known and roughly the same across the board.
Otherwise, you'll have instability in the cluster.
My impl reject nodes with election timeout that is not within 10% of the node election timeout.
The usual metric we use is that you want the election timeout to be roughly 5-10 times the ping time between instances.
I'm getting a ping time that is 250ms to US West from my location, so I'll probably use 3 seconds as the election timeout.
For Sydney, on the other hand, I'm getting 475ms, which means that I'll put the election timeout at 5 seconds.
Note that as usual, we need to balance liveliness with failure detection.
The idea of dynamically changing the election timeout is... suspect.
First, you'll need to make sure that this runs through the cluster as well, to make sure that a majority of the nodes are up to date on this.
Second, you are assuming that the network conditions are stable. It is very common to be able to get wildly different times.
Just in the time it took me to write this email, I checked
Beijing a few times and got: [1126, 295]. And to Ningxia: [388, 3085].
But you have to account for failures, etc. What happens when some of your nodes have different election timeouts? You may end up in a situation where:
* The election timeout started out as 5 seconds.
* Long period of stability, the election timeout drops to 500 ms.
* Network disruption / slowdown, average ping time is 400 ms now.
* You now have to deal with the cluster setup in such a way that it cannot elect a leader.
Another thing to consider is that measuring the heartbeat numbers are about _one_ path in the network. Leader to followers.
What about the latency between the other nodes.
Let's assume that Node 1 to 2 & 3 is 100 ms. But node 2 to 3 is 400 ms.
When Node 1 is the leader, if you set things up based on heartbeats, you'll fail to recover from the node going down.