On Wednesday, February 19, 2025 at 7:43:30 PM UTC je... wrote:
Fair points, Archie. I think the Raft paper and thesis don't mention the "stepdown if you haven't heard from a majority for a while" rule. It's not essential, but nice to have...
Actually the dissertation (at least) does discuss this. See page 69 (as numbered;
page 86 in the pdf numbering) in chapter 6 (section 6.2) of the online
version. In other words, this file
where it says,
"Raft must also prevent stale leadership information from
delaying client requests indefinitely. Leadership information
can become stale all across the system, in leaders,
followers, and clients:
• Leaders: A server might be in the leader state,
but if it isn’t the current leader, it could be needlessly
delaying client requests. For example, suppose a leader
is partitioned from the rest of the cluster, but it can
still communicate with a particular client. Without
additional mechanism, it could delay a request from
that client forever, being unable to replicate a log
entry to any other servers. Meanwhile, there might
be another leader of a newer term that is able to
communicate with a majority of the cluster and
would be able to commit the client’s request. Thus,
a leader in Raft steps down if an election timeout
elapses without a successful round of heartbeats
to a majority of its cluster; this allows clients to retry
their requests with another server."
I note however that this can cause alot of leader churn
when manually bootstrapping up a new cluster if
you are not very fast about it.
So I disable this for the first minute (or so -- it is configurable);
in the same way that during that first minute it is useful
to require communications with all nodes, not just a quorum, so as to detect
mis-configuration (e.g. bad IP/port conflict prevents a node from ever starting)
that might otherwise be hidden by Raft's fault-tolerance
properties.