Is Raft's failure detection mechanism weak ?

48 views
Skip to first unread message

Pablo Pessolani

unread,
Jan 7, 2024, 9:25:10 PMJan 7
to raft-dev
In the original paper "CONSENSUS: BRIDGING THEORY AND PRACTICE" the time-out  seems to be the failure detection mechanism.
In page 34:
  •  "Raft uses a heartbeat mechanism to trigger leader election."
  •  If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.
  • a follower increments its current term and transitions to candidate state.
What if the failure detection mechanism (timeout) fails by a benign failure (timing failure)? Then, the candidate increments its term.
Then, in page 34-35:
  • While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be leader. ..... If the term in the RPC is smaller than the candidate’s current term, then the candidate rejects the RPC
In this case, the Candidate doesn't recognize the current leader because now, its term is greater then leader's term.

It is not clear what to do.

Regards.
PAP
 



Archie Cobbs

unread,
Jan 8, 2024, 9:23:01 AMJan 8
to raft-dev
On Sunday, January 7, 2024 at 8:25:10 PM UTC-6 Pablo Pessolani wrote:
In this case, the Candidate doesn't recognize the current leader because now, its term is greater then leader's term.

It is not clear what to do.

In this scenario, the candidate is also trying to get elected - that's what it means to be a candidate. So the candidate is not just sitting there. There's an ongoing election process occurring.

Eventually, if enough nodes are reachable, the election will converge and a new leader & term will be established.

-Archie
Reply all
Reply to author
Forward
0 new messages