For the Raft algorithm, the leader election process is as follows:
1. Node Roles
Nodes in the Raft algorithm have three roles:
(1) Leader: Responsible for handling client requests, managing log replication, and sending heartbeats.
(2) Follower: Passively accepts logs and heartbeats from the Leader and does not initiate requests.
(3) Candidate: During the election process, a Follower can transition to a Candidate and initiate an election.
2. Election Trigger Conditions
Leader elections are typically triggered under the following conditions:
(1) Leader failure: A Follower does not receive a heartbeat from the Leader within a certain period (election timeout).
(2) New node joining: A newly joined node may trigger an election.
(3) Partition recovery: An election may be triggered after network partition recovery.
3. Election Process
The election process consists of the following steps:
(1) Follower Transitions to Candidate
(2) Candidate Initiates Voting
(3) Other Nodes Vote
(4) Election Result
(5) Election Failure
4. Election Timeout
(1) To prevent multiple nodes from initiating elections simultaneously, Raft uses randomized election timeout periods (typically 150ms-300ms).
(2) Randomized timeout periods reduce the probability of election conflicts.
5. Log Consistency Guarantee
(1) During the election process, Raft ensures that only nodes with sufficiently up-to-date logs can become Leaders by comparing the Term and Index of logs.
(2) This guarantees that the Leader's log contains all committed log entries, ensuring consistency.
6. Election Optimization
(1) Pre-Vote Mechanism: Before formally initiating an election, a Candidate can first send a Pre-Vote request to confirm whether it has a chance to win the election, avoiding unnecessary Term increments.
(2) Leader Transfer: In certain situations, the Leader can proactively transfer leadership to another node to avoid frequent elections.
Modified Election Process:
The election process is divided into the following steps:
(1) Follower Transitions to Candidate (unchanged)
(2) Candidate Initiates Voting (modified)
(3) Other Nodes Vote
(4) Voting Result (Receiving Rejected Votes)
(5) Voting Result (Receiving an Accepted Vote)
(6) Election Failure
Leader Election Examples
Example 1: Raft Election Succeeds in One Round
Assume a 5-node cluster (A, B, C, D, E):
Example 2: Raft Election Fails in One Round
Example 3: Another Raft Scenario
Correctness of the Idea
At any moment, each node will only vote for one Candidate, so at any moment, there is at most one node with votes from a majority of nodes.
Efficiency
Under normal Raft operation, the new modifications have no impact. However, when the Raft algorithm cannot elect a Leader in one election round, the new modifications may facilitate the election of a Leader.
Thank you for your opinions, and welcome to talk about it.
(1)
For the Example 3, point 4. This
is a heuristic way, in other word, you could choose not to use this heuristic way,
and just let the candidate gives up its candidacy when it knows that it could
not become the leader (that not less than half of nodes vote for other candidates).
But it is a low probability case that node A would come back after a few
seconds. If in most cases it is true, then this heuristic way could speed up
the election process, and in a few cases, it could slow down the election
process, then it is feasible way. And we could do the heuristic way in the
first few election rounds, then we could the election process would work when
the normal raft election process works.
(2) In Example 3, we don’t need B and C could communicate with each other.
Because we judge the highest priority according to only the information from B
and C, and also D and E. You could refer to the following:
(4) Voting Result (Receiving Rejected Votes)
And in other way, if there are network partition,
and the cluster splits into many disconnected parts, then there are no way to get
a leader. And I don’t say that the election process is bound to be succeed in
one round using this proposal.
(3) For what happens if a server gives up is candidacy and tells servers that
voted for a server they can’t communicate with? First, it is a rare case,
because if they can’t communicate with another candidate, then they don’t know
the random priority, and then they would not think that candidate has high priority.
If it is the case that we get the information for the candidate, and we could
not connect to it when we want to do the re-voting process. Except that the candidate
who gives up the candidacy, there are no other changes compared to the normal
raft election. And the candidate who gives up the candidacy doesn’t need to
know whether the node who was asked to re-voting receives the re-voting request
or not, and whether the node revoting or not.
For example:
There are 7 nodes, and A is leader now, B, C, D, E, F, G are candidates. The node A stops now, and candidate B, C, D become candidates, and get the candidate from themselves.
And B get the vote from E, C get the vote from F, D get the vote from G.
B has the rand priority 300, C has the rand priority 200, D has the rand priority 100.
Case 1:
If B, C, D, E, F, G could communicate with each other, but when D and G want to
do the re-voting, D and G could not communicate with B. then D and G would keep
to do the re-voting to note B, but could not succeed. C and F would give their
votes to B, and B get 4 votes, and becomes the leader.
Case 2:
If B, C, D, E, F, G could communicate with
each other, but after D tells G to do the re-voting to the node B, then D disconnected
with G. Now, B, C, E, F, G are connected, then B would get 5 nodes, and becomes
the leader.
(4)I agree with you that an election still need to time out and may need to try again with a higher term number. The example I gives is just for one round, I don’t give the information about another round because it not related to the changes. You could refer to the following:
(5) Voting Result (Receiving an Accepted Vote)
(6) Election Failure
For “whether the extra complexity is worth it given that elections should be
rare events”, because there are some proposals that are used to speed up the
election process for example the random wait time, and the following:
6. Election Optimization
(1) Pre-Vote Mechanism: Before formally initiating an election, a Candidate can first send a Pre-Vote request to confirm whether it has a chance to win the election, avoiding unnecessary Term increments.
From the above, we could know that speeding up the election process is very important. And when the cluster is in the leader election process, the cluster could not do the real work. Any speed up would make sense. Of course, we could do more investigation and experiments to check it. And there is not much extra complexity for the software development, CPU cost and network cost.
And thank you for your response. According to your response, I could find that you have a lot of thinking about it. Thank you very much.