Minor change to make candidate less disruptive to election

coder zn

unread,

Jan 13, 2015, 2:47:08 PM1/13/15

to raft...@googlegroups.com

The current behavior of a candidate is to increase its term when it does not win the election after the timeout. But the problem is when this candidate does not win or lose the election after an extended period of time, it may increase its term to be higher than the rest of the cluster, and as soon as it gets in touch with the majority of the cluster or the leader, it will disrupt the otherwise stable election. The solution in the paper is: "To prevent this problem, servers disregard RequestVote RPCs when they believe a current leader exists." But this solution has a drawback. There is no way for the candidate to join the rest of the cluster without an election.

We probably can make a minor change to the way the term of a candidate is increased to fix this problem. A candidate increases its term only when it gets at least one rejection of its vote. Timeout is not a rejection. When the election times out, the candidate simply retries its vote with the same term. This basically means the vote is still valid since no peer ever rejects it. The invariant that there is at most one vote for each term is still maintained.

If this candidate can reach a majority of the cluster, it will either win or lose the election. If this candidate cannot reach a majority of the cluster, it won't prevent the rest of the cluster to finish the election. This change won't make election worse.

If a candidate is network partitioned for an extended period of time, its term won't increase unboundedly, and it can join as a follower to the cluster once it is no longer partitioned, without disrupting the election.

Thanks,

====

I did not read the whole mailing list, so please forgive me if this is proposed and discussed before.

Oren Eini (Ayende Rahien)

unread,

Jan 13, 2015, 2:56:40 PM1/13/15

to coder zn, raft...@googlegroups.com

This was discussed a few times recently.

The current method (which we implemented in our own impl) is to have a two stage election process.

1) Candidate run a dry-run election, which says: "if I were asking you for a vote for term N, would you vote for me". Nodes are allowed to vote for multiple candidates in dry run elections (but note if they already voted in that term in a real election).

2) If the dry run election was successful (quorum), we actually increment the term and try again. If it wasn't successful, we timeout and try another dry run election.

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kijana Woodard

unread,

Jan 13, 2015, 3:14:03 PM1/13/15

to Oren Eini (Ayende Rahien), coder zn, raft...@googlegroups.com

+1 on PreVote [section 9.6]

"There is no way for the candidate to join the rest of the cluster without an election."

It can get an AppendEntries request from the leader.

coder zn

unread,

Jan 13, 2015, 4:59:50 PM1/13/15

to raft...@googlegroups.com, znc...@gmail.com

Can anyone point out what's wrong with my proposal? It is definitely simpler and more efficient than the 2-phase protocol.

coder zn

unread,

Jan 13, 2015, 5:00:45 PM1/13/15

to raft...@googlegroups.com, aye...@ayende.com, znc...@gmail.com

On Tuesday, January 13, 2015 at 12:14:03 PM UTC-8, Kijana Woodard wrote:

+1 on PreVote [section 9.6]

"There is no way for the candidate to join the rest of the cluster without an election."

It can get an AppendEntries request from the leader.

Yes, it can get an AppendEntries from the leader, but the leader's term would be lower than the candidate's, and that would trigger a new election.

Oren Eini (Ayende Rahien)

unread,

Jan 13, 2015, 5:04:18 PM1/13/15

to coder zn, raft...@googlegroups.com

Hence the pre vote

coder zn

unread,

Jan 13, 2015, 5:08:52 PM1/13/15

to raft...@googlegroups.com, znc...@gmail.com

I agree pre vote can solve this problem, but it is more complex and less efficient (two round-trips).

BTW, where is section 9.6? There is no section 9.6 in the paper. Thanks,

Kijana Woodard

unread,

Jan 13, 2015, 5:09:42 PM1/13/15

to coder zn, raft...@googlegroups.com

In the full dissertation. https://ramcloud.stanford.edu/~ongaro/thesis.pdf

Kijana Woodard

unread,

Jan 13, 2015, 5:16:58 PM1/13/15

to coder zn, raft...@googlegroups.com

"Can anyone point out what's wrong with my proposal? It is definitely simpler and more efficient than the 2-phase protocol."

*bump*

Oren Eini (Ayende Rahien)

unread,

Jan 13, 2015, 11:33:29 PM1/13/15

to Kijana Woodard, coder zn, raft...@googlegroups.com

It doesn't solve the problem because in a common case, a candidate that doesn't see the leader (which others can see) will always get rejections, and its term would increase rapidly, forcing an election to make it join the system.

With the pre vote system, only if other nodes also agree that there needs to be election will we actually see term increase and hence elections.

Reply all

Reply to author

Forward