Question regarding the RAFT problems

132 views
Skip to first unread message

Aadi Swadipto Mondal

unread,
Mar 3, 2023, 3:14:37 PM3/3/23
to raft-dev

Hi,

 

I am a graduate student at UW Madison studying distributed systems. I was going through your wonderful problem sets about Raft and Paxos, they are indeed helpful. Although, I do have one question about Problem 2 in Raft.

rubric.pdf (ongardie.net)


Here, I cannot figure out how exactly Server 2 can get term 3 in index 4. If I try to visualize how this log state could have been created, something like this comes up in my mind: 

  • Term 1, Server 2 is the leader, commits <1,1> (index, term) and <2,1> and succeeds. Tries to send accept RPCs for < 3,1>, fails.
  • Among S1, S3, and S4, one becomes the leader and commits < 3,2>, < 4,2> and <5,2> (since all have received majority successful accept responses in their own term, i.e. term 2); send accept RPC for <6,2> but crashes before receiving a majority, thus not committed.
  • Now looking at the figure, it seems S2 stands for the election with log entries up to index 3. S2 can never be the leader because in no case it can receive majority votes. The only candidates it can garner votes from are S4 (which has fewer log entries and term numbers), and itself. So, how can it insert <4,3> here?

When this election occurs, S1, S3, and S5 have already their maximum term as Term 2. So they can never vote for S2. So S2 can never get term 3, which apparently seems to be the case.

 

Your answer says that we cannot commit < 3,2>, <4,2>, and <5,2>. But if we try to think from a programmer's perspective, I guess the rules say that if the leader receives success from a majority of servers for accepting responses for a log entry in its own term, then it can consider that log committed at that index and update state machines. 

 

If we cannot do that, then there is a serious problem. Let’s say that after term 1, S2 and S4 crashed. The system still has a majority and assumes S1, S3, and S5 are always fine. Now, without loss of generality, assume S1 is the leader. S1 can never proceed because what are the criteria then for S1 to get a log committed? S1 should commit < 3,2>, <4,3>, ….etc for serving client requests and making progress. It cannot wait for itself to crash or for any other servers to crash.

 

On the other hand, I can say that even <2,1> is not safe. S4 can be the leader with a term number greater than all the current ones (e.g. 5 or 6) and replace the log entries of everyone starting from index 2. So, in that case, the only log to be committed should be <1,1> and nothing else.

 

Please let me know if my understanding is correct or if I am missing something crucial.

 

Thanks,

Aadi

image001[25].png

John Ousterhout

unread,
Mar 4, 2023, 4:54:45 PM3/4/23
to raft...@googlegroups.com
Here's how the scenario in the figure can arise (the key idea is that log entries for term 2 weren't replicated until term 4; because of this, they cannot be considered committed even if stored on a majority of the nodes):

1. S2 is leader for term 1; it replicates all of the green log entries before crashing.
2. S1 is leader for term 2; it accepts entries for indices 3-6 and adds them to its log, but crashes before it replicates any of them.
3. S2 is leader for term 3; it accepts an entry for index 4 but crashes before replicating it.
4. S1 becomes leader for term 4; it accepts a new entry for index 7 and then replicates some of its entries from term 2 to produce the scenario in the figure.

-John-

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/61e6ebc4-acce-45f3-b2e4-1457123b45abn%40googlegroups.com.

Aadi Swadipto Mondal

unread,
Mar 4, 2023, 6:47:59 PM3/4/23
to raft-dev
Thanks, I got your point.
Reply all
Reply to author
Forward
0 new messages