--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/f9d87f46-e5b8-42bb-a7df-03c334b80758%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/CAK8d3PRh%3Dp%3Do0%3DLbcte56%2BPtKxCUeuFXCq4DO3xpwqYt8_MJ_g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/CAK8d3PRh%3Dp%3Do0%3DLbcte56%2BPtKxCUeuFXCq4DO3xpwqYt8_MJ_g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/CA%2BC0XuqX-o%2Bv62kDzusBaEQdH70zW3bJXaVhWF6Bu44-9LVkzA%40mail.gmail.com.
Hi Changqing Li,A little hard to understand exactly what you are trying to say, especially regarding "leader does commits log entries from previous terms even though these logs have been stored on a majority."I'd agree that uncommitted log entries can be overwritten.The question is how a leader commit log entries from previous terms. But Raft paper specifically says, on page 9,"Raft never commits log entries from previous terms by count-ing replicas. Only log entries from the leader’s current term are committed by counting replicas"On the other hand, Figure 8 would say"log entry from term 2 has been replicated on a majority of the servers, but it is not committed."So, besides current leader counting majority replicas, and THEN deciding it is committed, how else can a new leader decide a previous term replicas committed? As far as I know, Raft says leader doesn't decide uncommitted entry to be committed from previous terms.David Kao
On Thu, Oct 31, 2019 at 2:15 AM Changqing Li <lich...@gmail.com> wrote:
Hi David,Regarding "if the leader fails before any follower (or < majority followers) learns about this committed value, then it seems possible for that entry to be viewed as uncommitted, which means the next term leader could overwrite it; but that entry was already returned OK to client.", I think followers do not need to know the committed value immediately. Raft algorithm guarantees that only the nodes from majority group can be elected as new leader after old leader replicates the latest log to majority and then crashes, so the entry will never be overwritten.
Log entry may be overwritten when old leader crashes at the time the latest log entry has not been store on a majority of nodes. To avoid consistent issue, leader does commits log entries from previous terms even though these logs have been stored on a majority.
Keine Neco <neco...@gmail.com> 于2019年10月31日周四 上午10:37写道:
Hi David,At 5th stage, e.g. this is a 5 nodes Raft group, there are at least 3 nodes contains this committed entry (one is the failed leader), so there are just two nodes which don't contains this entry. The majority can't be organized by only two node, so the new majority have to contains at least one node with this committed entry (But this node may think it is not committed now.). this entry contains maximum log term and log index, so this node must be elected as new leader. And its all uncommitted entries will be committed after its first no-op entry committed.To suffer twice RTT is not so reasonable for Raft to achieve a lower latency.I think you can read the election part again to get more details. Thanks for your question.Dong.
David Kao <a.l...@gmail.com> 于2019年10月31日周四 上午8:02写道:
--Has anyone ever had similar unease feeling regarding how a entry is considered committed in Raft?Here is what I found in "In Search of an Understandable Consensus Algorithm (Extended Version)" after about 10 pages.Let's say:
- Leader append an entry to majority of followers upon receiving a write request from client
- leader considers the entry committed after majority has the entry in its log; but only he knows it is committed.
- leader returns to client OK <--- this is how I read the paper.
- but only in next heartbeat (appendEntries) that clients start to learn about a log index being committed <-- only the current term leader decides if an entry in its term is committed; subsequent term leader cannot decide on entries of the previous term.
- if the leader fails before any follower (or < majority followers) learns about this committed value, then it seems possible for that entry to be viewed as uncommitted, which means the next term leader could overwrite it; but that entry was already returned OK to client.
- remember, having an entry in the Raft log doesn't mean it is committed; the highest committed index needs to increment
- To fix the above, the leader may have to broadcast the committed entry (by incrementing highest committed log index) to a majority before returning to client. That way, whoever has the latest committed value can be elected the new leader (leader completeness).
- But this makes the algorithm a naive Paxos, taking two rounds (append actual entry to Raft log & then telling followers it was committed, before returning OK to client)
Raft is so widely received I doubt it has any bugs, so can someone point out where I understood the algorithm to be wrong? What do you think?Alternatively, my understanding of the section "committing entries from previous terms" might be wrong w.r.t how leaders are elected ... but that's why I am writing this post.
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/f9d87f46-e5b8-42bb-a7df-03c334b80758%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/CAK8d3PRh%3Dp%3Do0%3DLbcte56%2BPtKxCUeuFXCq4DO3xpwqYt8_MJ_g%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft...@googlegroups.com.
Hi Ed,Thanks for agreeing that the abridged paper is at least insufficient in explaining the details.
However, I get leader completeness and log matching property. I did read through the proof by contradiction (section 5.4.3 Safety argument). The whole argument is surrounded around the idea of "committed" entries, which is largely similar to the idea of learned value in Paxos.
The problem is, if you follow my original email (first email of this thread), how is an entry determined to be committed?I'd appreciate if anyone can point out, using my example, which step I understood it to be wrong.
On Thursday, October 31, 2019 at 6:31:04 AM UTC-7, Ed M wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/f24a6315-c67a-4b0a-995b-afe49b7ad03f%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/CAG4NAK69BsBuksw_61XYQ6RkT2LUo_z4iUURM_P_nyOZZ-4bYQ%40mail.gmail.com.
An entry is committed in either of two ways:1. The entry is accepted by a majority of the servers in the cluster *in the same term as the entry* (if the leader crashes before receiving the results of those AppendEntries calls, it's possible that no one in the cluster will know that the entry is committed, but it is indeed committed; any future leader is guaranteed to store that entry, and it will finish propagating it to the rest of the cluster, if needed).2. An entry in the same log, but with higher index, is committed.Your error is in your fourth step: "subsequent term leader cannot decide on entries of the previous term." This is not true. A subsequent leader can commit entries in earlier terms; the way it does this is by committing a new entry in the current term, after which rule 2 above applies.-John-
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/f24a6315-c67a-4b0a-995b-afe49b7ad03f%40googlegroups.com.