--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/6c76fd7c-6355-48ae-9e0a-480ff8d4a797n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/3efd54ac-0280-4355-972a-ab331caa4227n%40googlegroups.com.
Hi Jack,
I read the MongoRaftReconfig (https://arxiv.org/pdf/2102.11960.pdf), and I like this idea. I think this paper is about designing a new protocol and high-level abstraction is preferred and necessary to polish thought. I also agree with you that the asynchronous receive/send style is favorable from the engineering aspect,
The consistency of the spec and TLA+ has value in software engineering; all systems design must ultimately be implemented in source code.
In my view, the appropriate approach is to abstract from a higher level when designing a new protocol, which is more scalable and straightforward. However, when implementing an already well-designed protocol, it is better to implement it as an asynchronous send/receive approach to close it to implementation.
I have some unproven ideas about changing MongoRaftReconfig specification from atomic to asynchronous communication. In the atomic communication style, we assume the system has some global states that can be checked. However, real-world systems obtain these global states through unreliable communication, such as send/receive.
For example, the committed variable stores the committed log index and term. Updating this variable corresponds to the AdvancedCommitIndex in the original Raft. The commit index is implemented by receiving confirmation from followers. Some variables need to be added to transform it to asynchronous.
The atomic communication version of the config variable is divided into two variables:
configCommitted, the current configuration of the cluster application;
configNew, the new configuration to be changed.
Thus, ConfigQuorumCheck, and TermQuorumCheck in spec (corresponding to Q1 and Q2 in the paper), the leader must keep additional variables:
followerConfig, the configCommitted status of other nodes collected by the leader;
followerTerm, the currentTerm status of other nodes collected by the leader.
The committed variable will be removed, and the confirmation of each AppendLogEntries guarantees the OplogCommitment in spec (corresponding to P1 in Paper) property. The AppendLog message flow may be added with additional version and term values of the config. QuorumsOverlap also requires similar processing, such as checking the config version and term when invoking AppendLog and RequestVote RPC.
It is trivial.
I also want to share my thoughts about using level abstraction to deal with scale by staged model check. First, implement high-level abstractions, such as assuming atomic communication, and use them to generate a relatively small state space. Then, lower-level abstractions will be implemented, with send and receive to describe messages. It is best to make the lower-level abstractions a refinement of the high-level abstractions, although I have little experience ensuring perfect and precise refinement. The states generated by the high-level abstractions will become the initial states of the lower-level abstractions, and then model checking will be done in stages. For example, Raft is divided into AppendLog and LeaderElection. We first generate the initial state from the high-level abstraction, plus the AppendLog operator, and perform model checking. Then, we perform model checking in the next stage by using the initial state from the high-level, plus the LeaderElection operator. I do this stage approach only through intuition. IPA(https://arxiv.org/pdf/2202.11385.pdf) may provide the tools to get it more rigorous.
Guo Hua