advancing the commitIndex faster (minor) optimization

47 views
Skip to first unread message

Jason Aten

unread,
May 1, 2025, 12:34:42 PMMay 1
to raft-dev
Some Raft newcomer questions:

The Raft dissertation and talks mention that the commit index
advancement is kept simple for understandability, and
that if we knew all servers had a log entry, then the Figure 3.7
problem cannot occur.  

I've tried to implement that suggestion below. 
(If all servers have an entry then commit it, even if not term=current.)

This is probably inconsequential as
far as optimizations goes, but I'm asking to make sure I have
 understand the commit index setting mechanism properly...
and also to see if it might fail under certain other circumstances.

I looked at this code to get the idea of how to implement advanceCommitIndex()

and then wrote the following in Go... my question / attempt at implementing
this (admittedly minor) optimization is inline below.

Questions: basically, have I got the right idea? Also does joint-consensus conflict with it?

// AdvanceCommitIndex:
//
// When the Raft leader server hears back from the
// peer servers and updates their matchIndex, it knows
// the longest common prefix of all the reporting peers' logs,
// and can mark that index as "committed" when
// a) quorum is reached; and
// b) all future leaders will have that entry.
//
//  and b) means either
//    1) the entry is from this leader's term (standard Raft rule);
//    2) all peers have the entry (not just a quorum;
//       not standard Raft; I'm proposing implementation below
//       to see if I've understood the idea...)
//
func (s *RaftServer) AdvanceCommitIndex() {
if s.role != LEADER {
return
}

// Find the smallest match index that is replicated to a majority
var matchIndexes []int64
// add leader's own, since we are not in peers
matchIndexes = append(matchIndexes, s.lastLogIndex())
for _, peer := range s.peers {
matchIndexes = append(matchIndexes, peer.MatchIndex)
}
// sort in increasing order
sort.Slice(matchIndexes, func(i, j int) bool {
return matchIndexes[i] < matchIndexes[j]
})

// For an ascending sorted slice of MatchIndexes,
// quorumMin computes the index of the
// smallest match still in quorum, so the
// log index that is smallest among a bare
// minimum of quorum  members.
// The rest of the members of the
// quorum have >= matchIndexes.
//  -- idea stolen from the RaftConsensus.cc:2185 and line 467
//     Configuration::SimpleConfiguration::quorumMin() method.
i := s.quorumMinSliceIndex()
if i >= len(matchIndexes) {
// not enough for quorum??? would be out of bounds...
return
}

newCommitIndex := matchIndexes[i]

// Is the "all-nodes-have-it" exception to
// the rule possible?
        newCommitIndexHasFullClusterAgreement := false
if len(matchIndexes) == s.clusterSize() &&
matchIndexes[0] == newCommitIndex {

newCommitIndexHasFullClusterAgreement = true
}
if s.state.CommitIndex >= newCommitIndex {
return // RaftConsensus.cc:2186
}
if newCommitIndex > int64(len(s.wal.RaftLog)) {
panic("should be impossible...out of bounds hmmm")
return
}
if s.wal.RaftLog[newCommitIndex-1].Term != s.state.CurrentTerm {
// "At least one of these entries must also be from the current term to
// guarantee that no server without them can be elected."
//   -- RaftConsensus.cc:2192

if newCommitIndexHasFullClusterAgreement {
                    // optimized version falls through to commitWhatWeCan()
                    countTimesOptimizationPathTaken() // accumulate statistics
} else {
return
}
               
// the safe version, the standard Raft rule; what logcabin does:
//return // RaftConsensus.cc:2194
}
s.state.CommitIndex = newCommitIndex // RaftConsensus.cc:2195

// THIS IS THE COMMIT ON THE LEADER.
s.commitWhatWeCan(true)
        ...

// The other question/reason I'm not doing this atm is because
// I haven't implemented the joint consensus
// transition stuff, and I don't know how this
// optimization would interact with that; I
// don't know what the proper definition of
// cluster size would be in those situations,
// should the C_new, C_old, C_old+new, or
// what be used? Better to play it safe for now,
// but maybe the raft-dev list can offer
// some insight?
//
// With the joint-consensus cluster membership
// change, is the optimization above still safe?

Question 3) On page 74 of the Raft dissertation, in figure 6.3, why is

the lease extension = start + (the election timeout divided by the clock drift bound)?

I would have thought you subtracted the drift bound, not divided by it? Is this just a typo?


Thanks in advance for your input.

Jason

Reply all
Reply to author
Forward
0 new messages