advancing the commitIndex faster (minor) optimization

54 views

Skip to first unread message

Jason Aten

unread,

May 1, 2025, 12:34:42 PMMay 1

to raft-dev

Some Raft newcomer questions:

The Raft dissertation and talks mention that the commit index

advancement is kept simple for understandability, and

that if we knew all servers had a log entry, then the Figure 3.7

problem cannot occur.

I've tried to implement that suggestion below.

(If all servers have an entry then commit it, even if not term=current.)

This is probably inconsequential as

far as optimizations goes, but I'm asking to make sure I have

understand the commit index setting mechanism properly...

and also to see if it might fail under certain other circumstances.

I looked at this code to get the idea of how to implement advanceCommitIndex()

https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L2193

and then wrote the following in Go... my question / attempt at implementing

this (admittedly minor) optimization is inline below.

Questions: basically, have I got the right idea? Also does joint-consensus conflict with it?

// AdvanceCommitIndex:
//
// When the Raft leader server hears back from the
// peer servers and updates their matchIndex, it knows
// the longest common prefix of all the reporting peers' logs,
// and can mark that index as "committed" when
// a) quorum is reached; and
// b) all future leaders will have that entry.
//
// and b) means either
// 1) the entry is from this leader's term (standard Raft rule);
// 2) all peers have the entry (not just a quorum;
// not standard Raft; I'm proposing implementation below
// to see if I've understood the idea...)
//
func (s *RaftServer) AdvanceCommitIndex() {
if s.role != LEADER {
return
}

// Find the smallest match index that is replicated to a majority
var matchIndexes []int64
// add leader's own, since we are not in peers
matchIndexes = append(matchIndexes, s.lastLogIndex())
for _, peer := range s.peers {
matchIndexes = append(matchIndexes, peer.MatchIndex)
}
// sort in increasing order
sort.Slice(matchIndexes, func(i, j int) bool {
return matchIndexes[i] < matchIndexes[j]
})

// For an ascending sorted slice of MatchIndexes,
// quorumMin computes the index of the
// smallest match still in quorum, so the
// log index that is smallest among a bare
// minimum of quorum members.
// The rest of the members of the
// quorum have >= matchIndexes.
// -- idea stolen from the RaftConsensus.cc:2185 and line 467
// Configuration::SimpleConfiguration::quorumMin() method.
i := s.quorumMinSliceIndex()
if i >= len(matchIndexes) {
// not enough for quorum??? would be out of bounds...
return
}

newCommitIndex := matchIndexes[i]

// Is the "all-nodes-have-it" exception to
// the rule possible?
newCommitIndexHasFullClusterAgreement := false
if len(matchIndexes) == s.clusterSize() &&
matchIndexes[0] == newCommitIndex {

newCommitIndexHasFullClusterAgreement = true
}
if s.state.CommitIndex >= newCommitIndex {
return // RaftConsensus.cc:2186
}
if newCommitIndex > int64(len(s.wal.RaftLog)) {
panic("should be impossible...out of bounds hmmm")
return
}
if s.wal.RaftLog[newCommitIndex-1].Term != s.state.CurrentTerm {
// "At least one of these entries must also be from the current term to
// guarantee that no server without them can be elected."
// -- RaftConsensus.cc:2192

if newCommitIndexHasFullClusterAgreement {
// optimized version falls through to commitWhatWeCan()
countTimesOptimizationPathTaken() // accumulate statistics
} else {
return
}

// the safe version, the standard Raft rule; what logcabin does:
//return // RaftConsensus.cc:2194
}
s.state.CommitIndex = newCommitIndex // RaftConsensus.cc:2195

// THIS IS THE COMMIT ON THE LEADER.
s.commitWhatWeCan(true)
...

// The other question/reason I'm not doing this atm is because
// I haven't implemented the joint consensus
// transition stuff, and I don't know how this
// optimization would interact with that; I
// don't know what the proper definition of
// cluster size would be in those situations,
// should the C_new, C_old, C_old+new, or
// what be used? Better to play it safe for now,
// but maybe the raft-dev list can offer
// some insight?
//
// With the joint-consensus cluster membership
// change, is the optimization above still safe?

Question 3) On page 74 of the Raft dissertation, in figure 6.3, why is

the lease extension = start + (the election timeout divided by the clock drift bound)?

I would have thought you subtracted the drift bound, not divided by it? Is this just a typo?

Thanks in advance for your input.

Jason

Reply all

Reply to author

Forward

0 new messages