Re: [raft-dev] lastApplied and commitIndex

A. Jesse Jiryu Davis

unread,

Sep 1, 2025, 5:04:00 AMSep 1

to raft...@googlegroups.com

I agree. I think commitIndex and lastApplied are tracked separately, only to allow more concurrency on a Raft node: a separate thread can do the maybe-expensive work of executing a command, after the node learns the command was committed. Meanwhile the thread that learned about the new commitIndex (from a message from a peer) can reply to the message or do other work immediately. This is how Ongaro's LogCabin reference implementation works. MongoDB's Raft also tracks lastApplied and commitIndex separately, for concurrency's sake. If you don't care about performance, I think you could combine them.

On Sat, Aug 30, 2025 at 4:19 PM Sny <popeye...@gmail.com> wrote:

I've spent a lot of time figuring out why it would be incorrect for `lastApplied` and `commitIndex` to not be kept as a single field.

Call this field `safeIndex`.

I thought of an edge case but even that self-corrects (at least from what I understand).

Say for example we had three nodes A, B, C. Each of them have logs with event IDs {1, 2, 3}. A is the leader and it just got to know that all entries are replicated to a majority and thus by the rules of commitment, it is committed. But before it can share safeIndex in the AppendEntries request to the followers, the node crashes.

Now C wins the election (by log completeness both B & C are eligible). It locally knows that safeIndex is 0. So in heartbeats it sends out the stale safeIndex. In my case, I made the followers have a branch in the apply(..) method so that the apply succeeds if localSafeIndex >= leaderSafeIndex. This means that logs don't suddenly rollback or auto apply.

Now C commits a no-op and sends {4} (by the rules of nextIndex initialization). When this succeeds the leader updates nextIndex and on response by a majority updates safeIndex to 4.

Other than some performance optimizations with fine-grained locking (or other synchronization techniques) there seems to be no advantage.

Since this has been raised a few times before, I'm likely understanding something wrong? Figure 2 is the only place that speaks of `lastApplied`

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/raft-dev/e956ae5b-8227-447f-abc4-c6746cb1160en%40googlegroups.com.

Archie Cobbs

unread,

Sep 1, 2025, 11:45:01 AMSep 1

to raft...@googlegroups.com

Another reason to keep them separate is that at the moment commitIndex advances, it's not necessarily true that every other node has a copy of the entry at commitIndex yet.

A node already is, or could become at any time, the leader. So it might need to send that entry in an AppendEntries - but if that entry has already been compacted, that's impossible, so it would have to send an InstallSnapshot instead, which is much less efficient.

One can imagine a scenario where everything was working perfectly except that there were an occasional drop of an AppendEntries message. If the leader were compacting the state machine aggressively, then each time an AppendEntries is dropped, if that entry happened to get committed before the leader retransmits it (certainly possible), then the leader would be forced to send an InstallSnapshot instead.

On the other end of the spectrum, you could imagine never compacting the state machine. Then you would never have to send an InstallSnapshot, but the trade-off is that this would obviously cause other inefficiencies. So there's a happy medium in there somewhere.

-Archie

To view this discussion visit https://groups.google.com/d/msgid/raft-dev/CAL%2BCuuQ%3D%2BGBXqcMd6Jyw%3DjSNmE6q29%3DqapcwCKugML0dw230Sg%40mail.gmail.com.

--

Archie L. Cobbs

A. Jesse Jiryu Davis

unread,

Sep 3, 2025, 8:17:43 AMSep 3

to raft...@googlegroups.com

Huh, I don't understand what you're saying here, Archie. I agree that a Raft node shouldn't compact log entries too soon. The paper and thesis propose various rules for how long to wait before compacting. E.g. in the thesis chapter 5, Ongaro suggests a node should compact when its log is 4x larger than its previous snapshot. But this seems unrelated to separating commitIndex from lastApplied. I think that a Raft leader should try to commit and then apply entries as soon as possible. Orthogonally, it should keep applied entries in its log until they're unlikely to be needed.

To view this discussion visit https://groups.google.com/d/msgid/raft-dev/CANSoFxuG4UzPu7cBQqobsSKicfKg_JjLeOM5MLH3ESa9eqbnkg%40mail.gmail.com.

Archie Cobbs

unread,

Sep 3, 2025, 11:15:15 AMSep 3

to raft...@googlegroups.com

Hi Jesse,

I was thinking of the simplest data model, in which there is (a) a state machine, and (b) zero or more unapplied log entries. In this simple model, as soon as you apply the next log entry to the state machine it disappears, so it would no longer be available for AppendRequests. In that model you might want to delay applying entries. Of course the trade-off would be that up-to-date queries of the state machine would need to account for any unapplied entries that are ≤ the commit index.

A more real-world and practical model applies entries to the state machine as soon as they are committed, but also keeps them around for a while (in a separate side list) even after they have been applied in case they need to be resent. In this model there's no reason to track lastApplied and commitIndex separately, but instead you have this new list of "lingering entries" to keep track of. The advantage vs. the simple model is that up-to-date state machine queries don't need to account for any uncompacted entries.

Hope that makes sense.

-Archie

To view this discussion visit https://groups.google.com/d/msgid/raft-dev/CAL%2BCuuQQGWxXELum6_Ay6g_uYzq9%2BH%2B1obA1koBohwO0AkNdJA%40mail.gmail.com.