Leader backs up followers quickly with persistance

Saif Mohamed

unread,

Mar 17, 2026, 12:48:40 AM (3 days ago) Mar 17

to tlaplus

Currently I'm trying to build a layer of distributed state machine with Raft following this paper: http://nil.csail.mit.edu/6.5840/2025/papers/raft-extended.pdf
Course: https://pdos.csail.mit.edu/6.824/

The problem appeared when I introduced the persistence layer (disk) to my log replication logic. There are situations where agreement on a specific entry cannot be reached because of time limit exceeded (a time-frame the entry should be replicated inside of).

Mainly, when there is a heavy randomized leader failure, the followers don't get the chance to fully back up their log. This problem didn't appear previously without persistence in the picture.

My question is: does the TLA+ tool help me identify the true problem in case I'm already identifying it wrong?

Any thoughts or recommendations will be helpful, thanks.

Andrew Helwer

unread,

Mar 18, 2026, 1:32:08 PM (2 days ago) Mar 18

to tla...@googlegroups.com

Hi Saif,

Trying to understand - you're saying that your replicas can die in between confirming/choosing a value and writing it to disk, and so this leads to chosen values being lost to the cluster since the non-persisted values are gone when they come back up?

Andrew

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/1b9fd2b0-45c0-43e1-8ab7-cb1b220f0b08n%40googlegroups.com.

Saif Mohamed

unread,

7:25 AM (3 hours ago) 7:25 AM

to tlaplus

Hi Andrew, thanks for your interest,

I don't think that is the problem because there is no replica (follower) that could confirm (ack to the leader) that a log entry has been replicated on its local state before it persists this entry on its disk (syncs it). Based on that, there are no partial answers for replicating a log entry—as long as the confirmation is received by the leader, the log entry is persisted.

It could be the case that (and this just popped into my mind as I'm writing this message) the RPC ack message got delayed in the network, so the leader had to retransmit this log entry message again. The follower is up to date (does nothing), sends an ack, which gets delayed again, and so on. That makes the follower retransmit the ack message for the same log entry, for example, 100 times, but ack message number 5 reaches the leader, so the leader moves forward assuming 100 entries were successfully transmitted. Then, at entry number 101, the leader receives a delayed message from the past (message number 6), which makes the leader revert to a past state for that follower.

This is my mental environment:

This problem is kind of a deadlock, because the test cases are very randomized and sometimes pass and sometimes not (no agreement reached).
It could be that the timer timeout is not big enough to give the follower the ability to fully back up.