Leader backs up followers quickly with persistance

51 views
Skip to first unread message

Saif Mohamed

unread,
Mar 17, 2026, 12:48:40 AM (4 days ago) Mar 17
to tlaplus

Currently I'm trying to build a layer of distributed state machine with Raft following this paper: http://nil.csail.mit.edu/6.5840/2025/papers/raft-extended.pdf
Course: https://pdos.csail.mit.edu/6.824/

The problem appeared when I introduced the persistence layer (disk) to my log replication logic. There are situations where agreement on a specific entry cannot be reached because of time limit exceeded (a time-frame the entry should be replicated inside of).

Mainly, when there is a heavy randomized leader failure, the followers don't get the chance to fully back up their log. This problem didn't appear previously without persistence in the picture.

My question is: does the TLA+ tool help me identify the true problem in case I'm already identifying it wrong?

Any thoughts or recommendations will be helpful, thanks.

Andrew Helwer

unread,
Mar 18, 2026, 1:32:08 PM (3 days ago) Mar 18
to tla...@googlegroups.com
Hi Saif,

Trying to understand - you're saying that your replicas can die in between confirming/choosing a value and writing it to disk, and so this leads to chosen values being lost to the cluster since the non-persisted values are gone when they come back up?

Andrew

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/1b9fd2b0-45c0-43e1-8ab7-cb1b220f0b08n%40googlegroups.com.

Saif Mohamed

unread,
Mar 20, 2026, 7:25:46 AM (yesterday) Mar 20
to tlaplus

Hi Andrew, thanks for your interest,

I don't think that is the problem because there is no replica (follower) that could confirm (ack to the leader) that a log entry has been replicated on its local state before it persists this entry on its disk (syncs it). Based on that, there are no partial answers for replicating a log entry—as long as the confirmation is received by the leader, the log entry is persisted.

It could be the case that (and this just popped into my mind as I'm writing this message) the RPC ack message got delayed in the network, so the leader had to retransmit this log entry message again. The follower is up to date (does nothing), sends an ack, which gets delayed again, and so on. That makes the follower retransmit the ack message for the same log entry, for example, 100 times, but ack message number 5 reaches the leader, so the leader moves forward assuming 100 entries were successfully transmitted. Then, at entry number 101, the leader receives a delayed message from the past (message number 6), which makes the leader revert to a past state for that follower.

This is my mental environment:

  • This problem is kind of a deadlock, because the test cases are very randomized and sometimes pass and sometimes not (no agreement reached).

  • It could be that the timer timeout is not big enough to give the follower the ability to fully back up.

Could TLA+ help narrow down this problem?

Thanks.

Andrew Helwer

unread,
Mar 20, 2026, 2:15:21 PM (23 hours ago) Mar 20
to tla...@googlegroups.com
I think TLA+ would help narrow the problem down as long as you can accurately model all the things that might be causing this problem. This can be done incrementally, by adding details and subsystems to your spec until behaviour that plausibly matches your failure is found.

Andrew

Saif Mohamed

unread,
Mar 20, 2026, 3:00:29 PM (22 hours ago) Mar 20
to tlaplus

Any good practical tutorial would be great if you’ve got one.
appreciate your interest , thanks.

Andrew Helwer

unread,
Mar 20, 2026, 3:01:51 PM (22 hours ago) Mar 20
to tla...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages