Leader backs up followers quickly with persistance

17 views

Skip to first unread message

Saif Mohamed

unread,

Mar 17, 2026, 12:48:40 AM (3 days ago) Mar 17

to tlaplus

Currently I'm trying to build a layer of distributed state machine with Raft following this paper: http://nil.csail.mit.edu/6.5840/2025/papers/raft-extended.pdf
Course: https://pdos.csail.mit.edu/6.824/

The problem appeared when I introduced the persistence layer (disk) to my log replication logic. There are situations where agreement on a specific entry cannot be reached because of time limit exceeded (a time-frame the entry should be replicated inside of).

Mainly, when there is a heavy randomized leader failure, the followers don't get the chance to fully back up their log. This problem didn't appear previously without persistence in the picture.

My question is: does the TLA+ tool help me identify the true problem in case I'm already identifying it wrong?

Any thoughts or recommendations will be helpful, thanks.

Andrew Helwer

unread,

Mar 18, 2026, 1:32:08 PM (2 days ago) Mar 18

to tla...@googlegroups.com

Hi Saif,

Trying to understand - you're saying that your replicas can die in between confirming/choosing a value and writing it to disk, and so this leads to chosen values being lost to the cluster since the non-persisted values are gone when they come back up?

Andrew

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/1b9fd2b0-45c0-43e1-8ab7-cb1b220f0b08n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages