Currently I'm trying to build a layer of distributed state machine with Raft following this paper: http://nil.csail.mit.edu/6.5840/2025/papers/raft-extended.pdf
Course: https://pdos.csail.mit.edu/6.824/
The problem appeared when I introduced the persistence layer (disk) to my log replication logic. There are situations where agreement on a specific entry cannot be reached because of time limit exceeded (a time-frame the entry should be replicated inside of).
Mainly, when there is a heavy randomized leader failure, the followers don't get the chance to fully back up their log. This problem didn't appear previously without persistence in the picture.
My question is: does the TLA+ tool help me identify the true problem in case I'm already identifying it wrong?
Any thoughts or recommendations will be helpful, thanks.
--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/1b9fd2b0-45c0-43e1-8ab7-cb1b220f0b08n%40googlegroups.com.