Question about the state machine model

cheney

unread,

Jul 29, 2021, 2:29:30 AM7/29/21

to raft-dev

Hi all,

We know that raft is based on the state machines assumption that if the state machine is "deterministic" and all the state machines will reach the same final state when they are fed with same input sequence.

While I found that the state machine assumption may be violated, I'm trying to find a way to modify our business logic to suit the state machine model.

We have such a problem in our storage system, each disk we have a storage engine, and multiple raft groups run on the same storage engine, and each storage engine may have different raft group members.For example, three nodes A,B,C form one raft group, the engine which serves for A, B, C may contains different other groups.

Currently, our implementation logic is that, one user request will be written into raft logs and replicated at first(regardless some parameters checking), and after the log is committed, we try to apply the log.However, since the state of the storage engines are different, the apply result of each raft member may become different, that violate the state machine assumption(e.g., the user sends a wrong parameter request which cause duplicate allocation for some resource, some nodes have the duplicated resource so they can detect the fault while others can't).

I proposed one solution is that, the leader should do all the checks and makes sure that the request will be applied successful before it writes the request into WAL.However I still meet the problem that even the leader thinks it's okay, the followers may conclude different results since their state are not same with leader.

Another solution is that, for some undeterministic logic I can just bypass the raft mechanism so we can make sure only the deterministic logic goes to raft.

What is the standard way to solve this problem?

Oren Eini (Ayende Rahien)

unread,

Jul 29, 2021, 3:36:02 AM7/29/21

to raft...@googlegroups.com

The state machine's input and state should be identical across the system.

If you are checking the local state which may be different, that violates the assumptions and leads to Bad Things.

In general, each raft should be isolated from the others. Common scenarios where this happens is if a single node may run out of disk, or some such, but that should just mark the node as failed.

If you need to do get additional details, get them before the raft log, so they will be identical across the board.

Also note that time in the cluster is dictate by the leader, not each node's clock

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/94598b5c-b80c-469b-aa60-70e7bc07c1b9n%40googlegroups.com.

--

Oren Eini
CEO / Hibernating Rhinos LTD

Mobile: 972-52-548-6969

Sales: sa...@ravendb.net

Skype: ayenderahien

Support: sup...@ravendb.net

Konstantin Osipov

unread,

Jul 29, 2021, 6:54:11 AM7/29/21

to raft...@googlegroups.com

* cheney <cheney...@gmail.com> [21/07/29 09:29]:

I believe raft is designed for replication of deterministic state
machines, so applying the same sequence of commands in the same
order should produce the same result.

--
Konstantin Osipov, Moscow, Russia

Reply all

Reply to author

Forward