After reading the paper and looking at several Raft implementations, I
wonder about the relationship of the state machine and the consensus
module. Initially, I thought that the implementation of the two can be
strictly decoupled:
User ---> State Machine ---> Consensus Module ---> Log (Diagram 1)
That is, the state machine uses the consensus module only as
implementation detail. But then I see implementations where this is not
the case, where the state machine lives *inside* the consensus module.
Consul, for example [1]:
Since all servers participate as part of the peer set, they all know
the current leader. When an RPC request arrives at a non-leader
server, the request is forwarded to the leader. If the RPC is a
query type, meaning it is read-only, the leader generates the result
based on the current state of the FSM. If the RPC is a transaction
type, meaning it modifies state, the leader generates a new log
entry and applies it using Raft. Once the log entry is committed and
applied to the FSM, the transaction is complete.
That is, users send a command to the consensus module as opposed to the
state machine, and the command is dispatched based on its type (read vs.
write):
/---> State Machine
User ---> Consensus Module ---/ (Diagram 2)
\
\---> Log
Going back to the paper and looking at Figure 1, it seems that the
consensus module indeed controls the state machine, and more
importantly, is directly user-facing. i wonder whether that's a
necessity. In fact, the accompanying text in the paper reads more like
Diagram 1.
For example, would it be possible for Raft to only manage the log and
let the state machine drive snapshotting/compaction? Then, the state
machine would use the consensus module only as an implementation detail,
forwarding to it mutable operations (writes) while answering immutable
operations (reads) from its own state, derived from its local log of the
consensus node.
In Diagram 2, the consensus module takes care of both committing and
applying, returning to the user as soon as the command has been
"applied" (even though "applied" could have meant only reading
successfully from the state machine in case of an immutable command). In
Diagram 1, the state machine only dispatches the command to the
consensus module if its a write. In this case, the consensus module
returns to the user successfully after persisting the command in the log
and subsequent application to the state machine. The latter approach
makes more sense to me from an architectural point of view: only the
state machine knows how to interpret its commands, and whether
dispatching to the consensus module makes sense.
I haven't use many other Raft implementations, but these concerns came
up during my own implementation of Raft within an actor model runtime.
Any comments that help me better understand the fundamental design
decisions would be much appreciated.
Matthias
[1]
https://www.consul.io/docs/internals/consensus.html