If the end goal is linearizability, why do we need consensus?

496 views
Skip to first unread message

Colin Scott

unread,
Oct 19, 2015, 5:16:26 PM10/19/15
to raft-dev
In the Raft paper, Diego states (section 8, 3rd paragraph):

"Our goal for Raft is to implement linearizable semantics"'

It dawned on me that if the only goal is linearizability, consensus (and RSMs) should not be necessary. There are algorithms (see: slides, or paper) for implementing linearizable registers that (i) only require a quorum of processes to be correct (i.e. they have the same level of fault tolerance as Raft), and (ii) are strictly weaker than consensus (i.e. they assume a weaker failure detector than omega).

Linearizable registers do *not* however support stronger semantics, such as Atomic Compare-And-Swap.

So my question is this: are there Raft users who depend on stronger semantics than linearizability? i.e., is anyone using Raft for CAS (or similar) operations?

Thanks!
-Colin

Armon Dadgar

unread,
Oct 19, 2015, 5:36:00 PM10/19/15
to raft...@googlegroups.com
We certainly depend on the stronger semantics for CAS and coordination features like distributed locking for Consul. The serializability also makes it much easier to reason about the state of the system.

Best Regards,

Armon Dadgar

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Wilcox

unread,
Oct 19, 2015, 5:49:16 PM10/19/15
to raft...@googlegroups.com
This is a good question. 

Like you said, the key is what operations you support. "Linearizability" doesn't mean anything by itself, it only has meaning with respect to a sequential specification. In Raft, the sequential spec is just the state machine. So in the quote from the paper, "linearizable semantics" means "linearizable semantics with respect to the state machine", not "a linearizable register". 

You're right that if your only state machine operations are read and write (ie, it's a register), then you don't necessarily need Raft or consensus. In practice though, operations like CAS are convenient.

A rephrasing of your question is then "does anyone use Raft with a state machine that only supports read and write operations?" My guess would be that very few people do that. 

James

--

Colin Scott

unread,
Oct 19, 2015, 8:34:11 PM10/19/15
to raft-dev
Mostly makes sense, though I'm still fuzzy about the details. An attempt to make the discussion more concrete:

The state machine is a (sequential, i.e. non-concurrent) piece of code, written by the application developer.

The current state of the state machine resides in memory on a given replica, and consists of the values of all variables within the state machine code.

Each entry of the Raft log codifies a state machine transition. When Raft commits a log entry, it invokes the application developer's state machine with the value of the log entry. The state machine then performs some computation based on that value, and ends up in a new state.

Now, here's a confusing bit: since the state machine is a sequential piece of code, it doesn't actually need to contain any CAS instructions. The RSM abstraction allow the application developer to not have to worry about any concurrency whatsoever; the state machine code is single-threaded, and Raft always waits for the state machine to halt its computation before it feeds in a new state transition.

Now, the one part I'm confused about: what exactly you mean by "if your only state machine operations are read and write (ie, it's a register)".

I think what you mean is: if you tried to implement the state machine code *without* the assumption that each state transition is atomic, would you need to invoke a CAS instruction in order to achieve correctness? 

Is that right? If so, what's a concrete example of such a state machine? Armon mentioned that Consul's state machine supports a "CAS" state transition. I assume that the state machine code itself doesn't invoke an (x86) CAS instruction, but it would need to without the RSM abstraction?

Thanks!
-Colin 

Colin Scott

unread,
Oct 19, 2015, 8:44:56 PM10/19/15
to raft-dev
I haven't read this yet, but it looks like it might contain the answer to my question?: http://delivery.acm.org/10.1145/110000/102808/p124-herlihy.pdf

Colin Scott

unread,
Oct 19, 2015, 8:59:00 PM10/19/15
to raft...@googlegroups.com
You received this message because you are subscribed to a topic in the Google Groups "raft-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raft-dev/KCMqVfuhSNw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raft-dev+u...@googlegroups.com.

James Wilcox

unread,
Oct 20, 2015, 12:33:15 PM10/20/15
to raft...@googlegroups.com
The state machine is a (sequential, i.e. non-concurrent) piece of code, written by the application developer.

This is a good way of thinking about it. An alternate way that might be useful for this discussion is to not think of the state machine as code, but as a more abstract, mathematical object, like an automaton you might see in a theory of computation class. 
 
The current state of the state machine resides in memory on a given replica, and consists of the values of all variables within the state machine code.

In the alternate perspective, the state of the state machine is completely opaque and does not necessarily consist of "variables", but a copy resides on each machine (either on disk or in memory or both depending on exactly what flavor of Raft/RSM you're using).
 
Each entry of the Raft log codifies a state machine transition.

Yes. In the automaton view, each entry contains a single input 
 
When Raft commits a log entry, it invokes the application developer's state machine with the value of the log entry. The state machine then performs some computation based on that value, and ends up in a new state.

Now, here's a confusing bit: since the state machine is a sequential piece of code, it doesn't actually need to contain any CAS instructions. The RSM abstraction allow the application developer to not have to worry about any concurrency whatsoever; the state machine code is single-threaded, and Raft always waits for the state machine to halt its computation before it feeds in a new state transition.

This is where the automaton viewpoint can help. The automaton makes a *single*, atomic transition in response to each input.  So if it didn't have a CAS transition, and only read-write transitions, then it would have to use at least two log entries to simulate the CAS. But this simulation does not reflect the atomic nature of CAS, since the two log entries could get split up and have other things executed between them. So while that state machine abstraction saves you from some of the hard parts of concurrency, you still have to think about how much work should be done by each atomic transition. 

Now we can backport this explanation to the view of state machine as code. There, each transition consists of *all* the code that is run in response to the input. So the restriction to read/write operations would be saying that on each input, the code could either write a single variable or read a single variable and nothing else.
 
Now, the one part I'm confused about: what exactly you mean by "if your only state machine operations are read and write (ie, it's a register)".

I think what you mean is: if you tried to implement the state machine code *without* the assumption that each state transition is atomic, would you need to invoke a CAS instruction in order to achieve correctness? 

Hopefully the above explanation makes sense now. The state machine's transitions are atomic, but the restriction to being a register means that each transition can only consist of a single read or write operation. 
 
James

 

Colin Scott

unread,
Oct 20, 2015, 2:56:06 PM10/20/15
to raft...@googlegroups.com

Makes sense, thanks!


--
You received this message because you are subscribed to a topic in the Google Groups "raft-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raft-dev/KCMqVfuhSNw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raft-dev+u...@googlegroups.com.

jordan.h...@gmail.com

unread,
Oct 20, 2015, 11:19:53 PM10/20/15
to raft...@googlegroups.com
Excellent explanation


--

Martin Furmanski

unread,
Oct 22, 2015, 4:42:48 PM10/22/15
to raft-dev
It's not that the ultimate goal is to implement linearizable semantics, because as we all know that is to develop and describe an understandable consensus algorithm. The linearizability is an additional goal and we thank Diego for it. : )

The sessions as described give you the possibility to wrap any command and give it the exactly-once property, which would otherwise be problematic in the face of retries, for arbitrary commands ( i.e. important for such that don't naturally have zero effect on subsequent applications ).

Archie Cobbs

unread,
Oct 22, 2015, 5:33:30 PM10/22/15
to raft-dev
On Monday, October 19, 2015 at 4:16:26 PM UTC-5, Colin Scott wrote:
Linearizable registers do *not* however support stronger semantics, such as Atomic Compare-And-Swap.

So my question is this: are there Raft users who depend on stronger semantics than linearizability? i.e., is anyone using Raft for CAS (or similar) operations?

Raft provides "just" a consensus log, but that's a great building block and I'd guess many uses build on that. For example in my case I implemented a fully transactional key/value database using Raft (see this post). It provides linearizable ACID semantics over the whole key space by having the leader detect conflicts.

-Archie
Reply all
Reply to author
Forward
0 new messages