[Question] request processing parallelization in raft.

41 views
Skip to first unread message

zhijun

unread,
Aug 10, 2017, 10:35:44 PM8/10/17
to raft-dev
Hi,

I recently got a bit confused on this. In raft the leader replicates the log to other servers, and only apply the change locally after getting confirmation from more than half of the servers. And as I understand, to ensure changes are applied in the same order as they arrive, these changes should be applied sequentially, then effectively the request processing becomes somewhat single thread - would this hurt performance significantly?  Or is there anything that I'm missing?

Thanks a lot!
Zhijun

Henrik Ingo

unread,
Aug 11, 2017, 3:16:18 AM8/11/17
to raft...@googlegroups.com
That's correct.

The simple answer is that Raft is first and foremost a simple and easy to understand algorithm, and such optimization questions are out of scope.

In practice many replicated databases struggle with throughput of replication. Often the case is such that the primary is highly tuned to process requests in parallel, but the replication stream forces transactions into a sequence, and then the first implementations of applying transactions onto the secondaries are more or less single threaded and you experience secondary lag.

What you often can do is that you have more than one applier thread (on the secondary, but in the case of Raft, also on the primary) that can work on applying transactions in parallel, but coordinate between themselves so that they are committed in the correct order. This seems to give significant performance benefit.

Note that in many systems this approach opens the door for transactions to conflict (e.g. trying to update a record that also another not-yet-committed transaction has updated) that need to be resolved somehow. Several systems depend on some heuristic or proof which says that a short subset of the transaction sequence is known not to conflict. Alternatively you could simply rollback and retry the later transaction.

henrik

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
henri...@avoinelama.fi
+358-40-5697354        skype: henrik.ingo            irc: hingo
www.openlife.cc

My LinkedIn profile: http://fi.linkedin.com/pub/henrik-ingo/3/232/8a7

jordan.h...@gmail.com

unread,
Aug 11, 2017, 4:44:08 PM8/11/17
to raft...@googlegroups.com
Raft does indeed at some level depend on a single thread to enforce order when applying changes. And even if it could deal with that limitation, it's still limited by the throughput of a single node (the leader). But there are a couple obvious solutions we've used to deal with these problems...

What we do first is split the state machine into multiple independent state machines where guarantees do not hold across state machines. Any given operation is associated with a specific state machine, and two-phase commit is used for cross state machine operations. This allows each state machine to read and apply operations independently and without regard for the order of operations in other state machines. Snapshots of state machines are taken independently of all other state machines. Similarly, we create a separate logical session for each state machine, and all sessions associated with a client share a single periodic keep-alive request that contains indexes relevant to the associated state machine for each session (see linearizable semantics).

Secondly, you can scale horizontally. We partition the state among multiple Raft clusters and again use a two-phase commit protocol with a fault tolerant coordinator for cross-partition transactions.

In this architecture, the cluster effectively manages multiple state machines, and each state machine may or may not be partitioned among several physical Raft clusters. Ordering guarantees do not hold across state machines and the cost of coordinating state machines and partitions is high, but it's very rare for our use case.

Of course, none of this is particularly novel or unexpected. But if a single Raft instance is viewed as a component of a larger system, the ordering limitations are not hard to overcome.
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages