Related to Vilho's question:
out of curiosity, what kind of clients do you have in mind? This may be due to my limited imagination but I don't see how quorum check would become an issue CPU-wise. But sure, an incoming response must be handled, and a thread, memory+cache, and CPU is involved. With high throughput the accumulated load might be substantial.
Suppose the client is an open-loop clients, and its CPU is not beefy, i.e., only has 1 thread for quorum check.
When the client wants to submit at a high rate, say 100K req/sec.
Previously for each request to commit, it only needs to receive one reply from the leader, and that's enough.
But for now, for each request to commit, it needs to receive reply from 2f+1 replies (all the replicas will send reply to it ), and needs to check quorum for f+1 replies.
Consider you have 2f+1=9 replicas, now every request will lead to 9 replies for the client to handle.
If you are submitting 100K req/sec, you are receiving 900K reply/sec. And you only have 1 thread at the client, that will make the client CPU-intensive.
The story is like this: Previously Raft/Multi-Paxos cause leader bottleneck, because leader's burden is much heavier. Now the burden for the leader is removed, but the burden does not disappear, it is just migrated to the client.
2. Having a database background, I'd expect potential large memory consumption caused by large results as a potential concern, assuming that the client stores them longer than what it would need by itself. Especially in cases where due to a failure the quorum checks between client and the leader are not in sync.
Clients do not need to stay for long. If the client finds the request has been outstanding for a long time, it just needs to retry or abandon.
3. The other issue you refer to is something I need to read. In my first response, I had a gut feeling that configuration changes might cause issues but I couldn't quickly find any problematic cases. Maybe reading the nopaxos paper helps. The answer to "what could possibly go wrong" in distributed processing is often interesting.
True. Totally agree. Actually, I post this discussion because we are designing a similar mechanism. One of my cooperators is worrying but he cannot find an error trace to prove it violate correctness, so I put it here to see anybody can come up with an error trace to show me the quorum offloading is problematic.