Are writes handled by leader only?

Marek Denis

unread,

Feb 2, 2016, 11:22:45 AM2/2/16

to raft...@googlegroups.com

Hi,

As far as I understand basics of the Raft protocol all writes must be handled by elected leader. So, in some implementations even if a client tries to write and chooses a follower, he will be redirected to a leader.

This would basically mean that even for a distributed system spanning across multiple datacenters all writes will always end in one node/rack/datacenter (depending on size of deployment). Is there any better way to distribute writes more evenly?

I watched a quite interesting talk titled "Transactions Across Datacenters" where some techniques are described and when it comes to true multihoming (multiple nodes featuring writes), Paxos is mentioned as a protocol allowing for this. Moreover, the author says, it many nodes/datacenters will allow for writes (link to the talk: https://youtu.be/srOgpXECblk?t=2681)

To summarizing the question - is Raft and Paxos any different in handling multiple writes by nodes different than a leader?

Thanks,

--

Marek Denis

Henrik Ingo

unread,

Feb 7, 2016, 11:58:07 AM2/7/16

to raft...@googlegroups.com

Hi Marek

Raft is a leader-based replication protocol, so yes, sending all
writes to a single leader is a fundamental property of the algorithm.
There are other replication protocols that are for example
multi-master, and they would then be quite different.

Note that Raft could still be used in an implementation that spans
multiple data centers or continents even, and allows for multiple
writes. For example, in MongoDB a cluster is composed of multiple
shards. Each shard can be in a different location, or alternatively
each shard can be configured (with priorities, this is outside of the
pure Raft protocol) to elect a leader in a specifc data center (as
much as possible). Within each shard, high availability is provided by
a leader-based replication protocol similar to Raft. But taking the
cluster as a whole, it is possible to configure individual shards so
that it is always possible for an application to write to some shard,
whose primary is close to the client (e.g. on the same continent).

So in summary: writing to the nearest node may be a property of your
replication algorithm, but may also be provided by other means.

henrik

> --
> You received this message because you are subscribed to the Google Groups
> "raft-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to raft-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
henri...@avoinelama.fi
+358-40-5697354 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://fi.linkedin.com/pub/henrik-ingo/3/232/8a7

Diego Ongaro

unread,

Feb 10, 2016, 2:54:14 PM2/10/16

to raft...@googlegroups.com

Thanks, Henrik.

Marek, if you're looking for more pointers, check out sections 11.7 "Performance" and 11.7.1 "Reducing leader bottleneck" in the Related Work chapter of my PhD dissertation ( https://github.com/ongardie/dissertation ).

-Diego

Reply all

Reply to author

Forward