Election vs consensus

430 views
Skip to first unread message

Roberto Ostinelli

unread,
Jan 7, 2022, 5:15:09 PM1/7/22
to erlang-q...@erlang.org
Dear list,
This is not directly related to erlang but for obvious reasons it’s pretty tightly related.

I would like to understand why election algorithms such as bully are afaik deemed not enough to build consistent systems, hence the existence of consensus algorithms such as paxos/raft. More specifically in the creation of simple key/value stores.

I would have imagined that with leader election you can pipe all the read/write operations through the leader, hence implement consistency. However i.e. raft came in to fill this kind of scenario, and I would like to understand why it is needed.

Does someone have pointers on things to read?

Thanks in advance,
r.

Anthony Howe

unread,
Jan 7, 2022, 5:24:34 PM1/7/22
to Roberto Ostinelli, erlang-q...@erlang.org
Some of the Leader Election / RAFT related material I collected when I was
looking into the subject.

https://raft.github.io/raft.pdf

https://www.elastic.co/blog/found-leader-election-in-general

https://www.tigera.io/blog/using-etcd-for-elections/

--
Anthony C Howe SnertSoft
ach...@snert.com Twitter: SirWumpus BarricadeMX & Milters
http://snert.com/ http://nanozen.snert.com/ http://snertsoft.com/

Roberto Ostinelli

unread,
Jan 7, 2022, 5:57:31 PM1/7/22
to Anthony Howe, erlang-q...@erlang.org
Thank you Anthony. I need to spend more time on the raft.pdf (I’ve only skimmed through it before).

Raft, afaik, once a leader is elected all it does is to send a new log (i.e. a write in a kv store) to all the followers and wait for their ack. What I’m trying to understand is what raft (a consensus algorithm) offers on top of a simple bully (a leader algorithm) with a similar ack mechanism, for example, in the context of a kv store. Some explanations probably are in the raft.pdf so that could be a good start. Thank you.

Any other pointers welcome!

Thanks,
r.

> On 7 Jan 2022, at 23:24, Anthony Howe <ach...@snert.com> wrote:

Aliaksei Artamonau

unread,
Jan 8, 2022, 1:09:48 AM1/8/22
to Roberto Ostinelli, erlang-q...@erlang.org
When you talk about implementing consensus on top of leader election, how exactly do you envision it? Getting a bit more details could help answer your question.

You could use the bully algorithm to implement leader election for a consensus protocol like (Multi-)Paxos. Raft is slightly different because it uses leader election to avoid the need for the new leader to synchronize with the other nodes to get all updates that may have been committed by previous leaders. At the same time, the leader election algorithm that Raft uses is not that different from the bully algorithm: substitute process IDs for log positions and require at least a majority of responses.

Some reasons for why majority voting schemes are typically used to implement consensus protocols (to my understanding).

1. In order for the consensus protocol to make progress a majority of nodes is required anyway. So it does not achieve much to choose a leader when there's no majority.
2. In Raft specifically, leaders use integer term/epoch numbers. And no term/epoch should have more than one leader, which requires for at least a majority of nodes to agree on a leader.
3. Competing leaders impede progress. So checking with a majority of nodes lowers the probability of having concurrent leaders.

Roughly speaking what a consensus protocol like Raft gives you:

1. A total order on leader terms.
2. A way to ensure that a stale leader can't commit anything once a new term has commenced.
3. A way for a new leader to be sure that it has got all updates that may have previously been committed.
4. A total order on all updates.
5. A way to change the set of nodes in your cluster while preserving safety.
5. Availability as long as a majority of nodes are alive.



Aliaksei

Dominic Morneau

unread,
Jan 8, 2022, 1:34:45 AM1/8/22
to Aliaksei Artamonau, erlang-questions
I think Aliaksei put it well. The simpler election algorithms like bully and ring election are not partition-tolerant, they can elect multiple leaders at the same time if the network is bad. Raft uses a fancier algorithm based on majority vote which solves that weakness (and more).

Dominic

2022年1月8日(土) 15:09 Aliaksei Artamonau <aliaksiej...@gmail.com>:

Roberto Ostinelli

unread,
Jan 9, 2022, 8:43:28 AM1/9/22
to Aliaksei Artamonau, erlang-q...@erlang.org
Thank you Aliaksei, this is helpful.
Reply all
Reply to author
Forward
0 new messages