Configuration changes

421 views
Skip to first unread message

Ezra Hoch

unread,
Feb 26, 2014, 11:55:53 AM2/26/14
to raft...@googlegroups.com
There is something I don't understand about configuration changes.

Suppose the old set contains 9 nodes, and the new set contains 3 nodes (these sets are distinct).
At first, we try to commit old+new.
Let's assume it is committed (i.e., all 12 nodes are aware of that they are now operating with configurations old+new).

Now, suppose the leader is from the new group. We try to add an entry for the new configuration.
As soon as it is added to the server, the server operates according to the new configurations rules.
That is, the server sends the entry only to the 3 new nodes, and wait for 2 to reply. 
Suppose all 3 reply, and now the 3 new nodes are operating by themselves correctly.

Now, consider the 9 old nodes: they think that the configuration is old+new, and since there are 9 old nodes, it is possible (with network partitioning) that they will elect their own leader, and that leader will continue to add items to the log, as it has a majority of 7 (7 out of the 12 it thinks are in the cluster).

How is the configuration transition suppose to work?

Thanks,
Ezra

Diego Ongaro

unread,
Feb 28, 2014, 1:58:41 AM2/28/14
to Ezra Hoch, raft...@googlegroups.com
Hi Ezra,

Fortunately, the situation you describe is not possible. For a server
using the old+new configuration to be elected leader, it needs votes
from a majority of the old configuration (5 of the 9) *and* a majority
of the new configuration (2 of the 3). It's not 7 of 12, it's 5 of 9
*and* 2 of 3.

Once the Cnew entry is committed on a majority of the new
configuration, that majority will no longer vote for servers without
the Cnew entry; remember, a server cannot become leader without having
all committed entries in its log. So the 9 old servers can't elect
their own leader because they need votes from the new configuration,
and once the new configuration has moved on, they won't grant votes to
servers in the old configuration.

Best,
Diego
> --
> You received this message because you are subscribed to the Google Groups
> "raft-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to raft-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Ezra Hoch

unread,
Feb 28, 2014, 2:13:24 AM2/28/14
to raft...@googlegroups.com, Ezra Hoch
Thanks Diego.

I now see where I miss read the paper. 
It states:
"Agreement (for elections and entry commitment) requires majorities from both the old and new configurations."

I read it as *majority* (single, not plural) from both the old and new configurations.
I would suggest to emphasise this point (i.e., *majorities*) as it is crucial for the correctness of the configuration transition.

Best,
Ezra

Ezra Hoch

unread,
Feb 28, 2014, 2:50:16 AM2/28/14
to raft...@googlegroups.com, Ezra Hoch
I have another question about reconfiguration:
Suppose we have old-config as 3 members, and new-config as different 3 members.
Server manages to commit old+new config change to the 6 nodes.

Now, the leader manages to replicate the new config only to the old members. and then it crashes.

Let's consider who can be elected as leader:
- an old member can't be elected because they all have the new config entry, and thus they don't consider themselves part of the system anymore ("...Cnew and replicate it to the cluster. Again, this configuration will take effect on each server as soon as it is seen.")
- a new member can't be elected, because they need votes from the old members, but the old members will never vote for them, because: a) the old members are out of the game, and b) the old members are more up to date (they have another entry in the log) than the new members.

I think the only way for it work is to allow members which already received the new config to participate in voting for members that still have old+new config as their working config, even though they aren't part of the new config.
Is that true? Is there any other way to ensure progress?

Thanks again,
Ezra

Diego Ongaro

unread,
Feb 28, 2014, 2:59:27 AM2/28/14
to Ezra Hoch, raft...@googlegroups.com
Well, once a leader has the Cnew entry in its log, it stops
replicating to the old configuration, since those servers no longer
contribute to commitment. So you wouldn't end up in a situation where
the old servers have Cnew to begin with.
-Diego

Ezra Hoch

unread,
Feb 28, 2014, 3:17:00 AM2/28/14
to raft...@googlegroups.com, Ezra Hoch
What if the following happens?

- All members (all 6) have old+new
- Old member1 becomes leader and receives the new config from the client, then is it is too slow and is unelected
- Old member2 becomes leader and receives the new config from the client, then is it is too slow and is unelected (it can be elected in the first place, because it can get 2 votes from the old set, and 3 votes from the new set)
- Now no one can be elected: two of the old members are out of the game, and the other 4 members (1 old, 3 new) won't have their needed majorities.

Thanks,
Ezra

Diego Ongaro

unread,
Feb 28, 2014, 2:45:38 PM2/28/14
to Ezra Hoch, raft...@googlegroups.com

Wow, I think that's a bug. Thanks for catching it. I'm going to refrain from throwing out possible fixes until I've thought about the issue some more. Hopefully we can find a nice solution.
Best,
Diego

Diego Ongaro

unread,
Mar 4, 2014, 7:11:11 PM3/4/14
to Ezra Hoch, raft...@googlegroups.com
Hi Ezra,

Thanks again for bringing this issue up; it's something that hadn't
occurred to us. Let's see if you can break this one :)

I think a key property we need to maintain is that, even if up to a
bare majority of the old cluster servers are down, and up to a bare
majority of the new cluster servers are down, there must always be at
least one server that is eligible to become leader.

The problem you've pointed out occurs when the only servers with
eligible logs (those that are most up-to-date out of a quorum) stop
themselves from becoming candidates because they're not members of the
latest configurations in their logs.

I think there's two approaches to solving this:
1. You can disallow removed servers (those in the old configuration
that are not also in the new configuration) from having the Cnew log
entry, or
2. You can allow removed servers to become candidates and leaders,
even when they're not part of their latest configuration.
I think both of these can work, but my proposal uses the second approach.

The first change is that, unlike what we wrote in earlier emails,
servers would continue to campaign to become leaders even if they're
not part of the latest configuration in their logs. So now a removed
server that has the Cnew entry can be elected leader. It just doesn't
count its own vote towards anything, as it's seeking only votes from
the majority of the new configuration at that point.

If a removed server is leader when the Cold,new entry is committed,
then as in the paper, it creates the Cnew log entry and steps down
once that entry is committed. The second change is that now such a
leader would also stop accepting new client requests after appending
the Cnew entry in its log. The reason is that in this situation, you
want the system to transition to a leader in the new configuration
quickly, and if the removed server as leader continues to grow its
log, this biases its log over others (it could then be the only
eligible server again).

If that was clear and precise enough, are you able to poke any holes
in it? I'm also open to any alternative ideas that might be easier to
understand.

Best,
Diego

Pablo Medina

unread,
Mar 5, 2014, 9:01:21 PM3/5/14
to Diego Ongaro, Ezra Hoch, raft...@googlegroups.com
Hi guys,

What if Cnew gets effective only when its committed instead of when its received? In that case the servers that are down having Cnew in its log can then (when restarted) become candidates (and elected) due to Cnew not being effective, replicate Cnew to other servers, commit it when it reach majorities and then apply Cnew (and stopping if no longer part of the cluster). Does that make sense?

Pablo.

Ezra Hoch

unread,
Mar 8, 2014, 2:11:46 PM3/8/14
to raft...@googlegroups.com, Diego Ongaro, Ezra Hoch
Hi Pablo,

I'm not sure I understand what you propose. Only the leader knows when an entry gets committed (when it receives enough acks). How do you propose to notify the followers that an entry was committed? If it is by sending a message, can't you reproduce the setting previously mentioned? (for example, A-B-C are the old ones, X-Y-Z are the new ones. they all have old+new, and they all have Cnew as well. A is the leader so it knows Cnew was committed. It tells B and then they both crash. So we have C and X-Y-Z all alive, and all working according to old+new, so no new leader will be elected).

Diego, I've thought about your suggested solution, and there is something I'm missing about how the process of a configuration change is initiated. Consider the following setting:
- A-B-C are the old ones, X-Y-Z are the new ones
- Z is sleepy, so it doesn't get any messages for the time being
- A is leader, and sends old+new to everyone (and everyone except Z acks it)
- now A sends Cnew to everyone (and everyone except Z acks it)
- now X becomes leader
- A+B+C leave the scene
- now X crashes and Z wakes up
- we have Y that is alive and has Cnew committed, and Z which is alive and has no idea what's going on

What would happen next? How would Z know to vote for Y, when Y asks its vote?

Diego Ongaro

unread,
Mar 8, 2014, 5:55:29 PM3/8/14
to Ezra Hoch, raft...@googlegroups.com
On Sat, Mar 8, 2014 at 11:11 AM, Ezra Hoch <all.ez...@gmail.com> wrote:
> Hi Pablo,
>
> I'm not sure I understand what you propose. Only the leader knows when an
> entry gets committed (when it receives enough acks). How do you propose to
> notify the followers that an entry was committed? If it is by sending a
> message, can't you reproduce the setting previously mentioned? (for example,
> A-B-C are the old ones, X-Y-Z are the new ones. they all have old+new, and
> they all have Cnew as well. A is the leader so it knows Cnew was committed.
> It tells B and then they both crash. So we have C and X-Y-Z all alive, and
> all working according to old+new, so no new leader will be elected).

That does seem to be a problem (in Pablo's proposal) -- the old nodes
don't know when they can shut down. If they do so right after marking
Cnew committed (locally), that's too early.


> Diego, I've thought about your suggested solution, and there is something
> I'm missing about how the process of a configuration change is initiated.
> Consider the following setting:
> - A-B-C are the old ones, X-Y-Z are the new ones
> - Z is sleepy, so it doesn't get any messages for the time being
> - A is leader, and sends old+new to everyone (and everyone except Z acks it)
> - now A sends Cnew to everyone (and everyone except Z acks it)
> - now X becomes leader
> - A+B+C leave the scene
> - now X crashes and Z wakes up
> - we have Y that is alive and has Cnew committed, and Z which is alive and
> has no idea what's going on
>
> What would happen next? How would Z know to vote for Y, when Y asks its
> vote?

Back to discussing my (Diego's) proposal now. So the state of the cluster is:
- A, B, and C are gone forever
- X is crashed
- Y has the Cnew entry (X,Y,Z) in its log
- Z has the Cold entry in its log (A,B,C)

From here, Y could get Z's vote, since Y's log is more up-to-date than Z's.

Servers will happily grant votes to candidates that aren't in their
current configurations, as long as the candidate has a log that's no
less up-to-date than theirs. This allows servers to be added to the
cluster, for example, or allows servers that have been away for a long
time but are still part of the current configuration to grant votes.

So Y would become leader (under the rules of Cnew), ship over Cold,new
and Cnew and any other entries it has to Z, and the cluster would
continue on.

Does that make sense? I can't help but wonder whether I'm backing
myself into a corner again...

Best,
Diego

Ezra Hoch

unread,
Mar 9, 2014, 1:34:42 AM3/9/14
to raft...@googlegroups.com, Ezra Hoch
Hi Diego,

I think your solution works.
The logic for casting votes you presented is very simple, very clear, and solves all the end cases I can think of :)

Cheers,
Ezra

Taral

unread,
Mar 9, 2014, 12:45:26 PM3/9/14
to Diego Ongaro, raft...@googlegroups.com, Ezra Hoch

Are you planning to update the TLA+ files with these updates, Diego?

- Taral

Diego Ongaro

unread,
Mar 9, 2014, 4:00:08 PM3/9/14
to Taral, raft...@googlegroups.com, Ezra Hoch
Thanks, Ezra.

Taral,
I wasn't planning on updating the TLA+ files since they don't
currently cover membership changes and they also only focus on safety
(the most subtlety seems to be in liveness/availability for membership
changes). It'd be good to see some formal methods applied to
membership changes, but I have no current plans to do that (anyone
else?). In light of this issue, I do hope to include a
liveness/availability argument for membership changes in my thesis,
but it'll be an informal sketch.
-Diego
Reply all
Reply to author
Forward
0 new messages