Membership replacement order

Alex Goltman

unread,

Nov 24, 2015, 9:34:55 AM11/24/15

to raft-dev

Hi,

In our system we wish to provide n+2 redundancy, thus we chose having 5 members. In some cases we wish to replace raft members with others, either because the current member is dead or for rebalancing.

Our raft implementation supports membership change using 1 member removal/addition at a time. So in order to replace a member we need to both remove the previous one and add the new one.

We considered the possible 2 orders of doing that with the following cases in mind:

5 health members, and we wish to replace one of them with another for rebalancing purposes: If we first remove a member - in that moment we remain with only 4 members, and loose the n+2 redundancy. So we would like to add first.
5 members total, and 2 of them die: We then decide on a new one to replace a dead one, and add it as member, but then for some reason it dies, and we are left with 3/6 members dead and no quorum. It's correct that in that moment we already have 3 nodes dead thus we are over our n+2 redundancy guarantee. But if we were to first remove the old member and then add the new one, there would be no problem as we would continue having a quorum throughout the whole process - 3/5 in the beginning, 3/4 after removing, and 3/5 after adding the new one. So in that case we would like to first remove.

So the compromise we thought of is to check if we're replacing a dead member - if so then first remove, else first add.

Was this discussed somewhere before? Would appreciate your thoughts.

Thanks,

Alex

Kijana Woodard

unread,

Nov 24, 2015, 10:02:58 AM11/24/15

to raft...@googlegroups.com

Fwiw, I'm not sure what "n+2 redundancy" means.

For the first case, raft is already handling "node down". Don't worry about it. I'd probably take one down first, then add the other, but it doesn't really matter.

For the second case, you're on an edge. I've seen people here talk about having admin controls override standard behavior. Effectively, a "running with scissors mode" for handling such extraordinary situations.

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Goltman

unread,

Nov 24, 2015, 11:32:36 AM11/24/15

to raft-dev

In n+2 redundancy I meant we need to allow the system to lose 2 members, and still function.

If we first remove a member from the 5, we'll be left with 4, which means in that state we can't allow to lose 2/4 members (as our redundancy n+2 guarantee dictates), because we won't be able to have a 3/4 quorum.

Юрий Соколов

unread,

Nov 25, 2015, 3:03:40 AM11/25/15

to raft-dev

Question to Diego : is it possible to use same "one at time" cluster managment protocol to atomically "swap one old member with new one"?

Probably, no: there is a chance of split brain if alternative majority decide to combine with old member.

Alex Goltman

unread,

Nov 25, 2015, 6:06:44 AM11/25/15

to raft-dev

No need for Diago for that :)

Like you wrote - there will be a split brain. E.g. starting with [A, B, C] and replacing C -> D:

* B hears of the new conf and forms a quorum with D.
* A didn't receive the new conf and forms a quorum with C.

Still waiting for opinions on the "remove then add" vs. "add then remove" replacement.

Diego Ongaro

unread,

Dec 15, 2015, 2:24:36 PM12/15/15

to raft...@googlegroups.com

Hi Alex,

Sorry I'm pretty late to this thread. The original question is pretty interesting and not something I'd really thought about before.

If you have an odd-sized cluster and know with high confidence that a server is dead, you can remove it with no real change in behavior. For example, say I have {S1, S2, S3, S4, S5} but I know S5 is dead. Then any decision requires 3 of {S1, S2, S3, S4}. But that's the exact same property I get if I evict S5 from the cluster.

The same doesn't hold when you're starting with an even-sized cluster. For example, say I have {S1, S2, S3, S4} but I know S4 is dead. Then any decision requires all of {S1, S2, S3}. If I evict S4 from the cluster, now any decision requires only two of {S1, S2, S3}. That may be preferable, but it also may be fewer copies than the administrator had in mind. Taken to the extreme, would you be ok with a 1-server cluster? Probably not.

So maybe you want the administrator to set not only a target cluster size but also a minimum cluster size, such as target 5 and minimum of 3 or 4. Would this policy for an automated replacement task do the right thing?

while true {

while curr > min && exists dead server {

remove dead server;

}

while curr < target && exists spare healthy server {

add spare healthy server;

}

For manual replacement, where the server being replaced is healthy, I think you'd still want to add then remove.

Of course, determining whether a server is in fact dead is difficult to do exactly, but you might be happy with a heuristic in practice.

Another option is to implement the joint consensus approach to membership changes. It takes a moderate amount of additional work to implement, but it lets you replace any number of members atomically. If you're worried about this kind of thing, that may be worth it to you.