AddServer / RemoveServer Linearizability (single membership changes)

Jason Teplitz

unread,

Mar 23, 2017, 11:22:59 PM3/23/17

to raft-dev

Hey raft-dev!

I was wondering if membership changes are typically considered to occur inside a client's session? If they are not then I was wondering how the following (admittedly contrived) situation should be handled:

Let's say we have Admin A and Admin B. Admin A wants to add server S into a cluster and sends an AddServer(S) RPC to the leader. The leader successfully replicates the new configuration but crashes before it can tell admin A. During this time, admin B decides they want to remove server S from the cluster for some reason and they issue a RemoveServer(S) RPC. They send this RPC to the new leader and get back a success. Meanwhile, admin A's script has been waiting in a backoff period and now tries again and issues another AddServer(S) RPC. Without any session information the new leader will go ahead and add S again and the end result will be that S is in the cluster.

This situation makes me think that we would want to filter out duplicate configurations when picking the latest configuration from the log, but I wasn't sure when I read through the paper so I wanted to double check what the people here thought.

Thanks!

Oren Eini (Ayende Rahien)

unread,

Mar 24, 2017, 3:27:14 PM3/24/17

to raft...@googlegroups.com

Jason,

First, remember that only a single AddServer / RemoveServer can be pending at any given point.

Second, in such cases, I would typically suggest to use optimistic concurrency.

What I typically do is to have the topology include the log index in which it was last changed, then you can issue a command that looks something like:

AddServer( S, lastTopologyChange: 944 );

This way, if you have a mismatch, the server will accept (and then ignore) this command, and the leader will send an error.

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

Terry Tan

unread,

Apr 10, 2017, 5:44:02 AM4/10/17

to raft-dev

Hi Rahien,

Why can we only add one server every time , why can't we add several servers within one time? could you tell me what

will happen if we add several serves within one time.

在 2017年3月25日星期六 UTC+8上午3:27:14，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 12, 2017, 3:21:13 AM4/12/17

to raft...@googlegroups.com

You _can_ add multiple servers, but the logic there is much more complex.

See the thesis discussion on that. https://raft.github.io/raft.pdf

6, in particular, Joint Consensus

On the other hand, if you limit yourself to just one server at a time, you pretty much don't have to do anything.

Terry Tan

unread,

Apr 13, 2017, 7:39:23 AM4/13/17

to raft-dev

Hi,

I read that paper before ,but i am still confused about it.

first ,who starts the new servers joining. is that two new servers, if it is how do they know each other?

2.it says two new servers will wait util it catch up the leaders, if they finished the catching up , if the leader crashes before it sends config reqeust to rest servers ,two old servers will not know the rest new servers,then what is the solution? the two old servers will elect a new leader among two of them,but two new servers dont have the vote rights ,so they will never join the old cluster?

3 .let's say leader finished the commit new c(old,new) entry, then it crashes,the rest two old servers are crashed as well , the two new servers will never elect a leader?

在 2017年4月12日星期三 UTC+8下午3:21:13，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 14, 2017, 2:57:15 PM4/14/17

to raft...@googlegroups.com

The way I implemented it is that the topology is made from "Members, Promotables, Watchers"

Members are full class members, they can vote and become leaders

Watchers are just listening, and can never become leaders

Promotables will only be listening, but once they are caught up with the leader, they will automatically be moved to members (as another topology change).

That give us the ability to start a server (start in passive mode, and then add it to a cluster) . We typically add it in promotable mode, and that means that it will auto promote when it is caught up.

And that is pretty much why you want to only do a single server transition at a time, because you don't have to worry about different majorities and updates.

The way I handle that is that I split the process so the adding of the server is quick, just add as promotable, and it is spread to the entire cluster, and the leader update it.

Then I can add _another_ server as promotable, and then it will also catch up, and then update itself on the fly

Terry Tan

unread,

Apr 17, 2017, 3:22:55 AM4/17/17

to raft-dev

Hi Ayende,

Do you have any codes(or your open source project) for me to refer? and i have several questions as well,

the first , when one server joins the cluster, it issues a request to leader , what will leader process this request ? when it refresh it's new config when it takes effect the new config ,when the promotables know that it is syc finished? to simply compare it's log with leader's log ,if log index is the same then promote to be member?

在 2017年4月15日星期六 UTC+8上午2:57:15，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 18, 2017, 3:45:09 AM4/18/17

to raft...@googlegroups.com

The current version of this is here:

https://github.com/ravendb/ravendb/tree/v3.5/Rachis

We have a newer version in the works (uses stable TCP connections instead of HTTP), but it isn't quite ready yet.

Oren Eini (Ayende Rahien)

unread,

Apr 18, 2017, 3:45:27 AM4/18/17

to raft...@googlegroups.com

The promotables don't know anything, it is the leader that handle this

Terry Tan

unread,

Apr 18, 2017, 6:25:01 AM4/18/17

to raft-dev

Hi Ayende,

Let's say leader received the request for joining , then leader adds the server into its memberlist ,and send this config request to followers, let's say one follower received the request ,and finished the config ,meanwhile ,leader is crashed ,then this follower will take new quorum(3) as majority ,while another follower not receiving request will still take 2 as majority ,so that two leaders will be elected , how do we handle this situation ?

在 2017年4月18日星期二 UTC+8下午3:45:27，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 18, 2017, 7:19:06 AM4/18/17

to raft...@googlegroups.com

Not how it works.

The new server is first added as promotable, but let us ignore it for now and say that we have the following topology:

A,B,C <- members

D is added by leader A that notify B, then crashes immediately afterward.

A topology (down): [a,b,c,d]

B topology: [a,b,c,d]

C topology: [a,b,c]

D topology: [a,b,c,d]

If C attempts to become leader, it will be rejected, because its lot isn't as long as the one on B, and then B will become the leader (or D, if C will accept a request vote from it).

Terry Tan

unread,

Apr 18, 2017, 10:40:49 PM4/18/17

to raft-dev

Hi Ayende,

Thank you for your help, now i know better about the whole process ,but there is another question which is ,as what raft paper says , the config process will not be like normal process, so it will not append log first ,then commit ,instead of doing so ,it will config first ,then append log ,then doing commitment ,after the entry being committed , the apply process is actually doing nothing, if what i am saying is correct ,as the case you gave ,

let's say leader received joining request, then forward to one follower ,then this follower finished the config ,but somehow ,failed to append log ,meanwhile ,leader is crashed ,then the length of the log of this follower might be the same as another follower (which was not received the config request ),let's say the one not receiving config request then is elected leader ,this config will be lost , right?

在 2017年4月18日星期二 UTC+8下午7:19:06，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 19, 2017, 2:34:59 AM4/19/17

to raft...@googlegroups.com

Yes, that is correct, and this is fine, since this effectively roll the change back

Terry Tan

unread,

Apr 19, 2017, 6:42:08 AM4/19/17

to raft-dev

Hi Ayende,

Thank you so much , BTW　for the part you mentioned before, like promotable servers , How it works? As what i am thinking ,it is pretty much like joint consensus which needs two entries committed. First , leader received the join request ,config locally ,mark it as promotable ,then sends cofing request to the rest followers ,after the followers finished the config (then append entry) and then the entry is committed by the majority, The leader starts to send syc appendrequst to the promotable util it catches up , then changes the status locally, append another entry, the later process is the same as previous , if there is any exception happening ,the follower having the promotable server address will become the leader ,then syc for the promotable ,i dont know if what i am saying is correct or not ?

在 2017年4月19日星期三 UTC+8下午2:34:59，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 19, 2017, 7:02:48 AM4/19/17

to raft...@googlegroups.com

Yes, pretty much.

Terry Tan

unread,

Apr 20, 2017, 1:07:35 AM4/20/17

to raft-dev

But i have another question ,if the num of the memebers is four, as below

A topology (down): [a,b,c,d,e]

B topology: [a,b,c,d,e]

C topology: [a,b,c,d]

D topology: [a,b,c,d]

1 . if we have server E join the cluster, the leader received the request ,config locally, send config reqeust to server B ,B finished config and appended its log ,then leader is crashed. Given that the majority of the cluster is still 3, so server c and d will still get chance to be elected leader (ex. c votes itself and d votes for c = 2 ),

of course ,the server B will get this chance as well, let's say if c win the election, then the joining info will be

lost as well ,right?

2 for joint consensus,what i am confused is that why the leader needs to send old+new config to all the follwers first, i think for old config ,all the followers already have it,why dont just send new config ?

is it in case that the followers may not have the latest old config info?

在 2017年4月19日星期三 UTC+8下午7:02:48，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 20, 2017, 1:59:43 AM4/20/17

to raft...@googlegroups.com

Terry,

Yes, that is correct, and the new leader will then reset the log to match its own, and the new topology would be lost.

Terry Tan

unread,

Apr 20, 2017, 5:26:07 AM4/20/17

to raft-dev

Hi Ayende ,

Thank you so much for your reply, how about for the joint consensus, who sends join request to leader ? let's say we have topology as below

A topology  [a,b,c]

B topology: [a,b,c]

C topology: [a,b,c]

then we decide to scale out to be 5 in once , add two new servers (D and E) ,who will send the join request? D or E?

simply speaking ,what is the difference between single server joining and joint consensus(which says the process will include two phase,in which two entries will be committed)

在 2017年4月20日星期四 UTC+8下午1:59:43，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 20, 2017, 5:40:49 AM4/20/17

to raft...@googlegroups.com

Admin will join first D and then E

They will first be added as promotables, then the leader will move them to voters when they are full.

Terry Tan

unread,

Apr 21, 2017, 6:48:43 AM4/21/17

to raft-dev

Hi Ayende,

By admin, you mean leader ? if it is , as what you said , the process is one by one ,it is still single server joining. I have read source code from some guy on github, the joint consensus is the joining server makes config like below

if originally , we have

{s1,s2,s3}

then the joining server config like {s1,s2,s3,s4,s5} , this config will send to leader , leader will have two collections,one of which keeps the old config ,and one of which keeps the new config , the new config will be {s1,s2,s3,s4,s5} ,then it mark the local state as Cold_new. after that leader sent appendrequest(including the union collect of old and new ) to followers , follower will append entry ,take effect the new config immediately without confirming the comittment of the entry. leader received the response ,then commit entry ,change the status to new , reset old peers to new peers ,empty the new peers collection , But i dont know why he does it like this ,could you tell me why ? For vote part ,keeping old peers can handle votes from old peers ,if it is new peers ,use new rule (new quorum size ) ,why? let'say s1 is the leader ,and 2 new servers s4,s5 join the cluster ,then s1 partition for a while, so that s1 ,s4,s5 is still a cluster ,becuase use the new rule ,its quorum is 3, For s2,s3, they restart the election ,to have a new leader ,becuse they dont know s4,s5.

it seems this way can not prevent issue happening .

在 2017年4月20日星期四 UTC+8下午5:40:49，Ayende Rahien写道：

jordan.h...@gmail.com

unread,

Apr 21, 2017, 7:21:37 AM4/21/17

to raft...@googlegroups.com

This issue is covered in the original paper that proposed joint consensus. See figure 10 and the related sections. The gist is, during the first phase of the configuration change while joint consensus applies (from the paper):

Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations.

Terry Tan

unread,

Apr 22, 2017, 10:56:46 AM4/22/17

to raft-dev

Hi Jordan,

Thank you for you reply ,i have read that paper several times ,but i still can not understand gist,so that i raised question here , i restart to think about the process ,if the joining server did not get reply util that new config is appended and applied by majority of the followers ,maybe this issue will not happen,

i dont know if this is what raft paper chapter 6 means or not . If it is not ,could you share with me some detail about it ?

Oren Eini (Ayende Rahien)

unread,

Apr 23, 2017, 5:28:29 AM4/23/17

to raft...@googlegroups.com

No, admin is literally the cluster administrator, who initiate the join command to the cluster.

Terry Tan

unread,

Apr 23, 2017, 9:04:39 AM4/23/17

to raft-dev

Hi Ayende,

what is in C (old,new) entry replicated to the followers ,raft paper says it is a combination .

1. Is it an entry including an old collection like (s1,s2,s3) and a new collection like (s1,s3,s4)?

2.what does this mean (Once a given server adds the new configuration entry to its log, it uses that configuration for all future decisions (a server always uses the latest configuration in its log, regardless of whether the entry is committed).)

By new configuration in this sentence , this means the c(old new )configuration,the one includs both old and new config?

3.what does this mean (OnceCold,new has been committed, neitherCold norCnew can make decisions without approval of the other)

Does it mean that any entry must be approved by both old cluster and new cluster ,then it is able to be committed?

在 2017年4月23日星期日 UTC+8下午5:28:29，Ayende Rahien写道：

Oren Eini (Ayende Rahien)

unread,

Apr 24, 2017, 1:09:59 AM4/24/17

to raft...@googlegroups.com

1) You are talking about the more complex, any change allowed system.

If you limit it to a single concurrent change (add / remove one server at a time), this is _much_ easier to work with.

2) Yes

But the new configuration isn't committed, so it uses both, until it is committed.

3) that is only applicable until both cold and cnew has committed the entry, in which case, just cnew is applicable.

Reply all

Reply to author

Forward