Hi,
I am new to this group,
so please let me know if this question has been asked before. And
pardon me if it is a stupid question. I am quite confused by the
following scenario.
0. Initial State
C = {S1, S2, S3}, master = S1.
S1 [C]
S2 [C]
S3 [C]
1. Adding one member
D = {S1, S2, S3, S4}, master = S1
S1 [C, D]
S2 [C]
S3 [C]
S4 [C, D]
From the thesis, S1 adopts D once it is appended and starts the replication of D to S4.
2. Adding another member before step 1 returns D has been committed
E = {S1, S2, S3, S4, S5}, master = S1, still waiting for replication of D to S2 and S3, but network is partitioned
S1 [C, D, E]
S2 [C]
S3 [C]
S4 [C, D, E]
S5 [C, D, E]
Again, S1 takes E immediately after it is appended; network partition for S2 and S3 continues.
3. Master election for a new term
S1
went down and quickly restarted. S5 starts an election for a new term
and gets the votes from {S1, S4, S5}. At the same time, S2 detects the
network partition and starts a new election and gets the votes from {S2,
S3}. Both S5 and S3 become the master of the same term and continue
serving requests.
Did I miss anything? The
thesis says "further configuration change can be started" (after a
configuration is committed) which I assume concurrent configuration is
not considered which would avoid the above scenario. On the other hand,
the discussion in
https://groups.google.com/g/raft-dev/c/t4xj6dJTP6E/m/d2D9LrWRza8J
assumes concurrent configuration change in all the examples so I am
assuming it is allowed. The fix seems unrelated to the above scenario
though.
Thanks a lot.