[PATCH 2/7] test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace

4 views
Skip to first unread message

Kamil Braun

unread,
Sep 27, 2021, 11:31:12 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).
---
test/raft/randomized_nemesis_test.cc | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 4324e8f2d..fefc93c8f 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1793,6 +1793,8 @@ struct reconfiguration {
}
};

+namespace std {
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
@@ -1803,6 +1805,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {
return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {
--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:12 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
We introduce a new operation to the framework: `reconfiguration`.
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

We use a dedicated thread (similarly to the network partitioning thread)
to periodically perform random reconfigurations.

We also improve the liveness check at the end of the test. The
previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize; introducing
reconfigurations exposed this problem. The new check tries a few times
to find a leader and perform a request - until a time limit is reached.

Kamil Braun (7):
test: raft: randomized_nemesis_test: `reconfiguration` operation
test: raft: randomized_nemesis_test put `variant` and `monostate`
`ostream` `operator<<`s into `std` namespace
test: raft: randomized_nemesis_test: handle more error types
raft: introduce `raft::server::accepts_requests()`
test: raft: randomized_nemesis_test: improve the bouncing algorithm
test: raft: randomized_nemesis_test: take time_point instead of
duration in wait_for_leader
test: raft: randomized_nemesis_test: perform reconfigurations in
basic_generator_test

raft/fsm.hh | 3 +
raft/server.cc | 5 +
raft/server.hh | 4 +
test/raft/randomized_nemesis_test.cc | 287 +++++++++++++++++++++------
4 files changed, 243 insertions(+), 56 deletions(-)

--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:13 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
A server may be a leader but still throw `not_a_leader` when calling
`add_entry` on it if it's currently stepping down.
---
raft/fsm.hh | 3 +++
raft/server.cc | 5 +++++
raft/server.hh | 4 ++++
test/raft/randomized_nemesis_test.cc | 8 ++++++--
4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/raft/fsm.hh b/raft/fsm.hh
index c090f177d..463372678 100644
--- a/raft/fsm.hh
+++ b/raft/fsm.hh
@@ -353,6 +353,9 @@ class fsm {
bool is_leader() const {
return std::holds_alternative<leader>(_state);
}
+ bool accepts_requests() const {
+ return is_leader() && !leader_state().stepdown;
+ }
bool is_follower() const {
return std::holds_alternative<follower>(_state);
}
diff --git a/raft/server.cc b/raft/server.cc
index 4ce74e97d..524ad9567 100644
--- a/raft/server.cc
+++ b/raft/server.cc
@@ -87,6 +87,7 @@ class server_impl : public rpc_server, public server {
std::pair<index_t, term_t> log_last_idx_term() override;
void elapse_election() override;
bool is_leader() override;
+ bool accepts_requests() const override;
void tick() override;
future<> stepdown(logical_clock::duration timeout) override;
private:
@@ -1031,6 +1032,10 @@ bool server_impl::is_leader() {
return _fsm->is_leader();
}

+bool server_impl::accepts_requests() const {
+ return _fsm->accepts_requests();
+}
+
void server_impl::elapse_election() {
while (_fsm->election_elapsed() < ELECTION_TIMEOUT) {
_fsm->tick();
diff --git a/raft/server.hh b/raft/server.hh
index f8f20c922..f6d8b21a3 100644
--- a/raft/server.hh
+++ b/raft/server.hh
@@ -117,6 +117,10 @@ class server {
virtual void elapse_election() = 0;
virtual bool is_leader() = 0;
virtual void tick() = 0;
+ // A server may be a leader but still refuse to accept new entries if it's
+ // currently stepping down. Use `accepts_requests()` to check if `add_entry`
+ // or `set_configuration` is currently possible.
+ virtual bool accepts_requests() const = 0;
};

std::unique_ptr<server> create_server(server_id uuid, std::unique_ptr<rpc> rpc,
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index f11fb4311..9bce049f6 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1059,6 +1059,10 @@ class raft_server {
return _server->is_leader();
}

+ bool accepts_requests() const {
+ return _server->accepts_requests();
+ }
+
raft::server_id id() const {
return _id;
}
@@ -1341,7 +1345,7 @@ struct wait_for_leader {
co_return raft::server_id{};
}

- auto it = std::find_if(nodes.begin(), nodes.end(), [&env] (raft::server_id id) { return env->get_server(id).is_leader(); });
+ auto it = std::find_if(nodes.begin(), nodes.end(), [&env] (raft::server_id id) { return env->get_server(id).accepts_requests(); });
if (it != nodes.end()) {
co_return *it;
}
@@ -1351,7 +1355,7 @@ struct wait_for_leader {
}(env.weak_from_this(), std::move(nodes)));

assert(l != raft::server_id{});
- assert(env.get_server(l).is_leader());
+ assert(env.get_server(l).accepts_requests());

co_return l;
}
--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:13 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
With reconfigurations the `commit_status_unknown` error may start
appearing.
---
test/raft/randomized_nemesis_test.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index fefc93c8f..f11fb4311 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -210,7 +210,7 @@ raft::command make_command(const cmd_id_t& cmd_id, const Input& input) {

// TODO: handle other errors?
template <PureStateMachine M>
-using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry>;
+using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown>;

// Sends a given `input` as a command to `server`, waits until the command gets replicated
// and applied on that server and returns the produced output.
@@ -254,6 +254,8 @@ future<call_result_t<M>> call(
return make_ready_future<call_result_t<M>>(e);
} catch (raft::dropped_entry e) {
return make_ready_future<call_result_t<M>>(e);
+ } catch (raft::commit_status_unknown e) {
+ return make_ready_future<call_result_t<M>>(e);
} catch (logical_timer::timed_out<typename M::output_t> e) {
(void)e.get_future().discard_result()
.handle_exception([] (std::exception_ptr eptr) {
@@ -261,6 +263,8 @@ future<call_result_t<M>> call(
std::rethrow_exception(eptr);
} catch (const output_channel_dropped&) {
} catch (const raft::dropped_entry&) {
+ } catch (const raft::commit_status_unknown&) {
+ } catch (const raft::not_a_leader&) {
} catch (const raft::stopped_error&) {
}
});
@@ -919,6 +923,8 @@ future<reconfigure_result_t> reconfigure(
try {
std::rethrow_exception(eptr);
} catch (const raft::dropped_entry&) {
+ } catch (const raft::commit_status_unknown&) {
+ } catch (const raft::not_a_leader&) {
} catch (const raft::stopped_error&) {
}
});
--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:16 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.

The commit also improves the liveness check at the end of the test. The
previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize; introducing
reconfigurations exposed this problem. The new check tries a few times
to find a leader and perform a request - until a time limit is reached.
---
test/raft/randomized_nemesis_test.cc | 162 +++++++++++++++++++++------
1 file changed, 130 insertions(+), 32 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index a89a2de56..477c82c6b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1067,6 +1067,10 @@ class raft_server {
return _id;
}

+ raft::configuration get_configuration() const {
+ return _server->get_configuration();
+ }
+
void deliver(raft::server_id src, const typename rpc<typename M::state_t>::message_t& m) {
assert(_started);
_queue->push(src, m);
@@ -1355,7 +1359,9 @@ struct wait_for_leader {
}(env.weak_from_this(), std::move(nodes)));

assert(l != raft::server_id{});
- assert(env.get_server(l).accepts_requests());
+
+ // Note: `l` may no longer accept requests at this point if there was a yield at the `co_await` above
+ // and `l` decided to step down, was restarted, or just got removed from the configuration.

co_return l;
}
@@ -2038,10 +2044,17 @@ std::ostream& operator<<(std::ostream& os, const AppendReg::ret& r) {
return os << format("ret{{{}, {}}}", r.x, r.prev);
}

+namespace raft {
+std::ostream& operator<<(std::ostream& os, const raft::server_address& a) {
+ return os << a.id;
+}
+}
+
SEASTAR_TEST_CASE(basic_generator_test) {
using op_type = operation::invocable<operation::either_of<
raft_call<AppendReg>,
- network_majority_grudge<AppendReg>
+ network_majority_grudge<AppendReg>,
+ reconfiguration<AppendReg>
>>;
using history_t = utils::chunked_vector<std::variant<op_type, operation::completion<op_type>>>;

@@ -2068,39 +2081,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {
// Wait for the server to elect itself as a leader.
assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

+ size_t no_all_servers = 10;
+ std::vector<raft::server_id> all_servers{leader_id};
+ for (size_t i = 1; i < no_all_servers; ++i) {
+ all_servers.push_back(co_await env.new_server(false));
+ }

- size_t no_servers = 5;
- std::unordered_set<raft::server_id> servers{leader_id};
- for (size_t i = 1; i < no_servers; ++i) {
- servers.insert(co_await env.new_server(false));
+ size_t no_init_servers = 5;
+
+ // `known_config` represents the set of servers that may potentially be in the cluster configuration.
+ //
+ // It is not possible to determine in general what the 'true' current configuration is (if even such notion
+ // makes sense at all). Given a sequence of reconfiguration requests, assuming that all except possibly the last
+ // requests have finished, then:
+ // - if the last request has finished successfully, then the current configuration must be equal
+ // to the one chosen in the last request;
+ // - but if it hasn't finished yet, or it finished with a failure, the current configuration may contain servers
+ // from the one chosen in the last request or from the previously known set of servers.
+ //
+ // The situation is even worse considering that requests may never 'finish', i.e. we may never get a response
+ // to a reconfiguration request (in which case we eventually timeout). These requests may in theory execute
+ // at any point in the future. We take a practical approach when updating `known_config`: we assume
+ // that our timeouts for reconfiguration requests are large enough so that if a reconfiguration request
+ // has timed out, it has either already finished or it never will.
+ // TODO: this may not be true and we may end up with `known_config` that does not contain the current leader
+ // (not observed in practice yet though... I think) Come up with a better approach.
+ std::unordered_set<raft::server_id> known_config;
+
+ for (size_t i = 0; i < no_init_servers; ++i) {
+ known_config.insert(all_servers[i]);
}

assert(std::holds_alternative<std::monostate>(
co_await env.get_server(leader_id).reconfigure(
- std::vector<raft::server_id>{servers.begin(), servers.end()}, timer.now() + 100_t, timer)));
+ std::vector<raft::server_id>{known_config.begin(), known_config.end()}, timer.now() + 100_t, timer)));

- auto threads = operation::make_thread_set(servers.size() + 1);
+ auto threads = operation::make_thread_set(all_servers.size() + 2);
auto nemesis_thread = some(threads);

- auto seed = tests::random::get_int<int32_t>();
+ auto threads_without_nemesis = threads;
+ threads_without_nemesis.erase(nemesis_thread);
+
+ auto reconfig_thread = some(threads_without_nemesis);

- // TODO: make it dynamic based on the current configuration
- std::unordered_set<raft::server_id>& known = servers;
+ auto seed = tests::random::get_int<int32_t>();

raft_call<AppendReg>::state_type db_call_state {
.env = env,
- .known = known,
+ .known = known_config,
.timer = timer
};

network_majority_grudge<AppendReg>::state_type network_majority_grudge_state {
.env = env,
- .known = known,
+ .known = known_config,
.timer = timer,
.rnd = std::mt19937{seed}
};

- auto init_state = op_type::state_type{std::move(db_call_state), std::move(network_majority_grudge_state)};
+ reconfiguration<AppendReg>::state_type reconfiguration_state {
+ .all_servers = all_servers,
+ .env = env,
+ .known = known_config,
+ .timer = timer,
+ .rnd = std::mt19937{seed}
+ };
+
+ auto init_state = op_type::state_type{
+ std::move(db_call_state),
+ std::move(network_majority_grudge_state),
+ std::move(reconfiguration_state)
+ };

using namespace generator;

@@ -2123,11 +2174,16 @@ SEASTAR_TEST_CASE(basic_generator_test) {
return op_type{network_majority_grudge<AppendReg>{raft::logical_clock::duration{dist(engine)}}};
})
),
- stagger(seed, timer.now(), 0_t, 50_t,
- sequence(1, [] (int32_t i) {
- assert(i > 0);
- return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
- })
+ pin(reconfig_thread,
+ stagger(seed, timer.now() + 1000_t, 500_t, 500_t,
+ constant([] () { return op_type{reconfiguration<AppendReg>{500_t}}; })
+ ),
+ stagger(seed, timer.now(), 0_t, 50_t,
+ sequence(1, [] (int32_t i) {
+ assert(i > 0);
+ return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
+ })
+ )
)
)
);
@@ -2170,6 +2226,10 @@ SEASTAR_TEST_CASE(basic_generator_test) {
} else {
tlogger.debug("completion {}", c);
}
+
+ // TODO: check consistency of reconfiguration completions
+ // (there's not much to check, but for example: we should not get back `conf_change_in_progress`
+ // if our last reconfiguration was successful?).
}
};

@@ -2179,20 +2239,58 @@ SEASTAR_TEST_CASE(basic_generator_test) {
consistency_checker{}};
co_await interp.run();

- // All network partitions are healed, this should succeed:
- auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, timer.now() + 10000_t)
- .handle_exception_type([] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
- tlogger.error("Failed to find a leader after 10000 ticks at the end of test (network partitions are healed).");
- assert(false);
- });
+ tlogger.debug("Finished generator run, time: {}", timer.now());
+
+ // Liveness check: we must be able to obtain a final response after all the nemeses have stopped.
+ // Due to possible multiple leaders at this point and the cluster stabilizing after reconfigurations,
+ // we may need to try sending requests multiple times to different servers to obtain the last result.
+
+ auto limit = timer.now() + 10000_t;
+ size_t cnt = 0;
+ for (; timer.now() < limit; ++cnt) {
+ tlogger.debug("Trying to obtain last result: attempt number {}", cnt + 1);
+
+ auto now = timer.now();
+ auto leader = co_await wait_for_leader<AppendReg>{}(env,
+ std::vector<raft::server_id>{all_servers.begin(), all_servers.end()}, timer, limit)
+ .handle_exception_type([&timer, now] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
+ tlogger.error("Failed to find a leader after {} ticks at the end of test.", timer.now() - now);
+ assert(false);
+ });
+
+ if (env.get_server(leader).accepts_requests()) {
+ tlogger.debug("Leader {} found after {} ticks", leader, timer.now() - now);
+ } else {
+ tlogger.warn("Leader {} found after {} ticks, but suddenly lost leadership", leader, timer.now() - now);
+ continue;
+ }
+
+ auto config = env.get_server(leader).get_configuration();
+ tlogger.debug("Leader {} configuration: current {} previous {}", leader, config.current, config.previous);

- // Should also succeed
- auto last_res = co_await env.get_server(last_leader).call(AppendReg::append{-1}, timer.now() + 10000_t, timer);
- if (!std::holds_alternative<typename AppendReg::ret>(last_res)) {
- tlogger.error(
- "Expected success on the last call in the test (after electing a leader; network partitions are healed)."
- " Got: {}", last_res);
- assert(false);
+ for (auto& s: all_servers) {
+ auto& srv = env.get_server(s);
+ if (srv.is_leader() && s != leader) {
+ auto conf = srv.get_configuration();
+ tlogger.debug("There is another leader: {}, configuration: current {} previous {}", s, conf.current, conf.previous);
+ }
+ }
+
+ tlogger.debug("From the clients' point of view, the possible cluster members are: {}", known_config);
+
+ auto [res, last_attempted_server] = co_await bouncing{[&timer, &env] (raft::server_id id) {
+ return env.get_server(id).call(AppendReg::append{-1}, timer.now() + 200_t, timer);
+ }}(timer, known_config, leader, known_config.size() + 1, 10_t, 10_t);
+
+ if (std::holds_alternative<typename AppendReg::ret>(res)) {
+ tlogger.debug("Last result: {}", res);
+ co_return;
+ }
+
+ tlogger.warn("Failed to obtain last result at end of test: {} returned by {}", res, last_attempted_server);
}
+
+ tlogger.error("Failed to obtain a final successful response at the end of the test. Number of attempts: {}", cnt);
+ assert(false);
});
}
--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:18 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.
---
test/raft/randomized_nemesis_test.cc | 36 ++++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 9bce049f6..3334e6907 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1582,10 +1582,10 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
// The maximum number of calls until we give up is specified by `bounces`.
// The initial `raft::server_id` argument provided to `F` is specified as an argument
// to this function (`srv_id`). If the initial call returns `not_a_leader`, then:
-// - if the result contained a different leader ID, we will use it in the next call,
-// sleeping for `known_leader_delay` first,
+// - if the result contained a different leader ID and we didn't already try that ID,
+// we will use it in the next call, sleeping for `known_leader_delay` first,
// - otherwise we will take the next ID from the `known` set, sleeping for
-// `unknown_leader_delay` first.
+// `unknown_leader_delay` first; no ID will be tried twice.
// The returned result contains the result of the last call to `F` and the last
// server ID passed to `F`.
template <typename F>
@@ -1608,25 +1608,35 @@ struct bouncing {
raft::logical_clock::duration known_leader_delay,
raft::logical_clock::duration unknown_leader_delay
) {
- auto it = known.find(srv_id);
+ tlogger.trace("bouncing call: starting with {}", srv_id);
+ std::unordered_set<raft::server_id> tried;
while (true) {
auto res = co_await _f(srv_id);
+ tried.insert(srv_id);
+ known.erase(srv_id);

if (auto n_a_l = std::get_if<raft::not_a_leader>(&res); n_a_l && bounces) {
--bounces;
+
if (n_a_l->leader) {
assert(n_a_l->leader != srv_id);
- co_await timer.sleep(known_leader_delay);
- srv_id = n_a_l->leader;
- } else {
- co_await timer.sleep(unknown_leader_delay);
- assert(!known.empty());
- if (it == known.end() || ++it == known.end()) {
- it = known.begin();
+ if (!tried.contains(n_a_l->leader)) {
+ co_await timer.sleep(known_leader_delay);
+ srv_id = n_a_l->leader;
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}", srv_id);
+ continue;
+ }
+ }
+
+ if (!known.empty()) {
+ srv_id = *known.begin();
+ if (n_a_l->leader) {
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}, but already tried it; trying {}", n_a_l->leader, srv_id);
+ } else {
+ tlogger.trace("bouncing call: got `not_a_leader`, no reroute, trying {}", srv_id);
}
- srv_id = *it;
+ continue;
}
- continue;
}

co_return std::pair{res, srv_id};
--
2.31.1

Kamil Braun

unread,
Sep 27, 2021, 11:31:21 AMSep 27
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
---
test/raft/randomized_nemesis_test.cc | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 3334e6907..a89a2de56 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1329,7 +1329,7 @@ std::ostream& operator<<(std::ostream& os, const ExReg::exchange& e) {
return os << format("xng{{{}}}", e.x);
}

-// Wait until either one of `nodes` in `env` becomes a leader, or duration `d` expires according to `timer` (whichever happens first).
+// Wait until either one of `nodes` in `env` becomes a leader, or time point `timeout` is reached according to `timer` (whichever happens first).
// If the leader is found, returns it. Otherwise throws a `logical_timer::timed_out` exception.
template <PureStateMachine M>
struct wait_for_leader {
@@ -1338,8 +1338,8 @@ struct wait_for_leader {
environment<M>& env,
std::vector<raft::server_id> nodes,
logical_timer& timer,
- raft::logical_clock::duration d) {
- auto l = co_await timer.with_timeout(timer.now() + d, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
+ raft::logical_clock::time_point timeout) {
+ auto l = co_await timer.with_timeout(timeout, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
while (true) {
if (!env) {
co_return raft::server_id{};
@@ -1383,7 +1383,7 @@ SEASTAR_TEST_CASE(basic_test) {
auto leader_id = co_await env.new_server(true);

// Wait at most 1000 ticks for the server to elect itself as a leader.
- assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

auto call = [&] (ExReg::input_t input, raft::logical_clock::duration timeout) {
return env.get_server(leader_id).call(std::move(input), timer.now() + timeout, timer);
@@ -1457,7 +1457,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
// It's easier to catch the problem when we send entries one by one, not in batches.
.append_request_threshold = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(true,
raft::server::configuration{
@@ -1488,7 +1488,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
env.get_network().remove_grudge(id2, id1);
env.get_network().remove_grudge(id1, id2);

- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);

// Now the current term is greater than the term of the first couple of entries.
@@ -1538,7 +1538,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
.snapshot_threshold = 5,
.snapshot_trailing = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(false,
raft::server::configuration{
@@ -1571,7 +1571,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
env.get_network().remove_grudge(id1, id2);

// With the bug this would timeout, the cluster is unable to elect a leader without the configuration.
- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);
});
}
@@ -2066,7 +2066,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {
auto leader_id = co_await env.new_server(true);

// Wait for the server to elect itself as a leader.
- assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);


size_t no_servers = 5;
@@ -2180,7 +2180,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {
co_await interp.run();

// All network partitions are healed, this should succeed:
- auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, 10000_t)
+ auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, timer.now() + 10000_t)
.handle_exception_type([] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
tlogger.error("Failed to find a leader after 10000 ticks at the end of test (network partitions are healed).");
assert(false);
--
2.31.1

Gleb Natapov

unread,
Sep 29, 2021, 3:34:41 AMSep 29
to Kamil Braun, scylla...@googlegroups.com, kos...@scylladb.com
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
--
Gleb.

Kamil Braun

unread,
Sep 29, 2021, 6:18:18 AMSep 29
to Gleb Natapov, scylladb-dev, Konstantin Osipov
On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?

Kamil Braun

unread,
Sep 29, 2021, 6:19:55 AMSep 29
to Gleb Natapov, scylladb-dev, Konstantin Osipov
On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:


On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?
No, I just checked that the semaphore will become broken with `not_a_leader`.

So indeed, calling `server::add_entry` on a stepping down leader will cause `not_a_leader` to be thrown.
Meaning `accepts_requests` is still useful, right?

Kamil Braun

unread,
Sep 29, 2021, 6:21:02 AMSep 29
to Gleb Natapov, scylladb-dev, Konstantin Osipov
On Wed, Sep 29, 2021 at 12:19 PM Kamil Braun <kbr...@scylladb.com> wrote:


On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:


On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?
No, I just checked that the semaphore will become broken with `not_a_leader`.

So indeed, calling `server::add_entry` on a stepping down leader will cause `not_a_leader` to be thrown.
Meaning `accepts_requests` is still useful, right?
Perhaps we're talking about different throws: you're talking about the throw inside `fsm::add_entry`, I'm talking about the one coming from `server::add_entry`.

Gleb Natapov

unread,
Sep 29, 2021, 10:31:19 AMSep 29
to Kamil Braun, scylladb-dev, Konstantin Osipov
On Wed, Sep 29, 2021 at 12:20:50PM +0200, Kamil Braun wrote:
> On Wed, Sep 29, 2021 at 12:19 PM Kamil Braun <kbr...@scylladb.com> wrote:
>
> >
> >
> > On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:
> >
> >>
> >>
> >> On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
> >>
> >>> On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> >>> > A server may be a leader but still throw `not_a_leader` when calling
> >>> > `add_entry` on it if it's currently stepping down.
> >>>
> >>> The throw is for debugging fsm directly otherwise it could have been
> >>> assert. In reality a raft::server should never call fsm::add_entry()
> >>> when stepdown is true because log_limiter_semaphore should have no
> >>> units available.
> >>>
> >> What does it mean for the client when it calls `server::add_entry` on a
> >> stepping down leader? Won't it hang indefinitely on the semaphore?
> >>
> > No, I just checked that the semaphore will become broken with
> > `not_a_leader`.
> >
> > So indeed, calling `server::add_entry` on a stepping down leader will
> > cause `not_a_leader` to be thrown.
> > Meaning `accepts_requests` is still useful, right?
> >
> Perhaps we're talking about different throws: you're talking about the
> throw inside `fsm::add_entry`, I'm talking about the one coming from
> `server::add_entry`.
>
I do not understand. Yes not_a_leader will be thrown, but only when a
node will not be a leader. If stepdown is canceled not_a_leader will not
be thrown. server::add_entry is not guarantied to not throw not_a_leader
even if called on a leader and thus accepts_requests() is useless.
--
Gleb.

Kamil Braun

unread,
Sep 30, 2021, 10:46:21 AMSep 30
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
---
test/raft/randomized_nemesis_test.cc | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 00f598806..eb4d6071a 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1319,7 +1319,7 @@ std::ostream& operator<<(std::ostream& os, const ExReg::exchange& e) {
return os << format("xng{{{}}}", e.x);
}

-// Wait until either one of `nodes` in `env` becomes a leader, or duration `d` expires according to `timer` (whichever happens first).
+// Wait until either one of `nodes` in `env` becomes a leader, or time point `timeout` is reached according to `timer` (whichever happens first).
// If the leader is found, returns it. Otherwise throws a `logical_timer::timed_out` exception.
template <PureStateMachine M>
struct wait_for_leader {
@@ -1328,8 +1328,8 @@ struct wait_for_leader {
environment<M>& env,
std::vector<raft::server_id> nodes,
logical_timer& timer,
- raft::logical_clock::duration d) {
- auto l = co_await timer.with_timeout(timer.now() + d, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
+ raft::logical_clock::time_point timeout) {
+ auto l = co_await timer.with_timeout(timeout, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
while (true) {
if (!env) {
co_return raft::server_id{};
@@ -1373,7 +1373,7 @@ SEASTAR_TEST_CASE(basic_test) {
auto leader_id = co_await env.new_server(true);

// Wait at most 1000 ticks for the server to elect itself as a leader.
- assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

auto call = [&] (ExReg::input_t input, raft::logical_clock::duration timeout) {
return env.get_server(leader_id).call(std::move(input), timer.now() + timeout, timer);
@@ -1447,7 +1447,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
// It's easier to catch the problem when we send entries one by one, not in batches.
.append_request_threshold = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(true,
raft::server::configuration{
@@ -1478,7 +1478,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
env.get_network().remove_grudge(id2, id1);
env.get_network().remove_grudge(id1, id2);

- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);

// Now the current term is greater than the term of the first couple of entries.
@@ -1528,7 +1528,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
.snapshot_threshold = 5,
.snapshot_trailing = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(false,
raft::server::configuration{
@@ -1561,7 +1561,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
env.get_network().remove_grudge(id1, id2);

// With the bug this would timeout, the cluster is unable to elect a leader without the configuration.
- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);
});
}
@@ -1989,7 +1989,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {
auto leader_id = co_await env.new_server(true);

// Wait for the server to elect itself as a leader.
- assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);


size_t no_servers = 5;
@@ -2103,7 +2103,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {

Konstantin Osipov

unread,
Sep 30, 2021, 1:31:29 PMSep 30
to Kamil Braun, scylla...@googlegroups.com, gl...@scylladb.com
* Kamil Braun <kbr...@scylladb.com> [21/09/30 17:50]:

What's the motivation? Revealed in the next patch?
--
Konstantin Osipov, Moscow, Russia

Kamil Braun

unread,
Oct 5, 2021, 6:20:10 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
We introduce a new operation to the framework: `reconfiguration`.
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

We use a dedicated thread (similarly to the network partitioning thread)
to periodically perform random reconfigurations.

Kamil Braun (5):
test: raft: randomized_nemesis_test: `reconfiguration` operation
test: raft: randomized_nemesis_test put `variant` and `monostate`
`ostream` `operator<<`s into `std` namespace
test: raft: randomized_nemesis_test: handle more error types
test: raft: randomized_nemesis_test: improve the bouncing algorithm
test: raft: randomized_nemesis_test: perform reconfigurations in
basic_generator_test

test/raft/randomized_nemesis_test.cc | 209 ++++++++++++++++++++++-----
1 file changed, 173 insertions(+), 36 deletions(-)

---

GIT URL: https://github.com/kbr-/scylla/tree/reconfig-v2

v2:
- the improved liveness check that was introduced in the last commit of
v1 was separately merged into master. The last commit of v2 only
extends the check with some additional log messages regarding the
cluster configuration.
- remove `server::accepts_requests()`. The original intent of
introducing `accepts_requests()` was the idea that if a server is a
leader and is not stepping down, the next `add_entry` or
`set_configuration` call should succeed. That's not true in general,
however (many things can still happen concurrently). The improved
liveness check tries to obtain the last response multiple times
(there is no guarantee that the first time will succeed even though we
stopped all nemeses) and for that `is_leader()` is enough.

--
2.31.1

Kamil Braun

unread,
Oct 5, 2021, 6:20:11 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).
---
test/raft/randomized_nemesis_test.cc | 53 ++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 61e455cb8..fe6b95b67 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1752,6 +1752,59 @@ class network_majority_grudge {
}
};

+// Must be executed sequentially.
+template <PureStateMachine M>
+struct reconfiguration {
+ raft::logical_clock::duration timeout;
+
+ struct state_type {
+ const std::vector<raft::server_id> all_servers;
+ environment<M>& env;
+ // a subset of all_servers that we modify;
+ // the set of servers which may potentially be in the current configuration
+ std::unordered_set<raft::server_id>& known;
+ logical_timer& timer;
+ std::mt19937 rnd;
+ };
+
+ using result_type = reconfigure_result_t;
+
+ future<result_type> execute(state_type& s, const operation::context& ctx) {
+ assert(s.all_servers.size() > 1);
+ std::vector<raft::server_id> nodes{s.all_servers.begin(), s.all_servers.end()};
+
+ std::shuffle(nodes.begin(), nodes.end(), s.rnd);
+ nodes.resize(std::uniform_int_distribution<size_t>{1, nodes.size()}(s.rnd));
+
+ assert(s.known.size() > 0);
+ auto [res, last] = co_await bouncing{[&nodes, timeout = s.timer.now() + timeout, &timer = s.timer, &env = s.env] (raft::server_id id) {
+ return env.get_server(id).reconfigure(nodes, timeout, timer);
+ }}(s.timer, s.known, *s.known.begin(), 10, 10_t, 10_t);
+
+ std::visit(make_visitor(
+ [&, last = last] (std::monostate) {
+ tlogger.debug("reconfig successful from {} to {} by {}", s.known, nodes, last);
+ s.known = std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()};
+ // TODO: include the old leader as well in case it's not part of the new config?
+ // it may remain a leader for some time...
+ },
+ [&, last = last] (raft::not_a_leader& e) {
+ tlogger.debug("reconfig failed, not a leader: {} tried {} by {}", e, nodes, last);
+ },
+ [&, last = last] (auto& e) {
+ s.known.merge(std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()});
+ tlogger.debug("reconfig failed: {}, tried {} after merge {} by {}", e, nodes, s.known, last);
+ }
+ ), res);
+
+ co_return res;
+ }
+
+ friend std::ostream& operator<<(std::ostream& os, const reconfiguration& r) {
+ return os << format("reconfiguration{{timeout:{}}}", r.timeout);
+ }
+};
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
--
2.31.1

Kamil Braun

unread,
Oct 5, 2021, 6:20:12 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).
---
test/raft/randomized_nemesis_test.cc | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index fe6b95b67..78e07a79b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1805,6 +1805,8 @@ struct reconfiguration {
}
};

+namespace std {
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
@@ -1815,6 +1817,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {
return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {
--
2.31.1

Kamil Braun

unread,
Oct 5, 2021, 6:20:13 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
With reconfigurations the `commit_status_unknown` error may start
appearing.
---
test/raft/randomized_nemesis_test.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 78e07a79b..67e0fc93b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Kamil Braun

unread,
Oct 5, 2021, 6:20:14 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.
---
test/raft/randomized_nemesis_test.cc | 36 ++++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 67e0fc93b..65a7bb28c 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1590,10 +1590,10 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
// The maximum number of calls until we give up is specified by `bounces`.
// The initial `raft::server_id` argument provided to `F` is specified as an argument
// to this function (`srv_id`). If the initial call returns `not_a_leader`, then:
-// - if the result contained a different leader ID, we will use it in the next call,
-// sleeping for `known_leader_delay` first,
+// - if the result contained a different leader ID and we didn't already try that ID,
+// we will use it in the next call, sleeping for `known_leader_delay` first,
// - otherwise we will take the next ID from the `known` set, sleeping for
-// `unknown_leader_delay` first.
+// `unknown_leader_delay` first; no ID will be tried twice.
// The returned result contains the result of the last call to `F` and the last
// server ID passed to `F`.
template <typename F>
@@ -1616,25 +1616,35 @@ struct bouncing {
- srv_id = *it;
}
- continue;
+
+ if (!known.empty()) {
+ srv_id = *known.begin();
+ if (n_a_l->leader) {
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}, but already tried it; trying {}", n_a_l->leader, srv_id);
+ } else {
+ tlogger.trace("bouncing call: got `not_a_leader`, no reroute, trying {}", srv_id);
+ }
+ continue;
+ }

Kamil Braun

unread,
Oct 5, 2021, 6:20:16 AMOct 5
to scylla...@googlegroups.com, kos...@scylladb.com, gl...@scylladb.com, Kamil Braun
We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.
---
test/raft/randomized_nemesis_test.cc | 108 +++++++++++++++++++++------
1 file changed, 86 insertions(+), 22 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 65a7bb28c..a8538be33 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1063,6 +1063,10 @@ class raft_server {
return _id;
}

+ raft::configuration get_configuration() const {
+ return _server->get_configuration();
+ }
+
void deliver(raft::server_id src, const typename rpc<typename M::state_t>::message_t& m) {
assert(_started);
if (!_gate.is_closed()) {
@@ -2046,10 +2050,17 @@ std::ostream& operator<<(std::ostream& os, const AppendReg::ret& r) {
return os << format("ret{{{}, {}}}", r.x, r.prev);
}

+namespace raft {
+std::ostream& operator<<(std::ostream& os, const raft::server_address& a) {
+ return os << a.id;
+}
+}
+
SEASTAR_TEST_CASE(basic_generator_test) {
using op_type = operation::invocable<operation::either_of<
raft_call<AppendReg>,
- network_majority_grudge<AppendReg>
+ network_majority_grudge<AppendReg>,
+ reconfiguration<AppendReg>
>>;
using history_t = utils::chunked_vector<std::variant<op_type, operation::completion<op_type>>>;

@@ -2076,39 +2087,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {
// Wait for the server to elect itself as a leader.
assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

- // TODO: make it dynamic based on the current configuration
- std::unordered_set<raft::server_id>& known = servers;
+ auto reconfig_thread = some(threads_without_nemesis);
+
@@ -2131,11 +2180,16 @@ SEASTAR_TEST_CASE(basic_generator_test) {
return op_type{network_majority_grudge<AppendReg>{raft::logical_clock::duration{dist(engine)}}};
})
),
- stagger(seed, timer.now(), 0_t, 50_t,
- sequence(1, [] (int32_t i) {
- assert(i > 0);
- return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
- })
+ pin(reconfig_thread,
+ stagger(seed, timer.now() + 1000_t, 500_t, 500_t,
+ constant([] () { return op_type{reconfiguration<AppendReg>{500_t}}; })
+ ),
+ stagger(seed, timer.now(), 0_t, 50_t,
+ sequence(1, [] (int32_t i) {
+ assert(i > 0);
+ return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
+ })
+ )
)
)
);
@@ -2178,6 +2232,10 @@ SEASTAR_TEST_CASE(basic_generator_test) {
} else {
tlogger.debug("completion {}", c);
}
+
+ // TODO: check consistency of reconfiguration completions
+ // (there's not much to check, but for example: we should not get back `conf_change_in_progress`
+ // if our last reconfiguration was successful?).
}
};

@@ -2201,7 +2259,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {

auto now = timer.now();
auto leader = co_await wait_for_leader<AppendReg>{}(env,
- std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, limit)
+ std::vector<raft::server_id>{all_servers.begin(), all_servers.end()}, timer, limit)
.handle_exception_type([&timer, now] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
tlogger.error("Failed to find a leader after {} ticks at the end of test.", timer.now() - now);
assert(false);
@@ -2214,16 +2272,22 @@ SEASTAR_TEST_CASE(basic_generator_test) {
continue;
}

- for (auto& s: servers) {
+ auto config = env.get_server(leader).get_configuration();
+ tlogger.debug("Leader {} configuration: current {} previous {}", leader, config.current, config.previous);
+
+ for (auto& s: all_servers) {
auto& srv = env.get_server(s);
if (srv.is_leader() && s != leader) {
- tlogger.debug("There is another leader: {}", s);
+ auto conf = srv.get_configuration();
+ tlogger.debug("There is another leader: {}, configuration: current {} previous {}", s, conf.current, conf.previous);
}
}

+ tlogger.debug("From the clients' point of view, the possible cluster members are: {}", known_config);
+
auto [res, last_attempted_server] = co_await bouncing{[&timer, &env] (raft::server_id id) {
return env.get_server(id).call(AppendReg::append{-1}, timer.now() + 200_t, timer);
- }}(timer, known, leader, known.size() + 1, 10_t, 10_t);
+ }}(timer, known_config, leader, known_config.size() + 1, 10_t, 10_t);

if (std::holds_alternative<typename AppendReg::ret>(res)) {
tlogger.debug("Last result: {}", res);
--
2.31.1

Kamil Braun

unread,
Oct 21, 2021, 5:19:48 AM (21 hours ago) Oct 21
to scylladb-dev, Konstantin Osipov, Gleb Natapov
P I N G
Reply all
Reply to author
Forward
0 new messages