[PATCH 2/7] test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:12 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).
---
test/raft/randomized_nemesis_test.cc | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 4324e8f2d..fefc93c8f 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1793,6 +1793,8 @@ struct reconfiguration {
}
};

+namespace std {
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
@@ -1803,6 +1805,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {
return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:12 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

We introduce a new operation to the framework: `reconfiguration`.
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

We use a dedicated thread (similarly to the network partitioning thread)
to periodically perform random reconfigurations.

We also improve the liveness check at the end of the test. The
previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize; introducing
reconfigurations exposed this problem. The new check tries a few times
to find a leader and perform a request - until a time limit is reached.

Kamil Braun (7):
test: raft: randomized_nemesis_test: `reconfiguration` operation
test: raft: randomized_nemesis_test put `variant` and `monostate`
`ostream` `operator<<`s into `std` namespace
test: raft: randomized_nemesis_test: handle more error types
raft: introduce `raft::server::accepts_requests()`
test: raft: randomized_nemesis_test: improve the bouncing algorithm
test: raft: randomized_nemesis_test: take time_point instead of
duration in wait_for_leader
test: raft: randomized_nemesis_test: perform reconfigurations in
basic_generator_test

raft/fsm.hh | 3 +
raft/server.cc | 5 +
raft/server.hh | 4 +
test/raft/randomized_nemesis_test.cc | 287 +++++++++++++++++++++------
4 files changed, 243 insertions(+), 56 deletions(-)

--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:13 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

A server may be a leader but still throw `not_a_leader` when calling
`add_entry` on it if it's currently stepping down.
---
raft/fsm.hh | 3 +++
raft/server.cc | 5 +++++
raft/server.hh | 4 ++++
test/raft/randomized_nemesis_test.cc | 8 ++++++--
4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/raft/fsm.hh b/raft/fsm.hh
index c090f177d..463372678 100644
--- a/raft/fsm.hh
+++ b/raft/fsm.hh
@@ -353,6 +353,9 @@ class fsm {
bool is_leader() const {
return std::holds_alternative<leader>(_state);
}
+ bool accepts_requests() const {
+ return is_leader() && !leader_state().stepdown;
+ }
bool is_follower() const {
return std::holds_alternative<follower>(_state);
}
diff --git a/raft/server.cc b/raft/server.cc
index 4ce74e97d..524ad9567 100644
--- a/raft/server.cc
+++ b/raft/server.cc
@@ -87,6 +87,7 @@ class server_impl : public rpc_server, public server {
std::pair<index_t, term_t> log_last_idx_term() override;
void elapse_election() override;
bool is_leader() override;
+ bool accepts_requests() const override;
void tick() override;
future<> stepdown(logical_clock::duration timeout) override;
private:
@@ -1031,6 +1032,10 @@ bool server_impl::is_leader() {
return _fsm->is_leader();
}

+bool server_impl::accepts_requests() const {
+ return _fsm->accepts_requests();
+}
+
void server_impl::elapse_election() {
while (_fsm->election_elapsed() < ELECTION_TIMEOUT) {
_fsm->tick();
diff --git a/raft/server.hh b/raft/server.hh
index f8f20c922..f6d8b21a3 100644
--- a/raft/server.hh
+++ b/raft/server.hh
@@ -117,6 +117,10 @@ class server {
virtual void elapse_election() = 0;
virtual bool is_leader() = 0;
virtual void tick() = 0;
+ // A server may be a leader but still refuse to accept new entries if it's
+ // currently stepping down. Use `accepts_requests()` to check if `add_entry`
+ // or `set_configuration` is currently possible.
+ virtual bool accepts_requests() const = 0;
};

std::unique_ptr<server> create_server(server_id uuid, std::unique_ptr<rpc> rpc,
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index f11fb4311..9bce049f6 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1059,6 +1059,10 @@ class raft_server {
return _server->is_leader();
}

+ bool accepts_requests() const {
+ return _server->accepts_requests();
+ }
+
raft::server_id id() const {
return _id;
}
@@ -1341,7 +1345,7 @@ struct wait_for_leader {
co_return raft::server_id{};
}

- auto it = std::find_if(nodes.begin(), nodes.end(), [&env] (raft::server_id id) { return env->get_server(id).is_leader(); });
+ auto it = std::find_if(nodes.begin(), nodes.end(), [&env] (raft::server_id id) { return env->get_server(id).accepts_requests(); });
if (it != nodes.end()) {
co_return *it;
}
@@ -1351,7 +1355,7 @@ struct wait_for_leader {
}(env.weak_from_this(), std::move(nodes)));

assert(l != raft::server_id{});
- assert(env.get_server(l).is_leader());
+ assert(env.get_server(l).accepts_requests());

co_return l;
}
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:13 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

With reconfigurations the `commit_status_unknown` error may start
appearing.
---
test/raft/randomized_nemesis_test.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index fefc93c8f..f11fb4311 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -210,7 +210,7 @@ raft::command make_command(const cmd_id_t& cmd_id, const Input& input) {

// TODO: handle other errors?
template <PureStateMachine M>
-using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry>;
+using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown>;

// Sends a given `input` as a command to `server`, waits until the command gets replicated
// and applied on that server and returns the produced output.
@@ -254,6 +254,8 @@ future<call_result_t<M>> call(
return make_ready_future<call_result_t<M>>(e);
} catch (raft::dropped_entry e) {
return make_ready_future<call_result_t<M>>(e);
+ } catch (raft::commit_status_unknown e) {
+ return make_ready_future<call_result_t<M>>(e);
} catch (logical_timer::timed_out<typename M::output_t> e) {
(void)e.get_future().discard_result()
.handle_exception([] (std::exception_ptr eptr) {
@@ -261,6 +263,8 @@ future<call_result_t<M>> call(
std::rethrow_exception(eptr);
} catch (const output_channel_dropped&) {
} catch (const raft::dropped_entry&) {
+ } catch (const raft::commit_status_unknown&) {
+ } catch (const raft::not_a_leader&) {
} catch (const raft::stopped_error&) {
}
});
@@ -919,6 +923,8 @@ future<reconfigure_result_t> reconfigure(
try {
std::rethrow_exception(eptr);
} catch (const raft::dropped_entry&) {
+ } catch (const raft::commit_status_unknown&) {
+ } catch (const raft::not_a_leader&) {
} catch (const raft::stopped_error&) {
}
});
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:16 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.

The commit also improves the liveness check at the end of the test. The

previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize; introducing
reconfigurations exposed this problem. The new check tries a few times
to find a leader and perform a request - until a time limit is reached.

---
test/raft/randomized_nemesis_test.cc | 162 +++++++++++++++++++++------
1 file changed, 130 insertions(+), 32 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index a89a2de56..477c82c6b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1067,6 +1067,10 @@ class raft_server {
return _id;
}

+ raft::configuration get_configuration() const {
+ return _server->get_configuration();
+ }
+
void deliver(raft::server_id src, const typename rpc<typename M::state_t>::message_t& m) {
assert(_started);
_queue->push(src, m);
@@ -1355,7 +1359,9 @@ struct wait_for_leader {

}(env.weak_from_this(), std::move(nodes)));

assert(l != raft::server_id{});

- assert(env.get_server(l).accepts_requests());
+
+ // Note: `l` may no longer accept requests at this point if there was a yield at the `co_await` above
+ // and `l` decided to step down, was restarted, or just got removed from the configuration.

co_return l;
}
@@ -2038,10 +2044,17 @@ std::ostream& operator<<(std::ostream& os, const AppendReg::ret& r) {
return os << format("ret{{{}, {}}}", r.x, r.prev);
}

+namespace raft {
+std::ostream& operator<<(std::ostream& os, const raft::server_address& a) {
+ return os << a.id;
+}
+}
+
SEASTAR_TEST_CASE(basic_generator_test) {
using op_type = operation::invocable<operation::either_of<
raft_call<AppendReg>,
- network_majority_grudge<AppendReg>
+ network_majority_grudge<AppendReg>,
+ reconfiguration<AppendReg>
>>;
using history_t = utils::chunked_vector<std::variant<op_type, operation::completion<op_type>>>;

@@ -2068,39 +2081,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {
// Wait for the server to elect itself as a leader.
assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

+ size_t no_all_servers = 10;
+ std::vector<raft::server_id> all_servers{leader_id};
+ for (size_t i = 1; i < no_all_servers; ++i) {
+ all_servers.push_back(co_await env.new_server(false));
+ }

- size_t no_servers = 5;
- std::unordered_set<raft::server_id> servers{leader_id};
- for (size_t i = 1; i < no_servers; ++i) {
- servers.insert(co_await env.new_server(false));
+ size_t no_init_servers = 5;
+
+ // `known_config` represents the set of servers that may potentially be in the cluster configuration.
+ //
+ // It is not possible to determine in general what the 'true' current configuration is (if even such notion
+ // makes sense at all). Given a sequence of reconfiguration requests, assuming that all except possibly the last
+ // requests have finished, then:
+ // - if the last request has finished successfully, then the current configuration must be equal
+ // to the one chosen in the last request;
+ // - but if it hasn't finished yet, or it finished with a failure, the current configuration may contain servers
+ // from the one chosen in the last request or from the previously known set of servers.
+ //
+ // The situation is even worse considering that requests may never 'finish', i.e. we may never get a response
+ // to a reconfiguration request (in which case we eventually timeout). These requests may in theory execute
+ // at any point in the future. We take a practical approach when updating `known_config`: we assume
+ // that our timeouts for reconfiguration requests are large enough so that if a reconfiguration request
+ // has timed out, it has either already finished or it never will.
+ // TODO: this may not be true and we may end up with `known_config` that does not contain the current leader
+ // (not observed in practice yet though... I think) Come up with a better approach.
+ std::unordered_set<raft::server_id> known_config;
+
+ for (size_t i = 0; i < no_init_servers; ++i) {
+ known_config.insert(all_servers[i]);
}

assert(std::holds_alternative<std::monostate>(
co_await env.get_server(leader_id).reconfigure(
- std::vector<raft::server_id>{servers.begin(), servers.end()}, timer.now() + 100_t, timer)));
+ std::vector<raft::server_id>{known_config.begin(), known_config.end()}, timer.now() + 100_t, timer)));

- auto threads = operation::make_thread_set(servers.size() + 1);
+ auto threads = operation::make_thread_set(all_servers.size() + 2);
auto nemesis_thread = some(threads);

- auto seed = tests::random::get_int<int32_t>();
+ auto threads_without_nemesis = threads;
+ threads_without_nemesis.erase(nemesis_thread);
+
+ auto reconfig_thread = some(threads_without_nemesis);

- // TODO: make it dynamic based on the current configuration
- std::unordered_set<raft::server_id>& known = servers;
+ auto seed = tests::random::get_int<int32_t>();

raft_call<AppendReg>::state_type db_call_state {
.env = env,
- .known = known,
+ .known = known_config,
.timer = timer
};

network_majority_grudge<AppendReg>::state_type network_majority_grudge_state {
.env = env,
- .known = known,
+ .known = known_config,
.timer = timer,
.rnd = std::mt19937{seed}
};

- auto init_state = op_type::state_type{std::move(db_call_state), std::move(network_majority_grudge_state)};
+ reconfiguration<AppendReg>::state_type reconfiguration_state {
+ .all_servers = all_servers,
+ .env = env,
+ .known = known_config,
+ .timer = timer,
+ .rnd = std::mt19937{seed}
+ };
+
+ auto init_state = op_type::state_type{
+ std::move(db_call_state),
+ std::move(network_majority_grudge_state),
+ std::move(reconfiguration_state)
+ };

using namespace generator;

@@ -2123,11 +2174,16 @@ SEASTAR_TEST_CASE(basic_generator_test) {
return op_type{network_majority_grudge<AppendReg>{raft::logical_clock::duration{dist(engine)}}};
})
),
- stagger(seed, timer.now(), 0_t, 50_t,
- sequence(1, [] (int32_t i) {
- assert(i > 0);
- return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
- })
+ pin(reconfig_thread,
+ stagger(seed, timer.now() + 1000_t, 500_t, 500_t,
+ constant([] () { return op_type{reconfiguration<AppendReg>{500_t}}; })
+ ),
+ stagger(seed, timer.now(), 0_t, 50_t,
+ sequence(1, [] (int32_t i) {
+ assert(i > 0);
+ return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
+ })
+ )
)
)
);
@@ -2170,6 +2226,10 @@ SEASTAR_TEST_CASE(basic_generator_test) {
} else {
tlogger.debug("completion {}", c);
}
+
+ // TODO: check consistency of reconfiguration completions
+ // (there's not much to check, but for example: we should not get back `conf_change_in_progress`
+ // if our last reconfiguration was successful?).
}
};

@@ -2179,20 +2239,58 @@ SEASTAR_TEST_CASE(basic_generator_test) {
consistency_checker{}};
co_await interp.run();

- // All network partitions are healed, this should succeed:
- auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, timer.now() + 10000_t)
- .handle_exception_type([] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
- tlogger.error("Failed to find a leader after 10000 ticks at the end of test (network partitions are healed).");
- assert(false);
- });
+ tlogger.debug("Finished generator run, time: {}", timer.now());
+
+ // Liveness check: we must be able to obtain a final response after all the nemeses have stopped.
+ // Due to possible multiple leaders at this point and the cluster stabilizing after reconfigurations,
+ // we may need to try sending requests multiple times to different servers to obtain the last result.
+
+ auto limit = timer.now() + 10000_t;
+ size_t cnt = 0;
+ for (; timer.now() < limit; ++cnt) {
+ tlogger.debug("Trying to obtain last result: attempt number {}", cnt + 1);
+
+ auto now = timer.now();
+ auto leader = co_await wait_for_leader<AppendReg>{}(env,
+ std::vector<raft::server_id>{all_servers.begin(), all_servers.end()}, timer, limit)
+ .handle_exception_type([&timer, now] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {
+ tlogger.error("Failed to find a leader after {} ticks at the end of test.", timer.now() - now);
+ assert(false);
+ });
+
+ if (env.get_server(leader).accepts_requests()) {
+ tlogger.debug("Leader {} found after {} ticks", leader, timer.now() - now);
+ } else {
+ tlogger.warn("Leader {} found after {} ticks, but suddenly lost leadership", leader, timer.now() - now);
+ continue;
+ }
+
+ auto config = env.get_server(leader).get_configuration();
+ tlogger.debug("Leader {} configuration: current {} previous {}", leader, config.current, config.previous);

- // Should also succeed
- auto last_res = co_await env.get_server(last_leader).call(AppendReg::append{-1}, timer.now() + 10000_t, timer);
- if (!std::holds_alternative<typename AppendReg::ret>(last_res)) {
- tlogger.error(
- "Expected success on the last call in the test (after electing a leader; network partitions are healed)."
- " Got: {}", last_res);
- assert(false);
+ for (auto& s: all_servers) {
+ auto& srv = env.get_server(s);
+ if (srv.is_leader() && s != leader) {
+ auto conf = srv.get_configuration();
+ tlogger.debug("There is another leader: {}, configuration: current {} previous {}", s, conf.current, conf.previous);
+ }
+ }
+
+ tlogger.debug("From the clients' point of view, the possible cluster members are: {}", known_config);
+
+ auto [res, last_attempted_server] = co_await bouncing{[&timer, &env] (raft::server_id id) {
+ return env.get_server(id).call(AppendReg::append{-1}, timer.now() + 200_t, timer);
+ }}(timer, known_config, leader, known_config.size() + 1, 10_t, 10_t);
+
+ if (std::holds_alternative<typename AppendReg::ret>(res)) {
+ tlogger.debug("Last result: {}", res);
+ co_return;
+ }
+
+ tlogger.warn("Failed to obtain last result at end of test: {} returned by {}", res, last_attempted_server);
}
+
+ tlogger.error("Failed to obtain a final successful response at the end of the test. Number of attempts: {}", cnt);
+ assert(false);
});
}
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:18 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.
---
test/raft/randomized_nemesis_test.cc | 36 ++++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 9bce049f6..3334e6907 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1582,10 +1582,10 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
// The maximum number of calls until we give up is specified by `bounces`.
// The initial `raft::server_id` argument provided to `F` is specified as an argument
// to this function (`srv_id`). If the initial call returns `not_a_leader`, then:
-// - if the result contained a different leader ID, we will use it in the next call,
-// sleeping for `known_leader_delay` first,
+// - if the result contained a different leader ID and we didn't already try that ID,
+// we will use it in the next call, sleeping for `known_leader_delay` first,
// - otherwise we will take the next ID from the `known` set, sleeping for
-// `unknown_leader_delay` first.
+// `unknown_leader_delay` first; no ID will be tried twice.
// The returned result contains the result of the last call to `F` and the last
// server ID passed to `F`.
template <typename F>
@@ -1608,25 +1608,35 @@ struct bouncing {
raft::logical_clock::duration known_leader_delay,
raft::logical_clock::duration unknown_leader_delay
) {
- auto it = known.find(srv_id);
+ tlogger.trace("bouncing call: starting with {}", srv_id);
+ std::unordered_set<raft::server_id> tried;
while (true) {
auto res = co_await _f(srv_id);
+ tried.insert(srv_id);
+ known.erase(srv_id);

if (auto n_a_l = std::get_if<raft::not_a_leader>(&res); n_a_l && bounces) {
--bounces;
+
if (n_a_l->leader) {
assert(n_a_l->leader != srv_id);
- co_await timer.sleep(known_leader_delay);
- srv_id = n_a_l->leader;
- } else {
- co_await timer.sleep(unknown_leader_delay);
- assert(!known.empty());
- if (it == known.end() || ++it == known.end()) {
- it = known.begin();
+ if (!tried.contains(n_a_l->leader)) {
+ co_await timer.sleep(known_leader_delay);
+ srv_id = n_a_l->leader;
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}", srv_id);

+ continue;
+ }
+ }
+

+ if (!known.empty()) {
+ srv_id = *known.begin();
+ if (n_a_l->leader) {
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}, but already tried it; trying {}", n_a_l->leader, srv_id);
+ } else {
+ tlogger.trace("bouncing call: got `not_a_leader`, no reroute, trying {}", srv_id);
}
- srv_id = *it;
+ continue;
}
- continue;
}

co_return std::pair{res, srv_id};
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 27, 2021, 11:31:21 AM9/27/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

---
test/raft/randomized_nemesis_test.cc | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 3334e6907..a89a2de56 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1329,7 +1329,7 @@ std::ostream& operator<<(std::ostream& os, const ExReg::exchange& e) {
return os << format("xng{{{}}}", e.x);
}

-// Wait until either one of `nodes` in `env` becomes a leader, or duration `d` expires according to `timer` (whichever happens first).
+// Wait until either one of `nodes` in `env` becomes a leader, or time point `timeout` is reached according to `timer` (whichever happens first).
// If the leader is found, returns it. Otherwise throws a `logical_timer::timed_out` exception.
template <PureStateMachine M>
struct wait_for_leader {
@@ -1338,8 +1338,8 @@ struct wait_for_leader {
environment<M>& env,
std::vector<raft::server_id> nodes,
logical_timer& timer,
- raft::logical_clock::duration d) {
- auto l = co_await timer.with_timeout(timer.now() + d, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
+ raft::logical_clock::time_point timeout) {
+ auto l = co_await timer.with_timeout(timeout, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
while (true) {
if (!env) {
co_return raft::server_id{};
@@ -1383,7 +1383,7 @@ SEASTAR_TEST_CASE(basic_test) {
auto leader_id = co_await env.new_server(true);

// Wait at most 1000 ticks for the server to elect itself as a leader.
- assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

auto call = [&] (ExReg::input_t input, raft::logical_clock::duration timeout) {
return env.get_server(leader_id).call(std::move(input), timer.now() + timeout, timer);
@@ -1457,7 +1457,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
// It's easier to catch the problem when we send entries one by one, not in batches.
.append_request_threshold = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(true,
raft::server::configuration{
@@ -1488,7 +1488,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {
env.get_network().remove_grudge(id2, id1);
env.get_network().remove_grudge(id1, id2);

- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);

// Now the current term is greater than the term of the first couple of entries.
@@ -1538,7 +1538,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
.snapshot_threshold = 5,
.snapshot_trailing = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(false,
raft::server::configuration{
@@ -1571,7 +1571,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
env.get_network().remove_grudge(id1, id2);

// With the bug this would timeout, the cluster is unable to elect a leader without the configuration.
- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);
});
}
@@ -2066,7 +2066,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {
auto leader_id = co_await env.new_server(true);

// Wait for the server to elect itself as a leader.

- assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

size_t no_servers = 5;
@@ -2180,7 +2180,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {
co_await interp.run();

// All network partitions are healed, this should succeed:

- auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, 10000_t)
+ auto last_leader = co_await wait_for_leader<AppendReg>{}(env, std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, timer.now() + 10000_t)

.handle_exception_type([] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {

tlogger.error("Failed to find a leader after 10000 ticks at the end of test (network partitions are healed).");

assert(false);
--
2.31.1

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 29, 2021, 3:34:41 AM9/29/21

to Kamil Braun, scylladb-dev@googlegroups.com, kostja@scylladb.com

On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.

--
Gleb.

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 29, 2021, 6:18:18 AM9/29/21

to Gleb Natapov, scylladb-dev, Konstantin Osipov

On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:

On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.

What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 29, 2021, 6:19:55 AM9/29/21

to Gleb Natapov, scylladb-dev, Konstantin Osipov

On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:

On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?

No, I just checked that the semaphore will become broken with `not_a_leader`.

So indeed, calling `server::add_entry` on a stepping down leader will cause `not_a_leader` to be thrown.

Meaning `accepts_requests` is still useful, right?

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 29, 2021, 6:21:02 AM9/29/21

to Gleb Natapov, scylladb-dev, Konstantin Osipov

On Wed, Sep 29, 2021 at 12:19 PM Kamil Braun <kbr...@scylladb.com> wrote:

On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:

On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> A server may be a leader but still throw `not_a_leader` when calling
> `add_entry` on it if it's currently stepping down.

The throw is for debugging fsm directly otherwise it could have been
assert. In reality a raft::server should never call fsm::add_entry()
when stepdown is true because log_limiter_semaphore should have no
units available.
What does it mean for the client when it calls `server::add_entry` on a stepping down leader? Won't it hang indefinitely on the semaphore?
No, I just checked that the semaphore will become broken with `not_a_leader`.

So indeed, calling `server::add_entry` on a stepping down leader will cause `not_a_leader` to be thrown.
Meaning `accepts_requests` is still useful, right?

Perhaps we're talking about different throws: you're talking about the throw inside `fsm::add_entry`, I'm talking about the one coming from `server::add_entry`.

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 29, 2021, 10:31:19 AM9/29/21

to Kamil Braun, scylladb-dev, Konstantin Osipov

On Wed, Sep 29, 2021 at 12:20:50PM +0200, Kamil Braun wrote:
> On Wed, Sep 29, 2021 at 12:19 PM Kamil Braun <kbr...@scylladb.com> wrote:
>
> >
> >
> > On Wed, Sep 29, 2021 at 12:18 PM Kamil Braun <kbr...@scylladb.com> wrote:
> >
> >>
> >>
> >> On Wed, Sep 29, 2021 at 9:34 AM Gleb Natapov <gl...@scylladb.com> wrote:
> >>
> >>> On Mon, Sep 27, 2021 at 05:30:51PM +0200, Kamil Braun wrote:
> >>> > A server may be a leader but still throw `not_a_leader` when calling
> >>> > `add_entry` on it if it's currently stepping down.
> >>>
> >>> The throw is for debugging fsm directly otherwise it could have been
> >>> assert. In reality a raft::server should never call fsm::add_entry()
> >>> when stepdown is true because log_limiter_semaphore should have no
> >>> units available.
> >>>
> >> What does it mean for the client when it calls `server::add_entry` on a
> >> stepping down leader? Won't it hang indefinitely on the semaphore?
> >>
> > No, I just checked that the semaphore will become broken with
> > `not_a_leader`.
> >
> > So indeed, calling `server::add_entry` on a stepping down leader will
> > cause `not_a_leader` to be thrown.
> > Meaning `accepts_requests` is still useful, right?
> >
> Perhaps we're talking about different throws: you're talking about the
> throw inside `fsm::add_entry`, I'm talking about the one coming from
> `server::add_entry`.
>

I do not understand. Yes not_a_leader will be thrown, but only when a
node will not be a leader. If stepdown is canceled not_a_leader will not
be thrown. server::add_entry is not guarantied to not throw not_a_leader
even if called on a leader and thus accepts_requests() is useless.

--
Gleb.

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 30, 2021, 10:46:21 AM9/30/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

---
test/raft/randomized_nemesis_test.cc | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 00f598806..eb4d6071a 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1319,7 +1319,7 @@ std::ostream& operator<<(std::ostream& os, const ExReg::exchange& e) {

return os << format("xng{{{}}}", e.x);
}

-// Wait until either one of `nodes` in `env` becomes a leader, or duration `d` expires according to `timer` (whichever happens first).
+// Wait until either one of `nodes` in `env` becomes a leader, or time point `timeout` is reached according to `timer` (whichever happens first).
// If the leader is found, returns it. Otherwise throws a `logical_timer::timed_out` exception.
template <PureStateMachine M>
struct wait_for_leader {

@@ -1328,8 +1328,8 @@ struct wait_for_leader {

environment<M>& env,
std::vector<raft::server_id> nodes,
logical_timer& timer,
- raft::logical_clock::duration d) {
- auto l = co_await timer.with_timeout(timer.now() + d, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
+ raft::logical_clock::time_point timeout) {
+ auto l = co_await timer.with_timeout(timeout, [] (weak_ptr<environment<M>> env, std::vector<raft::server_id> nodes) -> future<raft::server_id> {
while (true) {
if (!env) {
co_return raft::server_id{};

@@ -1373,7 +1373,7 @@ SEASTAR_TEST_CASE(basic_test) {

auto leader_id = co_await env.new_server(true);

// Wait at most 1000 ticks for the server to elect itself as a leader.
- assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<ExReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

auto call = [&] (ExReg::input_t input, raft::logical_clock::duration timeout) {
return env.get_server(leader_id).call(std::move(input), timer.now() + timeout, timer);

@@ -1447,7 +1447,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {

// It's easier to catch the problem when we send entries one by one, not in batches.
.append_request_threshold = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(true,
raft::server::configuration{

@@ -1478,7 +1478,7 @@ SEASTAR_TEST_CASE(snapshot_uses_correct_term_test) {

env.get_network().remove_grudge(id2, id1);
env.get_network().remove_grudge(id1, id2);

- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);

// Now the current term is greater than the term of the first couple of entries.

@@ -1528,7 +1528,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {

.snapshot_threshold = 5,
.snapshot_trailing = 1,
});
- assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, 1000_t) == id1);
+ assert(co_await wait_for_leader<ExReg>{}(env, {id1}, timer, timer.now() + 1000_t) == id1);

auto id2 = co_await env.new_server(false,
raft::server::configuration{

@@ -1561,7 +1561,7 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {

env.get_network().remove_grudge(id1, id2);

// With the bug this would timeout, the cluster is unable to elect a leader without the configuration.
- auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, 1000_t);
+ auto l = co_await wait_for_leader<ExReg>{}(env, {id1, id2}, timer, timer.now() + 1000_t);
tlogger.trace("last leader: {}", l);
});
}

@@ -1989,7 +1989,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {

auto leader_id = co_await env.new_server(true);

// Wait for the server to elect itself as a leader.
- assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, 1000_t) == leader_id);
+ assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

size_t no_servers = 5;

@@ -2103,7 +2103,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {

Konstantin Osipov

<kostja@scylladb.com>

unread,

Sep 30, 2021, 1:31:29 PM9/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/09/30 17:50]:

What's the motivation? Revealed in the next patch?

--
Konstantin Osipov, Moscow, Russia

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:10 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

We introduce a new operation to the framework: `reconfiguration`.
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

We use a dedicated thread (similarly to the network partitioning thread)
to periodically perform random reconfigurations.

Kamil Braun (5):

test: raft: randomized_nemesis_test: `reconfiguration` operation
test: raft: randomized_nemesis_test put `variant` and `monostate`
`ostream` `operator<<`s into `std` namespace
test: raft: randomized_nemesis_test: handle more error types

test: raft: randomized_nemesis_test: improve the bouncing algorithm

test: raft: randomized_nemesis_test: perform reconfigurations in
basic_generator_test

test/raft/randomized_nemesis_test.cc | 209 ++++++++++++++++++++++-----
1 file changed, 173 insertions(+), 36 deletions(-)

---

GIT URL: https://github.com/kbr-/scylla/tree/reconfig-v2

v2:
- the improved liveness check that was introduced in the last commit of
v1 was separately merged into master. The last commit of v2 only
extends the check with some additional log messages regarding the
cluster configuration.
- remove `server::accepts_requests()`. The original intent of
introducing `accepts_requests()` was the idea that if a server is a
leader and is not stepping down, the next `add_entry` or
`set_configuration` call should succeed. That's not true in general,
however (many things can still happen concurrently). The improved
liveness check tries to obtain the last response multiple times
(there is no guarantee that the first time will succeed even though we
stopped all nemeses) and for that `is_leader()` is enough.

--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:11 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

---
test/raft/randomized_nemesis_test.cc | 53 ++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 61e455cb8..fe6b95b67 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1752,6 +1752,59 @@ class network_majority_grudge {
}
};

+// Must be executed sequentially.
+template <PureStateMachine M>
+struct reconfiguration {
+ raft::logical_clock::duration timeout;
+
+ struct state_type {
+ const std::vector<raft::server_id> all_servers;
+ environment<M>& env;
+ // a subset of all_servers that we modify;
+ // the set of servers which may potentially be in the current configuration
+ std::unordered_set<raft::server_id>& known;
+ logical_timer& timer;
+ std::mt19937 rnd;
+ };
+
+ using result_type = reconfigure_result_t;
+
+ future<result_type> execute(state_type& s, const operation::context& ctx) {
+ assert(s.all_servers.size() > 1);
+ std::vector<raft::server_id> nodes{s.all_servers.begin(), s.all_servers.end()};
+
+ std::shuffle(nodes.begin(), nodes.end(), s.rnd);
+ nodes.resize(std::uniform_int_distribution<size_t>{1, nodes.size()}(s.rnd));
+
+ assert(s.known.size() > 0);
+ auto [res, last] = co_await bouncing{[&nodes, timeout = s.timer.now() + timeout, &timer = s.timer, &env = s.env] (raft::server_id id) {
+ return env.get_server(id).reconfigure(nodes, timeout, timer);
+ }}(s.timer, s.known, *s.known.begin(), 10, 10_t, 10_t);
+
+ std::visit(make_visitor(
+ [&, last = last] (std::monostate) {
+ tlogger.debug("reconfig successful from {} to {} by {}", s.known, nodes, last);
+ s.known = std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()};
+ // TODO: include the old leader as well in case it's not part of the new config?
+ // it may remain a leader for some time...
+ },
+ [&, last = last] (raft::not_a_leader& e) {
+ tlogger.debug("reconfig failed, not a leader: {} tried {} by {}", e, nodes, last);
+ },
+ [&, last = last] (auto& e) {
+ s.known.merge(std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()});
+ tlogger.debug("reconfig failed: {}, tried {} after merge {} by {}", e, nodes, s.known, last);
+ }
+ ), res);
+
+ co_return res;
+ }
+
+ friend std::ostream& operator<<(std::ostream& os, const reconfiguration& r) {
+ return os << format("reconfiguration{{timeout:{}}}", r.timeout);
+ }
+};
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:12 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).
---
test/raft/randomized_nemesis_test.cc | 4 ++++

1 file changed, 4 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index fe6b95b67..78e07a79b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1805,6 +1805,8 @@ struct reconfiguration {
}
};

+namespace std {

+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}

@@ -1815,6 +1817,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {

return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {
--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:13 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

With reconfigurations the `commit_status_unknown` error may start
appearing.
---
test/raft/randomized_nemesis_test.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 78e07a79b..67e0fc93b 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:14 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.
---
test/raft/randomized_nemesis_test.cc | 36 ++++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 67e0fc93b..65a7bb28c 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1590,10 +1590,10 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {

// The maximum number of calls until we give up is specified by `bounces`.
// The initial `raft::server_id` argument provided to `F` is specified as an argument
// to this function (`srv_id`). If the initial call returns `not_a_leader`, then:
-// - if the result contained a different leader ID, we will use it in the next call,
-// sleeping for `known_leader_delay` first,
+// - if the result contained a different leader ID and we didn't already try that ID,
+// we will use it in the next call, sleeping for `known_leader_delay` first,
// - otherwise we will take the next ID from the `known` set, sleeping for
-// `unknown_leader_delay` first.
+// `unknown_leader_delay` first; no ID will be tried twice.
// The returned result contains the result of the last call to `F` and the last
// server ID passed to `F`.
template <typename F>

@@ -1616,25 +1616,35 @@ struct bouncing {

- srv_id = *it;
}
- continue;

+
+ if (!known.empty()) {
+ srv_id = *known.begin();
+ if (n_a_l->leader) {
+ tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}, but already tried it; trying {}", n_a_l->leader, srv_id);
+ } else {
+ tlogger.trace("bouncing call: got `not_a_leader`, no reroute, trying {}", srv_id);

+ }
+ continue;
+ }

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 5, 2021, 6:20:16 AM10/5/21

to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun

We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.

---
test/raft/randomized_nemesis_test.cc | 108 +++++++++++++++++++++------
1 file changed, 86 insertions(+), 22 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 65a7bb28c..a8538be33 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1063,6 +1063,10 @@ class raft_server {

return _id;
}

+ raft::configuration get_configuration() const {
+ return _server->get_configuration();
+ }
+
void deliver(raft::server_id src, const typename rpc<typename M::state_t>::message_t& m) {
assert(_started);

if (!_gate.is_closed()) {
@@ -2046,10 +2050,17 @@ std::ostream& operator<<(std::ostream& os, const AppendReg::ret& r) {

return os << format("ret{{{}, {}}}", r.x, r.prev);
}

+namespace raft {
+std::ostream& operator<<(std::ostream& os, const raft::server_address& a) {
+ return os << a.id;
+}
+}
+
SEASTAR_TEST_CASE(basic_generator_test) {
using op_type = operation::invocable<operation::either_of<
raft_call<AppendReg>,
- network_majority_grudge<AppendReg>
+ network_majority_grudge<AppendReg>,
+ reconfiguration<AppendReg>
>>;
using history_t = utils::chunked_vector<std::variant<op_type, operation::completion<op_type>>>;

@@ -2076,39 +2087,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {

// Wait for the server to elect itself as a leader.

assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

- // TODO: make it dynamic based on the current configuration

- std::unordered_set<raft::server_id>& known = servers;

+ auto reconfig_thread = some(threads_without_nemesis);

+

@@ -2131,11 +2180,16 @@ SEASTAR_TEST_CASE(basic_generator_test) {

return op_type{network_majority_grudge<AppendReg>{raft::logical_clock::duration{dist(engine)}}};
})
),
- stagger(seed, timer.now(), 0_t, 50_t,
- sequence(1, [] (int32_t i) {
- assert(i > 0);
- return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
- })
+ pin(reconfig_thread,
+ stagger(seed, timer.now() + 1000_t, 500_t, 500_t,
+ constant([] () { return op_type{reconfiguration<AppendReg>{500_t}}; })
+ ),
+ stagger(seed, timer.now(), 0_t, 50_t,
+ sequence(1, [] (int32_t i) {
+ assert(i > 0);
+ return op_type{raft_call<AppendReg>{AppendReg::append{i}, 200_t}};
+ })
+ )
)
)
);

@@ -2178,6 +2232,10 @@ SEASTAR_TEST_CASE(basic_generator_test) {

} else {
tlogger.debug("completion {}", c);
}
+
+ // TODO: check consistency of reconfiguration completions
+ // (there's not much to check, but for example: we should not get back `conf_change_in_progress`
+ // if our last reconfiguration was successful?).
}
};

@@ -2201,7 +2259,7 @@ SEASTAR_TEST_CASE(basic_generator_test) {

auto now = timer.now();
auto leader = co_await wait_for_leader<AppendReg>{}(env,
- std::vector<raft::server_id>{servers.begin(), servers.end()}, timer, limit)

+ std::vector<raft::server_id>{all_servers.begin(), all_servers.end()}, timer, limit)

.handle_exception_type([&timer, now] (logical_timer::timed_out<raft::server_id>) -> raft::server_id {

tlogger.error("Failed to find a leader after {} ticks at the end of test.", timer.now() - now);

assert(false);
@@ -2214,16 +2272,22 @@ SEASTAR_TEST_CASE(basic_generator_test) {
continue;
}

- for (auto& s: servers) {

+ auto config = env.get_server(leader).get_configuration();
+ tlogger.debug("Leader {} configuration: current {} previous {}", leader, config.current, config.previous);

+

+ for (auto& s: all_servers) {

auto& srv = env.get_server(s);

if (srv.is_leader() && s != leader) {

- tlogger.debug("There is another leader: {}", s);

+ auto conf = srv.get_configuration();
+ tlogger.debug("There is another leader: {}, configuration: current {} previous {}", s, conf.current, conf.previous);
}
}

+ tlogger.debug("From the clients' point of view, the possible cluster members are: {}", known_config);
+

auto [res, last_attempted_server] = co_await bouncing{[&timer, &env] (raft::server_id id) {

return env.get_server(id).call(AppendReg::append{-1}, timer.now() + 200_t, timer);

- }}(timer, known, leader, known.size() + 1, 10_t, 10_t);

+ }}(timer, known_config, leader, known_config.size() + 1, 10_t, 10_t);

if (std::holds_alternative<typename AppendReg::ret>(res)) {

tlogger.debug("Last result: {}", res);

--
2.31.1

Kamil Braun

<kbraun@scylladb.com>

unread,

Oct 21, 2021, 5:19:48 AM10/21/21

to scylladb-dev, Konstantin Osipov, Gleb Natapov

P I N G

Alejo Sanchez

<alejo.sanchez@scylladb.com>

unread,

Oct 29, 2021, 10:21:08 AM10/29/21

to Kamil Braun, scylladb-dev, Konstantin Osipov, Gleb Natapov

On Tue, Oct 5, 2021 at 12:20 PM Kamil Braun <kbr...@scylladb.com> wrote:

LGTM

There is a possible corner case if the cluster is in a leader-less state.

known = {A B C} all followers

try A, then B, then C

known = {} and tried = {A, B, C}

then it keeps trying the last server (say C) over and over.
C could take time to learn of the leader or even be disconnected from it.

Perhaps once known becomes empty and tried is not empty you can std::swap them and start over?

Here

} else if (!tried.empty()) { sleep(); std::swap(tried, known);}

or something.

}

co_return std::pair{res, srv_id};
--
2.31.1

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20211005102005.53397-5-kbraun%40scylladb.com.

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:20:57 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

Hi,

1) I suggest you push this series, putting on the maintainer hat.
We should not hold up work on a test case because we have nits,
and the series provides useful findings already. In other words,
the series is LGTM. Whatever is written later should be a follow
up.

2) I think we should not add existing servers back. Adding an
existing server deserves perhaps a dedicated grudge, but the
common case is when we add new servers, this is what we'll do in
production.

3) Exception handling with a lengthy catch blocks
calls for re-factoring. I've had (with your help)
the dubious pleasure of tracing the exception pathways and
extending the catch blocks in the test, and with disconnected
future/promise life cycles it is difficult. Please
make this test easier to maintain for common folks like me.

4) I haven't reviewed the original series, so the test doesn't
follow what I'm going to request here, but my personal policy
that each non-trivial function deserves a comment.
reconfigure() input and output specifically deserves an
explanation.

5) I reckoned you chose to use std::variant
to return exceptional results to avoid co_awaits from catch blocks.
Is this correct? Shouldn't you then perhaps pay some attention to
this method and create a data type for "exceptional_return_value",
so that all raft exceptions don't have to be listed
in multiple places in the test?

6) Maybe I missed it, but please consider validating
reconfigurations. I.e. check that set_configuration() actually
sets the server configuration which was requested.

Thanks,

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

Konstantin Osipov, Moscow, Russia

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:23:02 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> +// Must be executed sequentially.
> +template <PureStateMachine M>
> +struct reconfiguration {
> + raft::logical_clock::duration timeout;
> +
> + struct state_type {

Why does reconfiguration state have to be stored in state_type
object? Why can't you store it all in struct reconfiguration?

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:23:42 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

Reviewed-by: Konstantin Osipov <kos...@scylladb.com>

> As a preparation for the following commits.
> Otherwise the definitions wouldn't be found during argument-dependent lookup
> (I don't understand why it worked before but won't after the next commit).
> ---
> test/raft/randomized_nemesis_test.cc | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
> index fe6b95b67..78e07a79b 100644
> --- a/test/raft/randomized_nemesis_test.cc
> +++ b/test/raft/randomized_nemesis_test.cc

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:24:23 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> With reconfigurations the `commit_status_unknown` error may start
> appearing.

Reviewed by: Konstantin Osipov <kos...@scylladb.com>

Please merge the exception handling if you can in a follow up
patch.

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:29:12 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> The bouncing algorithm tries to send a request to other servers in the
> configuration after it receives a `not_a_leader` response.
>
> Improve the algorithm so it doesn't try the same server twice.

Reviewed-by: Konstantin Osipov <kos...@scylladb.com>

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 9:34:56 AM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

Reviewed-by: Konstantin Osipov <kos...@scylladb.com>

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> @@ -2076,39 +2087,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {
> // Wait for the server to elect itself as a leader.
> assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);
>
> + size_t no_all_servers = 10;
> + std::vector<raft::server_id> all_servers{leader_id};
> + for (size_t i = 1; i < no_all_servers; ++i) {
> + all_servers.push_back(co_await env.new_server(false));
> + }
>
> - size_t no_servers = 5;
> - std::unordered_set<raft::server_id> servers{leader_id};
> - for (size_t i = 1; i < no_servers; ++i) {
> - servers.insert(co_await env.new_server(false));
> + size_t no_init_servers = 5;

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 1:54:35 PM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> The bouncing algorithm tries to send a request to other servers in the
> configuration after it receives a `not_a_leader` response.

There is one thing about this patch I missed.

The goal of the patch set which adds forwarding entries was to
make sure that add_entry/modify_config never get not_a_leader.

So I wish we also removed not_a_leader handling from the nemesis
test altogether.

Do you have a strong reason to return "not_a_leader" rather than
"timed_out" when you reach the bounce limit?

Konstantin Osipov

<kostja@scylladb.com>

unread,

Oct 30, 2021, 2:04:32 PM10/30/21

to Kamil Braun, scylladb-dev@googlegroups.com, gleb@scylladb.com

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

I rebased the patch set on top of forwarding the entries, and it
fails:
WARN 2021-10-30 21:02:41,435 [shard 0] randomized_nemesis_test - Failed to obtain last result at end of test: seastar::timed_out_error (timedout) returned by 00000000-0000-0000-0000-000000000008
ERROR 2021-10-30 21:02:41,435 [shard 0] randomized_nemesis_test - Failed to obtain a final successful response at the end of the test. Number of attempts: 50
randomized_nemesis_test: test/raft/randomized_nemesis_test.cc:2410: auto basic_generator_test::run_test_case()::(anonymous class)::operator()(environment<AppendReg> &, ticker &) const: Assertion `false' failed.
Aborting on shard 0.
Backtrace:
0x408688
0x4302a2
0x7f0c13b013bf
/lib/x86_64-linux-gnu/libc.so.6+0x4618a
/lib/x86_64-linux-gnu/libc.so.6+0x25858
/lib/x86_64-linux-gnu/libc.so.6+0x25728
/lib/x86_64-linux-gnu/libc.so.6+0x36f35
0x324548
0x418854
0x419a77
0x418eaa
0x52508d
0x524586
0x520d14
0x3fe2fd
/lib/x86_64-linux-gnu/libpthread.so.0+0x9608
/lib/x86_64-linux-gnu/libc.so.6+0x122292
zsh: abort (core dumped) ./build/dev/test/raft/randomized_nemesis_test

There is a chance it is a buggy rebase, but more likely forwarding
upsets some new use cases than patches in my series with new
exception types, which are not handled in the test.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Oct 31, 2021, 7:20:01 PM10/31/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

On Sat, Oct 30, 2021 at 3:20 PM Konstantin Osipov <kos...@scylladb.com> wrote:

Hi,

1) I suggest you push this series, putting on the maintainer hat.

The first rule of the maintainers' club is that one never commits his own patches.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20211030132053.GA176268%40starling.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Oct 31, 2021, 7:21:26 PM10/31/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

Do you want me to delay merging this series because of that, or not?

Konstantin Osipov

<kostja@scylladb.com>

unread,

Nov 1, 2021, 5:29:02 AM11/1/21

to Tomasz Grabiec, Kamil Braun, scylladb-dev, Gleb Natapov

* Tomasz Grabiec <tgra...@scylladb.com> [21/11/01 10:03]:

Let's see what Kamil says.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 2, 2021, 6:23:34 AM11/2/21

to Konstantin Osipov, Tomasz Grabiec, Kamil Braun, scylladb-dev, Gleb Natapov

Yeah I don't want to commit my own patches, so I'd like to ask @Tomasz Grabiec to review and possibly merge

Regarding the failure, I don't think there's a point in stopping this series from being merged because of that, we'll debug the problem tomorrow and include the fix in @Konstantin Osipov 's series

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:58:50 AM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test: `reconfiguration` operation

The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -1752,6 +1752,59 @@ class network_majority_grudge {
}
};

+// Must be executed sequentially.
+template <PureStateMachine M>
+struct reconfiguration {
+ raft::logical_clock::duration timeout;
+
+ struct state_type {

std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:58:51 AM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace

As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -1805,6 +1805,8 @@ struct reconfiguration {
}
};

+namespace std {

+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}

@@ -1815,6 +1817,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {
return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:58:52 AM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test: handle more error types

With reconfigurations the `commit_status_unknown` error may start
appearing.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -210,7 +210,7 @@ raft::command make_command(const cmd_id_t& cmd_id, const Input& input) {

// TODO: handle other errors?
template <PureStateMachine M>
-using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry>;
+using call_result_t = std::variant<typename M::output_t, timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown>;

// Sends a given `input` as a command to `server`, waits until the command gets replicated
// and applied on that server and returns the produced output.

@@ -254,13 +254,17 @@ future<call_result_t<M>> call(

return make_ready_future<call_result_t<M>>(e);
} catch (raft::dropped_entry e) {
return make_ready_future<call_result_t<M>>(e);
+ } catch (raft::commit_status_unknown e) {
+ return make_ready_future<call_result_t<M>>(e);
} catch (logical_timer::timed_out<typename M::output_t> e) {
(void)e.get_future().discard_result()
.handle_exception([] (std::exception_ptr eptr) {

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:58:54 AM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test: improve the bouncing algorithm

The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:58:55 AM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test: perform reconfigurations in basic_generator_test

We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -1063,6 +1063,10 @@ class raft_server {
return _id;
}

+ raft::configuration get_configuration() const {
+ return _server->get_configuration();
+ }
+
void deliver(raft::server_id src, const typename rpc<typename M::state_t>::message_t& m) {
assert(_started);
if (!_gate.is_closed()) {
@@ -2046,10 +2050,17 @@ std::ostream& operator<<(std::ostream& os, const AppendReg::ret& r) {
return os << format("ret{{{}, {}}}", r.x, r.prev);
}

+namespace raft {
+std::ostream& operator<<(std::ostream& os, const raft::server_address& a) {
+ return os << a.id;
+}
+}
+
SEASTAR_TEST_CASE(basic_generator_test) {
using op_type = operation::invocable<operation::either_of<
raft_call<AppendReg>,
- network_majority_grudge<AppendReg>
+ network_majority_grudge<AppendReg>,
+ reconfiguration<AppendReg>
>>;
using history_t = utils::chunked_vector<std::variant<op_type, operation::completion<op_type>>>;

@@ -2076,39 +2087,77 @@ SEASTAR_TEST_CASE(basic_generator_test) {
// Wait for the server to elect itself as a leader.
assert(co_await wait_for_leader<AppendReg>{}(env, {leader_id}, timer, timer.now() + 1000_t) == leader_id);

+ size_t no_all_servers = 10;
+ std::vector<raft::server_id> all_servers{leader_id};
+ for (size_t i = 1; i < no_all_servers; ++i) {
+ all_servers.push_back(co_await env.new_server(false));
+ }

- size_t no_servers = 5;
- std::unordered_set<raft::server_id> servers{leader_id};
- for (size_t i = 1; i < no_servers; ++i) {
- servers.insert(co_await env.new_server(false));
+ size_t no_init_servers = 5;

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:02:10 PM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>

Branch: master

test: raft: randomized_nemesis_test: `reconfiguration` operation

The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:02:12 PM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: master

test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace

As a preparation for the following commits.
Otherwise the definitions wouldn't be found during argument-dependent lookup
(I don't understand why it worked before but won't after the next commit).

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -1805,6 +1805,8 @@ struct reconfiguration {
}
};

+namespace std {

+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}

@@ -1815,6 +1817,8 @@ std::ostream& operator<<(std::ostream& os, const std::variant<T, Ts...>& v) {
return os;
}

+} // namespace std
+
namespace operation {

std::ostream& operator<<(std::ostream& os, const thread_id& tid) {

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:02:12 PM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: master

test: raft: randomized_nemesis_test: handle more error types

With reconfigurations the `commit_status_unknown` error may start
appearing.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:02:14 PM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: master

test: raft: randomized_nemesis_test: improve the bouncing algorithm

The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Commit Bot

<bot@cloudius-systems.com>

unread,

Nov 2, 2021, 8:02:15 PM11/2/21

to scylladb-dev@googlegroups.com, Kamil Braun

From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: master

test: raft: randomized_nemesis_test: perform reconfigurations in basic_generator_test

We use a dedicated thread (similarly to the nemesis thread)
to periodically perform reconfigurations.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 9:48:46 AM11/3/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

On Sat, Oct 30, 2021 at 3:20 PM Konstantin Osipov <kos...@scylladb.com> wrote:

Hi,

1) I suggest you push this series, putting on the maintainer hat.
We should not hold up work on a test case because we have nits,
and the series provides useful findings already. In other words,
the series is LGTM. Whatever is written later should be a follow
up.

2) I think we should not add existing servers back. Adding an
existing server deserves perhaps a dedicated grudge, but the
common case is when we add new servers, this is what we'll do in
production.

Ensuring servers are not added back is actually not trivial.

When a reconfiguration request fails, I don't know if it wasn't executed... or if it won't be eventually some time in the future.

I am performing them "sequentially" meaning that I start new reconfigurations before finishing previous ones, but because "finishing" includes such indefinite failures, requests may happen concurrently.

So it may happen e.g. that I send two requests:

1. reconfigure {A, B, C}

2. reconfigure {A, B}

I send 1. first but it fails, then I send 2. But it may happen that 2. happens before 1. So 1. will be readding C.

We discussed this with @Gleb Natapov and we discussed two alternative solutions. Both solutions involve unique reconfiguration request IDs.

Solution 1: Include a set of request IDs into fsm state. Before we append a reconfiguration entry, check if its ID is in this set. If yes, we somehow need to wait for the previous reconfiguration to finish, or return if we know that it is already finished. We didn't go into details about how to do that actually. For example, we could keep a set of finished reconfigurations + an optional "in-progress" reconfiguration; if the ID is in the set, we immediately return success, if the "in-progress" optional is engaged, wait on that. Move in-progress to the finished set when it's finished.

When a reconfiguration request fails, the client must repeat it with the same ID.

This ensures that every reconfiguration that we attempt would finish, and out-of-order failed requests won't have any additional effect (due to idempotency achieved with the set of IDs).

This requires that there is a single reconfiguring client which does not fail. In the test it's easy to achieve. In production we can achieve it by using replicated state machines :) (the client can e.g. persist its state for failover in the same Raft cluster that it's trying to reconfigure)

One aspect of solution 1 that I don't like is the fact that we must finish every reconfiguration that we ever attempt. There is no way of "giving up" a given reconfiguration request, if we want to ensure that it doesn't surprise us in the future (when it happens concurrently with another issued reconfiguration).

That's why I was thinking about solution 2: it's very similar to solution 1, but the client can also request to block a given ID from happening, if it hasn't happened yet. I.e.: the client sends an ID with a "block" request. The server checks its set of performed/in-progress reconfigurations: if the ID is already there, it responds with a nack, otherwise it inserts the ID into an additional set of blocked IDs and acks. The server cannot start blocked reconfigurations. This way the client can ensure that a request has failed (and will not happen in the future) and continue with a different request. But maybe there isn't much value in such a feature, and it's a bit more complex.

In any case, even if we have a way of ensuring that we don't readd servers, I think it's valuable to test this. It already helped find at least one problem (that we don't abort snapshot transfers when we remove a server from configuration).

Gleb Natapov

<gleb@scylladb.com>

unread,

Nov 3, 2021, 9:56:36 AM11/3/21

to Kamil Braun, Konstantin Osipov, scylladb-dev

On Wed, Nov 03, 2021 at 02:48:32PM +0100, Kamil Braun wrote:
> On Sat, Oct 30, 2021 at 3:20 PM Konstantin Osipov <kos...@scylladb.com>
> wrote:
>
> > Hi,
> >
> > 1) I suggest you push this series, putting on the maintainer hat.
> > We should not hold up work on a test case because we have nits,
> > and the series provides useful findings already. In other words,
> > the series is LGTM. Whatever is written later should be a follow
> > up.
> >
> > 2) I think we should not add existing servers back. Adding an
> > existing server deserves perhaps a dedicated grudge, but the
> > common case is when we add new servers, this is what we'll do in
> > production.
> >
> Ensuring servers are not added back is actually not trivial.
> When a reconfiguration request fails, I don't know if it wasn't executed...
> or if it won't be eventually some time in the future.
> I am performing them "sequentially" meaning that I start new
> reconfigurations before finishing previous ones, but because "finishing"
> includes such indefinite failures, requests may happen concurrently.
> So it may happen e.g. that I send two requests:
> 1. reconfigure {A, B, C}
> 2. reconfigure {A, B}
> I send 1. first but it fails, then I send 2. But it may happen that 2.
> happens before 1. So 1. will be readding C.
>

> We discussed this with @Gleb Natapov <gl...@scylladb.com> and we discussed

> two alternative solutions. Both solutions involve unique reconfiguration
> request IDs.
> Solution 1: Include a set of request IDs into fsm state. Before we append a
> reconfiguration entry, check if its ID is in this set. If yes, we somehow
> need to wait for the previous reconfiguration to finish, or return if we
> know that it is already finished. We didn't go into details about how to do
> that actually. For example, we could keep a set of finished
> reconfigurations + an optional "in-progress" reconfiguration; if the ID is
> in the set, we immediately return success, if the "in-progress" optional is
> engaged, wait on that. Move in-progress to the finished set when it's
> finished.
> When a reconfiguration request fails, the client must repeat it with the
> same ID.
> This ensures that every reconfiguration that we attempt would finish, and
> out-of-order failed requests won't have any additional effect (due to
> idempotency achieved with the set of IDs).
>

If you never rea-add the same nodes the config itself is unique and can
be used as the id.

> This requires that there is a single reconfiguring client which does not
> fail. In the test it's easy to achieve. In production we can achieve it by
> using replicated state machines :) (the client can e.g. persist its state
> for failover in the same Raft cluster that it's trying to reconfigure)
>
> One aspect of solution 1 that I don't like is the fact that we must finish
> every reconfiguration that we ever attempt. There is no way of "giving up"
> a given reconfiguration request, if we want to ensure that it doesn't
> surprise us in the future (when it happens concurrently with another issued
> reconfiguration).
> That's why I was thinking about solution 2: it's very similar to solution
> 1, but the client can also request to block a given ID from happening, if
> it hasn't happened yet. I.e.: the client sends an ID with a "block"
> request. The server checks its set of performed/in-progress
> reconfigurations: if the ID is already there, it responds with a nack,
> otherwise it inserts the ID into an additional set of blocked IDs and acks.
> The server cannot start blocked reconfigurations. This way the client can
> ensure that a request has failed (and will not happen in the future) and
> continue with a different request. But maybe there isn't much value in such
> a feature, and it's a bit more complex.
>

How do you ensure that the "block" request is eventually executed? Or not
reordered with reconfiguration request itself? Isn't is just turtles all
the way down?

--
Gleb.

Konstantin Osipov

<kostja@scylladb.com>

unread,

Nov 3, 2021, 10:10:14 AM11/3/21

to Kamil Braun, scylladb-dev, Gleb Natapov

* Kamil Braun <kbr...@scylladb.com> [21/11/03 16:51]:

> On Sat, Oct 30, 2021 at 3:20 PM Konstantin Osipov <kos...@scylladb.com>
> wrote:
>
> > Hi,
> >
> > 1) I suggest you push this series, putting on the maintainer hat.
> > We should not hold up work on a test case because we have nits,
> > and the series provides useful findings already. In other words,
> > the series is LGTM. Whatever is written later should be a follow
> > up.
> >
> > 2) I think we should not add existing servers back. Adding an
> > existing server deserves perhaps a dedicated grudge, but the
> > common case is when we add new servers, this is what we'll do in
> > production.
> >
> Ensuring servers are not added back is actually not trivial.
> When a reconfiguration request fails, I don't know if it wasn't executed...
> or if it won't be eventually some time in the future.
> I am performing them "sequentially" meaning that I start new
> reconfigurations before finishing previous ones, but because "finishing"
> includes such indefinite failures, requests may happen concurrently.
> So it may happen e.g. that I send two requests:
> 1. reconfigure {A, B, C}
> 2. reconfigure {A, B}
> I send 1. first but it fails, then I send 2. But it may happen that 2.
> happens before 1. So 1. will be readding C.

This is fine. I don't insist you never re-add existing servers.
Just introduce a generator that adds new server ids as a rule,
this is what this whole comment was about. If it fails to add
once in a while... it's ok.

> We discussed this with @Gleb Natapov <gl...@scylladb.com> and we discussed

I think we should not spend time solving this problem.

It's another matter that Raft library would benefit from
finding out the apply output, but that would require state machine
support and will have to be done the same way as you currently do
in the test and I suggest to do in schema changes:
- along with raft::command apply to the state machine some unique
command identifier
- query the state machine if it has this command identifier.

This would require a more powerful state machine that is currently
supplied to raft though.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 10:16:05 AM11/3/21

to Gleb Natapov, Konstantin Osipov, scylladb-dev

This request is already idempotent, so it can be repeated until it gives a response, after which we know that our reconfiguration either succeeded or didn't (and won't).

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 10:21:04 AM11/3/21

to Gleb Natapov, Konstantin Osipov, scylladb-dev

On Wed, Nov 3, 2021 at 2:56 PM Gleb Natapov <gl...@scylladb.com> wrote:

Interesting observation. The server would still have to remember those IDs somehow e.g. by keeping their hashes though (I guess we don't want to remember entire lists of servers).

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 10:22:43 AM11/3/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

The proposed mechanisms are specific to reconfiguration entries only,

here you're discussing general request IDs if I understand correctly.

Reconfiguration entries don't even enter `apply` right now.

Gleb Natapov

<gleb@scylladb.com>

unread,

Nov 3, 2021, 10:23:40 AM11/3/21

to Kamil Braun, Konstantin Osipov, scylladb-dev

So here you do not gave a way to give up as well. You need to retry
until you success.

--
Gleb.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 10:35:32 AM11/3/21

to Gleb Natapov, Konstantin Osipov, scylladb-dev

But then "success" can mean two things: either blocking the original reconfiguration request, so I can move on to a different reconfiguration if I want, or confirming that the original request has reached the cluster.

You could argue that it doesn't have much value because of the second possibility, in which case I would probably agree.

The idea is that "usually" a failure (such as timeout) means that probably the request did fail, and this gives us a way to confirm that a request indeed did fail (and isn't just waiting in the shadows to jump in at the least expected moment; even if it does, it's blocked).

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 3, 2021, 1:28:58 PM11/3/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

On Sat, Oct 30, 2021 at 3:23 PM Konstantin Osipov <kos...@scylladb.com> wrote:

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> +// Must be executed sequentially.
> +template <PureStateMachine M>
> +struct reconfiguration {
> + raft::logical_clock::duration timeout;
> +
> + struct state_type {

Why does reconfiguration state have to be stored in state_type
object? Why can't you store it all in struct reconfiguration?

`reconfiguration` objects are created by the generator (`gen` in the test), same as `raft_call`s and `network_majority_grudge`s.

The idea of the generator is to give us a deterministically generated sequence of events which doesn't rely on any external

state, just on the definition of the generator. Therefore I avoid putting references to this external state in the generator,

which I would have to do if I wanted to include those references in `reconfiguration`.

There's also the nice side effect that these event objects (`reconfiguration`, `raft_call` etc.) are smaller, which is good,

because we create and destroy lots of these objects.

Maybe it can be done differently, but at this point `reconfiguration` just follows the pattern set by `raft_call` and `network_majority_grudge`.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 4, 2021, 12:31:24 PM11/4/21

to Alejo Sanchez, scylladb-dev, Konstantin Osipov, Gleb Natapov

On Fri, Oct 29, 2021 at 4:21 PM Alejo Sanchez <alejo....@scylladb.com> wrote:

On Tue, Oct 5, 2021 at 12:20 PM Kamil Braun <kbr...@scylladb.com> wrote:

The bouncing algorithm tries to send a request to other servers in the
configuration after it receives a `not_a_leader` response.

Improve the algorithm so it doesn't try the same server twice.
---

test/raft/randomized_nemesis_test.cc | 36 ++++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 67e0fc93b..65a7bb28c 100644

--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc

@@ -1590,10 +1590,10 @@ SEASTAR_TEST_CASE(snapshotting_preserves_config_test) {
// The maximum number of calls until we give up is specified by `bounces`.
// The initial `raft::server_id` argument provided to `F` is specified as an argument
// to this function (`srv_id`). If the initial call returns `not_a_leader`, then:
-// - if the result contained a different leader ID, we will use it in the next call,
-// sleeping for `known_leader_delay` first,
+// - if the result contained a different leader ID and we didn't already try that ID,
+// we will use it in the next call, sleeping for `known_leader_delay` first,
// - otherwise we will take the next ID from the `known` set, sleeping for
-// `unknown_leader_delay` first.
+// `unknown_leader_delay` first; no ID will be tried twice.
// The returned result contains the result of the last call to `F` and the last
// server ID passed to `F`.
template <typename F>
@@ -1616,25 +1616,35 @@ struct bouncing {
raft::logical_clock::duration known_leader_delay,
raft::logical_clock::duration unknown_leader_delay
) {
- auto it = known.find(srv_id);
+ tlogger.trace("bouncing call: starting with {}", srv_id);

+ std::unordered_set<raft::server_id> tried;

while (true) {
auto res = co_await _f(srv_id);
+ tried.insert(srv_id);
+ known.erase(srv_id);

LGTM

There is a possible corner case if the cluster is in a leader-less state.
known = {A B C} all followers
try A, then B, then C
known = {} and tried = {A, B, C}
then it keeps trying the last server (say C) over and over.
C could take time to learn of the leader or even be disconnected from it.

Are you sure? if `tried` contains everyone and `known` = {}, we won't reach any of the `continue` statements, so we will reach the end of the `while` loop - which has a `co_return`.

Perhaps once known becomes empty and tried is not empty you can std::swap them and start over?

Here
} else if (!tried.empty()) { sleep(); std::swap(tried, known);}
or something.

}

co_return std::pair{res, srv_id};

--
2.31.1

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20211005102005.53397-5-kbraun%40scylladb.com.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 4, 2021, 12:33:15 PM11/4/21

to Konstantin Osipov, Kamil Braun, scylladb-dev, Gleb Natapov

On Sat, Oct 30, 2021 at 7:54 PM Konstantin Osipov <kos...@scylladb.com> wrote:

* Kamil Braun <kbr...@scylladb.com> [21/10/05 13:21]:

> The bouncing algorithm tries to send a request to other servers in the
> configuration after it receives a `not_a_leader` response.

There is one thing about this patch I missed.

The goal of the patch set which adds forwarding entries was to
make sure that add_entry/modify_config never get not_a_leader.

So I wish we also removed not_a_leader handling from the nemesis
test altogether.

Do you have a strong reason to return "not_a_leader" rather than
"timed_out" when you reach the bounce limit?

Not sure if I can call it a "strong" reason, but I want the algorithm to return the result of the operation: in this case we didn't timeout, we just couldn't find a leader even when we tried all servers so we gave up. So we return an error type saying exactly that: we couldn't find a leader.

> co_return std::pair{res, srv_id};
> --
> 2.31.1

--

Konstantin Osipov, Moscow, Russia

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 4, 2021, 12:37:30 PM11/4/21

to Konstantin Osipov, Tomasz Grabiec, Kamil Braun, scylladb-dev, Gleb Natapov

ping @Tomasz Grabiec

The patchset is ready for merging, there are some ongoing discussions but no specific requests for changing anything in the patchset, and Kostja gave a LGTM.

In the meantime Kostja checked again and said that the problem was a bad rebase, the two series actually do work together.

Alejo Sanchez

<alejo.sanchez@scylladb.com>

unread,

Nov 5, 2021, 10:12:18 AM11/5/21

to Kamil Braun, scylladb-dev, Konstantin Osipov, Gleb Natapov

You are correct.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Nov 5, 2021, 6:15:35 PM11/5/21

to Kamil Braun, Konstantin Osipov, scylladb-dev, Gleb Natapov

It was merged on Nov 2 as 00814dcadc.

Konstantin Osipov

<kostja@scylladb.com>

unread,

Nov 7, 2021, 8:51:27 AM11/7/21

to Kamil Braun, scylladb-dev, Gleb Natapov

* Kamil Braun <kbr...@scylladb.com> [21/11/06 15:59]:

> > > The bouncing algorithm tries to send a request to other servers in the
> > > configuration after it receives a `not_a_leader` response.
> >
> > There is one thing about this patch I missed.
> >
> > The goal of the patch set which adds forwarding entries was to
> > make sure that add_entry/modify_config never get not_a_leader.
> >
> > So I wish we also removed not_a_leader handling from the nemesis
> > test altogether.
> >
> > Do you have a strong reason to return "not_a_leader" rather than
> > "timed_out" when you reach the bounce limit?
> >
> Not sure if I can call it a "strong" reason, but I want the algorithm to
> return the result of the operation: in this case we didn't timeout, we just
> couldn't find a leader even when we tried all servers so we gave up. So we
> return an error type saying exactly that: we couldn't find a leader.

Neither of the exceptions were designed for this use case.
not_a_leader was created to be returned by a follower *instead* of
bouncing. You are in a state of uncertainty: you exhausted your
bounce list, and possibly also the time limit. There may be a
leader, but you didn't find it. You either return a new exception
(e.g. bounced_without_success) or piggy-back on an existing one, but in
this case timeout looks better than not_a_leader since we try to
define not_a_leader as a local-only exception, whereas time out
can represent uncertainty quite well.

I'd wait for Gleb comment as well.

Gleb Natapov

<gleb@scylladb.com>

unread,

Nov 7, 2021, 9:01:26 AM11/7/21

to Konstantin Osipov, Kamil Braun, scylladb-dev

FWIW I agree with your reasoning. If the idea of the forwarding code to not
handle forwarding in a client's code what it suppose to do with "not_a_leader"
exception?

--
Gleb.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 8, 2021, 4:55:10 AM11/8/21

to Tomasz Grabiec, Konstantin Osipov, scylladb-dev, Gleb Natapov

Indeed, thanks.

Kamil Braun

<kbraun@scylladb.com>

unread,

Nov 8, 2021, 6:54:48 AM11/8/21

to Gleb Natapov, Konstantin Osipov, scylladb-dev

Doesn't `not_a_leader` denote a definite failure, meaning the client could safely retry the request

without risk of double application? Which we can't say about timeout error.

--
Gleb.

Gleb Natapov

<gleb@scylladb.com>

unread,

Nov 8, 2021, 7:01:01 AM11/8/21

to Kamil Braun, Konstantin Osipov, scylladb-dev

I agree that timeout is not necessary the exception to return here. It
may be bounced_without_success or something other internal to the test.
After all a timeout error is also test internal.

--
Gleb.

Reply all

Reply to author

Forward