[PATCH 1/7] test: raft: randomized_nemesis_test: `reconfiguration` operation

0 views
Skip to first unread message

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 27, 2021, 11:31:06 AM9/27/21
to scylladb-dev@googlegroups.com, kostja@scylladb.com, gleb@scylladb.com, Kamil Braun
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).
---
test/raft/randomized_nemesis_test.cc | 53 ++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
index 00f598806..4324e8f2d 100644
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -1740,6 +1740,59 @@ class network_majority_grudge {
}
};

+// Must be executed sequentially.
+template <PureStateMachine M>
+struct reconfiguration {
+ raft::logical_clock::duration timeout;
+
+ struct state_type {
+ const std::vector<raft::server_id> all_servers;
+ environment<M>& env;
+ // a subset of all_servers that we modify;
+ // the set of servers which may potentially be in the current configuration
+ std::unordered_set<raft::server_id>& known;
+ logical_timer& timer;
+ std::mt19937 rnd;
+ };
+
+ using result_type = reconfigure_result_t;
+
+ future<result_type> execute(state_type& s, const operation::context& ctx) {
+ assert(s.all_servers.size() > 1);
+ std::vector<raft::server_id> nodes{s.all_servers.begin(), s.all_servers.end()};
+
+ std::shuffle(nodes.begin(), nodes.end(), s.rnd);
+ nodes.resize(std::uniform_int_distribution<size_t>{1, nodes.size()}(s.rnd));
+
+ assert(s.known.size() > 0);
+ auto [res, last] = co_await bouncing{[&nodes, timeout = s.timer.now() + timeout, &timer = s.timer, &env = s.env] (raft::server_id id) {
+ return env.get_server(id).reconfigure(nodes, timeout, timer);
+ }}(s.timer, s.known, *s.known.begin(), 10, 10_t, 10_t);
+
+ std::visit(make_visitor(
+ [&, last = last] (std::monostate) {
+ tlogger.debug("reconfig successful from {} to {} by {}", s.known, nodes, last);
+ s.known = std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()};
+ // TODO: include the old leader as well in case it's not part of the new config?
+ // it may remain a leader for some time...
+ },
+ [&, last = last] (raft::not_a_leader& e) {
+ tlogger.debug("reconfig failed, not a leader: {} tried {} by {}", e, nodes, last);
+ },
+ [&, last = last] (auto& e) {
+ s.known.merge(std::unordered_set<raft::server_id>{nodes.begin(), nodes.end()});
+ tlogger.debug("reconfig failed: {}, tried {} after merge {} by {}", e, nodes, s.known, last);
+ }
+ ), res);
+
+ co_return res;
+ }
+
+ friend std::ostream& operator<<(std::ostream& os, const reconfiguration& r) {
+ return os << format("reconfiguration{{timeout:{}}}", r.timeout);
+ }
+};
+
std::ostream& operator<<(std::ostream& os, const std::monostate&) {
return os << "";
}
--
2.31.1

Reply all
Reply to author
Forward
0 new messages