[QUEUED scylla next] test: raft: randomized_nemesis_test: during bouncing call, allow a leader to reroute to itself

0 views
Skip to first unread message

Commit Bot

<bot@cloudius-systems.com>
unread,
May 23, 2022, 7:30:17 PM5/23/22
to scylladb-dev@googlegroups.com, Kamil Braun
From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: next

test: raft: randomized_nemesis_test: during bouncing call, allow a leader to reroute to itself

A server executing a `modify_config` call, even if it initially was a
leader and accepted the request, may end up throwing a `not_a_leader`
error, rerouting the caller to a new leader - but this new leader may be
that same server. This happens because `execute_modify_config`
translates certain errors that it considers transient (such as
`conf_change_in_progress`) into `not_a_leader{last_known_leader}`,
in attempt to notify the caller that they should retry the request; but
when this translation happens, the `last_known_leader` may be that same
server (it could have even lost leadership and then regained it back
while the request was being handled).

This is not strictly an error, and it should be safe for the client to
retry the request by sending it to the same server. The nemesis test
assumed that a server never returns `not_a_leader{itself}`; this commit
drops the assumption.

An alternative solution would be to extend the error types that are now
translated to `not_a_leader` so they include information about the last
known leader. This way the client does not lose information about the
original error and still gets a potential contact point for retry.

---
diff --git a/test/raft/randomized_nemesis_test.cc b/test/raft/randomized_nemesis_test.cc
--- a/test/raft/randomized_nemesis_test.cc
+++ b/test/raft/randomized_nemesis_test.cc
@@ -2129,8 +2129,7 @@ struct bouncing {
--bounces;

if (n_a_l->leader) {
- assert(n_a_l->leader != srv_id);
- if (!tried.contains(n_a_l->leader)) {
+ if (n_a_l->leader == srv_id || !tried.contains(n_a_l->leader)) {
co_await timer.sleep(known_leader_delay);
srv_id = n_a_l->leader;
tlogger.trace("bouncing call: got `not_a_leader`, rerouted to {}", srv_id);

Commit Bot

<bot@cloudius-systems.com>
unread,
May 24, 2022, 10:48:01 AM5/24/22
to scylladb-dev@googlegroups.com, Kamil Braun
@@ -2291,8 +2291,7 @@ struct bouncing {

Commit Bot

<bot@cloudius-systems.com>
unread,
May 25, 2022, 12:07:52 AM5/25/22
to scylladb-dev@googlegroups.com, Kamil Braun
From: Kamil Braun <kbr...@scylladb.com>
Committer: Kamil Braun <kbr...@scylladb.com>
Branch: master
Reply all
Reply to author
Forward
0 new messages