[PATCH] api: introduce raft stepdown REST API

6 views
Skip to first unread message

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 18, 2024, 6:33:43 AMSep 18
to scylladb-dev@googlegroups.com
Also provide test.pu util function to trigger it. Can be useful for
testing.

---

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/11707/

diff --git a/test/topology/util.py b/test/topology/util.py
index afa3c7acd2..65ca70bc22 100644
--- a/test/topology/util.py
+++ b/test/topology/util.py
@@ -398,6 +398,14 @@ async def trigger_snapshot(manager, server: ServerInfo) -> None:
host = cql.cluster.metadata.get_host(server.ip_addr)
await manager.api.client.post(f"/raft/trigger_snapshot/{group0_id}", host=server.ip_addr)

+async def trigger_stepdown(manager, server: ServerInfo) -> None:
+ cql = manager.get_cql()
+ group0_id = (await cql.run_async(
+ "select value from system.scylla_local where key = 'raft_group0_id'"))[0].value
+
+ host = cql.cluster.metadata.get_host(server.ip_addr)
+ await manager.api.client.post(f"/raft/trigger_stepdown/{group0_id}", host=server.ip_addr)
+


async def get_coordinator_host_ids(manager: ManagerClient) -> list[str]:
diff --git a/api/raft.cc b/api/raft.cc
index bdf83b2358..115ffcc6c7 100644
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -123,6 +123,37 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};
}

+ co_return json_void{};
+ });
+ r::trigger_stepdown.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
+ raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};
+ auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {
+ if (timeout_str.empty()) {
+ return std::chrono::seconds{60};
+ }
+ auto dur = std::stoll(timeout_str);
+ if (dur <= 0) {
+ throw std::runtime_error{"Timeout must be a positive number."};
+ }
+ return std::chrono::seconds{dur};
+ });
+
+ std::atomic<bool> found_srv{false};
+ co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
+ auto* srv = raft_gr.find_server(gid);
+ if (!srv) {
+ co_return;
+ }
+
+ found_srv = true;
+ apilog.info("Triggering stepdown for group {}", gid);
+ co_await srv->stepdown(timeout_dur);
+ });
+
+ if (!found_srv) {
+ throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};
+ }
+
co_return json_void{};
});
}
@@ -131,6 +162,7 @@ void unset_raft(http_context&, httpd::routes& r) {
r::trigger_snapshot.unset(r);
r::get_leader_host.unset(r);
r::read_barrier.unset(r);
+ r::trigger_stepdown.unset(r);
}

}
diff --git a/api/api-doc/raft.json b/api/api-doc/raft.json
index 971475eb9a..8a8d90b981 100644
--- a/api/api-doc/raft.json
+++ b/api/api-doc/raft.json
@@ -94,6 +94,38 @@
]
}
]
+ },
+ {
+ "path":"/raft/trigger_stepdown/{group_id}",
+ "operations":[
+ {
+ "method":"POST",
+ "summary":"Triggers stepdown of a leader for given Raft group (does nothing if the node is not a leader)",
+ "type":"string",
+ "nickname":"trigger_stepdown",
+ "produces":[
+ "application/json"
+ ],
+ "parameters":[
+ {
+ "name":"group_id",
+ "description":"The ID of the group which leader should stepdown",
+ "required":true,
+ "allowMultiple":false,
+ "type":"string",
+ "paramType":"path"
+ },
+ {
+ "name":"timeout",
+ "description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",
+ "required":false,
+ "allowMultiple":false,
+ "type":"long",
+ "paramType":"query"
+ }
+ ]
+ }
+ ]
}
]
}
--
Gleb.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 18, 2024, 7:08:15 AMSep 18
to Laszlo Ersek, scylladb-dev@googlegroups.com
On Wed, Sep 18, 2024 at 01:06:33PM +0200, Laszlo Ersek wrote:
> Meta-question:
>
> On 9/18/24 12:33, 'Gleb Natapov' via ScyllaDB development wrote:
> > Also provide test.pu util function to trigger it. Can be useful for
> > testing.
> >
>
> When posting a patch to the list, are we supposed to link the github issue?
>
If there is a related issue of course. Fixed: or Ref:

> Thanks
> Laszlo
--
Gleb.

Laszlo Ersek

<laszlo.ersek@scylladb.com>
unread,
Sep 18, 2024, 7:43:00 AMSep 18
to Gleb Natapov, scylladb-dev@googlegroups.com
Meta-question:

On 9/18/24 12:33, 'Gleb Natapov' via ScyllaDB development wrote:
> Also provide test.pu util function to trigger it. Can be useful for
> testing.
>

When posting a patch to the list, are we supposed to link the github issue?

Thanks
Laszlo

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 19, 2024, 3:45:36 AMSep 19
to scylladb-dev@googlegroups.com, kbraun@scylladb.com
CI completed.
--
Gleb.

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 20, 2024, 5:07:42 AMSep 20
to Gleb Natapov, scylladb-dev@googlegroups.com
Please use `get_request_timeout` helper function introduced in https://github.com/scylladb/scylladb/commit/7a69d9070fe + `get_server_with_timeouts` -- following the style of other api implementations in api/raft.cc
(requires extending raft_server_with_timeouts with `stepdown`, but that's easy)

and for convenience, you could make group_id optional (if not passed, use group 0) -- like in `get_leader_host` or `read_barrier`





--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/Zuqsgm8QpaILvm4w%40scylladb.com.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 20, 2024, 5:56:22 AMSep 20
to Kamil Braun, scylladb-dev@googlegroups.com
On Fri, Sep 20, 2024 at 11:07:28AM +0200, Kamil Braun wrote:
> Please use `get_request_timeout` helper function introduced in
> https://github.com/scylladb/scylladb/commit/7a69d9070fe +
> `get_server_with_timeouts` -- following the style of other api
> implementations in api/raft.cc
> (requires extending raft_server_with_timeouts with `stepdown`, but that's
> easy)

I fails to see the benefit. raft::server::stepdown supports timeout
directly (and dos not support abort source), so anything raft_server_with_timeouts
provides is not useful.

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 20, 2024, 6:01:52 AMSep 20
to Gleb Natapov, scylladb-dev@googlegroups.com
Right... but how does `std::chrono::seconds{60}` translate to `logical_clock::duration`? What does it actually mean? I don't think it means 60 seconds

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 20, 2024, 6:26:33 AMSep 20
to Kamil Braun, scylladb-dev@googlegroups.com
On Fri, Sep 20, 2024 at 12:01:39PM +0200, Kamil Braun wrote:
> Right... but how does `std::chrono::seconds{60}` translate to
> `logical_clock::duration`? What does it actually mean? I don't think it
> means 60 seconds
>
Good point. We need to recalculate to the logical clock,

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 20, 2024, 6:32:35 AMSep 20
to Gleb Natapov, scylladb-dev@googlegroups.com
Or remove the useless logical_clock based timeout API and make it sane abort_source based like other ones

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 22, 2024, 8:06:52 AMSep 22
to Kamil Braun, scylladb-dev@googlegroups.com
On Fri, Sep 20, 2024 at 12:32:21PM +0200, Kamil Braun wrote:
> Or remove the useless logical_clock based timeout API and make it sane
> abort_source based like other ones
>
Changing raft API is out of the scope of this patch. I would not call it
useless though, but less flexible.
--
Gleb.

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 23, 2024, 4:45:50 AMSep 23
to Gleb Natapov, scylladb-dev@googlegroups.com
The amount of effort to migrate it to abort_source is probably similar to the amount of effort needed in writing the translation code between wall clock and logical clock

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 23, 2024, 4:59:07 AMSep 23
to Kamil Braun, scylladb-dev@googlegroups.com
On Mon, Sep 23, 2024 at 10:45:36AM +0200, Kamil Braun wrote:
> The amount of effort to migrate it to abort_source is probably similar to
> the amount of effort needed in writing the translation code between wall
> clock and logical clock
>
Unfortunately not. We already do the later in run_topology_coordinator
and it is a one linear.
--
Gleb.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 24, 2024, 4:03:26 AMSep 24
to scylladb-dev@googlegroups.com
Also provide test.pu util function to trigger it. Can be useful for
testing.

---

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/11821/

v1->v2:
* use get_request_timeout helper
* make group_id parameter optional

diff --git a/test/topology/util.py b/test/topology/util.py
index afa3c7acd26..5cdac13ed8a 100644
--- a/test/topology/util.py
+++ b/test/topology/util.py
@@ -398,6 +398,11 @@ async def trigger_snapshot(manager, server: ServerInfo) -> None:
host = cql.cluster.metadata.get_host(server.ip_addr)
await manager.api.client.post(f"/raft/trigger_snapshot/{group0_id}", host=server.ip_addr)

+async def trigger_stepdown(manager, server: ServerInfo) -> None:
+ cql = manager.get_cql()
+ host = cql.cluster.metadata.get_host(server.ip_addr)
+ await manager.api.client.post("/raft/trigger_stepdown", host=server.ip_addr)
+


async def get_coordinator_host_ids(manager: ManagerClient) -> list[str]:
diff --git a/api/raft.cc b/api/raft.cc
index bdf83b23588..4de530125ad 100644
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -11,6 +11,7 @@
#include "api/api-doc/raft.json.hh"

#include "service/raft/raft_group_registry.hh"
+#include "service/raft/raft_address_map.hh"
#include "log.hh"

using namespace seastar::httpd;
@@ -123,6 +124,39 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};
}

+ co_return json_void{};
+ });
+ r::trigger_stepdown.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
+ auto timeout = get_request_timeout(*req);
+ auto dur = timeout.value ? *timeout.value - lowres_clock::now() : std::chrono::seconds(60);
+ const auto stepdown_timeout_ticks = dur / service::raft_tick_interval;
+ auto timeout_dur = raft::logical_clock::duration(stepdown_timeout_ticks);
+
+ if (!req->query_parameters.contains("group_id")) {
+ // Read barrier on group 0 by default
+ co_await raft_gr.invoke_on(0, [timeout_dur] (service::raft_group_registry& raft_gr) {
+ return raft_gr.group0().stepdown(timeout_dur);
+ });
+ co_return json_void{};
+ }
+ raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};
+
+ std::atomic<bool> found_srv{false};
+ co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
+ auto* srv = raft_gr.find_server(gid);
+ if (!srv) {
+ co_return;
+ }
+
+ found_srv = true;
+ apilog.info("Triggering stepdown for group {}", gid);
+ co_await srv->stepdown(timeout_dur);
+ });
+
+ if (!found_srv) {
+ throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};
+ }
+
co_return json_void{};
});
}
@@ -131,6 +165,7 @@ void unset_raft(http_context&, httpd::routes& r) {
r::trigger_snapshot.unset(r);
r::get_leader_host.unset(r);
r::read_barrier.unset(r);
+ r::trigger_stepdown.unset(r);
}

}
diff --git a/api/api-doc/raft.json b/api/api-doc/raft.json
index 971475eb9a4..a5bb582c08f 100644
--- a/api/api-doc/raft.json
+++ b/api/api-doc/raft.json
@@ -94,6 +94,38 @@
]
}
]
+ },
+ {
+ "path":"/raft/trigger_stepdown/",
+ "operations":[
+ {
+ "method":"POST",
+ "summary":"Triggers stepdown of a leader for given Raft group (does nothing if the node is not a leader)",
+ "type":"string",
+ "nickname":"trigger_stepdown",
+ "produces":[
+ "application/json"
+ ],
+ "parameters":[
+ {
+ "name":"group_id",
+ "description":"The ID of the group which leader should stepdown",
+ "required":false,
+ "allowMultiple":false,
+ "type":"string",

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 24, 2024, 5:11:25 AMSep 24
to Gleb Natapov, scylladb-dev@googlegroups.com
On Tue, Sep 24, 2024 at 10:03 AM 'Gleb Natapov' via ScyllaDB development <scylla...@googlegroups.com> wrote:
Also provide test.pu util function to trigger it. Can be useful for
test.py
Read barrier?
 
+            co_await raft_gr.invoke_on(0, [timeout_dur] (service::raft_group_registry& raft_gr) {
+                return raft_gr.group0().stepdown(timeout_dur);
+            });
+            co_return json_void{};
+        }
+        raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};
+
+        std::atomic<bool> found_srv{false};
+        co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
+            auto* srv = raft_gr.find_server(gid);
+            if (!srv) {
+                co_return;
+            }
+
+            found_srv = true;
+            apilog.info("Triggering stepdown for group {}", gid);
Maybe add similar log for group 0 case
If not provided, group 0 is used

+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"timeout",
+                     "description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"long",
+                     "paramType":"query"
+                  }
+               ]
+            }
+         ]
       }
    ]
 }

--
                        Gleb.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 25, 2024, 7:04:29 AMSep 25
to scylladb-dev@googlegroups.com
Also provide test.pu util function to trigger it. Can be useful for
testing.

---

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/11893/

v2->v3:
- fix a typo
- fix a comment
- fix a help string

v1->v2:
* use get_request_timeout helper
* make group_id parameter optional



diff --git a/test/topology/util.py b/test/topology/util.py
index afa3c7acd26..5cdac13ed8a 100644
--- a/test/topology/util.py
+++ b/test/topology/util.py
@@ -398,6 +398,11 @@ async def trigger_snapshot(manager, server: ServerInfo) -> None:
host = cql.cluster.metadata.get_host(server.ip_addr)
await manager.api.client.post(f"/raft/trigger_snapshot/{group0_id}", host=server.ip_addr)

+async def trigger_stepdown(manager, server: ServerInfo) -> None:
+ cql = manager.get_cql()
+ host = cql.cluster.metadata.get_host(server.ip_addr)
+ await manager.api.client.post("/raft/trigger_stepdown", host=server.ip_addr)
+


async def get_coordinator_host_ids(manager: ManagerClient) -> list[str]:
diff --git a/api/raft.cc b/api/raft.cc
index bdf83b23588..f117abd8a18 100644
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -11,6 +11,7 @@
#include "api/api-doc/raft.json.hh"

#include "service/raft/raft_group_registry.hh"
+#include "service/raft/raft_address_map.hh"
#include "log.hh"

using namespace seastar::httpd;
@@ -123,6 +124,40 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};
}

+ co_return json_void{};
+ });
+ r::trigger_stepdown.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
+ auto timeout = get_request_timeout(*req);
+ auto dur = timeout.value ? *timeout.value - lowres_clock::now() : std::chrono::seconds(60);
+ const auto stepdown_timeout_ticks = dur / service::raft_tick_interval;
+ auto timeout_dur = raft::logical_clock::duration(stepdown_timeout_ticks);
+
+ if (!req->query_parameters.contains("group_id")) {
+ // Stepfown on group 0 by default
+ co_await raft_gr.invoke_on(0, [timeout_dur] (service::raft_group_registry& raft_gr) {
+ apilog.info("Triggering stepdown for group0");
+ return raft_gr.group0().stepdown(timeout_dur);
+ });
+ co_return json_void{};
+ }
+ raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};
+
+ std::atomic<bool> found_srv{false};
+ co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
+ auto* srv = raft_gr.find_server(gid);
+ if (!srv) {
+ co_return;
+ }
+
+ found_srv = true;
+ apilog.info("Triggering stepdown for group {}", gid);
+ co_await srv->stepdown(timeout_dur);
+ });
+
+ if (!found_srv) {
+ throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};
+ }
+
co_return json_void{};
});
}
@@ -131,6 +166,7 @@ void unset_raft(http_context&, httpd::routes& r) {
r::trigger_snapshot.unset(r);
r::get_leader_host.unset(r);
r::read_barrier.unset(r);
+ r::trigger_stepdown.unset(r);
}

}
diff --git a/api/api-doc/raft.json b/api/api-doc/raft.json
index 971475eb9a4..77e8698b495 100644
--- a/api/api-doc/raft.json
+++ b/api/api-doc/raft.json
@@ -94,6 +94,38 @@
]
}
]
+ },
+ {
+ "path":"/raft/trigger_stepdown/",
+ "operations":[
+ {
+ "method":"POST",
+ "summary":"Triggers stepdown of a leader for given Raft group or group0 if not provided (returns an error if the node is not a leader)",
+ "type":"string",
+ "nickname":"trigger_stepdown",
+ "produces":[
+ "application/json"
+ ],
+ "parameters":[
+ {
+ "name":"group_id",
+ "description":"The ID of the group which leader should stepdown",

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 27, 2024, 7:13:36 AMSep 27
to Gleb Natapov, scylladb-dev@googlegroups.com
On Wed, Sep 25, 2024 at 1:04 PM 'Gleb Natapov' via ScyllaDB development <scylla...@googlegroups.com> wrote:
Also provide test.pu util function to trigger it. Can be useful for
test.py
Stepdown
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 29, 2024, 4:21:43 AMSep 29
to scylladb-dev@googlegroups.com
Also provide test.py until function to trigger it. Can be useful for
(Same CI as v3 since new version has only a comment change)

v3->v4:
- fix new typos

v2->v3:
- fix a typo
- fix a comment
- fix a help string

v1->v2:
* use get_request_timeout helper
* make group_id parameter optional


diff --git a/test/topology/util.py b/test/topology/util.py
index afa3c7acd26..5cdac13ed8a 100644
--- a/test/topology/util.py
+++ b/test/topology/util.py
@@ -398,6 +398,11 @@ async def trigger_snapshot(manager, server: ServerInfo) -> None:
host = cql.cluster.metadata.get_host(server.ip_addr)
await manager.api.client.post(f"/raft/trigger_snapshot/{group0_id}", host=server.ip_addr)

+async def trigger_stepdown(manager, server: ServerInfo) -> None:
+ cql = manager.get_cql()
+ host = cql.cluster.metadata.get_host(server.ip_addr)
+ await manager.api.client.post("/raft/trigger_stepdown", host=server.ip_addr)
+


async def get_coordinator_host_ids(manager: ManagerClient) -> list[str]:
diff --git a/api/raft.cc b/api/raft.cc
index bdf83b23588..1dcd1254fca 100644
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -11,6 +11,7 @@
#include "api/api-doc/raft.json.hh"

#include "service/raft/raft_group_registry.hh"
+#include "service/raft/raft_address_map.hh"
#include "log.hh"

using namespace seastar::httpd;
@@ -123,6 +124,40 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};
}

+ co_return json_void{};
+ });
+ r::trigger_stepdown.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
+ auto timeout = get_request_timeout(*req);
+ auto dur = timeout.value ? *timeout.value - lowres_clock::now() : std::chrono::seconds(60);
+ const auto stepdown_timeout_ticks = dur / service::raft_tick_interval;
+ auto timeout_dur = raft::logical_clock::duration(stepdown_timeout_ticks);
+
+ if (!req->query_parameters.contains("group_id")) {
+ // Stepdown on group 0 by default

Kamil Braun

<kbraun@scylladb.com>
unread,
Sep 30, 2024, 10:27:36 AMSep 30
to Gleb Natapov, scylladb-dev@googlegroups.com
On Sun, Sep 29, 2024 at 10:21 AM 'Gleb Natapov' via ScyllaDB development <scylla...@googlegroups.com> wrote:
Also provide test.py until function to trigger it. Can be useful for
until -> util
 
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 30, 2024, 11:12:14 AMSep 30
to Kamil Braun, scylladb-dev@googlegroups.com
On Mon, Sep 30, 2024 at 04:27:23PM +0200, Kamil Braun wrote:
> On Sun, Sep 29, 2024 at 10:21 AM 'Gleb Natapov' via ScyllaDB development <
> scylla...@googlegroups.com> wrote:
>
> > Also provide test.py until function to trigger it. Can be useful for
> >
> until -> util
>
Can you just fix it while applying?
Reply all
Reply to author
Forward
0 new messages