[PATCH v1 1/2] group0: Stop group0 if node initialization fails

1 view
Skip to first unread message

Gleb Natapov

<gleb@scylladb.com>
unread,
Sep 30, 2024, 6:26:25 AMSep 30
to scylladb-dev@googlegroups.com
Commit af83c5e53eb465 moved group0 abortion into the stroage service
drain function. But it is not called if node fails during initialization
(if it failed to join cluster for instance). So lets abort on both
paths (but only once).
---
main.cc | 7 +++++++
service/storage_service.cc | 8 +++++---
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/main.cc b/main.cc
index da95572d539..33199450cc8 100644
--- a/main.cc
+++ b/main.cc
@@ -1946,6 +1946,13 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl
ss.local().uninit_address_map().get();
});

+ // Need to make sure storage service does not use group0 before running group0_service.abort()
+ // Normally it is done in storage_service::do_drain(), but in case start up fail we need to do it
+ // here as well
+ auto stop_group0_usage_in_storage_service = defer_verbose_shutdown("group 0 usage in local storage", [&ss] {
+ ss.local().wait_for_group0_stop().get();
+ });
+
// Setup group0 early in case the node is bootstrapped already and the group exists.
// Need to do it before allowing incoming messaging service connections since
// storage proxy's and migration manager's verbs may access group0.
diff --git a/service/storage_service.cc b/service/storage_service.cc
index 7435f41084b..dd5ae6db5ad 100644
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -3231,9 +3231,11 @@ future<> storage_service::stop() {
}

future<> storage_service::wait_for_group0_stop() {
- _group0_as.request_abort();
- _topology_state_machine.event.broken(make_exception_ptr(abort_requested_exception()));
- co_await when_all(std::move(_raft_state_monitor), std::move(_sstable_cleanup_fiber), std::move(_upgrade_to_topology_coordinator_fiber));
+ if (!_group0_as.abort_requested()) {
+ _group0_as.request_abort();
+ _topology_state_machine.event.broken(make_exception_ptr(abort_requested_exception()));
+ co_await when_all(std::move(_raft_state_monitor), std::move(_sstable_cleanup_fiber), std::move(_upgrade_to_topology_coordinator_fiber));
+ }
}

future<> storage_service::check_for_endpoint_collision(std::unordered_set<gms::inet_address> initial_contact_nodes, const std::unordered_map<gms::inet_address, sstring>& loaded_peer_features) {
--
2.46.0

Reply all
Reply to author
Forward
0 new messages