[PATCH] storage service: stop group0 before draining storage service during shutdown

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 5, 2024, 10:05:09 AM9/5/24

to scylladb-dev@googlegroups.com

group0: stop group0 before draining storage service during shutdown

Currently storage service is drained while group0 is still active. The
draining stops commitlogs, so after this point no more writes are
possible, but if group0 is still active it may try to apply commands
which will try to do writes and they will fail causing group0 state
machine errors. This is benign since we are shutting down anyway, but
better to fix shutdown order to keep logs clean.

Fixes scylladb/scylladb#19665

---
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/11365/

diff --git a/main.cc b/main.cc
index 9aab704749..36ac65f07f 100644
--- a/main.cc
+++ b/main.cc
@@ -1915,11 +1915,6 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl
ss.local().uninit_address_map().get();
});

- // Need to make sure storage service does not use group0 before running group0_service.abort()
- auto stop_group0_usage_in_storage_service = defer_verbose_shutdown("group 0 usage in local storage", [&ss] {
- ss.local().wait_for_group0_stop().get();
- });
-
// Setup group0 early in case the node is bootstrapped already and the group exists.
// Need to do it before allowing incoming messaging service connections since
// storage proxy's and migration manager's verbs may access group0.
diff --git a/service/storage_service.cc b/service/storage_service.cc
index ffbc04d2e1..e676c6c5ff 100644
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -1086,12 +1086,15 @@ future<> storage_service::sstable_cleanup_fiber(raft::server& server, sharded<se
break;
}
rtlogger.debug("cleanup flag cleared");
- } catch (const seastar::abort_requested_exception &) {
+ } catch (const seastar::abort_requested_exception&) {
rtlogger.info("cleanup fiber aborted");
break;
} catch (raft::request_aborted&) {
rtlogger.info("cleanup fiber aborted");
break;
+ } catch (const seastar::broken_condition_variable&) {
+ rtlogger.info("cleanup fiber aborted");
+ break;
} catch (...) {
rtlogger.error("cleanup fiber got an error: {}", std::current_exception());
err = true;
@@ -4582,6 +4585,8 @@ future<> storage_service::drain() {
future<> storage_service::do_drain() {
co_await stop_transport();

+ co_await wait_for_group0_stop();
+
co_await tracing::tracing::tracing_instance().invoke_on_all(&tracing::tracing::shutdown);

co_await get_batchlog_manager().invoke_on_all([] (auto& bm) {
--
Gleb.

Kamil Braun

<kbraun@scylladb.com>

unread,

Sep 6, 2024, 6:42:24 AM9/6/24

to Gleb Natapov, scylladb-dev@googlegroups.com

This issue is causing flakiness in release branches too I think, so we should backport it. Therefore a PR would be more suitable, I believe.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/Ztm6kBmMS1gHWAy9%40scylladb.com.

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 6, 2024, 10:44:17 AM9/6/24

to Kamil Braun, scylladb-dev@googlegroups.com

On Fri, Sep 06, 2024 at 12:42:12PM +0200, Kamil Braun wrote:
> This issue is causing flakiness in release branches too I think, so we
> should backport it. Therefore a PR would be more suitable, I believe.
>

If it is then I can do it as a PR. But lets make sure it does, because
this kind of fix is not the one I would like to backport otherwise.

Benny Halevy

<bhalevy@scylladb.com>

unread,

Sep 7, 2024, 8:34:01 AM9/7/24

to Gleb Natapov, scylladb-dev

nit: why don't we print the exception?

+ break;
} catch (...) {
rtlogger.error("cleanup fiber got an error: {}", std::current_exception());
err = true;
@@ -4582,6 +4585,8 @@ future<> storage_service::drain() {
future<> storage_service::do_drain() {
co_await stop_transport();

+ co_await wait_for_group0_stop();

I presume you that checked that stopping group0 after stop_transport is fine, but it seems counter-intuitive. How about adding a comment explaining the drain order logic?

+
co_await tracing::tracing::tracing_instance().invoke_on_all(&tracing::tracing::shutdown);

co_await get_batchlog_manager().invoke_on_all([] (auto& bm) {
--
Gleb.

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 8, 2024, 3:49:46 AM9/8/24

to Benny Halevy, scylladb-dev

Why is it interesting? Those are expected reasons to stop the fiber.

> + break;
> > } catch (...) {
> > rtlogger.error("cleanup fiber got an error: {}",
> > std::current_exception());
> > err = true;
> > @@ -4582,6 +4585,8 @@ future<> storage_service::drain() {
> > future<> storage_service::do_drain() {
> > co_await stop_transport();
> >
> > + co_await wait_for_group0_stop();
> >
>
> I presume you that checked that stopping group0 after stop_transport is
> fine, but it seems counter-intuitive. How about adding a comment explaining
> the drain order logic?

Before the patch we also stop group0 after stop_transport. Why is it
counter-intuitive?

>
> +
> > co_await
> > tracing::tracing::tracing_instance().invoke_on_all(&tracing::tracing::shutdown);
> >
> > co_await get_batchlog_manager().invoke_on_all([] (auto& bm) {
> > --
> > Gleb.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "ScyllaDB development" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to scylladb-dev...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/scylladb-dev/Ztm6kBmMS1gHWAy9%40scylladb.com
> > .
> >

--
Gleb.

Benny Halevy

<bhalevy@scylladb.com>

unread,

Sep 15, 2024, 2:52:11 AM9/15/24

to Gleb Natapov, scylladb-dev

If they are expected, then they aren't very interesting indeed.

> > +             break;
> > >          } catch (...) {
> > >               rtlogger.error("cleanup fiber got an error: {}",
> > > std::current_exception());
> > >               err = true;
> > > @@ -4582,6 +4585,8 @@ future<> storage_service::drain() {
> > > future<> storage_service::do_drain() {
> > >      co_await stop_transport();
> > >
> > > +    co_await wait_for_group0_stop();
> > >
> >
> > I presume you that checked that stopping group0 after stop_transport is
> > fine, but it seems counter-intuitive. How about adding a comment explaining
> > the drain order logic?
>
> Before the patch we also stop group0 after stop_transport. Why is it
> counter-intuitive?

My intuition is that to stop group0 there could be some communication
over the network with other group0 members, e.g. to transfer leadership
to another node. A comment could help readers that aren't familiar with
the implementation details, like me.

Gleb Natapov

<gleb@scylladb.com>

unread,

Sep 15, 2024, 3:48:38 AM9/15/24

to Benny Halevy, scylladb-dev

On Sun, Sep 15, 2024 at 09:52:04AM +0300, Benny Halevy wrote:
> > > > + co_await wait_for_group0_stop();
> > > >
> > >
> > > I presume you that checked that stopping group0 after stop_transport is
> > > fine, but it seems counter-intuitive. How about adding a comment explaining
> > > the drain order logic?
> >
> > Before the patch we also stop group0 after stop_transport. Why is it
> > counter-intuitive?
>
> My intuition is that to stop group0 there could be some communication
> over the network with other group0 members, e.g. to transfer leadership
> to another node. A comment could help readers that aren't familiar with
> the implementation details, like me.
>

We do not stepdown as a leader during group0 stopping. It is a good idea
to add it as a step in shutdown procedure. Probably as a first thing in
ss:o_drain().

--
Gleb.

Konstantin Osipov

<kostja@scylladb.com>

unread,

Sep 16, 2024, 1:43:57 PM9/16/24

to Gleb Natapov, Benny Halevy, scylladb-dev

* 'Gleb Natapov' via ScyllaDB development <scylla...@googlegroups.com> [24/09/15 19:01]:

> > My intuition is that to stop group0 there could be some communication
> > over the network with other group0 members, e.g. to transfer leadership
> > to another node. A comment could help readers that aren't familiar with
> > the implementation details, like me.
> >
>
> We do not stepdown as a leader during group0 stopping. It is a good idea
> to add it as a step in shutdown procedure. Probably as a first thing in
> ss:o_drain().

Could you open a ticket please?

--
Konstantin Osipov, Moscow, Russia

Reply all

Reply to author

Forward