[QUEUED scylla next] sstable: partition_index_cache: Fix abort on bad_alloc during page loading

3 views
Skip to first unread message

Commit Bot

<bot@cloudius-systems.com>
unread,
May 25, 2022, 2:27:45 AM5/25/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: next

sstable: partition_index_cache: Fix abort on bad_alloc during page loading

When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c311e09adf8bcf0946637e35c7d0994 was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

---
diff --git a/sstables/partition_index_cache.hh b/sstables/partition_index_cache.hh
--- a/sstables/partition_index_cache.hh
+++ b/sstables/partition_index_cache.hh
@@ -55,7 +55,12 @@ private:
entry(entry&&) noexcept = default;

~entry() {
- assert(!is_referenced());
+ if (is_referenced()) {
+ // Live entry_ptr should keep the entry alive, except when the entry failed on loading.
+ // In that case, entry_ptr holders are not supposed to use the pointer, so it's safe
+ // to nullify those entry_ptrs.
+ assert(!ready());
+ }
}

void on_evicted() noexcept override;
diff --git a/test/boost/sstable_partition_index_cache_test.cc b/test/boost/sstable_partition_index_cache_test.cc
--- a/test/boost/sstable_partition_index_cache_test.cc
+++ b/test/boost/sstable_partition_index_cache_test.cc
@@ -36,10 +36,18 @@ static void add_entry(logalloc::region& r,

static partition_index_page make_page0(logalloc::region& r, simple_schema& s) {
partition_index_page page;
+ auto destroy_page = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ auto p = std::move(page);
+ });
+ });
+
add_entry(r, *s.schema(), page, s.make_pkey(0).key(), 0);
add_entry(r, *s.schema(), page, s.make_pkey(1).key(), 1);
add_entry(r, *s.schema(), page, s.make_pkey(2).key(), 2);
add_entry(r, *s.schema(), page, s.make_pkey(3).key(), 3);
+
+ destroy_page.cancel();
return page;
}

@@ -130,6 +138,47 @@ SEASTAR_THREAD_TEST_CASE(test_caching) {
}
}

+template <typename T>
+static future<> ignore_result(future<T>&& f) {
+ return f.then_wrapped([] (auto&& f) {
+ try {
+ f.get();
+ } catch (...) {
+ // expected, silence warnings about ignored failed futures
+ }
+ });
+}
+
+SEASTAR_THREAD_TEST_CASE(test_exception_while_loading) {
+ ::lru lru;
+ simple_schema s;
+ logalloc::region r;
+ partition_index_cache cache(lru, r);
+
+ auto clear_lru = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ lru.evict_all();
+ });
+ });
+
+ auto page0_loader = [&] (partition_index_cache::key_type k) {
+ return yield().then([&] {
+ return make_page0(r, s);
+ });
+ };
+
+ memory::with_allocation_failures([&] {
+ cache.evict_gently().get();
+ auto f0 = ignore_result(cache.get_or_load(0, page0_loader));
+ auto f1 = ignore_result(cache.get_or_load(0, page0_loader));
+ f0.get();
+ f1.get();
+ });
+
+ auto ptr = cache.get_or_load(0, page0_loader).get0();
+ has_page0(ptr);
+}
+
SEASTAR_THREAD_TEST_CASE(test_auto_clear) {
::lru lru;
simple_schema s;

Commit Bot

<bot@cloudius-systems.com>
unread,
May 25, 2022, 6:44:15 AM5/25/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: master

Commit Bot

<bot@cloudius-systems.com>
unread,
May 27, 2022, 2:50:41 AM5/27/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: next-5.0

sstable: partition_index_cache: Fix abort on bad_alloc during page loading

When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c311e09adf8bcf0946637e35c7d0994 was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

(cherry picked from commit f87274f66af07085c3fdc53d13e5c27a0fab0aec)

Commit Bot

<bot@cloudius-systems.com>
unread,
May 27, 2022, 2:51:07 AM5/27/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: next-4.6

sstable: partition_index_cache: Fix abort on bad_alloc during page loading

When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c311e09adf8bcf0946637e35c7d0994 was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

(cherry picked from commit f87274f66af07085c3fdc53d13e5c27a0fab0aec)

---
diff --git a/sstables/partition_index_cache.hh b/sstables/partition_index_cache.hh
--- a/sstables/partition_index_cache.hh
+++ b/sstables/partition_index_cache.hh
@@ -68,7 +68,12 @@ private:
entry(entry&&) noexcept = default;

~entry() {
- assert(!is_referenced());
+ if (is_referenced()) {
+ // Live entry_ptr should keep the entry alive, except when the entry failed on loading.
+ // In that case, entry_ptr holders are not supposed to use the pointer, so it's safe
+ // to nullify those entry_ptrs.
+ assert(!ready());
+ }
}

void on_evicted() noexcept override;
diff --git a/test/boost/sstable_partition_index_cache_test.cc b/test/boost/sstable_partition_index_cache_test.cc
--- a/test/boost/sstable_partition_index_cache_test.cc
+++ b/test/boost/sstable_partition_index_cache_test.cc
@@ -49,10 +49,18 @@ static void add_entry(logalloc::region& r,

static partition_index_page make_page0(logalloc::region& r, simple_schema& s) {
partition_index_page page;
+ auto destroy_page = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ auto p = std::move(page);
+ });
+ });
+
add_entry(r, *s.schema(), page, s.make_pkey(0).key(), 0);
add_entry(r, *s.schema(), page, s.make_pkey(1).key(), 1);
add_entry(r, *s.schema(), page, s.make_pkey(2).key(), 2);
add_entry(r, *s.schema(), page, s.make_pkey(3).key(), 3);
+
+ destroy_page.cancel();
return page;
}

@@ -143,6 +151,47 @@ SEASTAR_THREAD_TEST_CASE(test_caching) {

Commit Bot

<bot@cloudius-systems.com>
unread,
May 27, 2022, 9:18:23 AM5/27/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: branch-5.0

sstable: partition_index_cache: Fix abort on bad_alloc during page loading

When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c311e09adf8bcf0946637e35c7d0994 was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

(cherry picked from commit f87274f66af07085c3fdc53d13e5c27a0fab0aec)

---
diff --git a/sstables/partition_index_cache.hh b/sstables/partition_index_cache.hh
--- a/sstables/partition_index_cache.hh
+++ b/sstables/partition_index_cache.hh
@@ -55,7 +55,12 @@ private:
entry(entry&&) noexcept = default;

~entry() {
- assert(!is_referenced());
+ if (is_referenced()) {
+ // Live entry_ptr should keep the entry alive, except when the entry failed on loading.
+ // In that case, entry_ptr holders are not supposed to use the pointer, so it's safe
+ // to nullify those entry_ptrs.
+ assert(!ready());
+ }
}

void on_evicted() noexcept override;
diff --git a/test/boost/sstable_partition_index_cache_test.cc b/test/boost/sstable_partition_index_cache_test.cc
--- a/test/boost/sstable_partition_index_cache_test.cc
+++ b/test/boost/sstable_partition_index_cache_test.cc
@@ -36,10 +36,18 @@ static void add_entry(logalloc::region& r,

static partition_index_page make_page0(logalloc::region& r, simple_schema& s) {
partition_index_page page;
+ auto destroy_page = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ auto p = std::move(page);
+ });
+ });
+
add_entry(r, *s.schema(), page, s.make_pkey(0).key(), 0);
add_entry(r, *s.schema(), page, s.make_pkey(1).key(), 1);
add_entry(r, *s.schema(), page, s.make_pkey(2).key(), 2);
add_entry(r, *s.schema(), page, s.make_pkey(3).key(), 3);
+
+ destroy_page.cancel();
return page;
}

@@ -130,6 +138,47 @@ SEASTAR_THREAD_TEST_CASE(test_caching) {

Commit Bot

<bot@cloudius-systems.com>
unread,
May 30, 2022, 6:01:01 AM5/30/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: next-4.6

sstable: partition_index_cache: Fix abort on bad_alloc during page loading

When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c311e09adf8bcf0946637e35c7d0994 was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

(cherry picked from commit f87274f66af07085c3fdc53d13e5c27a0fab0aec)

---
diff --git a/sstables/partition_index_cache.hh b/sstables/partition_index_cache.hh
--- a/sstables/partition_index_cache.hh
+++ b/sstables/partition_index_cache.hh
@@ -68,7 +68,12 @@ private:
entry(entry&&) noexcept = default;

~entry() {
- assert(!is_referenced());
+ if (is_referenced()) {
+ // Live entry_ptr should keep the entry alive, except when the entry failed on loading.
+ // In that case, entry_ptr holders are not supposed to use the pointer, so it's safe
+ // to nullify those entry_ptrs.
+ assert(!ready());
+ }
}

void on_evicted() noexcept override;
diff --git a/test/boost/sstable_partition_index_cache_test.cc b/test/boost/sstable_partition_index_cache_test.cc
--- a/test/boost/sstable_partition_index_cache_test.cc
+++ b/test/boost/sstable_partition_index_cache_test.cc
@@ -49,10 +49,18 @@ static void add_entry(logalloc::region& r,

static partition_index_page make_page0(logalloc::region& r, simple_schema& s) {
partition_index_page page;
+ auto destroy_page = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ auto p = std::move(page);
+ });
+ });
+
add_entry(r, *s.schema(), page, s.make_pkey(0).key(), 0);
add_entry(r, *s.schema(), page, s.make_pkey(1).key(), 1);
add_entry(r, *s.schema(), page, s.make_pkey(2).key(), 2);
add_entry(r, *s.schema(), page, s.make_pkey(3).key(), 3);
+
+ destroy_page.cancel();
return page;
}

@@ -143,6 +151,47 @@ SEASTAR_THREAD_TEST_CASE(test_caching) {
}
}

+template <typename T>
+static future<> ignore_result(future<T>&& f) {
+ return f.then_wrapped([] (auto&& f) {
+ try {
+ f.get();
+ } catch (...) {
+ // expected, silence warnings about ignored failed futures
+ }
+ });
+}
+
+SEASTAR_THREAD_TEST_CASE(test_exception_while_loading) {
+ ::lru lru;
+ simple_schema s;
+ logalloc::region r;
+ partition_index_cache cache(lru, r);
+
+ auto clear_lru = defer([&] {
+ with_allocator(r.allocator(), [&] {
+ lru.evict_all();
+ });
+ });
+
+ auto page0_loader = [&] (partition_index_cache::key_type k) {
+ return later().then([&] {

Commit Bot

<bot@cloudius-systems.com>
unread,
May 30, 2022, 9:08:10 AM5/30/22
to scylladb-dev@googlegroups.com, Tomasz Grabiec
From: Tomasz Grabiec <tgra...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: branch-4.6
Reply all
Reply to author
Forward
0 new messages