[scylladb-dev] [PATCH v2] loading_cache: make find(...)/begin() update the last_read timestamp of the item

1 view
Skip to first unread message

Vlad Zolotarov

unread,
Jul 29, 2021, 6:08:23 PMJul 29
to scylla...@googlegroups.com, Vlad Zolotarov
timestamped_val::touch() updates the last_read timestamp and moves the
item to the front of the LRU list.

Before this patch we were calling timestamped_val::touch() only in a context of
loading_cache::get_ptr().

However we should also be doing it when we find the cache element via
synchronous calls like find(...) and begin().

Fixes #8920

Signed-off-by: Vlad Zolotarov <vl...@scylladb.com>
---
New in v2:
- loading_cache.hh: Make begin() touch() too.
- loading_cache_test:
- test_loading_cache_loading_expiry_reset_on_sync_op:
- Decrease a number of iterations to 10 (was 100 before).
- Add a check for a begin() API.
- Add an explicit test (test_loading_cache_move_item_to_mru_list_front_on_sync_op)
that checks that touched items are moved to an LRU front.
---
utils/loading_cache.hh | 9 ++++--
test/boost/loading_cache_test.cc | 51 ++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/utils/loading_cache.hh b/utils/loading_cache.hh
index dd86c218d..1904699ec 100644
--- a/utils/loading_cache.hh
+++ b/utils/loading_cache.hh
@@ -109,13 +109,13 @@ class timestamped_val {
return _lru_entry_ptr;
}

-private:
void touch() noexcept {
assert(_lru_entry_ptr);
_last_read = loading_cache_clock_type::now();
_lru_entry_ptr->touch();
}

+private:
void set_anchor_back_reference(lru_entry* lru_entry_ptr) noexcept {
_lru_entry_ptr = lru_entry_ptr;
}
@@ -397,7 +397,11 @@ class loading_cache {
}

iterator begin() {
- return boost::make_transform_iterator(list_begin(), _value_extractor_fn);
+ list_iterator it = list_begin();
+ if (it != list_end()) {
+ it->timestamped_value().touch();
+ }
+ return boost::make_transform_iterator(it, _value_extractor_fn);
}

template <typename Pred>
@@ -443,6 +447,7 @@ class loading_cache {
/// For instance this always holds for iterators returned by set_find(...).
list_iterator to_list_iterator(set_iterator set_it) {
if (set_it != set_end()) {
+ set_it->touch();
return _lru_list.iterator_to(*set_it->lru_entry_ptr());
}
return list_end();
diff --git a/test/boost/loading_cache_test.cc b/test/boost/loading_cache_test.cc
index 1f6645094..4844bafb7 100644
--- a/test/boost/loading_cache_test.cc
+++ b/test/boost/loading_cache_test.cc
@@ -299,6 +299,57 @@ SEASTAR_TEST_CASE(test_loading_cache_loading_expiry_eviction) {
});
}

+SEASTAR_TEST_CASE(test_loading_cache_loading_expiry_reset_on_sync_op) {
+ return seastar::async([] {
+ using namespace std::chrono;
+ utils::loading_cache<int, sstring> loading_cache(num_loaders, 20ms, testlog);
+ auto stop_cache_reload = seastar::defer([&loading_cache] { loading_cache.stop().get(); });
+
+ prepare().get();
+
+ loading_cache.get_ptr(0, loader).discard_result().get();
+ auto it = loading_cache.find(0);
+
+ // Check that the expiration timer is reset every time we call a find()
+ for (int i = 0; i < 10; ++i) {
+ BOOST_REQUIRE(it != loading_cache.end());
+ it = loading_cache.find(0);
+ sleep(10ms).get();
+ }
+
+ // Check that the expiration timer is reset every time we call a begin()
+ for (int i = 0; i < 10; ++i) {
+ BOOST_REQUIRE(it != loading_cache.end());
+ it = loading_cache.begin();
+ sleep(10ms).get();
+ }
+
+ sleep(20ms).get();
+ REQUIRE_EVENTUALLY_EQUAL(loading_cache.memory_footprint(), 0);
+ });
+}
+
+SEASTAR_TEST_CASE(test_loading_cache_move_item_to_mru_list_front_on_sync_op) {
+ return seastar::async([] {
+ using namespace std::chrono;
+ utils::loading_cache<int, sstring> loading_cache(num_loaders, 1h, testlog);
+ auto stop_cache_reload = seastar::defer([&loading_cache] { loading_cache.stop().get(); });
+
+ prepare().get();
+
+ for (int i = 0; i < 5; ++i) {
+ loading_cache.get_ptr(i, loader).discard_result().get();
+ }
+
+ auto it0 = loading_cache.find(0);
+ auto it1 = loading_cache.find(1);
+
+ // "1" should be at the beginning of the list and "0" right after it.
+ BOOST_REQUIRE(it0 == std::next(it1, 1));
+ BOOST_REQUIRE(it1 == loading_cache.begin());
+ });
+}
+
SEASTAR_TEST_CASE(test_loading_cache_loading_reloading) {
return seastar::async([] {
using namespace std::chrono;
--
2.25.1

Tomasz Grabiec

unread,
Jul 29, 2021, 7:58:48 PMJul 29
to Vlad Zolotarov, scylla...@googlegroups.com
On Fri, Jul 30, 2021 at 12:08 AM Vlad Zolotarov <vl...@scylladb.com> wrote:
timestamped_val::touch() updates the last_read timestamp and moves the
item to the front of the LRU list.

Before this patch we were calling timestamped_val::touch() only in a context of
loading_cache::get_ptr().

However we should also be doing it when we find the cache element via
synchronous calls like find(...) and begin().

find() I can understand, but begin() seems controversial. Obtaining an iterator doesn't mean you access the item behind it. For example, you could obtain begin() and end() to execute lower_bound() on the range. 
IMO this is putting too much responsibility on to_list_iterator(), which is about converting iterators. LRU changes could be done outside it at the caller.
 
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20210729220815.2499074-1-vladz%40scylladb.com.

Vladislav Zolotarov

unread,
Jul 30, 2021, 12:55:01 PMJul 30
to Tomasz Grabiec, scylla...@googlegroups.com


On 7/29/21 7:58 PM, Tomasz Grabiec wrote:


On Fri, Jul 30, 2021 at 12:08 AM Vlad Zolotarov <vl...@scylladb.com> wrote:
timestamped_val::touch() updates the last_read timestamp and moves the
item to the front of the LRU list.

Before this patch we were calling timestamped_val::touch() only in a context of
loading_cache::get_ptr().

However we should also be doing it when we find the cache element via
synchronous calls like find(...) and begin().

find() I can understand, but begin() seems controversial. Obtaining an iterator doesn't mean you access the item behind it. For example, you could obtain begin() and end() to execute lower_bound() on the range.

Well, find() doesn't mean you are going to access the item either - maybe you only want to know if it exists.
Meaning it make or doesn't make sense exactly as begin().

If I touch() it when an iterator is dereferenced would it make more sense?

I thought about it too but then the item is going to be touched *every time* it's dereferenced.
It kinda makes sense too but I finally decided to implement the version in question.

Do you think this semantics makes more sense?
I can call touch() at the caller, yes - it's just going to be more places duplicating the same code.
Would it make more sense?

Tomasz Grabiec

unread,
Jul 30, 2021, 1:58:50 PMJul 30
to Vladislav Zolotarov, scylla...@googlegroups.com
On Fri, Jul 30, 2021 at 6:55 PM Vladislav Zolotarov <vl...@scylladb.com> wrote:


On 7/29/21 7:58 PM, Tomasz Grabiec wrote:


On Fri, Jul 30, 2021 at 12:08 AM Vlad Zolotarov <vl...@scylladb.com> wrote:
timestamped_val::touch() updates the last_read timestamp and moves the
item to the front of the LRU list.

Before this patch we were calling timestamped_val::touch() only in a context of
loading_cache::get_ptr().

However we should also be doing it when we find the cache element via
synchronous calls like find(...) and begin().

find() I can understand, but begin() seems controversial. Obtaining an iterator doesn't mean you access the item behind it. For example, you could obtain begin() and end() to execute lower_bound() on the range.

Well, find() doesn't mean you are going to access the item either - maybe you only want to know if it exists.
Meaning it make or doesn't make sense exactly as begin().

For the LRU purpose, checking if an item exists can be considered an access. That's the bug you are fixing, right?
 

If I touch() it when an iterator is dereferenced would it make more sense?

Seems too low-level. Not every iterator dereference should be a touch, e.g. in the lower_bound() use case.
 

I thought about it too but then the item is going to be touched *every time* it's dereferenced.
It kinda makes sense too but I finally decided to implement the version in question.

Do you think this semantics makes more sense?

I would leave begin()/end() not touching.
 

Vladislav Zolotarov

unread,
Jul 30, 2021, 3:01:45 PMJul 30
to Tomasz Grabiec, scylla...@googlegroups.com


On 7/30/21 1:58 PM, Tomasz Grabiec wrote:


On Fri, Jul 30, 2021 at 6:55 PM Vladislav Zolotarov <vl...@scylladb.com> wrote:


On 7/29/21 7:58 PM, Tomasz Grabiec wrote:


On Fri, Jul 30, 2021 at 12:08 AM Vlad Zolotarov <vl...@scylladb.com> wrote:
timestamped_val::touch() updates the last_read timestamp and moves the
item to the front of the LRU list.

Before this patch we were calling timestamped_val::touch() only in a context of
loading_cache::get_ptr().

However we should also be doing it when we find the cache element via
synchronous calls like find(...) and begin().

find() I can understand, but begin() seems controversial. Obtaining an iterator doesn't mean you access the item behind it. For example, you could obtain begin() and end() to execute lower_bound() on the range.

Well, find() doesn't mean you are going to access the item either - maybe you only want to know if it exists.
Meaning it make or doesn't make sense exactly as begin().

For the LRU purpose, checking if an item exists can be considered an access.

I don't see it any different than getting begin() which returns you a most recently used item.


That's the bug you are fixing, right?

There is a bug where we don't touch() an item when it's fetched via a sync API:

    statements::prepared_statement::checked_weak_ptr get_prepared(const prepared_cache_key_type& key) {
        auto it = _prepared_cache.find(key);
        if (it == _prepared_cache.end()) {
            return statements::prepared_statement::checked_weak_ptr();
        }
        return *it;
    }
But it doesn't mean that we need to create a weird API that would only make sense in the context of the specific usage.
If we think that getting a non-end iterator is considered an access then every operation of getting such an iterator should end up in a touch(): both find() and begin().
If we think that getting an iterator should not be considered as a touch and only when it's dereferenced is considered a touch (and I tend to agree to that) then no iterator returning operations should call a touch() and it should only be called when an iterator is rederefenced.

I vote for the later API version. Do you disagree?


 

If I touch() it when an iterator is dereferenced would it make more sense?

Seems too low-level. Not every iterator dereference should be a touch, e.g. in the lower_bound() use case.

But some will and some may use begin() to get the most recently used item.
What do you suggest to do then? Not touching in that case would make a very weird API.

BTW, you can't run a lower_bound() on the LRU iterator dereferenced values since items are not sorted by their values but rather only by timestamps.
BUT there is a way to get all the way to a "_last_read" timestamp and run the lower/upper_bound() or other algorithms if needed.
So, I don't see an issue if we call a touch when the cached value is read.

Vladislav Zolotarov

unread,
Jul 30, 2021, 4:42:28 PMJul 30
to Tomasz Grabiec, scylla...@googlegroups.com
Correction - there is no way to get to _last_read today. So we are totally "safe" - upper_bound() can't be run on loading_cache iterators today even with a custom "compare".
But the same issue would remain if somebody would run something line count_if. - These algorithms are going to read the cached value and hence update the _last_read timestamp and the location of the item in the LRU list.
Therefore these algorithms simply can't be used with this container and instead corresponding container methods should be used.
We already have remove(), remove_if(), find().
If needed we can add more, e.g. upper/lower_bound(timestamp), count_if, etc.
Reply all
Reply to author
Forward
0 new messages