[COMMIT seastar master] Merge "Handle overflow in token bucket replenisher" from Pavel E

Commit Bot

<bot@cloudius-systems.com>

unread,

Jun 27, 2022, 10:40:02 AM6/27/22

to seastar-dev@googlegroups.com, Avi Kivity

From: Avi Kivity <a...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: master

Merge "Handle overflow in token bucket replenisher" from Pavel E

"
When a priority class is idling for 1+ hour its bandwidth replenisher
may overflow its math and start generating zero tokens thus locking up
the replenishing itself.

branch: https://github.com/xemul/seastar/tree/br-shared-token-bucket-replenish-overflow
fixes: #1111
tests: unit(dev), io_tester(dev)
"

* 'br-shared-token-bucket-replenish-overflow' of https://github.com/xemul/seastar:
shared_token_bucket: Test replenish after long idling
shared_token_bucket: Handle rate * delta overflows

---
diff --git a/include/seastar/util/shared_token_bucket.hh b/include/seastar/util/shared_token_bucket.hh
--- a/include/seastar/util/shared_token_bucket.hh
+++ b/include/seastar/util/shared_token_bucket.hh
@@ -134,11 +134,22 @@ class shared_token_bucket {
public:
static constexpr T max_rate = std::numeric_limits<T>::max() / 2 / max_delta.count();

- shared_token_bucket(T rate, T limit, T threshold) noexcept
+private:
+ static constexpr T accumulated(T rate, rate_resolution delta) noexcept {
+ return std::round(rate * delta.count());
+ }
+#ifndef __clang__
+ // std::round() is constexpr only since C++23 (but g++ doesn't care)
+ static_assert(accumulated(max_rate, max_delta) <= std::numeric_limits<T>::max());
+#endif
+
+public:
+ shared_token_bucket(T rate, T limit, T threshold, bool add_replenish_iffset = true) noexcept
: _replenish_rate(std::min(rate, max_rate))
, _replenish_limit(limit)
, _replenish_threshold(std::clamp(threshold, (T)1, limit))
- , _replenished(Clock::now())
+ // pretend it was replenished yesterday to spot overflows early
+ , _replenished(Clock::now() - std::chrono::hours(add_replenish_iffset ? 24 : 0))
, _rovers(_replenish_limit)
{}

@@ -181,8 +192,8 @@ public:
// the number of tokens accumulated for the given time frame
template <typename Rep, typename Per>
T accumulated_in(const std::chrono::duration<Rep, Per> delta) const noexcept {
- auto delta_at_rate = rate_cast(delta);
- return std::round(_replenish_rate * delta_at_rate.count());
+ auto delta_at_rate = std::min(rate_cast(delta), max_delta);
+ return accumulated(_replenish_rate, delta_at_rate);
}

// Estimated time to process the given amount of tokens
diff --git a/tests/unit/shared_token_bucket_test.cc b/tests/unit/shared_token_bucket_test.cc
--- a/tests/unit/shared_token_bucket_test.cc
+++ b/tests/unit/shared_token_bucket_test.cc
@@ -29,7 +29,7 @@ using namespace seastar;
using namespace std::chrono_literals;

SEASTAR_TEST_CASE(test_basic_non_capped_loop) {
- internal::shared_token_bucket<uint64_t, std::ratio<1>, internal::capped_release::no, manual_clock> tb(1, 1, 0);
+ internal::shared_token_bucket<uint64_t, std::ratio<1>, internal::capped_release::no, manual_clock> tb(1, 1, 0, false);

// Grab one token and make sure it's only available in 1s
auto th = tb.grab(1);
@@ -49,7 +49,7 @@ SEASTAR_TEST_CASE(test_basic_non_capped_loop) {
}

SEASTAR_TEST_CASE(test_basic_capped_loop) {
- internal::shared_token_bucket<uint64_t, std::ratio<1>, internal::capped_release::yes, manual_clock> tb(1, 1, 0);
+ internal::shared_token_bucket<uint64_t, std::ratio<1>, internal::capped_release::yes, manual_clock> tb(1, 1, 0, false);

// Grab on token and make sure it's only available in 1s
auto th = tb.grab(1);

Asias He

<asias@scylladb.com>

unread,

Jun 27, 2022, 10:29:08 PM6/27/22

to Avi Kivity, seastar-dev, Pavel Emelyanov

On Mon, Jun 27, 2022 at 10:40 PM Commit Bot <b...@cloudius-systems.com> wrote:

From: Avi Kivity <a...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: master

Merge "Handle overflow in token bucket replenisher" from Pavel E

"
When a priority class is idling for 1+ hour its bandwidth replenisher
may overflow its math and start generating zero tokens thus locking up
the replenishing itself.

Is this the root cause for the recent hangs we saw in scylla?

E.g., https://github.com/scylladb/scylla/issues/10814

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/000000000000fbfe9b05e26ee4a0%40google.com.

--

Asias

Pavel Emelyanov

<xemul@scylladb.com>

unread,

Jun 28, 2022, 1:52:12 AM6/28/22

to Asias He, Avi Kivity, seastar-dev

On 28.06.2022 05:28, Asias He wrote:
>
>
> On Mon, Jun 27, 2022 at 10:40 PM Commit Bot <b...@cloudius-systems.com <mailto:b...@cloudius-systems.com>> wrote:

>
> From: Avi Kivity <a...@scylladb.com <mailto:a...@scylladb.com>>
> Committer: Avi Kivity <a...@scylladb.com <mailto:a...@scylladb.com>>
> Branch: master
>
> Merge "Handle overflow in token bucket replenisher" from Pavel E
>
> "
> When a priority class is idling for 1+ hour its bandwidth replenisher
> may overflow its math and start generating zero tokens thus locking up
> the replenishing itself.
>
>
>
> Is this the root cause for the recent hangs we saw in scylla?

> E.g., https://github.com/scylladb/scylla/issues/10814 <https://github.com/scylladb/scylla/issues/10814>

If an IO class get a pause in IO for 1+ hour throughout the test, and then
gets stuck, then most likely it is.

>
> branch: https://github.com/xemul/seastar/tree/br-shared-token-bucket-replenish-overflow <https://github.com/xemul/seastar/tree/br-shared-token-bucket-replenish-overflow>

> fixes: #1111
> tests: unit(dev), io_tester(dev)
> "
>

> * 'br-shared-token-bucket-replenish-overflow' of https://github.com/xemul/seastar <https://github.com/xemul/seastar>:

> To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com <mailto:seastar-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/000000000000fbfe9b05e26ee4a0%40google.com <https://groups.google.com/d/msgid/seastar-dev/000000000000fbfe9b05e26ee4a0%40google.com>.
>
>
>
> --
> Asias

Reply all

Reply to author

Forward