The reactor will soon gain an aio poll mode, which does not have a timer thread.
To avoid losing the stall detector, we migrate it to use a timer+signal directly.
In order to prevent excessive signals, the timer is set to the report threshold,
rather than the task quota. This means that almost every delivered signal
corresponds to a quota violation (the exceptions are when we've stopped running tasks
by the time the signal was delivered).
To avoid excessive calls to timer_settime(), we allow ourselves some slack in the
timing. For a threshold of 10ms we'll set the timer to 13ms and rearm it after 3ms
have gone by. So when we enter task execution, we have between 10ms and 13ms on the
clock.
---
 src/core/stall_detector.hh | 29 ++++++----
 src/core/reactor.cc    | 107 +++++++++++++++++++++++--------------
 2 files changed, 85 insertions(+), 51 deletions(-)
diff --git a/src/core/stall_detector.hh b/src/core/stall_detector.hh
index 194a9e0733..41d57ab5cd 100644
--- a/src/core/stall_detector.hh
+++ b/src/core/stall_detector.hh
@@ -23,50 +23,59 @@
 #pragma once
 #include <signal.h>
 #include <limits>
 #include <chrono>
+#include <seastar/core/posix.hh>
 namespace seastar {
 class reactor;
 namespace internal {
 struct cpu_stall_detector_config {
   std::chrono::duration<double> threshold = std::chrono::seconds(2);
   unsigned stall_detector_reports_per_minute = 1;
+Â Â float slack = 0.3;Â // fraction of threshold that we're allowed to overshoot
 };
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/20181122173009.18506-6-avi%40scylladb.com.
For more options, visit https://groups.google.com/d/optout.
Using the thread cputime clock prevents false positives due to an overcommitted
processor, and this improves the quality of the reports.
It could also generate false negatives if system calls block (since the cputime
clock is stopped when a thread switches contexts voluntarily), but these events
are rarer these days and are better investigated using the other latency detector.
---
 src/core/reactor.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/core/reactor.cc b/src/core/reactor.cc
index 1fcfe14ffd..2d12315fb1 100644
--- a/src/core/reactor.cc
+++ b/src/core/reactor.cc
@@ -627,11 +627,11 @@ cpu_stall_detector::cpu_stall_detector(reactor* r, cpu_stall_detector_config cfg
   update_config(cfg);
   struct sigevent sev = {};
   sev.sigev_notify = SIGEV_THREAD_ID;
   sev.sigev_signo = signal_number();
   sev._sigev_un._tid = syscall(SYS_gettid);
-Â Â int err = timer_create(CLOCK_MONOTONIC, &sev, &_timer);
+Â Â int err = timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &_timer);
   if (err) {
     throw std::system_error(std::error_code(err, std::system_category()));
   }
   _timer_initialized = true;
--
2.19.1
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/20181122173009.18506-7-avi%40scylladb.com.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/20181122173009.18506-10-avi%40scylladb.com.
On Thu, Nov 22, 2018 at 12:30 PM Avi Kivity <a...@scylladb.com> wrote:
The reactor will soon gain an aio poll mode, which does not have a timer thread.
To avoid losing the stall detector, we migrate it to use a timer+signal directly.
In order to prevent excessive signals, the timer is set to the report threshold,
rather than the task quota. This means that almost every delivered signal
corresponds to a quota violation (the exceptions are when we've stopped running tasks
by the time the signal was delivered).
I don't understand the above. If we set the timer to 10ms (or really any value above the task quota),and the timer triggered, how is that not a violation - regardless of whether or not we stopped eventually?
If the report threshold is 10ms and we set the timer to 10ms, we will have to re-arm the timer the timer every time we run tasks. Otherwise, some time will have been taken off the timer on the previous run.
Since we don't want a system call every time we run some tasks,
we add in some slack and set the timer to 13ms. This means 3ms of
runtime can pass until the timer has just 10ms and we need to
rearm it, so we rearm at most 300 times per second (per shard).
You can't fool me, you're in love with that thing.
On Thu, Nov 22, 2018 at 12:30 PM Avi Kivity <a...@scylladb.com> wrote:
Using the thread cputime clock prevents false positives due to an overcommitted
processor, and this improves the quality of the reports.
It could also generate false negatives if system calls block (since the cputime
clock is stopped when a thread switches contexts voluntarily), but these events
are rarer these days and are better investigated using the other latency detector.
This is a bit out of of scope for this series, but the other latency detector (I am assuming you are talking about the rusage one)is a bit opaque these days.
It's also a bit not-enabled-by-default. We should merge the two.
It's hard to understand how much time it takes and whether or not we are having problems coming from it.
For a while now we have been exporting the total time spent in violation as a prometheus metric but we don't do this for things that block in pollers - and they do happen, unfortunately.
We may also have CPU stalls in pollers, so we should extend the
detector to cover that too.
On 11/26/18 10:01 PM, Glauber Costa wrote:
On Thu, Nov 22, 2018 at 12:30 PM Avi Kivity <a...@scylladb.com> wrote:
Using the thread cputime clock prevents false positives due to an overcommitted
processor, and this improves the quality of the reports.
It could also generate false negatives if system calls block (since the cputime
clock is stopped when a thread switches contexts voluntarily), but these events
are rarer these days and are better investigated using the other latency detector.
This is a bit out of of scope for this series, but the other latency detector (I am assuming you are talking about the rusage one)is a bit opaque these days.
It's also a bit not-enabled-by-default. We should merge the two.
It's hard to understand how much time it takes and whether or not we are having problems coming from it.
For a while now we have been exporting the total time spent in violation as a prometheus metric but we don't do this for things that block in pollers - and they do happen, unfortunately.
We may also have CPU stalls in pollers, so we should extend the detector to cover that too.
Ping? I don't think Glauber's comments were actionable.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/b1ac2793-544b-94e5-47fc-66d7aa16e556%40scylladb.com.