[PATCH] reduce kernel scheduler wakeup granularity

Glauber Costa

<glauber@scylladb.com>

unread,

Apr 27, 2017, 9:50:44 AM4/27/17

to scylladb-dev@googlegroups.com, Glauber Costa

We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.

However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.

We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
dist/common/sysctl.d/99-scylla-sched.conf | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dist/common/sysctl.d/99-scylla-sched.conf b/dist/common/sysctl.d/99-scylla-sched.conf
index fd1a0dc..632be7b 100644
--- a/dist/common/sysctl.d/99-scylla-sched.conf
+++ b/dist/common/sysctl.d/99-scylla-sched.conf
@@ -5,7 +5,7 @@ kernel.sched_tunable_scaling = 0
kernel.sched_min_granularity_ns = 500000

# Don't delay unrelated workloads
-kernel.sched_wakeup_granularity_ns = 500000
+kernel.sched_wakeup_granularity_ns = 450000

# Schedule all tasks in this period
kernel.sched_latency_ns = 1000000
--
2.9.3

Commit Bot

<bot@cloudius-systems.com>

unread,

Apr 27, 2017, 11:11:46 AM4/27/17

to scylladb-dev@googlegroups.com, Glauber Costa

From: Glauber Costa <gla...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>
Branch: master

reduce kernel scheduler wakeup granularity

We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.

However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.

We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

Message-Id: <20170427135039...@scylladb.com>

---
diff --git a/dist/common/sysctl.d/99-scylla-sched.conf
b/dist/common/sysctl.d/99-scylla-sched.conf

Commit Bot

<bot@cloudius-systems.com>

unread,

May 1, 2017, 4:13:59 AM5/1/17

to scylladb-dev@googlegroups.com, Glauber Costa

From: Glauber Costa <gla...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>

Branch: branch-1.7

reduce kernel scheduler wakeup granularity

We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.

However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.

We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

Message-Id: <20170427135039...@scylladb.com>
(cherry picked from commit 14b9aa228537c4e4543f1195a1eb2475cdd73148)

---
diff --git a/dist/common/sysctl.d/99-scylla-sched.conf
b/dist/common/sysctl.d/99-scylla-sched.conf

Reply all

Reply to author

Forward