[PATCH] sched/deadline: Always calculate end of period on sched

Steven Rostedt

unread,

Feb 12, 2016, 6:10:31 PM2/12/16

to LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

I'm writing a test case for SCHED_DEADLINE, and notice a strange
anomaly. Every so often, a deadline is missed and when I looked into
it, it happened because the sched_yield() had no effect (it didn't end
the previous period and let the start of the next runtime happen on the
end of the old period).

deadline-2228 7...1 116.778420: sys_enter_sched_yield:
deadline-2228 7d..3 116.778421: hrtimer_cancel: hrtimer=0xffff88011ebd79a0
deadline-2228 7d..2 116.778422: rcu_utilization: Start context switch
deadline-2228 7d..2 116.778423: rcu_utilization: End context switch
deadline-2228 7d..4 116.778423: hrtimer_start: hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 softexpires=116124420428
deadline-2228 7...1 116.778425: sys_exit_sched_yield: 0x0

Schedule was never called. A added some trace_printks() and discovered
that this happens when sched_yield() is called right after a tick that
updates its current bandwidth.

When the schedule tick happens that updates the current bandwidth,
update_curr_dl() is called, where it updates curr->se.exec_start to
rq_clock_task(rq).

The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
update by various points in the scheduler.

Now, if the user task calls sched_yield() just after a bandwidth update
synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
calls into update_curr_dl() we have:

delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec <= 0))
return;

Coming in here from a sched_yield() will have delta_exec == 0 if the
sched_yield() was called after a DL tick and before another
update_rq_clock_task() is called.

This means that the task will not release its remaining runtime, and
the will start off in the current period when it expected to be in the
next period.

The fix that appears to work for me is to add a test in
update_curr_dl() to not exit if delta_exec is zero and
dl_se->dl_yielded is true.

Signed-off-by: Steven Rostedt <ros...@goodmis.org>
---
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index cd64c979d0e1..1dd180cda574 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
* approach need further study.
*/
delta_exec = rq_clock_task(rq) - curr->se.exec_start;
- if (unlikely((s64)delta_exec <= 0))
+ if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
return;

schedstat_set(curr->se.statistics.exec_max,

Juri Lelli

unread,

Feb 15, 2016, 5:18:38 AM2/15/16

to Steven Rostedt, LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

Hi,

This looks good to me. Do you think we could also skip some of the
following updates/accounting in this case? Not sure we win anything by
doing that, though.

Thanks,

- Juri

Daniel Bristot de Oliveira

unread,

Feb 15, 2016, 7:37:36 AM2/15/16

to Juri Lelli, Steven Rostedt, LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams, John Kacur

On 02/15/2016 08:18 AM, Juri Lelli wrote:
> Do you think we could also skip some of the
> following updates/accounting in this case? Not sure we win anything by
> doing that, though.

I reviewed rostedt's patch and the following updates/accounting
operations. I agree with rostedt's patch, and also agree that
if (delta_exec == 0) it is a good idea to skip some += 0 and
function calls of the next updates/accounting operations,
before the if (dl_runtime_exeeded...).

Steven Rostedt

unread,

Feb 15, 2016, 11:22:15 AM2/15/16

to Juri Lelli, LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

On Mon, 15 Feb 2016 10:18:24 +0000
Juri Lelli <juri....@arm.com> wrote:

> > Signed-off-by: Steven Rostedt <ros...@goodmis.org>
> > ---
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cd64c979d0e1..1dd180cda574 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> > * approach need further study.
> > */
> > delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> > - if (unlikely((s64)delta_exec <= 0))
> > + if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> > return;
> >
>
> This looks good to me. Do you think we could also skip some of the
> following updates/accounting in this case? Not sure we win anything by
> doing that, though.
>

Well, I would say we get this patch in first and think about other
updates second. This fixes one bug, might as well pull it in.

I'm now looking into a second bug. I'm getting:

RT throttling activated

and

DL replenish lagged to much

messages, back to back, when I'm only using 50% of the band width.
Looks to be a leak of how much is being used. The big issue here is
that these messages kill the test due to the latency caused to perform
the printk(). After the messages are splatted out (they only print once
per boot), the tests run fine again. IOW, there seems to be no real
issue of something doing too much bandwidth.

I get this with or without this current patch.

-- Steve

Peter Zijlstra

unread,

Feb 23, 2016, 7:28:33 AM2/23/16

to Steven Rostedt, LKML, Juri Lelli, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

On Fri, Feb 12, 2016 at 06:10:20PM -0500, Steven Rostedt wrote:
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cd64c979d0e1..1dd180cda574 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> * approach need further study.
> */
> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> - if (unlikely((s64)delta_exec <= 0))
> + if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> return;
>
> schedstat_set(curr->se.statistics.exec_max,

Would something like this make sense instead?

It also retains the ->runtime while yielded, and would actually 'fix' a
case where, when we call yield, we would have had a negative runtime
after update_curr_dl().

The current code will 'gift' us extra runtime in that case.

---
kernel/sched/deadline.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 57b939c81bce..c2bca80d3388 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -399,6 +399,9 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
dl_se->runtime = pi_se->dl_runtime;
}

+ if (dl_se->dl_yielded && dl_se->runtime > 0)
+ dl_se->runtime = 0;
+
/*
* We keep moving the deadline away until we get some
* available runtime for the entity. This ensures correct
@@ -735,8 +738,11 @@ static void update_curr_dl(struct rq *rq)

* approach need further study.
*/
delta_exec = rq_clock_task(rq) - curr->se.exec_start;
- if (unlikely((s64)delta_exec <= 0))

+ if (unlikely((s64)delta_exec <= 0)) {
+ if (unlikely(dl_se->dl_yielded))
+ goto throttle;
return;
+ }

schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));
@@ -749,8 +755,10 @@ static void update_curr_dl(struct rq *rq)

sched_rt_avg_update(rq, delta_exec);

- dl_se->runtime -= dl_se->dl_yielded ? 0 : delta_exec;
- if (dl_runtime_exceeded(dl_se)) {
+ dl_se->runtime -= delta_exec;
+
+throttle:
+ if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
dl_se->dl_throttled = 1;
__dequeue_task_dl(rq, curr, 0);
if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
@@ -1002,10 +1010,8 @@ static void yield_task_dl(struct rq *rq)
* it and the bandwidth timer will wake it up and will give it
* new scheduling parameters (thanks to dl_yielded=1).
*/
- if (p->dl.runtime > 0) {
- rq->curr->dl.dl_yielded = 1;
- p->dl.runtime = 0;
- }
+ rq->curr->dl.dl_yielded = 1;
+
update_rq_clock(rq);
update_curr_dl(rq);
/*

Steven Rostedt

unread,

Feb 23, 2016, 8:13:07 AM2/23/16

to Peter Zijlstra, LKML, Juri Lelli, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

On Tue, 23 Feb 2016 13:28:22 +0100
Peter Zijlstra <pet...@infradead.org> wrote:

> On Fri, Feb 12, 2016 at 06:10:20PM -0500, Steven Rostedt wrote:
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cd64c979d0e1..1dd180cda574 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> > * approach need further study.
> > */
> > delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> > - if (unlikely((s64)delta_exec <= 0))
> > + if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> > return;
> >
> > schedstat_set(curr->se.statistics.exec_max,
>
>
> Would something like this make sense instead?
>

I'll test it and see if it works.

-- Steve

Steven Rostedt

unread,

Feb 23, 2016, 10:04:36 AM2/23/16

to Peter Zijlstra, LKML, Juri Lelli, Ingo Molnar, Clark Williams, Daniel Bristot de Oliveira, John Kacur

On Tue, 23 Feb 2016 13:28:22 +0100
Peter Zijlstra <pet...@infradead.org> wrote:

> Would something like this make sense instead?

It works perfectly.

Reported-by: Steven Rostedt <ros...@goodmis.org>
Tested-by: Steven Rostedt <ros...@goodmis.org>

Thanks!

-- Steve

tip-bot for Peter Zijlstra

unread,

Feb 29, 2016, 6:14:50 AM2/29/16

to linux-ti...@vger.kernel.org, ros...@goodmis.org, linux-...@vger.kernel.org, will...@redhat.com, juri....@gmail.com, h...@zytor.com, torv...@linux-foundation.org, tg...@linutronix.de, jka...@redhat.com, mi...@kernel.org, pet...@infradead.org, bri...@redhat.com

Commit-ID: 48be3a67da7413d62e5efbcf2c73a9dddf61fb96
Gitweb: http://git.kernel.org/tip/48be3a67da7413d62e5efbcf2c73a9dddf61fb96
Author: Peter Zijlstra <pet...@infradead.org>
AuthorDate: Tue, 23 Feb 2016 13:28:22 +0100
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 29 Feb 2016 09:41:51 +0100

sched/deadline: Always calculate end of period on sched_yield()

Steven noticed that occasionally a sched_yield() call would not result
in a wait for the next period edge as expected.

It turns out that when we call update_curr_dl() and end up with
delta_exec <= 0, we will bail early and fail to throttle.

Further inspection of the yield code revealed that yield_task_dl()
clearing dl.runtime is wrong too, it will not account the last bit of
runtime which could result in dl.runtime < 0, which in turn means that
replenish would gift us with too much runtime.

Fix both issues by not relying on the dl.runtime value for yield.

Reported-by: Steven Rostedt <ros...@goodmis.org>
Tested-by: Steven Rostedt <ros...@goodmis.org>

Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Daniel Bristot de Oliveira <bri...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
Cc: Juri Lelli <juri....@gmail.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Link: http://lkml.kernel.org/r/2016022312...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
kernel/sched/deadline.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 57b939c..04a569c 100644

@@ -994,18 +1002,14 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
*/

static void yield_task_dl(struct rq *rq)

{
- struct task_struct *p = rq->curr;
-
/*
* We make the task go to sleep until its current deadline by
* forcing its runtime to zero. This way, update_curr_dl() stops

[PATCH] sched/deadline: Always calculate end of period on sched_yield()

Steven Rostedt

Juri Lelli

Daniel Bristot de Oliveira

Steven Rostedt

Peter Zijlstra

Steven Rostedt

Steven Rostedt

tip-bot for Peter Zijlstra