[BUG] "perf top" results in "NOHZ: local_softirq

Heiko Carstens

unread,

Dec 7, 2010, 7:45:02 AM12/7/10

to Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Paul Mackerras, Thomas Gleixner, linux-...@vger.kernel.org

While playing around with perf I realized that "perf top" immediatly results
in a "NOHZ: local_softirq_pending 100" message to the console.

0x100 means that a HRTIMER_SOFTIRQ is pending when the cpu tries to disable
the tick.
In perf_event.c we have a call to __hrtimer_start_range_ns() in
perf_swevent_start_hrtimer() where its wakeup parameter is zero.
__hrtimer_start_range_ns() in turn will call hrtimer_enqueue_reprogram()
which will call __raise_softirq_irqoff(HRTIMER_SOFTIRQ) (since wakeup is
zero).
That means that just the HRTIMER_SOFTIRQ bit gets set in the softirq
pending field, but wakeup_softirqd() doesn't get called.

As far as I could see this function gets called from process context with
a spinlock held and hence we don't have any guarantee that this pending
softirq get executed before the idle task gets scheduled and tries to
disable the tick.

The easiest fix would be to set wakeup to one (see patch below), but I guess
there is a reason why its zero. Anybody?

---
kernel/perf_event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index eac7e33..958b3e0 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4942,7 +4942,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
}
__hrtimer_start_range_ns(&hwc->hrtimer,
ns_to_ktime(period), 0,
- HRTIMER_MODE_REL_PINNED, 0);
+ HRTIMER_MODE_REL_PINNED, 1);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Heiko Carstens

unread,

Dec 7, 2010, 8:30:03 AM12/7/10

to Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Paul Mackerras, Thomas Gleixner, linux-...@vger.kernel.org

Ah. With this patch "perf record" might deadlock.
That's the reason why wakeup is zero. Tough luck.

Peter Zijlstra

unread,

Dec 7, 2010, 8:51:34 AM12/7/10

to Heiko Carstens, Ingo Molnar, Frederic Weisbecker, Paul Mackerras, Thomas Gleixner, linux-...@vger.kernel.org

On Tue, 2010-12-07 at 14:29 +0100, Heiko Carstens wrote:
> > As far as I could see this function gets called from process context with
> > a spinlock held and hence we don't have any guarantee that this pending
> > softirq get executed before the idle task gets scheduled and tries to
> > disable the tick.
> >
> > The easiest fix would be to set wakeup to one (see patch below), but I guess
> > there is a reason why its zero. Anybody?

We can start that hrtimer from within the scheduler function while
holding the rq->lock, doing a wakeup from there is not sane.

The best solution would be to fix the hrtimer_start*() interface,
something Thomas and I have wanted to do for ages but because we've
procrastinated is now a much larger job than it was :/

The whole HRTIMER_SOFTIRQ thing should die.. but for that to happen its
only use-case today must first go.

The problem is trying to start a timer with already elapsed time.
Preferably hrtimer_start*() would simply return -ETIME and let the
caller sort it, sadly the current behaviour is to 'fix' it for the
caller by enqueueing the timer onto the softirq list and raising the
softirq.

I guess we could make hrtimer_start*(.wakeup=false) return the -ENOTIME
thing and audit those few use-cases.

Thomas Gleixner

unread,

Dec 7, 2010, 9:07:58 AM12/7/10

to Peter Zijlstra, Heiko Carstens, Ingo Molnar, Frederic Weisbecker, Paul Mackerras, linux-...@vger.kernel.org

On Tue, 7 Dec 2010, Peter Zijlstra wrote:
> On Tue, 2010-12-07 at 14:29 +0100, Heiko Carstens wrote:
> > > As far as I could see this function gets called from process context with
> > > a spinlock held and hence we don't have any guarantee that this pending
> > > softirq get executed before the idle task gets scheduled and tries to
> > > disable the tick.
> > >
> > > The easiest fix would be to set wakeup to one (see patch below), but I guess
> > > there is a reason why its zero. Anybody?
>
> We can start that hrtimer from within the scheduler function while
> holding the rq->lock, doing a wakeup from there is not sane.
>
> The best solution would be to fix the hrtimer_start*() interface,
> something Thomas and I have wanted to do for ages but because we've
> procrastinated is now a much larger job than it was :/
>
> The whole HRTIMER_SOFTIRQ thing should die.. but for that to happen its
> only use-case today must first go.
>
> The problem is trying to start a timer with already elapsed time.
> Preferably hrtimer_start*() would simply return -ETIME and let the
> caller sort it, sadly the current behaviour is to 'fix' it for the
> caller by enqueueing the timer onto the softirq list and raising the
> softirq.
>
> I guess we could make hrtimer_start*(.wakeup=false) return the -ENOTIME
> thing and audit those few use-cases.

That would be sensible anyway.

[BUG] "perf top" results in "NOHZ: local_softirq_pending 100"

Heiko Carstens

Heiko Carstens

Peter Zijlstra

Thomas Gleixner