add_timer_on: in-kernel users _all

Mathieu Desnoyers

unread,

Feb 18, 2010, 9:40:02 AM2/18/10

to

* Mathieu Desnoyers (com...@krystal.dyndns.org) wrote:
> * Thomas Gleixner (tg...@linutronix.de) wrote:
> > On Tue, 16 Feb 2010, Mathieu Desnoyers wrote:
> > > > The function is called from an IPI. That's a LTTNG problem, not a RT one.
> > >
> > > I use del_timer in IPI to delete lttng per-cpu timers on all CPUs. I
> > > have to do this because timers created with add_timer_on are documented
> > > to be incompatible with del_timer_sync():
> > >
> > > * Synchronization rules: Callers must prevent restarting of the timer,
> > > * otherwise this function is meaningless. It must not be called from
> > > * interrupt contexts. The caller must not hold locks which would prevent
> > > * completion of the timer's handler. The timer's handler must not call
> > > * add_timer_on(). Upon exit the timer is not queued and the handler is
> > > * not running on any CPU.
> >
> > Errm. The documentation says:
> >
> > "The timer's handler must not call add_timer_on()."
> >
> > It's not talking about a timer which was initialized with
> > add_timer_on().
> >
> > And your per cpu timer handlers have no requirement to call
> > add_timer_on() simply because add/mod_timer() is requeueing the timer
> > on the same cpu on which the handler runs.
> >
> > So the IPI is just a solution for a non existing problem.
>
> Oh, right. Thanks for the explanation. I'll look into moving LTTng to a
> saner del_timer_sync() scheme to delete the timers.

Double-checking this:

add_timer_on() needs to be paired with mod_timer_pinned(), otherwise
NO_HZ SMP config can rebalance the timer to a different CPU. I am fixing
this in lttng 0.194. These per-cpu timers, of course, should usually be
deferrable (they are in lttng).

(looking at kernel 2.6.32.4 here)
Looking at the kernel/time/clocksource.c watchdog, I wonder how
del_timer manages to synchronize the timer teardown. The handler,
clocksource_watchdog(), uses add_timer_on(), which prohibits using
del_timer_sync(). This seems rather odd. If we remove the watchdog and
re-add it, it may still be in use while we initialize the timer
structure.

Also, net/core/drop_monitor.c trace_drop_common usage of add_timer_on
seems odd:

Executing (AFAIK) with preempt on, data points to a per-cpu timer:

if (!timer_pending(&data->send_timer)) {
data->send_timer.expires = jiffies + dm_delay * HZ;
add_timer_on(&data->send_timer, smp_processor_id());
}

How is timer_pending synchronized with the target CPU timer wheel ?

Wait, there's more: arch/x86/kernel/cpu/mcheck/mce.c uses both
add_timer_on in its handler and del_timer_sync (which is incorrect).

arch/x86/kernel/apic/x2apic_uv_x.c almost has it right, but maybe it
should use del_timer_sync ?

arch/powerpc/platforms/chrp/setup.c should learn about
mod_timer_pinned().

Which leads to the following question: is there _any_ add_timer_on()
kernel user that's not currently buggy ? ;-) Maybe this calls for better
documentation of this interface. From what I've learn from digging into
cpufreq to debug its incorrect timer teardown last year, I fear there
are lots and lots of buggy per-cpu _and_ standard timer interface users
out there.

Maybe adding some debugging options, e.g. checking that a timer created
with add_timer_on is always modified by mod_timer_pinned, and is always
deferrable, and deleted by del_timer_sync could help discovering a
couple of outlawyer.

Thanks,

Mathieu

>
> Thanks,
>
> Mathieu
>
> >
> > Thanks,
> >
> > tglx
> >
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
>
> _______________________________________________
> ltt-dev mailing list
> ltt...@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Neil Horman

unread,

Feb 18, 2010, 11:20:02 AM2/18/10

to

Hm, I think I see your point here. You're suggesting that a call to one of the
tracepoint hooks in the drop monitor can race against a second call to the hook
from an interrupt context that pre-empted the first, leading to double add of
the timer? I agree, in fact I think its likely worse that that, the shared data
on the skb that I modify there can get corrupted in that case as well. I expect
a bit of refactoring paired with a local_irq_save/restore should fix that.

Thanks!
Neil

Mathieu Desnoyers

unread,

Feb 18, 2010, 11:50:01 AM2/18/10

to

Yes.

> I agree, in fact I think its likely worse that that, the shared data
> on the skb that I modify there can get corrupted in that case as well. I expect
> a bit of refactoring paired with a local_irq_save/restore should fix that.

For this current code, you don't seem to need to delete the timer. I think
you'll have to find a way to do the timer_pending check and the add_timer_on
operations atomically wrt timer wheel execution on the CPU (which can be a
remote cpu because preemption is enabled). Your case is a bit weird.. the
typical scenario here would be to use add_timer_on directly to re-arm the timer
for a future expiration time (removing the current pending timer
unconditionnally at the same time). But you leave the timer at its current
expiration time if present, and only if not present add it. I wonder if it's
really your intent ?

Thanks,

Mathieu

Neil Horman

unread,

Feb 18, 2010, 1:40:02 PM2/18/10

to

Yes, thats my intent. The idea is that when I note a packet is lost, I record
that in a data buffer that is part of an skb, and schedule a timer to send that
skb a second or so later. Other drops in that time frame are recorded within
the same skb. Since the drop monitor isn't built as a module currently, I don't
need to worry about deleting the timer at all, I can just let it expire.

Sounds like I need to wrap up the data extract and time modifications in a
prempt_disable/enable and local_irq_save/restore set
Neil

> Thanks,
>
> Mathieu

Mathieu Desnoyers

unread,

Feb 23, 2010, 6:20:02 PM2/23/10

to

I fear this will probably break the RT kernel, because the timer spinlocks
become sleepable (I had the same issue within LTTng). Another approach you could
try is to add a new member to the timer API, which hold the local timer base
lock while:

if the timer is already pending
- waits for already executing timers to complete their execution.
- add_timer_on local cpu with new expiration.
else
- do nothing

Thanks,

Mathieu

add_timer_on: in-kernel users _all_ buggy ?

Mathieu Desnoyers

Neil Horman

Mathieu Desnoyers

Neil Horman

Mathieu Desnoyers