tasklet_kill will always hang for recursive tasklets on a UP

Nagendra Singh Tomar

unread,

Aug 25, 2003, 8:10:11 AM8/25/03

to

Hi,
While going thru the code for tasklet_kill(), I cannot figure out
how recursive tasklets (tasklets that schedule themselves from within
their tasklet handler) can be killed by this function. To me it looks that
tasklet_kill will never complete for such tasklets.

void tasklet_kill(struct tasklet_struct *t)
{
...
...
while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
current->state = TASK_RUNNING;
do
sys_sched_yield();
while (test_bit(TASKLET_STATE_SCHED, &t->state));
}
...
...
}

The above while loop will only exit if TASKLET_STATE_SCHED is not set
(tasklet is not scheduled).
Now if we see tasklet_action

static void tasklet_action(struct softirq_action *a)
{
...
...
if (!atomic_read(&t->count)) {
--> TASKLET_STATE_SCHED is set here
if(!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
--> if we schedule the tasklet inside its handler,
--> TASKLET_STATE_SCHED will be set here also
tasklet_unlock(t);
continue;
}
...
...
}

The only small window when TASKLET_STATE_SCHED is not set is between the
time when test_and_clear_bit above clears it and by the time the tasklet
handler again calls tasklet_schedule(). But since tasklet_kill is called
from user context the while loop in tasklet_kill checking for
TASKLET_STATE_SCHED to be cleared cannot interleave between the above two
lines in tasklet_action and hence tasklet_kill will never come out of the
while loop.
This is true only for UP machines.

Pleae point me out if I am missing something.

Thanx
tomar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Nagendra Singh Tomar

unread,

Aug 25, 2003, 10:10:14 AM8/25/03

to

Sorry, I forgot to mention my kernel version.
It is 2.14.18-14 (RH-8.0 stock kernel)

Thanx
tomar

Juergen Quade

unread,

Aug 25, 2003, 10:20:14 AM8/25/03

to

> Hi,
> While going thru the code for tasklet_kill(), I cannot figure out
> how recursive tasklets (tasklets that schedule themselves from within
> their tasklet handler) can be killed by this function. To me it looks that
> tasklet_kill will never complete for such tasklets.

It is realy a sophisticated piece of code! I think it is not
the only bug you found. Some weeks ago I pointed out another
problem with tasklet_kill but got no answer.

To work our questions out is not done in just 1 minute :-(
And I was not able to find the person, who is responsible for the code.

As far as I can see, you missed nothing.
The tasklet enters itself to the "task_vec" list, because the
SCHED-Bit is always resetted, when "tasklet_schedule" is called.
It will always succeed.

Maybe you have a look to another (my) problem:

The function "tasklet_schedule" schedules a tasklet only, if the SCHED-Bit
ist _not_ set. So the trick is, to _set_ the SCHED-Bit and
to _not_ enter the tasklet in the "task_vec" list (ok, you showed
that this trick can fail). But anyway, if you look at the
code, tasklet_kill resets the bit in any case!!! It would have to
set the bit, not to reset it. Any comments?

void tasklet_kill(struct tasklet_struct *t)
{
...

while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
do
yield();
while (test_bit(TASKLET_STATE_SCHED, &t->state));
}
tasklet_unlock_wait(t);
clear_bit(TASKLET_STATE_SCHED, &t->state);

Nagendra Singh Tomar

unread,

Aug 26, 2003, 1:20:08 AM8/26/03

to

Hi Juergen,
Thanx for ur inputs. I think that I am missing something in ur
explanation. Can u please elaborate. In the meantime, the approach that I
will like is to have another state TASKLET_STATE_KILLED so the code
changes that need to be done are

void tasklet_kill(struct tasklet_struct *t)
{

...
...
/*
* Mark the tasklet as killed, so the next time around
* tasklet_action does not call the handler for this tasklet
*/
set_bit(TASKLET_STATE_KILLED, &t->state); <-- ADDED

while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
current->state = TASK_RUNNING;
do
sys_sched_yield();
while (test_bit(TASKLET_STATE_SCHED, &t->state));
}
...
...
}

Now inside tasklet_action if the state is killed we will not call the
tasklet handler, thus not giving recursive tasklets to again schedule.

static void tasklet_action(struct softirq_action *a)
{
...
...
if (!atomic_read(&t->count)) {

if(!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
/*
* If the tasklet_kill has been called for this tasklet,
* don't run it again, else we have a hang
*/
if(!test_bit(TASKLET_STATE_KILLED, &t->state)) <-- ADDED
t->func(t->data);
tasklet_unlock(t);
continue;
}
...
...
}

Thanx
tomar

Werner Almesberger

unread,

Aug 26, 2003, 1:50:15 AM8/26/03

to

Nagendra Singh Tomar wrote:
> While going thru the code for tasklet_kill(), I cannot figure out
> how recursive tasklets (tasklets that schedule themselves from within
> their tasklet handler) can be killed by this function. To me it looks that
> tasklet_kill will never complete for such tasklets.

That's also what I found when looking at it a while ago. This isn't
necessarily a bug of tasklet_kill, just some behaviour that needs
to be documented.

You can always introduce a flag that tells the tasklet if it should
reschedule itself, and clear that flag before calling tasklet_kill.

When I looked at it (I think this was in some 2.4 kernel), it also
seemed that tasklet_kill could loop forever if the tasklet is
scheduled but disabled.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina w...@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

Nagendra Singh Tomar

unread,

Aug 26, 2003, 3:00:08 AM8/26/03

to

Hi,

On Tue, 26 Aug 2003, Werner Almesberger wrote:

> Nagendra Singh Tomar wrote:
> > While going thru the code for tasklet_kill(), I cannot figure
> out
> > how recursive tasklets (tasklets that schedule themselves from within
> > their tasklet handler) can be killed by this function. To me it looks
> that
> > tasklet_kill will never complete for such tasklets.
>
> That's also what I found when looking at it a while ago. This isn't
> necessarily a bug of tasklet_kill, just some behaviour that needs
> to be documented.

I fail to understand this. How can we say that its not a bug. If we
support recursive tasklets, we should support killing them also. If we can
do it why not do it. Is there any reason for that.

>
> You can always introduce a flag that tells the tasklet if it should
> reschedule itself, and clear that flag before calling tasklet_kill.
>
> When I looked at it (I think this was in some 2.4 kernel), it also
> seemed that tasklet_kill could loop forever if the tasklet is
> scheduled but disabled.

You r right. Its a similar problem. TASKLET_STATE_SCHED will never get
reset for disabled tasklets.
I feel that these issues can be addresses easily by adding a couple of
checks.

>
> - Werner
>
>

Thanx
tomar

Werner Almesberger

unread,

Aug 26, 2003, 3:50:06 AM8/26/03

to

Nagendra Singh Tomar wrote:
> I fail to understand this. How can we say that its not a bug. If we
> support recursive tasklets, we should support killing them also. If we can
> do it why not do it. Is there any reason for that.

It's just a question of how you define "to kill" :-) But the
naming is ambiguous, because people may indeed expect
tasklet_kill to work like kill(2).

Obviously, tasklet_kill could set a flag that prevents a
tasklet from rescheduling itself. But then you'd change
the semantics of tasklet_schedule, and in many cases, you'd
still need some flag to tell you what has happened.

Example: if a tasklet allocates some resources, and frees
them when running the next time, you'd need a flag that
tells the caller(s) of tasklet_kill whether there are
still such resources that need freeing.

The current mechanism makes sure that the tasklet will
execute one last time, if scheduled before tasklet_kill.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina w...@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

Juergen Quade

unread,

Aug 26, 2003, 4:50:09 AM8/26/03

to

On Tue, Aug 26, 2003 at 04:38:02AM -0300, Werner Almesberger wrote:
> Nagendra Singh Tomar wrote:
> > I fail to understand this. How can we say that its not a bug. If we
> > support recursive tasklets, we should support killing them also. If we can
> > do it why not do it. Is there any reason for that.
>
> It's just a question of how you define "to kill" :-) But the
> naming is ambiguous, because people may indeed expect
> tasklet_kill to work like kill(2).

> ...

> Example: if a tasklet allocates some resources, and frees
> them when running the next time, you'd need a flag that
> tells the caller(s) of tasklet_kill whether there are
> still such resources that need freeing.

Is it really used in this way somewhere?

I always thought, a tasklet is a self-contained
(lets say stateless) function.

For more we have kernel-threads.

> The current mechanism makes sure that the tasklet will
> execute one last time, if scheduled before tasklet_kill.

If your tasklet has states, you can't know, in which
state it is, when you call "tasklet_kill".
Can this work reliable?

Juergen.

Werner Almesberger

unread,

Aug 26, 2003, 2:10:06 PM8/26/03

to

Juergen Quade wrote:
> Can this work reliable?

Hmm, actually, no. On UP, yes. But on SMP, you might tasklet_kill
while the tasklet is running, but before it has had a chance to
tasklet_schedule itself. tasklet_schedule will have no effect in
this case.

Alexey, if my observation is correct, the property

| * If tasklet_schedule() is called, then tasklet is guaranteed
| to be executed on some cpu at least once after this.

does not hold if using tasklet_kill on SMP.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina w...@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

kuz...@ms2.inr.ac.ru

unread,

Aug 26, 2003, 9:50:15 PM8/26/03

to

Hello!

> Hmm, actually, no. On UP, yes. But on SMP, you might tasklet_kill
> while the tasklet is running, but before it has had a chance to
> tasklet_schedule itself. tasklet_schedule will have no effect in
> this case.
>
> Alexey, if my observation is correct, the property
>
> | * If tasklet_schedule() is called, then tasklet is guaranteed
> | to be executed on some cpu at least once after this.
>
> does not hold if using tasklet_kill on SMP.

It still holds. tasklet_kill just waits for completion of scheduled
events. Well, it _assumes_ that cpu which calls tasklet_schedule
does not try to wake the tasklet after death. But it is from area
of pure scholastics already: waker and killer have to synchronize in some
way anyway.

Alexey

Nagendra Singh Tomar

unread,

Aug 27, 2003, 12:30:12 AM8/27/03

to

Hi,

On Wed, 27 Aug 2003 kuz...@ms2.inr.ac.ru wrote:

> Hello!
>
> > Hmm, actually, no. On UP, yes. But on SMP, you might tasklet_kill
> > while the tasklet is running, but before it has had a chance to
> > tasklet_schedule itself. tasklet_schedule will have no effect in
> > this case.
> >
> > Alexey, if my observation is correct, the property
> >
> > | * If tasklet_schedule() is called, then tasklet is guaranteed
> > | to be executed on some cpu at least once after this.
> >
> > does not hold if using tasklet_kill on SMP.
>
> It still holds. tasklet_kill just waits for completion of scheduled
> events. Well, it _assumes_ that cpu which calls tasklet_schedule
> does not try to wake the tasklet after death. But it is from area
> of pure scholastics already: waker and killer have to synchronize in
> some
> way anyway.

I didn't really understand this one. What Werner says seems correct
though. For recursive tasklets one of the two things are bound to happen.
Either the tasklet_kill hangs (which will happen on UP) or one (read last)
tasklet_schedule is not honoured (which will happen on SMP). On SMP the
user context tasklet_kill gets a chance to exit the
"while (test_bit(TASKLET_STATE_SCHED, &t->state))" loop as it gets a
chance to run parallely with tasklet_action in the small window (during
which TASKLET_STATE_SCHED is not set) that I mentioned in one of my
earlier mails.

So I believe that at least one (to be precise, the last one called before
tasklet dies) tasklet_schedule is not honoured.

Thanx,

tomar

Juergen Quade

unread,

Aug 27, 2003, 2:30:19 PM8/27/03

to

> Thanx for ur inputs. I think that I am missing something in ur
> explanation. Can u please elaborate. In the meantime, the approach that I

Maybe it is easier we make it the other way round.
If you look at the code of tasklet_kill:

void tasklet_kill(struct tasklet_struct *t)
{
if (in_interrupt())
printk("Attempt to kill tasklet from interrupt\n");

while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
do
yield();
while (test_bit(TASKLET_STATE_SCHED, &t->state));
}
tasklet_unlock_wait(t);
clear_bit(TASKLET_STATE_SCHED, &t->state);
}

Can you explain me, what the last statement
clear_bit(TASKLET_STATE_SCHED, &t->state);
is for?

> will like is to have another state TASKLET_STATE_KILLED so the code
> changes that need to be done are
>
> void tasklet_kill(struct tasklet_struct *t)
> {
>
> ...
> ...
> /*
> * Mark the tasklet as killed, so the next time around
> * tasklet_action does not call the handler for this tasklet
> */
> set_bit(TASKLET_STATE_KILLED, &t->state); <-- ADDED

> ...
>

What I don't like on this approach is, to add another flag (=state) to
the tasklet, which might make the world more complicated as necessary.
I will take some time to think about it, but can't do that today :-(

Beside this, if you can't use a function without looking
at the code and without experimenting with it, that
must lead to bugs! IMHO, here is a call for action.

Juergen.

Nagendra Singh Tomar

unread,

Aug 28, 2003, 2:00:21 AM8/28/03

to

Juergen,
The whole tasklet_kill function is a big confusion. It is a big
misnomer as Werner rightly said. For non-recursive tasklets this
function does not do anything. Its just an expensive "nop". If you simply
call tasklet_schedule after tasklet_kill, it will execute as nothing had
happened.
If we remove the line

clear_bit(TASKLET_STATE_SCHED, &t->state);

from tasklet_kill then tasklet_kill will have the desired effect of
"killing" the tasklet, tasklet_schedule() after tasklet_kill in that case,
will not call __tasklet_kill and hence it will not be queued to the CPU
queue and hence it will not run (desired effect).

tasklet_kill I believe was written to kill recursive tasklets only
(tasklets that schedule themseves from inside their handler), but as we
have seen for that particular case it hangs.
I feel either we remove tasklet_kill or fix it to do what it should.

Alexey, can u please comment on my observations.

Thanx,
tomar

kuz...@ms2.inr.ac.ru

unread,

Aug 28, 2003, 9:30:14 AM8/28/03

to

Hello!

> Either the tasklet_kill hangs (which will happen on UP)

Never can happen, unless you are trying to call tasklet_kill
from softirq context, which is illegal.

> So I believe that at least one (to be precise, the last one called before
> tasklet dies) tasklet_schedule is not honoured.

You cannot call tasklet_schedule while kill is called. As I said in previous
mail, preventing new schedules is responsibility of callers. tasklet struct
and control functions do not maintain any information about its state, it is
for client to handle this in his preferred way.

You are right when saying the name is misnomer. tasklet_kill does not kill,
it waits for the moment when tasklet becomes really dead after client
strangled it with his own hands.

Juergen Quade

unread,

Aug 28, 2003, 11:40:10 AM8/28/03

to

On Wed, Aug 27, 2003 at 11:16:52PM +0530, Nagendra Singh Tomar wrote:
> Juergen,
> The whole tasklet_kill function is a big confusion. It is a big
> misnomer as Werner rightly said. For non-recursive tasklets this
> function does not do anything. Its just an expensive "nop". If you simply
> call tasklet_schedule after tasklet_kill, it will execute as nothing had
> happened.
> If we remove the line
>
> clear_bit(TASKLET_STATE_SCHED, &t->state);
>
> from tasklet_kill then tasklet_kill will have the desired effect of
> "killing" the tasklet, tasklet_schedule() after tasklet_kill in that case,
> will not call __tasklet_kill and hence it will not be queued to the CPU
> queue and hence it will not run (desired effect).

Here we have it! In my opintion, the line

clear_bit(TASKLET_STATE_SCHED, &t->state);

is just a _BUG_. The programmer _wanted_ to write

set_bit(TASKLET_STATE_SCHED, &t->state);

In this case the function tasklet_kill _makes sense_ (beside
the problem of not working with recursive taskets)!
It will mostly be called in the cleanup function of a module
and - yes - it would be useful.

So in my opintion
1. we should fix the bug (very easy)
2. we should find some means to make it usable for recursive tasklets.

kuz...@ms2.inr.ac.ru

unread,

Aug 28, 2003, 12:00:21 PM8/28/03

to

Hello!

> Here we have it! In my opintion, the line
>
> clear_bit(TASKLET_STATE_SCHED, &t->state);
>
> is just a _BUG_.

No, really. The sense of tasklet_kill() is that tasklet is under complete
control of caller upon exit from it. This clear_bit just makes some (only
marginally useful) reinitialization for the case the user will want
to reuse the struct. Essentially, after tasklet_unlock_wait() you can do
everything with the struct, it is not an alive object anymore.

> 2. we should find some means to make it usable for recursive tasklets.

I would not say it is easy. When tasklet is enqueued on another cpu you
have no way to stop it unless you are in process context, where you can
sit and wait for completion.

Alexey

Juergen Quade

unread,

Aug 28, 2003, 12:20:16 PM8/28/03

to

On Thu, Aug 28, 2003 at 07:53:11PM +0400, kuz...@ms2.inr.ac.ru wrote:
> Hello!
>
> > Here we have it! In my opintion, the line
> >
> > clear_bit(TASKLET_STATE_SCHED, &t->state);
> >
> > is just a _BUG_.
>
> No, really. The sense of tasklet_kill() is that tasklet is under complete
> control of caller upon exit from it. This clear_bit just makes some (only
> marginally useful) reinitialization for the case the user will want
> to reuse the struct. Essentially, after tasklet_unlock_wait() you can do
> everything with the struct, it is not an alive object anymore.

Because the function as it is written is useless, but with
changing from "clear_bit" to "set_bit" it would be - at least partly -
useful, I still believe, it is a bug. Does anybody know, who is
responsible for the function?

> > 2. we should find some means to make it usable for recursive tasklets.
>
> I would not say it is easy. When tasklet is enqueued on another cpu you
> have no way to stop it unless you are in process context, where you can
> sit and wait for completion.

For sure, not easy.
But tasklet_kill will mostly be called in process context, won't it?

Juergen.

Werner Almesberger

unread,

Aug 29, 2003, 2:10:13 AM8/29/03

to

Juergen Quade wrote:
> useful, I still believe, it is a bug. Does anybody know, who is
> responsible for the function?

If it's not Alexey himself, I'm sure he knows who is :-)

> > > 2. we should find some means to make it usable for recursive tasklets.
> >
> > I would not say it is easy. When tasklet is enqueued on another cpu you
> > have no way to stop it unless you are in process context, where you can
> > sit and wait for completion.
>
> For sure, not easy.
> But tasklet_kill will mostly be called in process context, won't it?

Ah, a misunderstanding. You meant "can be used to kill 'recursive'
tasklets" (with "recursive" = re-schedules itself). Apparently,
Alexey understood "can be used from a tasklet".

The latter would basically mean to busy loop for the other tasklet
to be scheduled, run, and complete. Not nice.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina w...@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

Nagendra Singh Tomar

unread,

Aug 29, 2003, 2:10:15 AM8/29/03

to

On Thu, 28 Aug 2003 kuz...@ms2.inr.ac.ru wrote:

> Hello!
>
> > Either the tasklet_kill hangs (which will happen on UP)
>
> Never can happen, unless you are trying to call tasklet_kill
> from softirq context, which is illegal.

If I was not explicit, I meant that tasklet_kill called from process
context, for recursive tasklets will *always* hang on a UP machine at
least till 2.4 when the kernel was not premptible. And yes, "always". The
logic says that and experimentation also shows that.

>
>
> > So I believe that at least one (to be precise, the last one called
> before
> > tasklet dies) tasklet_schedule is not honoured.
>
> You cannot call tasklet_schedule while kill is called. As I said in
> previous
> mail, preventing new schedules is responsibility of callers. tasklet
> struct
> and control functions do not maintain any information about its state,
> it is
> for client to handle this in his preferred way.

So a better name would be wait_for_tasklet_completion. I think now I
understand the
intent. If somebody is unloading a module which has scheduled a tasklet,
the module cleanup function wants to be sure that the tasklet is not
sitting on any CPU queue waiting to be executed (if that happens the
tasklet might try to access the module address space and if that happens
after the module unload we will get an OOPS). Once tasklet_kill completes
the caller of tsaklet_kill has the responsibility to make sure that it is
not scheduled again (if it is scheduled it will again start running
happily as if nothing has happened)
All is fine, but the recursive tasklet problem is still there. We need
to add another state to tasklet TASKLET_STATE_KILLED which is set once
tasklet_kill is called. Once this is set tasklet_schedule just does not
schedule the tasklet.

>
> You are right when saying the name is misnomer. tasklet_kill does not
> kill,
> it waits for the moment when tasklet becomes really dead after client
> strangled it with his own hands.
>
> Alexey
>

Thanx for making things clear.
tomar

Werner Almesberger

unread,

Aug 29, 2003, 2:10:22 AM8/29/03

to

kuz...@ms2.inr.ac.ru wrote:
> It still holds. tasklet_kill just waits for completion of scheduled
> events. Well, it _assumes_ that cpu which calls tasklet_schedule
> does not try to wake the tasklet after death.

Well, the tasklet isn't dead yet - it's still running.

> But it is from area of pure scholastics already: waker and killer
> have to synchronize in some way anyway.

Yes, all I'm saying is that one can't rely on tasklet_kill to
make a self-rescheduling tasklet go away, which, given the name,
would seem a reasonably assumption.

Also, in this case, tasklet_schedule behaves differently on SMP.
So I'd suggest to resolve all this by clarifying that
tasklet_schedule must not be called while tasklet_kill is
executing.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina w...@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

kuz...@ms2.inr.ac.ru

unread,

Sep 4, 2003, 9:30:20 AM9/4/03

to

Hello!

> All is fine, but the recursive tasklet problem is still there. We need
> to add another state to tasklet TASKLET_STATE_KILLED which is set once
> tasklet_kill is called. Once this is set tasklet_schedule just does not
> schedule the tasklet.

Maybe. But all my past experience says me that it is some thing,
next to useless. Look, if you try to schedule some event not even caring
that the event handler is going to die, you do something really wrong.
State of death is connected not to tasklet but to source of events
which wake up the tasklet and need handling inside tasklet.
So, you just cannot tasklet_kill() before the source is shutdown and,
therefore, there are no good reasons to hold the bit inside the struct.

Alexey