Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH, RFC] kthread: (possibly) a missing memory barrier in kthread_stop()

8 views
Skip to first unread message

Dmitry Adamushko

unread,
Feb 18, 2008, 6:04:15 PM2/18/08
to linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Peter Zijlstra, Rusty Russel, dmitry.a...@gmail.com

Hi,


[ description ]

Subject: kthread: add a memory barrier to kthread_stop()

'kthread' threads do a check in the following order:
- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();

and set_current_state() implies an smp_mb().

on another side (kthread_stop), wake_up_process() does not seem to
guarantee a full mb.

And 'kthread_stop_info.k' must be visible before wake_up_process()
checks for/modifies a state of the 'kthread' task.

(the patch is at the end of the message)


[ more detailed description ]

the current code might well be safe in case a to-be-stopped 'kthread'
task is _not_ running on another CPU at the moment when kthread_stop()
is called (in this case, 'rq->lock' will act as a kind of synch.
point/barrier).

Another case is as follows:

CPU#0:

..
while (kthread_should_stop()) {

if (condition)
schedule();

/* ... do something useful ... */ <--- EIP

set_current_state(TASK_INTERRUPTIBLE);
}

so a 'kthread' task is about to call
set_current_state(TASK_INTERRUPTIBLE) ...


(in the mean time)

CPU#1:

kthread_stop()

-> kthread_stop_info.k = k (*)
-> wake_up_process()

wake_up_process() looks like:

(try_to_wake_up)

IRQ_OFF
LOCK

old_state = p->state;
if (!(old_state & state)) (**)
goto out;

..

UNLOCK
IRQ_ON


let's suppose (*) and (**) are reordered
(according to Documentation/memory-barriers.txt, neither IRQ_OFF nor
LOCK may prevent it from happening).

- the state is TASK_RUNNING, so we are about to return.

- CPU#1 is about to execute (*) (it's guaranteed to be done before
spin_unlock(&rq->lock) at the end of try_to_wake_up())


(in the mean time)

CPU#0:

- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();

here, kthread_stop_info.k is not yet visible

- schedule()

..

we missed a 'kthread_stop' event.

hum?


TIA,

---

From: Dmitry Adamushko <dmitry.a...@gmail.com>
Subject: kthread: add a memory barrier to kthread_stop()

'kthread' threads do a check in the following order:
- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();

and set_current_state() implies an smp_mb().

on another side (kthread_stop), wake_up_process() is not guaranteed to
act as a full mb.

'kthread_stop_info.k' must be visible before wake_up_process() checks
for/modifies a state of the 'kthread' task.


Signed-off-by: Dmitry Adamushko <dmitry.a...@gmail.com>


diff --git a/kernel/kthread.c b/kernel/kthread.c
index 0ac8878..5167110 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -211,6 +211,10 @@ int kthread_stop(struct task_struct *k)

/* Now set kthread_should_stop() to true, and wake it up. */
kthread_stop_info.k = k;
+
+ /* The previous store operation must not get ahead of the wakeup. */
+ smp_mb();
+
wake_up_process(k);
put_task_struct(k);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Nick Piggin

unread,
Feb 19, 2008, 1:46:13 AM2/19/08
to Dmitry Adamushko, linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Peter Zijlstra, Rusty Russel
On Tuesday 19 February 2008 10:03, Dmitry Adamushko wrote:
> Hi,
>
>
> [ description ]
>
> Subject: kthread: add a memory barrier to kthread_stop()
>
> 'kthread' threads do a check in the following order:
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> and set_current_state() implies an smp_mb().
>
> on another side (kthread_stop), wake_up_process() does not seem to
> guarantee a full mb.
>
> And 'kthread_stop_info.k' must be visible before wake_up_process()
> checks for/modifies a state of the 'kthread' task.
>
> (the patch is at the end of the message)
>
>
> [ more detailed description ]
>
> the current code might well be safe in case a to-be-stopped 'kthread'
> task is _not_ running on another CPU at the moment when kthread_stop()
> is called (in this case, 'rq->lock' will act as a kind of synch.
> point/barrier).
>
> Another case is as follows:
>
> CPU#0:
>
> ...

> while (kthread_should_stop()) {
>
> if (condition)
> schedule();
>
> /* ... do something useful ... */ <--- EIP
>
> set_current_state(TASK_INTERRUPTIBLE);
> }
>
> so a 'kthread' task is about to call
> set_current_state(TASK_INTERRUPTIBLE) ...
>
>
> (in the mean time)
>
> CPU#1:
>
> kthread_stop()
>
> -> kthread_stop_info.k = k (*)
> -> wake_up_process()
>
> wake_up_process() looks like:
>
> (try_to_wake_up)
>
> IRQ_OFF
> LOCK
>
> old_state = p->state;
> if (!(old_state & state)) (**)
> goto out;
>
> ...

>
> UNLOCK
> IRQ_ON
>
>
> let's suppose (*) and (**) are reordered
> (according to Documentation/memory-barriers.txt, neither IRQ_OFF nor
> LOCK may prevent it from happening).
>
> - the state is TASK_RUNNING, so we are about to return.
>
> - CPU#1 is about to execute (*) (it's guaranteed to be done before
> spin_unlock(&rq->lock) at the end of try_to_wake_up())
>
>
> (in the mean time)
>
> CPU#0:
>
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> here, kthread_stop_info.k is not yet visible
>
> - schedule()
>
> ...

>
> we missed a 'kthread_stop' event.
>
> hum?

Looks like you are correct to me.

Peter Zijlstra

unread,
Feb 19, 2008, 4:25:35 AM2/19/08
to Nick Piggin, Dmitry Adamushko, linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Rusty Russel, Paul E. McKenney

Does this not also imply you need a matching barrier in
kthread_should_stop() ?

Dmitry Adamushko

unread,
Feb 19, 2008, 4:53:54 AM2/19/08
to Peter Zijlstra, Nick Piggin, linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Rusty Russel, Paul E. McKenney
On 19/02/2008, Peter Zijlstra <a.p.zi...@chello.nl> wrote:
> [ ... ]

> > >
> > > From: Dmitry Adamushko <dmitry.a...@gmail.com>
> > > Subject: kthread: add a memory barrier to kthread_stop()
> > >
> > > 'kthread' threads do a check in the following order:
> > > - set_current_state(TASK_INTERRUPTIBLE);
> > > - kthread_should_stop();
> > >
> > > and set_current_state() implies an smp_mb().
> > >
> > > on another side (kthread_stop), wake_up_process() is not guaranteed to
> > > act as a full mb.
> > >
> > > 'kthread_stop_info.k' must be visible before wake_up_process() checks
> > > for/modifies a state of the 'kthread' task.
> > >
> > >
> > > Signed-off-by: Dmitry Adamushko <dmitry.a...@gmail.com>
> > >
> > >
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 0ac8878..5167110 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -211,6 +211,10 @@ int kthread_stop(struct task_struct *k)
> > >
> > > /* Now set kthread_should_stop() to true, and wake it up. */
> > > kthread_stop_info.k = k;
> > > +
> > > + /* The previous store operation must not get ahead of the wakeup. */
> > > + smp_mb();
>
> Does this not also imply you need a matching barrier in
> kthread_should_stop() ?

Yes, but only when it's used in combination with something that alters
a state of the task.

So it's rather a question of the interface-design.

We currently impose a requirement on how a main loop of 'kthread'
threads (ok, so it seems to dictate a policy :-) has to be orginized.
Namely, the following sequence must be kept in order:

(1) set_current_task(TASK_INTERRUPTIBLE);
(2) kthread_should_stop()
..
- schedule()

and (1) already provides a mb which becomes a "matching barrier" on
the kthread_should_stop() side.


--
Best regards,
Dmitry Adamushko

Andy Whitcroft

unread,
Feb 19, 2008, 8:01:03 AM2/19/08
to Dmitry Adamushko, linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Peter Zijlstra, Rusty Russel
On Tue, Feb 19, 2008 at 12:03:37AM +0100, Dmitry Adamushko wrote:
>
> Hi,
>
>
> [ description ]
>
> Subject: kthread: add a memory barrier to kthread_stop()
>
> 'kthread' threads do a check in the following order:
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> and set_current_state() implies an smp_mb().
>
> on another side (kthread_stop), wake_up_process() does not seem to
> guarantee a full mb.
>
> And 'kthread_stop_info.k' must be visible before wake_up_process()
> checks for/modifies a state of the 'kthread' task.
>
> (the patch is at the end of the message)
>
>
> [ more detailed description ]
>
> the current code might well be safe in case a to-be-stopped 'kthread'
> task is _not_ running on another CPU at the moment when kthread_stop()
> is called (in this case, 'rq->lock' will act as a kind of synch.
> point/barrier).
>
> Another case is as follows:
>
> CPU#0:
>
> ...

> while (kthread_should_stop()) {
>
> if (condition)
> schedule();
>
> /* ... do something useful ... */ <--- EIP
>
> set_current_state(TASK_INTERRUPTIBLE);
> }
>
> so a 'kthread' task is about to call
> set_current_state(TASK_INTERRUPTIBLE) ...
>
>
> (in the mean time)
>
> CPU#1:
>
> kthread_stop()
>
> -> kthread_stop_info.k = k (*)
> -> wake_up_process()
>
> wake_up_process() looks like:
>
> (try_to_wake_up)
>
> IRQ_OFF
> LOCK
>
> old_state = p->state;
> if (!(old_state & state)) (**)
> goto out;
>
> ...

>
> UNLOCK
> IRQ_ON
>
>
> let's suppose (*) and (**) are reordered
> (according to Documentation/memory-barriers.txt, neither IRQ_OFF nor
> LOCK may prevent it from happening).
>
> - the state is TASK_RUNNING, so we are about to return.
>
> - CPU#1 is about to execute (*) (it's guaranteed to be done before
> spin_unlock(&rq->lock) at the end of try_to_wake_up())
>
>
> (in the mean time)
>
> CPU#0:
>
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> here, kthread_stop_info.k is not yet visible
>
> - schedule()
>
> ...

The rules as written do seem to support your theory. The CPU has every
right to delay the .k = k as late as the UNLOCK operation.

On the read-side there is a full barrier implied by the
set_current_state(TASK_INTERRUPTIBLE), however this synchronises us with
the current global state, which may well not have the updated version
of .k.

That seems to imply that a write memory barrier would be sufficient to
cover this.

So three comments. First, should this not be an smp_wmb. Second, this
memory barrier is paired with the barrier in
set_current_state(TASK_INTERRUPTIBLE) and that probabally should be
documented as part of this patch. Finally, I think the comment as is is
hard to understand I got the sense of it backwards on first reading;
perhaps something like this:

/*
* Ensure kthread_stop_info.k is visible before wakeup, paired
* with barrier in set_current_state().
*/

-apw

Dmitry Adamushko

unread,
Feb 19, 2008, 8:12:04 AM2/19/08
to Andy Whitcroft, linux-...@vger.kernel.org, Ingo Molnar, Andrew Morton, Peter Zijlstra, Rusty Russel
On 19/02/2008, Andy Whitcroft <a...@shadowen.org> wrote:
> [ ... ]

>
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 0ac8878..5167110 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -211,6 +211,10 @@ int kthread_stop(struct task_struct *k)
> >
> > /* Now set kthread_should_stop() to true, and wake it up. */
> > kthread_stop_info.k = k;
> > +
> > + /* The previous store operation must not get ahead of the wakeup. */
> > + smp_mb();
> > +
> > wake_up_process(k);
> > put_task_struct(k);
>
> The rules as written do seem to support your theory. The CPU has every
> right to delay the .k = k as late as the UNLOCK operation.
>
> On the read-side there is a full barrier implied by the
> set_current_state(TASK_INTERRUPTIBLE), however this synchronises us with
> the current global state, which may well not have the updated version
> of .k.

yes.

>
> That seems to imply that a write memory barrier would be sufficient to
> cover this.
>
> So three comments. First, should this not be an smp_wmb.

No. We also need to be sure that ".k = k" is updated by the moment we
check for a state of the task in try_to_wake_up(), so that's write vs.
read ops.

The point is that a 'kthread' loop does :

(1) set TASK_INTERRUPTIBLE
(2) check for .k == k

and kthread_stop() must do it in the _reverse_ order:

(1) .k = k
(2) check for a task state and wakeup if necessary.

Only this way we ensure that a wakeup is not lost.

> Second, this
> memory barrier is paired with the barrier in
> set_current_state(TASK_INTERRUPTIBLE) and that probabally should be
> documented as part of this patch. Finally, I think the comment as is is
> hard to understand I got the sense of it backwards on first reading;
> perhaps something like this:
>
> /*
> * Ensure kthread_stop_info.k is visible before wakeup, paired
> * with barrier in set_current_state().
> */

Yes, I'll try to come up with a better description.


>
> -apw
>

--
Best regards,
Dmitry Adamushko

Dmitry Adamushko

unread,
Feb 19, 2008, 8:42:26 AM2/19/08
to linux-...@vger.kernel.org, Nick Piggin, Ingo Molnar, Andrew Morton, Rusty Russel, Paul E. McKenney, Andy Whitcroft, Peter Zijlstra
btw.,

(a bit more of 'nit-picking' :-)

to work properly, kthread_stop() should also impose one of the
following requirements on a 'kthread' loop:

- a loop can be interrupted _only_ as a result of
'kthread_should_stop() == true'
and no concurrent kthread_stop() calls are possible;

or

- if it can exit due to another reason, a user has to use additional
synchronization means to make sure than kthread_stop() is never
called/running after a main loop has finished (makes sense as 'struct
task_struct *' can be 'obsolete')

otherwise,

note, the comment in kthread() that says "It might have exited on its
own, w/o kthread_stop. Check."

so let's suppose a 'kthread' is really "exiting on its own" and at the
same time, kthread_stop() is running on another CPU... as a result, we
may end up with kthread_stop() being blocked on
wait_for_completion(&kthread_stop_info.done) without anybody to call
complete().

Although, the requirements above don't seem to be so insane in this case.


static int kthread(void *_create)
{
..
if (!kthread_should_stop())
ret = threadfn(data); <---- our main loop is
inside this function

/* It might have exited on its own, w/o kthread_stop. Check. */
if (kthread_should_stop()) {
kthread_stop_info.err = ret;
complete(&kthread_stop_info.done);
}
return 0;
}

--
Best regards,
Dmitry Adamushko

Dmitry Adamushko

unread,
Feb 19, 2008, 5:52:51 PM2/19/08
to linux-...@vger.kernel.org, Nick Piggin, Ingo Molnar, Andrew Morton, Rusty Russel, Paul E. McKenney, Andy Whitcroft, Peter Zijlstra, Linus Torvalds
humm... following the same logic, there is also a problem in kthread.c.

(1) the main loop of kthreadd() :

set_current_state(TASK_INTERRUPTIBLE);
if (list_empty(&kthread_create_list))
schedule();

and

(2) kthread_create() does:

spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);

provided,

- (1) doesn't want to take the 'kthread_create_lock' in order to do a
check for list_empty(&kthread_create_list)
[ which can be possible if list_empty() results in a single word-size
aligned read op. -- which is guaranteed to be atomic on any arch, iirc
]

and

- (1) and (2) can run in parallel.

then it's crucial that a modification of the list (i.e.
list_add_tail()) is completed by the moment a state of the task
(kthreadd_task->state) is checked in try_to_wake_up(). i.e. they must
not be re-ordered.

which makes me think that try_to_wake_up() could be better off just
acting as a full mb.

otherwise, a possible fix would be:

this way we get a pair of UNLOCK/LOCK which is guaranteed to be a full mb
(the LOCK is in try_to_wake_up())

[ moreover, there seems to be no need to call wake_up_process() with
'kthread_create_lock' being held ]

--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -145,8 +145,8 @@ struct task_struct *kthread_create(int
(*threadfn)(void *data),

spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
- wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);
+ wake_up_process(kthreadd_task);

wait_for_completion(&create.done);

0 new messages