[ description ]
Subject: kthread: add a memory barrier to kthread_stop()
'kthread' threads do a check in the following order:
- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();
and set_current_state() implies an smp_mb().
on another side (kthread_stop), wake_up_process() does not seem to
guarantee a full mb.
And 'kthread_stop_info.k' must be visible before wake_up_process()
checks for/modifies a state of the 'kthread' task.
(the patch is at the end of the message)
[ more detailed description ]
the current code might well be safe in case a to-be-stopped 'kthread'
task is _not_ running on another CPU at the moment when kthread_stop()
is called (in this case, 'rq->lock' will act as a kind of synch.
point/barrier).
Another case is as follows:
CPU#0:
..
while (kthread_should_stop()) {
if (condition)
schedule();
/* ... do something useful ... */ <--- EIP
set_current_state(TASK_INTERRUPTIBLE);
}
so a 'kthread' task is about to call
set_current_state(TASK_INTERRUPTIBLE) ...
(in the mean time)
CPU#1:
kthread_stop()
-> kthread_stop_info.k = k (*)
-> wake_up_process()
wake_up_process() looks like:
(try_to_wake_up)
IRQ_OFF
LOCK
old_state = p->state;
if (!(old_state & state)) (**)
goto out;
..
UNLOCK
IRQ_ON
let's suppose (*) and (**) are reordered
(according to Documentation/memory-barriers.txt, neither IRQ_OFF nor
LOCK may prevent it from happening).
- the state is TASK_RUNNING, so we are about to return.
- CPU#1 is about to execute (*) (it's guaranteed to be done before
spin_unlock(&rq->lock) at the end of try_to_wake_up())
(in the mean time)
CPU#0:
- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();
here, kthread_stop_info.k is not yet visible
- schedule()
..
we missed a 'kthread_stop' event.
hum?
TIA,
---
From: Dmitry Adamushko <dmitry.a...@gmail.com>
Subject: kthread: add a memory barrier to kthread_stop()
'kthread' threads do a check in the following order:
- set_current_state(TASK_INTERRUPTIBLE);
- kthread_should_stop();
and set_current_state() implies an smp_mb().
on another side (kthread_stop), wake_up_process() is not guaranteed to
act as a full mb.
'kthread_stop_info.k' must be visible before wake_up_process() checks
for/modifies a state of the 'kthread' task.
Signed-off-by: Dmitry Adamushko <dmitry.a...@gmail.com>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 0ac8878..5167110 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -211,6 +211,10 @@ int kthread_stop(struct task_struct *k)
/* Now set kthread_should_stop() to true, and wake it up. */
kthread_stop_info.k = k;
+
+ /* The previous store operation must not get ahead of the wakeup. */
+ smp_mb();
+
wake_up_process(k);
put_task_struct(k);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Looks like you are correct to me.
Does this not also imply you need a matching barrier in
kthread_should_stop() ?
Yes, but only when it's used in combination with something that alters
a state of the task.
So it's rather a question of the interface-design.
We currently impose a requirement on how a main loop of 'kthread'
threads (ok, so it seems to dictate a policy :-) has to be orginized.
Namely, the following sequence must be kept in order:
(1) set_current_task(TASK_INTERRUPTIBLE);
(2) kthread_should_stop()
..
- schedule()
and (1) already provides a mb which becomes a "matching barrier" on
the kthread_should_stop() side.
--
Best regards,
Dmitry Adamushko
The rules as written do seem to support your theory. The CPU has every
right to delay the .k = k as late as the UNLOCK operation.
On the read-side there is a full barrier implied by the
set_current_state(TASK_INTERRUPTIBLE), however this synchronises us with
the current global state, which may well not have the updated version
of .k.
That seems to imply that a write memory barrier would be sufficient to
cover this.
So three comments. First, should this not be an smp_wmb. Second, this
memory barrier is paired with the barrier in
set_current_state(TASK_INTERRUPTIBLE) and that probabally should be
documented as part of this patch. Finally, I think the comment as is is
hard to understand I got the sense of it backwards on first reading;
perhaps something like this:
/*
* Ensure kthread_stop_info.k is visible before wakeup, paired
* with barrier in set_current_state().
*/
-apw
yes.
>
> That seems to imply that a write memory barrier would be sufficient to
> cover this.
>
> So three comments. First, should this not be an smp_wmb.
No. We also need to be sure that ".k = k" is updated by the moment we
check for a state of the task in try_to_wake_up(), so that's write vs.
read ops.
The point is that a 'kthread' loop does :
(1) set TASK_INTERRUPTIBLE
(2) check for .k == k
and kthread_stop() must do it in the _reverse_ order:
(1) .k = k
(2) check for a task state and wakeup if necessary.
Only this way we ensure that a wakeup is not lost.
> Second, this
> memory barrier is paired with the barrier in
> set_current_state(TASK_INTERRUPTIBLE) and that probabally should be
> documented as part of this patch. Finally, I think the comment as is is
> hard to understand I got the sense of it backwards on first reading;
> perhaps something like this:
>
> /*
> * Ensure kthread_stop_info.k is visible before wakeup, paired
> * with barrier in set_current_state().
> */
Yes, I'll try to come up with a better description.
>
> -apw
>
--
Best regards,
Dmitry Adamushko
(a bit more of 'nit-picking' :-)
to work properly, kthread_stop() should also impose one of the
following requirements on a 'kthread' loop:
- a loop can be interrupted _only_ as a result of
'kthread_should_stop() == true'
and no concurrent kthread_stop() calls are possible;
or
- if it can exit due to another reason, a user has to use additional
synchronization means to make sure than kthread_stop() is never
called/running after a main loop has finished (makes sense as 'struct
task_struct *' can be 'obsolete')
otherwise,
note, the comment in kthread() that says "It might have exited on its
own, w/o kthread_stop. Check."
so let's suppose a 'kthread' is really "exiting on its own" and at the
same time, kthread_stop() is running on another CPU... as a result, we
may end up with kthread_stop() being blocked on
wait_for_completion(&kthread_stop_info.done) without anybody to call
complete().
Although, the requirements above don't seem to be so insane in this case.
static int kthread(void *_create)
{
..
if (!kthread_should_stop())
ret = threadfn(data); <---- our main loop is
inside this function
/* It might have exited on its own, w/o kthread_stop. Check. */
if (kthread_should_stop()) {
kthread_stop_info.err = ret;
complete(&kthread_stop_info.done);
}
return 0;
}
--
Best regards,
Dmitry Adamushko
(1) the main loop of kthreadd() :
set_current_state(TASK_INTERRUPTIBLE);
if (list_empty(&kthread_create_list))
schedule();
and
(2) kthread_create() does:
spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);
provided,
- (1) doesn't want to take the 'kthread_create_lock' in order to do a
check for list_empty(&kthread_create_list)
[ which can be possible if list_empty() results in a single word-size
aligned read op. -- which is guaranteed to be atomic on any arch, iirc
]
and
- (1) and (2) can run in parallel.
then it's crucial that a modification of the list (i.e.
list_add_tail()) is completed by the moment a state of the task
(kthreadd_task->state) is checked in try_to_wake_up(). i.e. they must
not be re-ordered.
which makes me think that try_to_wake_up() could be better off just
acting as a full mb.
otherwise, a possible fix would be:
this way we get a pair of UNLOCK/LOCK which is guaranteed to be a full mb
(the LOCK is in try_to_wake_up())
[ moreover, there seems to be no need to call wake_up_process() with
'kthread_create_lock' being held ]
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -145,8 +145,8 @@ struct task_struct *kthread_create(int
(*threadfn)(void *data),
spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
- wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);
+ wake_up_process(kthreadd_task);
wait_for_completion(&create.done);