Hi Paul,
I've got the following race report (you may not look at it in detail,
explanation below):
==================================================================
ThreadSanitizer: data-race in __change_pid
Write at 0xffff880481892a48 of size 8 by thread 2962 on CPU 8:
[< inline >] __hlist_del include/linux/list.h:640
[< inline >] hlist_del_rcu include/linux/rculist.h:345
[<ffffffff810b81e5>] __change_pid+0x85/0x120 kernel/pid.c:410
[<ffffffff810b8bc3>] detach_pid+0x23/0x30 kernel/pid.c:422
[< inline >] __unhash_process kernel/exit.c:67
[< inline >] __exit_signal kernel/exit.c:140
[<ffffffff8108ccc2>] release_task+0x5f2/0xaa0 kernel/exit.c:184
[< inline >] wait_task_zombie kernel/exit.c:1111
[<ffffffff8108ef41>] wait_consider_task+0x9f1/0x18a0 kernel/exit.c:1354
[< inline >] do_wait_thread kernel/exit.c:1417
[<ffffffff8108ff9f>] do_wait+0x1af/0x3b0 kernel/exit.c:1488
[< inline >] SYSC_wait4 kernel/exit.c:1615
[<ffffffff810906d5>] SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188
Previous atomic read at 0xffff880481892a48 of size 8 by thread 2961 on CPU 5:
[<ffffffff810b7b19>] pid_task+0x29/0x60 kernel/pid.c:445
[<ffffffff810a072d>] kill_pid_info+0x2d/0x90 kernel/signal.c:1341
[< inline >] kill_something_info kernel/signal.c:1426
[<ffffffff810a0903>] SYSC_kill+0x123/0x2f0 kernel/signal.c:2906
[<ffffffff810a3520>] SyS_kill+0x20/0x30 kernel/signal.c:2896
[<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188
Mutexes locked by thread 2962:
Mutex 195 is locked here:
[< inline >] __raw_write_lock_irq include/linux/rwlock_api_smp.h:217
[<ffffffff81ee38d4>] _raw_write_lock_irq+0x64/0x70
kernel/locking/spinlock.c:311
[<ffffffff8108c745>] release_task+0x75/0xaa0 kernel/exit.c:182
[< inline >] wait_task_zombie kernel/exit.c:1111
[<ffffffff8108ef41>] wait_consider_task+0x9f1/0x18a0 kernel/exit.c:1354
[< inline >] do_wait_thread kernel/exit.c:1417
[<ffffffff8108ff9f>] do_wait+0x1af/0x3b0 kernel/exit.c:1488
[< inline >] SYSC_wait4 kernel/exit.c:1615
[<ffffffff810906d5>] SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188
Mutex 517812 is locked here:
[< inline >] spin_lock include/linux/spinlock.h:312
[< inline >] __exit_signal kernel/exit.c:93
[<ffffffff8108c7e4>] release_task+0x114/0xaa0 kernel/exit.c:184
[< inline >] wait_task_zombie kernel/exit.c:1111
[<ffffffff8108ef41>] wait_consider_task+0x9f1/0x18a0 kernel/exit.c:1354
[< inline >] do_wait_thread kernel/exit.c:1417
[<ffffffff8108ff9f>] do_wait+0x1af/0x3b0 kernel/exit.c:1488
[< inline >] SYSC_wait4 kernel/exit.c:1615
[<ffffffff810906d5>] SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188
Mutex 521594 is locked here:
[< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
[<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
[< inline >] spin_lock include/linux/spinlock.h:312
[< inline >] write_seqlock include/linux/seqlock.h:470
[< inline >] __exit_signal kernel/exit.c:127
[<ffffffff8108c871>] release_task+0x1a1/0xaa0 kernel/exit.c:184
[< inline >] wait_task_zombie kernel/exit.c:1111
[<ffffffff8108ef41>] wait_consider_task+0x9f1/0x18a0 kernel/exit.c:1354
[< inline >] do_wait_thread kernel/exit.c:1417
[<ffffffff8108ff9f>] do_wait+0x1af/0x3b0 kernel/exit.c:1488
[< inline >] SYSC_wait4 kernel/exit.c:1615
[<ffffffff810906d5>] SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188
==================================================================
The race is on "rculist". One thread makes rcu_dereference of list head:
struct task_struct *pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result = NULL;
if (pid) {
struct hlist_node *first;
first =
rcu_dereference_check(hlist_first_rcu(&pid->tasks[type]),
lockdep_tasklist_lock_is_held());
if (first)
result = hlist_entry(first, struct
task_struct, pids[(type)].node);
}
return result;
}
EXPORT_SYMBOL(pid_task);
While another thread deletes the first element of the list:
static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next;
struct hlist_node **pprev = n->pprev;
*pprev = next;
if (next)
next->pprev = pprev;
}
static inline void hlist_del_rcu(struct hlist_node *n)
{
__hlist_del(n);
n->pprev = LIST_POISON2;
}
KTSAN does not like the plain assignment "*pprev = next;" in __hlist_del.
As far as I understand the intention behind is that when you remove an
element you do not make any new elements available to lock-free
readers, and so you don't need to do release (rcu_assign_pointer). Is
it correct?
This is essentially a transitive pointer passage:
// global data
T *g1, *g2;
// thread 1 (this is a thread that allocated a node)
T *p = alloc_t();
rcu_assign_pointer(&g1, p); // release
// thread 2 (this is the thread that does hlist_del_rcu, moves node
from head->next to head)
T *p = *g1;
*g2 = p;
// thread 3
T *p = rcu_dereference(&g2); // acquire
use(p);
C/C++ memory model requires the intermediate thread to use
acquire/release to pass the pointer, not just the first and the last
threads in the chain. However, I am not sure whether the rule is
dictated by real hardware features, or merely an abstraction overhead.
Can you think of any architectures where such transitive pointer
passage w/o barriers can cause issues?
Thank you
--
Dmitry Vyukov, Software Engineer,
dvy...@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.