[syzbot] [mm?] KCSAN: data-race in mtree_range_walk / rcu_segcblist_enqueue (2)

15 views
Skip to first unread message

syzbot

unread,
Jun 21, 2024, 9:29:26ā€ÆAMĀ (8 days ago)Ā Jun 21
to Liam.H...@oracle.com, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, lsto...@gmail.com, syzkall...@googlegroups.com, vba...@suse.cz
Hello,

syzbot found the following issue on:

HEAD commit: 50736169ecc8 Merge tag 'for-6.10-rc4-tag' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=164ec02a980000
kernel config: https://syzkaller.appspot.com/x/.config?x=704451bc2941bcb0
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb7d0f2fdb4229b9d67
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e4cbed12fec1/disk-50736169.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d50b5dcae4cd/vmlinux-50736169.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f2c14c5fcce2/bzImage-50736169.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9bb7d0...@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in mtree_range_walk / rcu_segcblist_enqueue

write to 0xffff888104077308 of 8 bytes by task 12265 on cpu 1:
rcu_segcblist_enqueue+0x67/0xb0 kernel/rcu/rcu_segcblist.c:345
rcutree_enqueue kernel/rcu/tree.c:2940 [inline]
call_rcu_core kernel/rcu/tree.c:2957 [inline]
__call_rcu_common kernel/rcu/tree.c:3093 [inline]
call_rcu+0x1bd/0x430 kernel/rcu/tree.c:3176
ma_free_rcu lib/maple_tree.c:197 [inline]
mas_free lib/maple_tree.c:1304 [inline]
mas_replace_node+0x2f8/0x440 lib/maple_tree.c:1741
mas_wr_node_store lib/maple_tree.c:3956 [inline]
mas_wr_modify+0x2bc3/0x3c90 lib/maple_tree.c:4189
mas_wr_store_entry+0x250/0x390 lib/maple_tree.c:4229
mas_store_prealloc+0x151/0x2b0 lib/maple_tree.c:5485
vma_iter_store mm/internal.h:1398 [inline]
vma_complete+0x3a7/0x760 mm/mmap.c:535
__split_vma+0x623/0x690 mm/mmap.c:2440
split_vma mm/mmap.c:2466 [inline]
vma_modify+0x198/0x1f0 mm/mmap.c:2507
vma_modify_flags include/linux/mm.h:3347 [inline]
mprotect_fixup+0x335/0x610 mm/mprotect.c:637
do_mprotect_pkey+0x673/0x9a0 mm/mprotect.c:820
__do_sys_mprotect mm/mprotect.c:841 [inline]
__se_sys_mprotect mm/mprotect.c:838 [inline]
__x64_sys_mprotect+0x48/0x60 mm/mprotect.c:838
x64_sys_call+0x26f5/0x2d70 arch/x86/include/generated/asm/syscalls_64.h:11
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff888104077308 of 8 bytes by task 12266 on cpu 0:
mtree_range_walk+0x140/0x460 lib/maple_tree.c:2774
mas_state_walk lib/maple_tree.c:3678 [inline]
mas_walk+0x16e/0x320 lib/maple_tree.c:4909
lock_vma_under_rcu+0x84/0x260 mm/memory.c:5840
do_user_addr_fault arch/x86/mm/fault.c:1329 [inline]
handle_page_fault arch/x86/mm/fault.c:1481 [inline]
exc_page_fault+0x150/0x650 arch/x86/mm/fault.c:1539
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 12266 Comm: syz-executor.3 Not tainted 6.10.0-rc4-syzkaller-00148-g50736169ecc8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Marco Elver

unread,
Jun 21, 2024, 11:28:59ā€ÆAMĀ (8 days ago)Ā Jun 21
to liam.h...@oracle.com, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, lsto...@gmail.com, syzkall...@googlegroups.com, vba...@suse.cz, RCU, Paul E. McKenney, Joel Fernandes
[+Cc rcu folks]
This is not an ordinary data race. I suspect this to be an incorrect
use of RCU, resulting in some kind of use-after-free / type-confusion.

The access within rcu_segcblist_enqueue() is to maple_node::rcu (at
offset 8 into maple_node). The racing access in mtree_range_walk() is
to either maple_node::mr64::pivot[0] or maple_node::ma64::pivot[0]
(both also offset 8 into maple_node).

Liam R. Howlett

unread,
Jun 21, 2024, 1:31:25ā€ÆPMĀ (8 days ago)Ā Jun 21
to Marco Elver, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, lsto...@gmail.com, syzkall...@googlegroups.com, vba...@suse.cz, RCU, Paul E. McKenney, Joel Fernandes
* Marco Elver <el...@google.com> [240621 11:29]:
Since it's not freed and the reader holds the RCU read lock, there is no
use-after-free risk here.

Both are at offset 8 of the node, but there is no type confusion.

This is a false positive, which I can explain.

The reader at mtree_range_walk() at 2774 reads piv[0] at offset 8, but
will validate the information by checking the parent pointer at offset 0
prior to using the value. In this case the check is on line 2793: if
(unlikely(ma_dead_node(node)))...

In the case of the reader having stale data, the data is thrown away and
the walk is started again. This node is already taken out of the tree
and will not be encountered again.

Note that all types have the same parent pointer (of undefined type
struct maple_pnode *, to catch type confusion at compile time) at offset
0.

On the writer side, the struct maple_pnode *parent is set to the address
of the node itself. When this happens,
lib/maple_tree.c:mte_set_node_dead() is called to set the parent parent
pointer and smp_wmb(); This corresponds to ma_dead_node() or
mte_dead_node() that uses smp_rmb(); prior to reading the parent
pointer.

I ran though this all with Paul (embarrassingly, a while back), and I
believe (if my notes are correct..) the fix I need here is to use
rcu_assign_pointer() in mte_set_node_dead() to make the checks here
happy.

Thanks,
Liam

Marco Elver

unread,
Jun 24, 2024, 5:39:16ā€ÆAMĀ (5 days ago)Ā Jun 24
to Liam R. Howlett, Marco Elver, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, lsto...@gmail.com, syzkall...@googlegroups.com, vba...@suse.cz, RCU, Paul E. McKenney, Joel Fernandes
Thanks for the explanation.

> I ran though this all with Paul (embarrassingly, a while back), and I
> believe (if my notes are correct..) the fix I need here is to use
> rcu_assign_pointer() in mte_set_node_dead() to make the checks here
> happy.

I see - though rcu_assign_pointer() isn't directly affecting the data
race reported here. The read of pivot[0] at lib/maple_tree.c:2774 will
always remain data-racy against the write inside
rcu_segcblist_enqueue() after a reuse. Assuming the
read-then-revalidate pattern makes the data race benign, the only
thing that may be helpful is explicitly mark the data-racy access
(more documentation about it at [1]):

/*
* ... explanation ...
*/
if (data_race(pivots[0] >= mas->index)) {

The only benefit would be to clearly document what is happening (helps
tooling like KCSAN to shut up about it, but also humans trying to grok
what's going on because it's not obvious). I wouldn't mind sending a
patch, but would just end up copying your explanation, so I'll leave
it to you what to do with it.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

Thanks,
-- Marco
Reply all
Reply to author
Forward
0 new messages