[syzbot] [mm?] INFO: rcu detected stall in validate_mm (3)

閲覧: 10 回
最初の未読メッセージにスキップ

syzbot

未読、
2024/05/12 5:19:30 (14 日前) 5月12日
To: Liam.H...@oracle.com、ak...@linux-foundation.org、linux-...@vger.kernel.org、linu...@kvack.org、lsto...@gmail.com、syzkall...@googlegroups.com、vba...@suse.cz
Hello,

syzbot found the following issue on:

HEAD commit: dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13f6734c980000
kernel config: https://syzkaller.appspot.com/x/.config?x=7144b4fe7fbf5900
dashboard link: https://syzkaller.appspot.com/bug?extid=a941018a091f1a1f9546
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10306760980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138c8970980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e1fea5a49470/disk-dccb07f2.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5f7d53577fef/vmlinux-dccb07f2.xz
kernel image: https://storage.googleapis.com/syzbot-assets/430b18473a18/bzImage-dccb07f2.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a94101...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P17678/1:b..l
rcu: (detected by 1, t=10502 jiffies, g=36541, q=38 ncpus=2)
task:syz-executor952 state:R running task stack:28968 pid:17678 tgid:17678 ppid:5114 flags:0x00000002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5409 [inline]
__schedule+0xf15/0x5d00 kernel/sched/core.c:6746
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7068
irqentry_exit+0x36/0x90 kernel/entry/common.c:354
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:88 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:122 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0xc7/0x1a0 mm/kasan/generic.c:189
Code: 83 c0 08 48 39 d0 0f 84 be 00 00 00 48 83 38 00 74 ed 48 8d 50 08 eb 0d 48 83 c0 01 48 39 c2 0f 84 8d 00 00 00 80 38 00 74 ee <48> 89 c2 b8 01 00 00 00 48 85 d2 74 1e 41 83 e2 07 49 39 d1 75 0a
RSP: 0018:ffffc900031ef850 EFLAGS: 00000202
RAX: fffffbfff2949b78 RBX: fffffbfff2949b79 RCX: ffffffff8ac92249
RDX: fffffbfff2949b79 RSI: 0000000000000004 RDI: ffffffff94a4dbc0
RBP: fffffbfff2949b78 R08: 0000000000000001 R09: fffffbfff2949b78
R10: ffffffff94a4dbc3 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000300 R15: 0000000000000000
instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
atomic_inc include/linux/atomic/atomic-instrumented.h:435 [inline]
mt_validate_nulls+0x5e9/0x9e0 lib/maple_tree.c:7550
mt_validate+0x3148/0x4390 lib/maple_tree.c:7599
validate_mm+0x9c/0x4b0 mm/mmap.c:288
mmap_region+0x1478/0x2760 mm/mmap.c:2934
do_mmap+0x8ae/0xf10 mm/mmap.c:1385
vm_mmap_pgoff+0x1ab/0x3c0 mm/util.c:573
ksys_mmap_pgoff+0x7d/0x5b0 mm/mmap.c:1431
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:79 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:79
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f305228c143
RSP: 002b:00007ffdd7b4fc18 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: fffffffffffff000 RCX: 00007f305228c143
RDX: 0000000000000000 RSI: 0000000000021000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000020022 R11: 0000000000000246 R12: 00007ffdd7b4fe70
R13: ffffffffffffffc0 R14: 0000000000001000 R15: 0000000000000000
</TASK>
rcu: rcu_preempt kthread starved for 10533 jiffies! g36541 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28736 pid:16 tgid:16 ppid:2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5409 [inline]
__schedule+0xf15/0x5d00 kernel/sched/core.c:6746
__schedule_loop kernel/sched/core.c:6823 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6838
schedule_timeout+0x136/0x2a0 kernel/time/timer.c:2582
rcu_gp_fqs_loop+0x1eb/0xb00 kernel/rcu/tree.c:1663
rcu_gp_kthread+0x271/0x380 kernel/rcu/tree.c:1862
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 1 PID: 17676 Comm: syz-executor952 Not tainted 6.9.0-rc7-syzkaller-00012-gdccb07f2914c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irq+0x29/0x50 kernel/locking/spinlock.c:202
Code: 90 f3 0f 1e fa 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 6a 98 8c f6 48 89 df e8 c2 14 8d f6 e8 ed 98 b5 f6 fb bf 01 00 00 00 <e8> b2 4f 7e f6 65 8b 05 b3 88 24 75 85 c0 74 06 5b c3 cc cc cc cc
RSP: 0018:ffffc9000321fcf0 EFLAGS: 00000202
RAX: 0000000003959e61 RBX: ffff88801c3d0940 RCX: 1ffffffff1f3e279
RDX: 0000000000000000 RSI: ffffffff8b0cae00 RDI: 0000000000000001
RBP: ffff88801c3d0d40 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8f9f5657 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000021 R14: ffff88801c3d0940 R15: ffff88801c3d0940
FS: 00007f305221e6c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f305221de40 CR3: 000000002dcec000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
</IRQ>
<TASK>
spin_unlock_irq include/linux/spinlock.h:401 [inline]
get_signal+0x1e3e/0x2710 kernel/signal.c:2914
arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x14a/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xdc/0x260 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f305228c107
Code: 14 25 28 00 00 00 75 05 48 83 c4 28 c3 e8 31 1b 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 <0f> 05 48 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89
RSP: 002b:00007f305221e238 EFLAGS: 00000246
RAX: 00000000000000ca RBX: 00007f305230f318 RCX: 00007f305228c109
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f305230f318
RBP: 00007f305230f310 R08: 00007f305221e6c0 R09: 00007f305221e6c0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f30522dc278
R13: 000000000000006e R14: 00007ffdd7b4fb90 R15: 00007ffdd7b4fc78
</TASK>
sched: RT throttling activated


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Liam R. Howlett

未読、
2024/05/12 13:28:41 (13 日前) 5月12日
To: syzbot、ak...@linux-foundation.org、linux-...@vger.kernel.org、linu...@kvack.org、lsto...@gmail.com、syzkall...@googlegroups.com、vba...@suse.cz
* syzbot <syzbot+a94101...@syzkaller.appspotmail.com> [240512 05:19]:
> Hello,
>
> syzbot found the following issue on:

First, excellent timing of this report - Sunday on an -rc7 release the
day before LSF/MM/BPF.
...

I was concerned that we had somehow constructed a broken tree, but I
believe the information below rules that situation out. It appears that
the verification of a tasks maple tree has exceeded the timeout allotted
to do so. This call stack indicates it is all happening while holding
the mmap lock, so no locking or RCU issue there.

This trace seems to think we are stuck in the checking the tree for
sequential NULLs, but not in the tree operation itself. This would
indicate the issue isn't here at all - or we have a broken tree which
causes the iteration to never advance.

The adjustments of the timeouts do seem to be sufficient and I am not
getting hung on my vm running the c reproducer, yet. I am not using the
bots config, yet.

I also noticed that the git bisect is very odd and inconsistent, often
ending in "crashed: INFO: rcu detected stall in corrupted". I also
noticed that KASAN is disabled in this report?
"disabling configs for [UBSAN BUG KASAN LOCKDEP ATOMIC_SLEEP LEAK], they
are not needed"

This seems like it would be wise to enable as it seems there is
corrupted stack traces, at least? I noticed that the .config DOES have
kasan enabled, so I guess it was dropped because it didn't pick up an
issue on the initial run?

There is only one report (the initial report) that detects the hung
state in the validate_mm() test function. This is actually the less
concerning of all of the other places - because this validate function
is generally disabled on production systems.

The last change to lib/maple_tree.c went in through in
mm-stable-2024-03-13-20-04.

I cannot say that this isn't the maple tree in an infinite loop, but I
don't think it is given the information above. Considering the infinite
loop scenario would produce the same crash on reproduction but this is
not what syzbot sees on the git bisect, I think it is not an issue in
the tree but an issue somewhere else - and probably a corruption issue
that wasn't detected by kasan (is this possible?).

Thanks,
Liam

Liam R. Howlett

未読、
2024/05/12 16:41:20 (13 日前) 5月12日
To: syzbot、ak...@linux-foundation.org、linux-...@vger.kernel.org、linu...@kvack.org、lsto...@gmail.com、syzkall...@googlegroups.com、vba...@suse.cz
* Liam R. Howlett <Liam.H...@oracle.com> [240512 13:28]:
> * syzbot <syzbot+a94101...@syzkaller.appspotmail.com> [240512 05:19]:
> > Hello,
> >
> > syzbot found the following issue on:
>
> First, excellent timing of this report - Sunday on an -rc7 release the
> day before LSF/MM/BPF.
>
> >
> > HEAD commit: dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f6734c980000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=7144b4fe7fbf5900
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a941018a091f1a1f9546
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10306760980000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138c8970980000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/e1fea5a49470/disk-dccb07f2.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/5f7d53577fef/vmlinux-dccb07f2.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/430b18473a18/bzImage-dccb07f2.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a94101...@syzkaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P17678/1:b..l
> > rcu: (detected by 1, t=10502 jiffies, g=36541, q=38 ncpus=2)
> > task:syz-executor952 state:R running task stack:28968 pid:17678 tgid:17678 ppid:5114 flags:0x00000002
...

>
> I cannot say that this isn't the maple tree in an infinite loop, but I
> don't think it is given the information above. Considering the infinite
> loop scenario would produce the same crash on reproduction but this is
> not what syzbot sees on the git bisect, I think it is not an issue in
> the tree but an issue somewhere else - and probably a corruption issue
> that wasn't detected by kasan (is this possible?).

I was able to recreate this with the provided config and reproducer (but
not my own config). My trace has no maple tree calls at all:

[ 866.380945][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 866.381464][ C1] rcu: (detected by 1, t=10502 jiffies, g=161409, q=149 ncpus=2)
[ 866.382152][ C1] rcu: All QSes seen, last rcu_preempt kthread activity 10500 (4295023801-4295013301), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 866.383324][ C1] rcu: rcu_preempt kthread starved for 10500 jiffies! g161409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 866.384952][ C1] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 866.385972][ C1] rcu: RCU grace-period kthread stack dump:
[ 866.386582][ C1] task:rcu_preempt state:R running task stack:27648 pid:16 tgid:16 ppid:2 flags:0x00004000
[ 866.387811][ C1] Call Trace:
[ 866.388164][ C1] <TASK>
[ 866.388475][ C1] __schedule+0xf06/0x5cb0
[ 866.388961][ C1] ? __pfx___lock_acquire+0x10/0x10
[ 866.389528][ C1] ? __pfx___schedule+0x10/0x10
[ 866.390065][ C1] ? schedule+0x298/0x350
[ 866.390541][ C1] ? __pfx_lock_release+0x10/0x10
[ 866.391090][ C1] ? __pfx___mod_timer+0x10/0x10
[ 866.391633][ C1] ? lock_acquire+0x1b1/0x560
[ 866.392133][ C1] ? lockdep_init_map_type+0x16d/0x7e0
[ 866.392709][ C1] schedule+0xe7/0x350
[ 866.393139][ C1] schedule_timeout+0x136/0x2a0
[ 866.393654][ C1] ? __pfx_schedule_timeout+0x10/0x10
[ 866.394142][ C1] ? __pfx_process_timeout+0x10/0x10
[ 866.394596][ C1] ? _raw_spin_unlock_irqrestore+0x3b/0x80
[ 866.395137][ C1] ? prepare_to_swait_event+0xf0/0x470
[ 866.395714][ C1] rcu_gp_fqs_loop+0x1ab/0xbd0
[ 866.396246][ C1] ? __pfx_rcu_gp_fqs_loop+0x10/0x10
[ 866.396852][ C1] ? rcu_gp_init+0xbdb/0x1480
[ 866.397393][ C1] ? __pfx_rcu_gp_cleanup+0x10/0x10
[ 866.397988][ C1] rcu_gp_kthread+0x271/0x380
[ 866.398493][ C1] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 866.399063][ C1] ? lockdep_hardirqs_on+0x7c/0x110
[ 866.399570][ C1] ? __kthread_parkme+0x143/0x220
[ 866.400045][ C1] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 866.400535][ C1] kthread+0x2c1/0x3a0
[ 866.400916][ C1] ? _raw_spin_unlock_irq+0x23/0x50
[ 866.401409][ C1] ? __pfx_kthread+0x10/0x10
[ 866.401854][ C1] ret_from_fork+0x45/0x80
[ 866.402284][ C1] ? __pfx_kthread+0x10/0x10
[ 866.402718][ C1] ret_from_fork_asm+0x1a/0x30
[ 866.403167][ C1] </TASK>

I'm going to see if I can hit the corrupted stack version with kasan enabled.

Thanks,
Liam
全員に返信
投稿者に返信
転送
新着メール 0 件