[syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

11 views
Skip to first unread message

syzbot

unread,
Jul 21, 2023, 4:48:13 PM7/21/23
to cluste...@redhat.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: fdf0eaf11452 Linux 6.5-rc2
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1797783aa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=27e33fd2346a54b
dashboard link: https://syzkaller.appspot.com/bug?extid=607aa822c60b2e75b269
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11322fb6a80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17687f1aa80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0ac950f24d26/disk-fdf0eaf1.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/666fcbcfa05d/vmlinux-fdf0eaf1.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5bbe73baa630/bzImage-fdf0eaf1.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/85821d156573/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+607aa8...@syzkaller.appspotmail.com

Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 27 Comm: khungtaskd Not tainted 6.5.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
panic+0x6a4/0x750 kernel/panic.c:340
check_hung_uninterruptible_tasks kernel/hung_task.c:226 [inline]
watchdog+0xcf2/0x11b0 kernel/hung_task.c:379
kthread+0x33a/0x430 kernel/kthread.c:389
ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:296
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Jul 27, 2023, 7:14:30 PM7/27/23
to agru...@redhat.com, ar...@arndb.de, cluste...@redhat.com, dhow...@redhat.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, rpet...@redhat.com, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
syzbot has bisected this issue to:

commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
Author: David Howells <dhow...@redhat.com>
Date: Thu May 16 11:52:27 2019 +0000

uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=169b475ea80000
start commit: fdf0eaf11452 Linux 6.5-rc2
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=159b475ea80000
console output: https://syzkaller.appspot.com/x/log.txt?x=119b475ea80000
Reported-by: syzbot+607aa8...@syzkaller.appspotmail.com
Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

David Howells

unread,
Jul 28, 2023, 4:20:15 AM7/28/23
to syzbot, dhow...@redhat.com, agru...@redhat.com, ar...@arndb.de, cluste...@redhat.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, rpet...@redhat.com, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
syzbot <syzbot+607aa8...@syzkaller.appspotmail.com> wrote:

> Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")

This would seem unlikely to be the culprit. It just changes the numbering on
the fsconfig-related syscalls.

Running the test program on v6.5-rc3, however, I end up with the test process
stuck in the D state:

INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds.
Not tainted 6.5.0-rc3-build3+ #1448
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro-17687f1aa state:D stack:0 pid:5551 ppid:5516 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x4a7/0x4f1
schedule+0x66/0xa1
schedule_timeout+0x9d/0xd7
? __next_timer_interrupt+0xf6/0xf6
gfs2_gl_hash_clear+0xa0/0xdc
? sugov_irq_work+0x15/0x15
gfs2_put_super+0x19f/0x1d3
generic_shutdown_super+0x78/0x187
kill_block_super+0x1c/0x32
deactivate_locked_super+0x2f/0x61
cleanup_mnt+0xab/0xcc
task_work_run+0x6b/0x80
exit_to_user_mode_prepare+0x76/0xfd
syscall_exit_to_user_mode+0x14/0x31
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89aac31dab
RSP: 002b:00007fff43d9b878 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007fff43d9cad8 RCX: 00007f89aac31dab
RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007fff43d9b920
RBP: 00007fff43d9c960 R08: 0000000000000000 R09: 0000000000000073
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007fff43d9cae8 R14: 0000000000417e18 R15: 00007f89aad51000
</TASK>

David

Bob Peterson

unread,
Jul 28, 2023, 7:48:53 AM7/28/23
to David Howells, syzbot, agru...@redhat.com, ar...@arndb.de, cluste...@redhat.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hi David,

This indicates gfs2 is having trouble resolving and freeing all its
glocks, which usually means a reference counting problem or ail (active
items list) problem during unmount.

If gfs2_gl_hash_clear gets stuck for a long period of time it is
supposed to dump the remaining list of glocks that still have not been
resolved. I think it takes 10 minutes or so. Can you post the console
messages that follow? That will help us figure out what's happening. Thanks.

Regards,

Bob Peterson

Reply all
Reply to author
Forward
0 new messages