[syzbot] [xfs?] INFO: task hung in __fdget

syzbot

unread,

Sep 3, 2023, 12:11:36 AM9/3/23

to bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

Hello,

syzbot found the following issue on:

HEAD commit: b97d64c72259 Merge tag '6.6-rc-smb3-client-fixes-part1' of..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/99875b49c50b/disk-b97d64c7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5bcacc1a3f5b/vmlinux-b97d64c7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e2fe9c8de38a/bzImage-b97d64c7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e245f0...@syzkaller.appspotmail.com

INFO: task syz-executor.0:19830 blocked for more than 143 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:26480 pid:19830 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
__mutex_lock_common+0xe33/0x2530 kernel/locking/mutex.c:679
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__fdget_pos+0x2b0/0x340 fs/file.c:1064
fdget_pos include/linux/file.h:74 [inline]
ksys_write+0x82/0x2c0 fs/read_write.c:628
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61abbfe0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f61ad19c050 RCX: 00007f61ad07cae9
RDX: 0000000000000090 RSI: 0000000020000200 RDI: 0000000000000005
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c050 R15: 00007ffc19e49218
</TASK>
INFO: task syz-executor.0:19835 blocked for more than 143 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:29552 pid:19835 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
rwsem_down_read_slowpath+0x5f4/0x950 kernel/locking/rwsem.c:1086
__down_read_common kernel/locking/rwsem.c:1250 [inline]
__down_read kernel/locking/rwsem.c:1263 [inline]
down_read+0x9c/0x2f0 kernel/locking/rwsem.c:1522
inode_lock_shared include/linux/fs.h:812 [inline]
lookup_slow+0x45/0x70 fs/namei.c:1709
walk_component+0x2d0/0x400 fs/namei.c:2001
lookup_last fs/namei.c:2458 [inline]
path_lookupat+0x16f/0x450 fs/namei.c:2482
filename_lookup+0x255/0x610 fs/namei.c:2511
user_path_at_empty+0x44/0x180 fs/namei.c:2910
user_path_at include/linux/namei.h:57 [inline]
__do_sys_chdir fs/open.c:551 [inline]
__se_sys_chdir+0xbf/0x220 fs/open.c:545
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61a37fd0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000050
RAX: ffffffffffffffda RBX: 00007f61ad19c120 RCX: 00007f61ad07cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000040
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c120 R15: 00007ffc19e49218
</TASK>
INFO: task syz-executor.0:19836 blocked for more than 144 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:29104 pid:19836 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
rwsem_down_read_slowpath+0x5f4/0x950 kernel/locking/rwsem.c:1086
__down_read_common kernel/locking/rwsem.c:1250 [inline]
__down_read kernel/locking/rwsem.c:1263 [inline]
down_read+0x9c/0x2f0 kernel/locking/rwsem.c:1522
inode_lock_shared include/linux/fs.h:812 [inline]
open_last_lookups fs/namei.c:3562 [inline]
path_openat+0x7b3/0x3180 fs/namei.c:3793
do_filp_open+0x234/0x490 fs/namei.c:3823
do_sys_openat2+0x13e/0x1d0 fs/open.c:1422
do_sys_open fs/open.c:1437 [inline]
__do_sys_open fs/open.c:1445 [inline]
__se_sys_open fs/open.c:1441 [inline]
__x64_sys_open+0x225/0x270 fs/open.c:1441
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61a35dc0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f61ad19c1f0 RCX: 00007f61ad07cae9
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000200001c0
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c1f0 R15: 00007ffc19e49218
</TASK>
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x498/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x310 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xdf5/0xe40 kernel/hung_task.c:379
kthread+0x2b8/0x350 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 21301 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:clear_page_erms+0xb/0x10 arch/x86/lib/clear_page_64.S:50
Code: 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 00 10 00 00 31 c0 <f3> aa c3 66 90 f3 0f 1e fa 48 83 f9 40 73 36 83 f9 08 73 0f 85 c9
RSP: 0018:ffffc900031ef8b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000004c0
RDX: ffff888042e98000 RSI: 0000000000000001 RDI: ffff888042e98b40
RBP: ffffc900031efb70 R08: ffffea00010ba637 R09: 0000000000000000
R10: ffffed10085d3000 R11: fffff940002174c7 R12: 0000000000000001
R13: 0000000000000001 R14: ffffea00010ba600 R15: 0000000000000000
FS: 00005555556b5480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f561ff9c018 CR3: 00000000289f1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
clear_page arch/x86/include/asm/page_64.h:53 [inline]
clear_highpage_kasan_tagged include/linux/highmem.h:248 [inline]
kernel_init_pages mm/page_alloc.c:1071 [inline]
post_alloc_hook+0xf8/0x210 mm/page_alloc.c:1534
prep_new_page mm/page_alloc.c:1543 [inline]
get_page_from_freelist+0x31ec/0x3370 mm/page_alloc.c:3183
__alloc_pages+0x255/0x670 mm/page_alloc.c:4439
vm_area_alloc_pages mm/vmalloc.c:3063 [inline]
__vmalloc_area_node mm/vmalloc.c:3139 [inline]
__vmalloc_node_range+0x9a3/0x1490 mm/vmalloc.c:3320
vmalloc_user+0x74/0x80 mm/vmalloc.c:3474
kcov_ioctl+0x59/0x630 kernel/kcov.c:704
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff00647c84b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffe6712be40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff00647c84b
RDX: 0000000000040000 RSI: ffffffff80086301 RDI: 00000000000000d9
RBP: 00007ff00659c0e8 R08: 00000000000000d8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe6712c578
R13: 0000000000000003 R14: 00007ff00659c0e8 R15: 0000000000000001
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Dave Chinner

unread,

Sep 3, 2023, 1:25:34 AM9/3/23

to syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

On Sat, Sep 02, 2023 at 09:11:34PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: b97d64c72259 Merge tag '6.6-rc-smb3-client-fixes-part1' of..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
> dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.

Been happening for months, apparently, yet for some reason it now
thinks a locking hang in __fdget_pos() is an XFS issue?

#syz set subsystems: fs

--
Dave Chinner
da...@fromorbit.com

Mateusz Guzik

unread,

Sep 3, 2023, 4:34:03 AM9/3/23

to Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

The report does not have info necessary to figure this out -- no
backtrace for whichever thread which holds f_pos_lock. I clicked on a
bunch of other reports and it is the same story.

Can the kernel be configured to dump backtraces from *all* threads?

If there is no feature like that I can hack it up.

Al Viro

unread,

Sep 3, 2023, 2:01:38 PM9/3/23

to Mateusz Guzik, Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

<break>t

over serial console, or echo t >/proc/sysrq-trigger would do it...

"Locking hang in __fdget_pos()" would mean either something stuck inside
->read/->write/->read_iter/->write_iter/->llseek instance, making
any further syscalls in that bunch on the same open file to block,
or a lost fdput_pos() somewhere; AFAICS, we don't have anything of
the latter variety, but the former is too wide to tell anything
useful.

Mateusz Guzik

unread,

Sep 3, 2023, 2:57:24 PM9/3/23

to Al Viro, Aleksandr Nogikh, Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

This does not dump backtraces, just a list of tasks + some stats.

The closest to useful here I found are 'w' ("Dumps tasks that are in
uninterruptable (blocked) state.") and 'l' ("Shows a stack backtrace
for all active CPUs."), both of which can miss the task which matters
(e.g., stuck in a very much *interruptible* state with f_pos_lock
held).

Unless someone can point at a way to get all these stacks, I'm going
to hack something up in the upcoming week, if only for immediate
syzbot usage.

--
Mateusz Guzik <mjguzik gmail.com>

Al Viro

unread,

Sep 3, 2023, 3:52:06 PM9/3/23

to Mateusz Guzik, Aleksandr Nogikh, Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Sun, Sep 03, 2023 at 08:57:23PM +0200, Mateusz Guzik wrote:

> This does not dump backtraces, just a list of tasks + some stats.
>
> The closest to useful here I found are 'w' ("Dumps tasks that are in
> uninterruptable (blocked) state.") and 'l' ("Shows a stack backtrace
> for all active CPUs."), both of which can miss the task which matters
> (e.g., stuck in a very much *interruptible* state with f_pos_lock
> held).
>
> Unless someone can point at a way to get all these stacks, I'm going
> to hack something up in the upcoming week, if only for immediate
> syzbot usage.

Huh? Sample of output here:
2023-09-03T15:34:36.271833-04:00 duke kernel: [87367.574459] task:ssh-agent state:S stack:0 pid:3949 ppid:3947 flags:0x
00000002
2023-09-03T15:34:36.284796-04:00 duke kernel: [87367.582848] Call Trace:
2023-09-03T15:34:36.284797-04:00 duke kernel: [87367.585306] <TASK>
2023-09-03T15:34:36.284797-04:00 duke kernel: [87367.587423] __schedule+0x222/0x630
2023-09-03T15:34:36.291459-04:00 duke kernel: [87367.590932] schedule+0x4b/0x90
2023-09-03T15:34:36.291460-04:00 duke kernel: [87367.594086] schedule_hrtimeout_range_clock+0xb1/0x110
2023-09-03T15:34:36.300477-04:00 duke kernel: [87367.599245] ? __hrtimer_init+0xf0/0xf0
2023-09-03T15:34:36.300477-04:00 duke kernel: [87367.603103] do_sys_poll+0x489/0x580
2023-09-03T15:34:36.308971-04:00 duke kernel: [87367.606702] ? _raw_spin_unlock_irqrestore+0x9/0x20
2023-09-03T15:34:36.308972-04:00 duke kernel: [87367.611598] ? __alloc_pages+0x111/0x1a0
2023-09-03T15:34:36.317380-04:00 duke kernel: [87367.615544] ? select_task_rq_fair+0x1c8/0xf70
2023-09-03T15:34:36.317381-04:00 duke kernel: [87367.620006] ? _raw_spin_unlock+0x5/0x10
2023-09-03T15:34:36.325273-04:00 duke kernel: [87367.623953] ? sched_clock_cpu+0x1c/0xd0
2023-09-03T15:34:36.325274-04:00 duke kernel: [87367.627899] ? default_send_IPI_single_phys+0x21/0x30
2023-09-03T15:34:36.334812-04:00 duke kernel: [87367.632977] ? ttwu_queue_wakelist+0x109/0x110
2023-09-03T15:34:36.334813-04:00 duke kernel: [87367.637439] ? _raw_spin_unlock_irqrestore+0x9/0x20
2023-09-03T15:34:36.343753-04:00 duke kernel: [87367.642344] ? try_to_wake_up+0x1eb/0x300
2023-09-03T15:34:36.343754-04:00 duke kernel: [87367.646380] ? __pollwait+0x110/0x110
2023-09-03T15:34:36.351376-04:00 duke kernel: [87367.650063] ? _raw_spin_unlock+0x5/0x10
2023-09-03T15:34:36.351377-04:00 duke kernel: [87367.654001] ? unix_stream_read_generic+0x528/0xa90
2023-09-03T15:34:36.361179-04:00 duke kernel: [87367.658906] ? _raw_spin_unlock_irqrestore+0x9/0x20
2023-09-03T15:34:36.361180-04:00 duke kernel: [87367.663805] ? _raw_spin_unlock_irqrestore+0x9/0x20
2023-09-03T15:34:36.370988-04:00 duke kernel: [87367.668708] ? __inode_wait_for_writeback+0x68/0xc0
2023-09-03T15:34:36.370989-04:00 duke kernel: [87367.673614] ? fsnotify_grab_connector+0x49/0x90
2023-09-03T15:34:36.380274-04:00 duke kernel: [87367.678258] ? fsnotify_destroy_marks+0x11/0x140
2023-09-03T15:34:36.380275-04:00 duke kernel: [87367.682901] ? enqueue_task_fair+0x211/0x5f0
2023-09-03T15:34:36.389726-04:00 duke kernel: [87367.687196] ? __rseq_handle_notify_resume+0x2b4/0x3a0
2023-09-03T15:34:36.389728-04:00 duke kernel: [87367.692353] ? recalibrate_cpu_khz+0x10/0x10
2023-09-03T15:34:36.397884-04:00 duke kernel: [87367.696651] ? ktime_get_ts64+0x47/0xe0
2023-09-03T15:34:36.397885-04:00 duke kernel: [87367.700509] __x64_sys_poll+0x93/0x120
2023-09-03T15:34:36.405254-04:00 duke kernel: [87367.704282] do_syscall_64+0x42/0x90
2023-09-03T15:34:36.405255-04:00 duke kernel: [87367.707880] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
2023-09-03T15:34:36.413922-04:00 duke kernel: [87367.712959] RIP: 0033:0x7f451858f000
2023-09-03T15:34:36.413923-04:00 duke kernel: [87367.716548] RSP: 002b:00007ffd799cece8 EFLAGS: 00000202 ORIG_RAX: 0000000000000007
2023-09-03T15:34:36.428692-04:00 duke kernel: [87367.724154] RAX: ffffffffffffffda RBX: 00000000000001b0 RCX: 00007f451858f000
2023-09-03T15:34:36.428692-04:00 duke kernel: [87367.731317] RDX: 0000000000002710 RSI: 0000000000000001 RDI: 00005596fc603190
2023-09-03T15:34:36.443022-04:00 duke kernel: [87367.738485] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
2023-09-03T15:34:36.443023-04:00 duke kernel: [87367.745649] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
2023-09-03T15:34:36.457354-04:00 duke kernel: [87367.752818] R13: 0000000000000001 R14: 000000000000000a R15: 00005596fc603190
2023-09-03T15:34:36.457355-04:00 duke kernel: [87367.759981] </TASK>

Looks like a stack trace to me; seeing one of the callers of fdget_pos()
in that would tell you who's currently holding *some* ->f_pos_lock.

That - on 6.1.42, with fairly bland .config (minimal debugging;
I need that box for fast builds, among other things). Enable
lockdep and you'll get who's holding which logs in addition
to those stack traces...

Mateusz Guzik

unread,

Sep 3, 2023, 4:04:08 PM9/3/23

to Al Viro, Aleksandr Nogikh, Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On 9/3/23, Al Viro <vi...@zeniv.linux.org.uk> wrote:

That's my bad, the only content which survived for me in terminal
buffer was a spew for the scheduler and a ps-like task list.
Stacktraces are indeed there. Sorry for the noise on this one. I
verified stack dumps are not gated by any defines either btw, so it's
all good on this front.

All this brainos aside, I added Aleksandr Nogikh to To: in my previous
e-mail. From short poking around I found he was responsive to some
queries concerning Linux vs syzbot and I figured would a good person
to ask.

So, Aleksandr, tl;dr would it be a problem to add the 't' sysrq to
syzbot in order to dump backtraces from all threads? Either break t
over a serial port or t written to /proc/sysrq-trigger. Lack of this
info blocks progress on the issue reported here
(https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412)

Dave Chinner

unread,

Sep 3, 2023, 6:27:21 PM9/3/23

to Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

On Sun, Sep 03, 2023 at 10:33:57AM +0200, Mateusz Guzik wrote:

> On Sun, Sep 03, 2023 at 03:25:28PM +1000, Dave Chinner wrote:
> > On Sat, Sep 02, 2023 at 09:11:34PM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: b97d64c72259 Merge tag '6.6-rc-smb3-client-fixes-part1' of..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Been happening for months, apparently, yet for some reason it now
> > thinks a locking hang in __fdget_pos() is an XFS issue?
> >
> > #syz set subsystems: fs
> >
>
> The report does not have info necessary to figure this out -- no
> backtrace for whichever thread which holds f_pos_lock. I clicked on a
> bunch of other reports and it is the same story.

That's true, but there's nothing that points at XFS in *any* of the
bug reports. Indeed, log from the most recent report doesn't have
any of the output from the time stuff hung. i.e. the log starts
at kernel time 669.487771 seconds, and the hung task report is at:

684.588608][ T28] INFO: task syz-executor.0:19830 blocked for more than 143 seconds

About 25 seconds later. So the hung tasks were running at about
540s, and that's just not in the logs.

Every report has a different combination of filesystems being
exercised, and a couple of them didn't even have XFS in them.

So at this point, there is no single filesystem that the reports
actually indicate is the cause, the reports don't contain the actual
operations that hung, and there's basically nothing to go on so far.
Hence putting it in the "fs" bucket (which encompasses all things
filesystems!) is the rigth thing to do.

The only commonality I kinda see is that secondary processes that
are hung seem mostly to be in directory operations waiting on inode
locks - either lookup or readdir, so it's entirely possible that a
filesystem has screwed up ->iterate_shared locking in some way...

> Can the kernel be configured to dump backtraces from *all* threads?

It already is (sysrq-t), but I'm not sure that will help - if it is
a leaked unlock then nothing will show up at all.

-Dave.
--
Dave Chinner
da...@fromorbit.com

Mateusz Guzik

unread,

Sep 3, 2023, 6:47:55 PM9/3/23

to Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

I did not mean to imply this is an xfs problem.

You wrote reports have been coming in for months so it is pretty clear
nobody is investigating. I figured I'm going to change that bit.

>> Can the kernel be configured to dump backtraces from *all* threads?
>
> It already is (sysrq-t), but I'm not sure that will help - if it is
> a leaked unlock then nothing will show up at all.
>

See the other part of the thread, I roped in someone from syzkaller
along with the question if they can use sysrq t.

Dave Chinner

unread,

Sep 3, 2023, 7:09:53 PM9/3/23

to Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

Which is pretty much the case for all filesystem bug reports from
syzbot except for those reported against XFS. Almost nobody else is
doing immediately triage syzbot reports, so they just sit there
gathering dust.

This reflects they reality that syzbot is doing stuff that just
doesn't happen to filesystems in production systems. Users will
almost never see these issues in real life because they aren't
corrupting the crap out their filesystems and running randomly
generated syscalls on them.

> I figured I'm going to change that bit.

I wish you the best of luck.

> >> Can the kernel be configured to dump backtraces from *all* threads?
> >
> > It already is (sysrq-t), but I'm not sure that will help - if it is
> > a leaked unlock then nothing will show up at all.
>
> See the other part of the thread, I roped in someone from syzkaller
> along with the question if they can use sysrq t.

Yes, I saw that. I'm just saying that there's a real good chance it
won't actually help and will just generate a heap more report noise.
Stack traces from all the idle tasks don't really tell us anything
about what went wrong....

Lockdep reports and logs are difficult enough to parse at the best
of times, adding more noise won't help anyone. We already have the
output of sysrq-l (from the nmi sent to all cpus before panic), so
the only remaining really useful debugging output missing is sysrq-w
(blocked tasks).

If something is blocked holding the f_pos_lock long enough to
trigger the hung task timer, then whatever task is holding it must
be blocked itself on something else. The sysrq-w output will dump
that without adding all the idle task noise....

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Al Viro

unread,

Sep 3, 2023, 7:13:50 PM9/3/23

to Dave Chinner, Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Mon, Sep 04, 2023 at 08:27:15AM +1000, Dave Chinner wrote:

> It already is (sysrq-t), but I'm not sure that will help - if it is
> a leaked unlock then nothing will show up at all.

Unlikely; grep and you'll see - very few callers, and for all of them
there's an fdput_pos() downstream of any fdget_pos() that had picked
non-NULL file reference.

In theory, it's not impossible that something had stripped FDPUT_POS_UNLOCK
from the flags, but that's basically "something might've corrupted the
local variables" scenario. There are 12 functions total where we might
be calling fdget_pos() and all of them are pretty small (1 in alpha
osf_sys.c, 6 in read_write.c and 5 in readdir.c); none of those takes
an address of struct fd, none of them has assignments to it after fdget_pos()
and the only accesses to its members are those to fd.file - all fetches.
Control flow is also easy to check - they are all short.

IMO it's much more likely that we'll find something like

thread A:
grabs some fs lock
gets stuck on something
thread B: write()
finds file
grabs ->f_pos_lock
calls into filesystem
blocks on fs lock held by A
thread C: read()/write()/lseek() on the same file
blocks on ->f_pos_lock

Dave Chinner

unread,

Sep 3, 2023, 9:45:08 PM9/3/23

to Al Viro, Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Mon, Sep 04, 2023 at 12:13:38AM +0100, Al Viro wrote:
> On Mon, Sep 04, 2023 at 08:27:15AM +1000, Dave Chinner wrote:
>
> > It already is (sysrq-t), but I'm not sure that will help - if it is
> > a leaked unlock then nothing will show up at all.
>
> Unlikely; grep and you'll see - very few callers, and for all of them
> there's an fdput_pos() downstream of any fdget_pos() that had picked
> non-NULL file reference.
>
> In theory, it's not impossible that something had stripped FDPUT_POS_UNLOCK
> from the flags, but that's basically "something might've corrupted the
> local variables" scenario.

Entirely possible - this is syzbot we are talking about here.
Especially if reiser or ntfs has been tested back before the logs we
have start, as both are known to corrupt memory and/or leak locks
when trying to parse corrupt filesystem images that syzbot feeds
them. That's why we largely ignore syzbot reports that involve
those filesystems...

Unfortunately, the logs from what was being done around when the
tasks actually hung are long gone (seems like only the last 20-30s
of log activity is reported) so when the hung task timer goes off
at 143s, there is nothing left to tell us what might have caused it.

IOWs, it's entirely possible that it is a memory corruption that
has resulted in a leaked lock somewhere...

> There are 12 functions total where we might
> be calling fdget_pos() and all of them are pretty small (1 in alpha
> osf_sys.c, 6 in read_write.c and 5 in readdir.c); none of those takes
> an address of struct fd, none of them has assignments to it after fdget_pos()
> and the only accesses to its members are those to fd.file - all fetches.
> Control flow is also easy to check - they are all short.
>
> IMO it's much more likely that we'll find something like
>
> thread A:
> grabs some fs lock
> gets stuck on something
> thread B: write()
> finds file
> grabs ->f_pos_lock
> calls into filesystem
> blocks on fs lock held by A
> thread C: read()/write()/lseek() on the same file
> blocks on ->f_pos_lock

Yes, that's exactly what I said in a followup email - we need to
know what happened to thread A, because that might be where we are
stuck on a leaked lock.

I saw quite a few reports where lookup/readdir are also stuck trying
to get an inode lock - those at the "thread B"s in the above example
- but there's no indication left of what happened with thread A.

If thread A was blocked iall that time on something, then the hung
task timer should fire on it, too. If it is running in a tight
loop, the NMI would have dumped a stack trace from it.

But neither of those things happened, so it's either leaked
something or it's in a loop with a short term sleep so doesn't
trigger the hung task timer. sysrq-w output will capture that
without all the noise of sysrq-t....

Al Viro

unread,

Sep 3, 2023, 11:02:49 PM9/3/23

to Dave Chinner, Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Mon, Sep 04, 2023 at 11:45:03AM +1000, Dave Chinner wrote:

> > thread B: write()
> > finds file
> > grabs ->f_pos_lock
> > calls into filesystem
> > blocks on fs lock held by A
> > thread C: read()/write()/lseek() on the same file
> > blocks on ->f_pos_lock
>
> Yes, that's exactly what I said in a followup email - we need to
> know what happened to thread A, because that might be where we are
> stuck on a leaked lock.
>
> I saw quite a few reports where lookup/readdir are also stuck trying
> to get an inode lock - those at the "thread B"s in the above example
> - but there's no indication left of what happened with thread A.
>
> If thread A was blocked iall that time on something, then the hung
> task timer should fire on it, too. If it is running in a tight
> loop, the NMI would have dumped a stack trace from it.
>
> But neither of those things happened, so it's either leaked
> something or it's in a loop with a short term sleep so doesn't
> trigger the hung task timer. sysrq-w output will capture that
> without all the noise of sysrq-t....

Here's what brought sysrq-t:

| > The report does not have info necessary to figure this out -- no
| > backtrace for whichever thread which holds f_pos_lock. I clicked on a
| > bunch of other reports and it is the same story.
| >

| > Can the kernel be configured to dump backtraces from *all* threads?
| >

| > If there is no feature like that I can hack it up.
|

| <break>t
|
| over serial console, or echo t >/proc/sysrq-trigger would do it...

A question specifically about getting the stack traces...

Theodore Ts'o

unread,

Sep 3, 2023, 11:27:07 PM9/3/23

to Dave Chinner, Al Viro, Mateusz Guzik, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Mon, Sep 04, 2023 at 11:45:03AM +1000, Dave Chinner wrote:

> Entirely possible - this is syzbot we are talking about here.
> Especially if reiser or ntfs has been tested back before the logs we
> have start, as both are known to corrupt memory and/or leak locks
> when trying to parse corrupt filesystem images that syzbot feeds
> them. That's why we largely ignore syzbot reports that involve
> those filesystems...
>
> Unfortunately, the logs from what was being done around when the
> tasks actually hung are long gone (seems like only the last 20-30s
> of log activity is reported) so when the hung task timer goes off
> at 143s, there is nothing left to tell us what might have caused it.
>
> IOWs, it's entirely possible that it is a memory corruption that
> has resulted in a leaked lock somewhere...

... and this is why I ignore any syzbot report that doesn't have a C
reproducer. Life is too short to waste time with what is very likely
syzbot noise.... And I'd much rather opt out of the gamification of
syzbot dashboards designed to use dark patterns to guilt developers to
work on "issues" that very likely have no real impact on real life
actual user impact, if it might cause developers and maintainers to
burn out and quit.

Basically, if syzbot won't prioritize things for us, it's encumbent on
us to prioritize things for our own mental health. And so syzbot
issues without a real reproducer are very low on my priority list; I
have things I can work on that are much more likely to make real world
impact. Even ones that have a real reproducer, there are certain
classes of bugs (e.g., "locking bugs" that require a badly corrupted
file system, or things that are just denial of service attacks if
you're too stupid to insert a USB thumb drive found in a parking lock
--- made worse by GNOME who has decided to default mount any random
USB thumb drive inserted into a system, even a server system that has
GNOME installed, thanks to some idiotic decision made by some random
Red Hat product manager), that I just ignore because I don't have
infinite amounts of time to coddle stupid Red Hat distro tricks.

- Ted

Mateusz Guzik

unread,

Sep 4, 2023, 2:09:06 AM9/4/23

to Theodore Ts'o, Dave Chinner, Al Viro, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

Hello everyone.

When I first stumbled upon this report I was almost completely
oblivious to syzbot vs fsdevel -- I only knew you guys are not fond of
ntfs reports, which made sense but was not indicating there are much
bigger issues.

Given this and other responses I poked around the history and that
made it apparent there are long-standing non-technical problems going
here, none of which I intend to deal with.

That is to say I'm bailing from this thread.

Cheers and sorry for poking the nest,

Christian Brauner

unread,

Sep 4, 2023, 4:11:54 AM9/4/23

to Dave Chinner, Mateusz Guzik, syzbot, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

I'll try to address syzkaller reports that we do get for VFS issues
asap but often they're just not VFS issues or have no reproducer.

Christian Brauner

unread,

Sep 4, 2023, 4:23:43 AM9/4/23

to Dave Chinner, Mateusz Guzik, syzbot, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

> > Which is pretty much the case for all filesystem bug reports from

I think we should at least consider the option of reducing the number of
filesystems syzbot tests. Filesystems that are orphaned or that have no
active maintainers for a long period of time just produce noise.
Slapping tags such as [ntfs3?] or [reiserfs?] really don't help to
reduce the noise.

Dave Chinner

unread,

Sep 4, 2023, 4:55:53 AM9/4/23

to Christian Brauner, Mateusz Guzik, syzbot, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, vi...@zeniv.linux.org.uk

On Mon, Sep 04, 2023 at 10:23:33AM +0200, Christian Brauner wrote:
> > > Which is pretty much the case for all filesystem bug reports from
>
> I think we should at least consider the option of reducing the number of
> filesystems syzbot tests. Filesystems that are orphaned or that have no
> active maintainers for a long period of time just produce noise.

I wish you good luck in convincing the syzbot maintainers that
testing obsolete, unmaintained and/or deprecated code should be
outside the scope of what syzbot exercises...

> Slapping tags such as [ntfs3?] or [reiserfs?] really don't help to
> reduce the noise.

Yup - they can't be trusted to be correct, so it does nothing to
reduce the noise. There's been two false XFS categorisations
in just the last half a day....

Aleksandr Nogikh

unread,

Sep 6, 2023, 1:53:36 PM9/6/23

to Mateusz Guzik, Al Viro, Dave Chinner, syzbot, bra...@kernel.org, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com

On Sun, Sep 3, 2023 at 10:04 PM Mateusz Guzik <mjg...@gmail.com> wrote:
>
> On 9/3/23, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > On Sun, Sep 03, 2023 at 08:57:23PM +0200, Mateusz Guzik wrote:
> >
> >> This does not dump backtraces, just a list of tasks + some stats.
> >>
> >> The closest to useful here I found are 'w' ("Dumps tasks that are in
> >> uninterruptable (blocked) state.") and 'l' ("Shows a stack backtrace
> >> for all active CPUs."), both of which can miss the task which matters
> >> (e.g., stuck in a very much *interruptible* state with f_pos_lock
> >> held).
> >>
> >> Unless someone can point at a way to get all these stacks, I'm going
> >> to hack something up in the upcoming week, if only for immediate
> >> syzbot usage.
> >
> > Huh? Sample of output here:
> > 2023-09-03T15:34:36.271833-04:00 duke kernel: [87367.574459] task:ssh-agent
> > state:S stack:0 pid:3949 ppid:3947 flags:0x

< .. >

> > 2023-09-03T15:34:36.457355-04:00 duke kernel: [87367.759981] </TASK>
> >
> > Looks like a stack trace to me; seeing one of the callers of fdget_pos()
> > in that would tell you who's currently holding *some* ->f_pos_lock.
> >
> > That - on 6.1.42, with fairly bland .config (minimal debugging;
> > I need that box for fast builds, among other things). Enable
> > lockdep and you'll get who's holding which logs in addition
> > to those stack traces...
> >
>
> That's my bad, the only content which survived for me in terminal
> buffer was a spew for the scheduler and a ps-like task list.
> Stacktraces are indeed there. Sorry for the noise on this one. I
> verified stack dumps are not gated by any defines either btw, so it's
> all good on this front.
>
> All this brainos aside, I added Aleksandr Nogikh to To: in my previous
> e-mail. From short poking around I found he was responsive to some
> queries concerning Linux vs syzbot and I figured would a good person
> to ask.
>
> So, Aleksandr, tl;dr would it be a problem to add the 't' sysrq to
> syzbot in order to dump backtraces from all threads? Either break t
> over a serial port or t written to /proc/sysrq-trigger. Lack of this
> info blocks progress on the issue reported here
> (https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412)

That's a good suggestion, thanks!
I think it should be doable and would indeed simplify debugging of our
reports (even if not for this particular one, judging by the
conversation below :) ).

I've filed https://github.com/google/syzkaller/issues/4200

--
Aleksandr

syzbot

unread,

Nov 30, 2023, 11:58:24 AM11/30/23

to bra...@kernel.org, da...@fromorbit.com, djw...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, ll...@lists.linux.dev, mjg...@gmail.com, nat...@kernel.org, ndesau...@google.com, nog...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, vi...@zeniv.linux.org.uk

syzbot has found a reproducer for the following issue on:

HEAD commit: 3b47bc037bd4 Merge tag 'pinctrl-v6.7-2' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14e6bdaae80000
kernel config: https://syzkaller.appspot.com/x/.config?x=c2c74446ab4f0028

dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11026e54e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/43de6cde1436/disk-3b47bc03.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/24a2ba210bcf/vmlinux-3b47bc03.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a5d414ced6d5/bzImage-3b47bc03.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/c28d17ae0132/mount_4.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e245f0...@syzkaller.appspotmail.com

INFO: task syz-executor.0:24270 blocked for more than 143 seconds.
Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

task:syz-executor.0 state:D stack:28784 pid:24270 tgid:24259 ppid:5149 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5376 [inline]
__schedule+0x1961/0x4ab0 kernel/sched/core.c:6688
__schedule_loop kernel/sched/core.c:6763 [inline]
schedule+0x149/0x260 kernel/sched/core.c:6778
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6835
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6a3/0xd60 kernel/locking/mutex.c:747
__fdget_pos+0x2b0/0x340 fs/file.c:1177
fdget_pos include/linux/file.h:74 [inline]
__do_sys_getdents64 fs/readdir.c:401 [inline]
__se_sys_getdents64+0x1dc/0x4f0 fs/readdir.c:390
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fd0cec7cae9
RSP: 002b:00007fd0cfa610c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
RAX: ffffffffffffffda RBX: 00007fd0ced9c050 RCX: 00007fd0cec7cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007fd0cecc847a R08: 0000000000000000 R09: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

R13: 000000000000006e R14: 00007fd0ced9c050 R15: 00007ffeeda72018
</TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/29:
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:301 [inline]
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
2 locks held by getty/4818:
#0: ffff88814afe20a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900031332f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b4/0x1e10 drivers/tty/n_tty.c:2201
1 lock held by syz-executor.5/5143:
3 locks held by syz-executor.0/24262:
1 lock held by syz-executor.0/24270:
#0: ffff88807c5574c8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x2b0/0x340 fs/file.c:1177
1 lock held by syz-executor.0/24319:
2 locks held by syz-executor.1/32017:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
#1: ffff8880b9828808 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
1 lock held by syz-executor.1/32030:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
2 locks held by syz-executor.4/32028:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
#1: ffff8880b9828808 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
2 locks held by syz-executor.3/32042:
#0: ffff88801cfea360 (&lo->lo_mutex){+.+.}-{3:3}, at: loop_global_lock_killable drivers/block/loop.c:120 [inline]
#0: ffff88801cfea360 (&lo->lo_mutex){+.+.}-{3:3}, at: loop_configure+0x1f9/0x1250 drivers/block/loop.c:1023
#1: ffffe8ffffd6a108 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x3a7/0x770 kernel/sched/psi.c:976
4 locks held by syz-executor.0/32045:

=============================================

NMI backtrace for cpu 1

CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x498/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x310 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]

watchdog+0xfaf/0xff0 kernel/hung_task.c:379
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:

NMI backtrace for cpu 0

CPU: 0 PID: 5138 Comm: syz-executor.1 Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:orc_find arch/x86/kernel/unwind_orc.c:217 [inline]
RIP: 0010:unwind_next_frame+0x271/0x29e0 arch/x86/kernel/unwind_orc.c:494
Code: 50 00 39 eb 0f 86 f7 1d 00 00 89 e8 48 8d 1c 85 08 26 20 90 48 89 d8 48 c1 e8 03 49 be 00 00 00 00 00 fc ff df 42 0f b6 04 30 <84> c0 0f 85 89 20 00 00 44 8b 3b 89 e8 ff c0 48 8d 1c 85 08 26 20
RSP: 0018:ffffc90003f3f700 EFLAGS: 00000a06
RAX: 0000000000000000 RBX: ffffffff90211d88 RCX: 00000000000a6001
RDX: ffff88801eaa3b80 RSI: 0000000000003de0 RDI: 00000000000a6000
RBP: 0000000000003de0 R08: ffffffff813db069 R09: 0000000000000000
R10: ffffc90003f3f840 R11: fffff520007e7f14 R12: ffffc90003f3f840
R13: 00000000000a6001 R14: dffffc0000000000 R15: ffffffff813de03c
FS: 0000555556748480(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 00007f84b6775198 CR3: 00000000200ef000 CR4: 00000000003506f0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>

__unwind_start+0x641/0x7a0 arch/x86/kernel/unwind_orc.c:760
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0xfd/0x1a0 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0x117/0x1c0 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4f/0x70 mm/kasan/common.c:52
__kasan_slab_alloc+0x66/0x70 mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:188 [inline]
slab_post_alloc_hook+0x6c/0x3c0 mm/slab.h:763
slab_alloc_node mm/slub.c:3478 [inline]
slab_alloc mm/slub.c:3486 [inline]
__kmem_cache_alloc_lru mm/slub.c:3493 [inline]
kmem_cache_alloc+0x19e/0x2b0 mm/slub.c:3502
sk_prot_alloc+0x58/0x210 net/core/sock.c:2069
sk_alloc+0x38/0x370 net/core/sock.c:2128
inet_create+0x652/0xe20 net/ipv4/af_inet.c:325
__sock_create+0x48c/0x910 net/socket.c:1569
sock_create net/socket.c:1620 [inline]
__sys_socket_create net/socket.c:1657 [inline]
__sys_socket+0x14f/0x3b0 net/socket.c:1704
__do_sys_socket net/socket.c:1718 [inline]
__se_sys_socket net/socket.c:1716 [inline]
__x64_sys_socket+0x7a/0x90 net/socket.c:1716
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f84b667e867
Code: f0 ff ff 77 06 c3 0f 1f 44 00 00 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 29 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffed1a3ed28 EFLAGS: 00000206 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f84b667e867
RDX: 0000000000000006 RSI: 0000000000000001 RDI: 0000000000000002
RBP: 00007ffed1a3f43c R08: 00007ffed1a3ed1c R09: 00007ffed1a3f127
R10: 00007ffed1a3eda0 R11: 0000000000000206 R12: 0000000000000032
R13: 0000000000165078 R14: 0000000000165035 R15: 0000000000000008
</TASK>

---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Reply all

Reply to author

Forward

[syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

syzbot

Dave Chinner

Mateusz Guzik

Al Viro

Mateusz Guzik

Al Viro

Mateusz Guzik

Dave Chinner

Mateusz Guzik

Dave Chinner

Al Viro

Dave Chinner

Al Viro

Theodore Ts'o

Mateusz Guzik

Christian Brauner

Christian Brauner

Dave Chinner

Aleksandr Nogikh

syzbot