[syzbot] INFO: task hung in io_sq_thread

syzbot

unread,

Jul 14, 2021, 11:19:19 PM7/14/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

Hello,

syzbot found the following issue on:

HEAD commit: 3dbdb38e2869 Merge branch 'for-5.14' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12ed1402300000
kernel config: https://syzkaller.appspot.com/x/.config?x=a1fcf15a09815757
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1088c06c300000

The issue was bisected to:

commit 4d004099a668c41522242aa146a38cc4eb59cb1e
Author: Peter Zijlstra <pet...@infradead.org>
Date: Fri Oct 2 09:04:21 2020 +0000

lockdep: Fix lockdep recursion

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13e4f19c300000
final oops: https://syzkaller.appspot.com/x/report.txt?x=1014f19c300000
console output: https://syzkaller.appspot.com/x/log.txt?x=17e4f19c300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac9573...@syzkaller.appspotmail.com
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")

INFO: task kworker/u4:4:9930 blocked for more than 143 seconds.
Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:4 state:D stack:25664 pid: 9930 ppid: 2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
context_switch kernel/sched/core.c:4683 [inline]
__schedule+0x934/0x2710 kernel/sched/core.c:5940
schedule+0xd3/0x270 kernel/sched/core.c:6019
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
__mutex_lock_common kernel/locking/mutex.c:1036 [inline]
__mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
io_ring_exit_work+0x15a/0x15d0 fs/io_uring.c:8823
process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task iou-sqp-2939:2946 blocked for more than 143 seconds.
Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-sqp-2939 state:D stack:28696 pid: 2946 ppid: 8489 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4683 [inline]
__schedule+0x934/0x2710 kernel/sched/core.c:5940
schedule+0xd3/0x270 kernel/sched/core.c:6019
io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9203
io_sq_thread+0xa99/0x1250 fs/io_uring.c:6963
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/1643:
#0: ffffffff8c17bb80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8150:
#0: ffff888034297270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
3 locks held by kworker/u4:4/9930:
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
#1: ffffc9000b31fdb0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
#2: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
1 lock held by iou-sqp-2939/2946:
#0: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6883

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1643 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 9784 Comm: kworker/0:7 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_power_efficient gc_worker
RIP: 0010:check_region_inline mm/kasan/generic.c:175 [inline]
RIP: 0010:kasan_check_range+0x2a/0x180 mm/kasan/generic.c:189
Code: 48 85 f6 0f 84 70 01 00 00 49 89 f9 41 54 44 0f b6 c2 49 01 f1 55 53 0f 82 18 01 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 <0f> 86 05 01 00 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000aedfad8 EFLAGS: 00000016
RAX: ffff7fffffffffff RBX: 00000000000004c2 RCX: ffffffff815ac59f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9049d938
RBP: ffff88802d62267a R08: 0000000000000000 R09: ffffffff9049d940
R10: fffffbfff2093b27 R11: 0000000000000000 R12: ffff88802d622658
R13: ffff88802d621c40 R14: 0000000000000000 R15: 23147e44cee2677e
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb48abd5000 CR3: 000000000be8e000 CR4: 0000000000350ef0
Call Trace:
instrument_atomic_read include/linux/instrumented.h:71 [inline]
test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
hlock_class kernel/locking/lockdep.c:199 [inline]
lookup_chain_cache_add kernel/locking/lockdep.c:3701 [inline]
validate_chain kernel/locking/lockdep.c:3757 [inline]
__lock_acquire+0x162f/0x54a0 kernel/locking/lockdep.c:5015
lock_acquire kernel/locking/lockdep.c:5625 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
process_one_work+0x8fc/0x1630 kernel/workqueue.c:2252
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Pavel Begunkov

unread,

Jul 15, 2021, 4:44:32 AM7/15/21

to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/15/21 4:19 AM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:

#syz test: git://git.kernel.dk/linux-block io_uring-5.14

--
Pavel Begunkov

syzbot

unread,

Jul 15, 2021, 5:15:11 AM7/15/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in io_sq_thread_park

INFO: task kworker/u4:1:10 blocked for more than 143 seconds.
Not tainted 5.14.0-rc1-syzkaller #0

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

task:kworker/u4:1 state:D stack:26320 pid: 10 ppid: 2 flags:0x00004000

Workqueue: events_unbound io_ring_exit_work
Call Trace:
context_switch kernel/sched/core.c:4683 [inline]

__schedule+0x93a/0x26f0 kernel/sched/core.c:5940

schedule+0xd3/0x270 kernel/sched/core.c:6019
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
__mutex_lock_common kernel/locking/mutex.c:1036 [inline]
__mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104

io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
io_ring_exit_work+0x15a/0x1600 fs/io_uring.c:8771

process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

INFO: task iou-sqp-23145:23155 blocked for more than 143 seconds.
Not tainted 5.14.0-rc1-syzkaller #0

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004

Call Trace:
context_switch kernel/sched/core.c:4683 [inline]

__schedule+0x93a/0x26f0 kernel/sched/core.c:5940
schedule+0xd3/0x270 kernel/sched/core.c:6019
io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9150
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6916

ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:

3 locks held by kworker/u4:1/10:
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
#1: ffffc90000cf7db0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
#2: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
1 lock held by khungtaskd/1620:
#0: ffffffff8b97b900 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8316:
#0: ffff888022f82ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
1 lock held by iou-sqp-23145/23155:
#0: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6836

=============================================

NMI backtrace for cpu 1

CPU: 1 PID: 1620 Comm: khungtaskd Not tainted 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:

__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105

nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]

check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
watchdog+0xd0a/0xfc0 kernel/hung_task.c:295

kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0

CPU: 0 PID: 10540 Comm: kworker/u4:10 Not tainted 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Workqueue: phy8 ieee80211_iface_work
RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
RIP: 0010:__sanitizer_cov_trace_switch+0x65/0xf0 kernel/kcov.c:320
Code: 10 31 c9 65 4c 8b 24 25 00 f0 01 00 4d 85 d2 74 6b 4c 89 e6 bf 03 00 00 00 4c 8b 4c 24 20 49 8b 6c c8 10 e8 2d ff ff ff 84 c0 <74> 47 49 8b 84 24 18 15 00 00 41 8b bc 24 14 15 00 00 48 8b 10 48
RSP: 0018:ffffc9000bfa7030 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000016
RDX: 0000000000000000 RSI: ffff88801c5d8000 RDI: 0000000000000003
RBP: 0000000000000084 R08: ffffffff8aaa7a20 R09: ffffffff88878073
R10: 0000000000000020 R11: 0000000000000003 R12: ffff88801c5d8000
R13: dffffc0000000000 R14: ffff888053061087 R15: ffff888053061086

FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 00007f87cdfe4090 CR3: 000000002a61f000 CR4: 0000000000350ef0
Call Trace:
_ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2186 [inline]
ieee80211_bss_info_update+0x463/0xb70 net/mac80211/scan.c:212
ieee80211_rx_bss_info net/mac80211/ibss.c:1126 [inline]
ieee80211_rx_mgmt_probe_beacon+0xcc6/0x17c0 net/mac80211/ibss.c:1615
ieee80211_ibss_rx_queued_mgmt+0xd30/0x1610 net/mac80211/ibss.c:1642
ieee80211_iface_process_skb net/mac80211/iface.c:1426 [inline]
ieee80211_iface_work+0x7f7/0xa40 net/mac80211/iface.c:1462

process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Tested on:

commit: 1b48773f io_uring: fix io_drain_req()
git tree: git://git.kernel.dk/linux-block io_uring-5.14
console output: https://syzkaller.appspot.com/x/log.txt?x=12040824300000
kernel config: https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:

Pavel Begunkov

unread,

Jul 15, 2021, 2:16:43 PM7/15/21

to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/15/21 10:15 AM, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in io_sq_thread_park

The interesting part here is this:

[ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004
[ 282.609927][ T1620] Call Trace:
[ 282.613711][ T1620] __schedule+0x93a/0x26f0
[ 282.618131][ T1620] ? io_schedule_timeout+0x140/0x140
[ 282.623446][ T1620] ? lockdep_hardirqs_on+0x79/0x100
[ 282.628789][ T1620] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 282.634647][ T1620] schedule+0xd3/0x270
[ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890
[ 282.644428][ T1620] ? __io_uring_free+0x170/0x170
[ 282.649361][ T1620] ? lockdep_hardirqs_on+0x79/0x100
[ 282.655639][ T1620] ? finish_wait+0x270/0x270
[ 282.660346][ T1620] io_sq_thread+0xaac/0x1250
[ 282.665096][ T1620] ? io_uring_cancel_generic+0x890/0x890
[ 282.670788][ T1620] ? ret_from_fork+0x8/0x30
[ 282.675295][ T1620] ? finish_wait+0x270/0x270
[ 282.679983][ T1620] ? rwlock_bug.part.0+0x90/0x90
[ 282.684958][ T1620] ? _raw_spin_unlock_irq+0x1f/0x40
[ 282.690155][ T1620] ? io_uring_cancel_generic+0x890/0x890
[ 282.696394][ T1620] ret_from_fork+0x1f/0x30

So we have a dying SQPOLL task, which is stuck in cancellation.
Let's add some debug output and see if we have a request lost.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

--
Pavel Begunkov

syzbot

unread,

Jul 15, 2021, 8:14:08 PM7/15/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:

WARNING in io_uring_cancel_generic

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_dump fs/io_uring.c:9117 [inline]
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Modules linked in:
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G W 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
FS: 00007fb893f40700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 00007f68dd0c6080 CR3: 000000002b457000 CR4: 0000000000350ef0
Call Trace:
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G W 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105

panic+0x306/0x73d kernel/panic.c:232
__warn.cold+0x35/0x44 kernel/panic.c:606
report_bug+0x1bd/0x210 lib/bug.c:199
handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..

Tested on:

commit: d55b8f3c io_uring: add syz debug info
git tree: https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=16f6cb80300000

Pavel Begunkov

unread,

Jul 16, 2021, 6:38:56 AM7/16/21

to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/16/21 1:14 AM, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

attempt to get stats #2

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

>

--
Pavel Begunkov

syzbot

unread,

Jul 16, 2021, 6:57:08 AM7/16/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Modules linked in:
CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
Call Trace:
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932

ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
panic+0x306/0x73d kernel/panic.c:232
__warn.cold+0x35/0x44 kernel/panic.c:606
report_bug+0x1bd/0x210 lib/bug.c:199
handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566

RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932

ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..

Tested on:

commit: 81fee56e io_uring: add syz debug info

git tree: https://github.com/isilence/linux.git syztest_sqpoll_hang

console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000

Pavel Begunkov

unread,

Jul 19, 2021, 9:15:56 AM7/19/21

to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, syzkall...@googlegroups.com

On 7/16/21 11:57 AM, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

Not clear yet, let's add more stats and rerun

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

>

--
Pavel Begunkov

syzbot

unread,

Jul 19, 2021, 9:46:13 AM7/19/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

3rd poll ffff88801857e000
ctx ffff88801857e000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1, cqes 1, poll tw 0, dpoll 1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 19055 at fs/io_uring.c:9166 io_dump fs/io_uring.c:9166 [inline]
WARNING: CPU: 0 PID: 19055 at fs/io_uring.c:9166 io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Modules linked in:
CPU: 0 PID: 19055 Comm: iou-sqp-19052 Tainted: G W 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

RIP: 0010:io_dump fs/io_uring.c:9166 [inline]
RIP: 0010:io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 e7 97 92 ff <0f> 0b e9 b1 fe ff ff e8 db 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc90002ff7c20 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880474b54c0 RCX: 0000000000000000
RDX: ffff8880474b54c0 RSI: ffffffff81e2f0c9 RDI: ffff88801857e518
RBP: ffff8880276e9f80 R08: 000000000000009d R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffffc90002ff7d40
R13: ffff888017015058 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f3ab01c2700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 0000000002aad848 CR3: 000000004a024000 CR4: 0000000000350ef0
Call Trace:
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6944

ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 19055 Comm: iou-sqp-19052 Tainted: G W 5.14.0-rc1-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
panic+0x306/0x73d kernel/panic.c:232
__warn.cold+0x35/0x44 kernel/panic.c:606
report_bug+0x1bd/0x210 lib/bug.c:199
handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566

RIP: 0010:io_dump fs/io_uring.c:9166 [inline]
RIP: 0010:io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 e7 97 92 ff <0f> 0b e9 b1 fe ff ff e8 db 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc90002ff7c20 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880474b54c0 RCX: 0000000000000000
RDX: ffff8880474b54c0 RSI: ffffffff81e2f0c9 RDI: ffff88801857e518
RBP: ffff8880276e9f80 R08: 000000000000009d R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffffc90002ff7d40
R13: ffff888017015058 R14: 0000000000000000 R15: 0000000000000001
io_sq_thread+0xaac/0x1250 fs/io_uring.c:6944

ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..

Tested on:

commit: 6828fc5e io_uring: add syz debug info

git tree: https://github.com/isilence/linux.git syztest_sqpoll_hang

console output: https://syzkaller.appspot.com/x/log.txt?x=10ed934c300000

Pavel Begunkov

unread,

Jul 19, 2021, 12:57:50 PM7/19/21

to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/16/21 11:57 AM, syzbot wrote:

> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

__arm_poll doesn't remove a second poll entry in case of failed
__io_queue_proc(), it's most likely the cause here.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

>

--
Pavel Begunkov

Jens Axboe

unread,

Jul 19, 2021, 1:13:55 PM7/19/21

to Pavel Begunkov, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/19/21 10:57 AM, Pavel Begunkov wrote:
> On 7/16/21 11:57 AM, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> WARNING in io_uring_cancel_generic
>
> __arm_poll doesn't remove a second poll entry in case of failed
> __io_queue_proc(), it's most likely the cause here.
>
> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

Was my thought on seeing the last debug run too. Haven't written a test
case, but my initial thought was catching this at the time that double
poll is armed, in __io_queue_proc(). Totally untested, just tossing
it out there.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0cac361bf6b8..ed33de5fffd2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
if (unlikely(poll->head)) {
struct io_poll_iocb *poll_one = poll;

+ /* first poll failed, don't arm double poll */
+ if (pt->error)
+ return;
/* already have a 2nd entry, fail a third attempt */
if (*poll_ptr) {
pt->error = -EINVAL;

--
Jens Axboe

syzbot

unread,

Jul 19, 2021, 1:15:10 PM7/19/21

to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ac9573...@syzkaller.appspotmail.com

Tested on:

commit: 61d27d88 io_uring: double remove on poll arm failure

git tree: https://github.com/isilence/linux.git syztest_sqpoll_hang

kernel config: https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:

Note: testing is done by a robot and is best-effort only.

Pavel Begunkov

unread,

Jul 19, 2021, 1:29:02 PM7/19/21

to Jens Axboe, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/19/21 6:13 PM, Jens Axboe wrote:
> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>> On 7/16/21 11:57 AM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>> WARNING in io_uring_cancel_generic
>>
>> __arm_poll doesn't remove a second poll entry in case of failed
>> __io_queue_proc(), it's most likely the cause here.
>>
>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
>
> Was my thought on seeing the last debug run too. Haven't written a test
> case, but my initial thought was catching this at the time that double
> poll is armed, in __io_queue_proc(). Totally untested, just tossing
> it out there.

Wouldn't help, unfortunately, the way syz triggers it is making a
request to go through __io_queue_proc() three times.

Either it's 3 waitqueues or we need to extend the check below to
the double poll entry.

if (poll_one->head == head)
return;

>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0cac361bf6b8..ed33de5fffd2 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
> if (unlikely(poll->head)) {
> struct io_poll_iocb *poll_one = poll;
>
> + /* first poll failed, don't arm double poll */
> + if (pt->error)
> + return;
> /* already have a 2nd entry, fail a third attempt */
> if (*poll_ptr) {
> pt->error = -EINVAL;
>

--

Pavel Begunkov

Jens Axboe

unread,

Jul 19, 2021, 2:38:40 PM7/19/21

to Pavel Begunkov, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, wi...@kernel.org

On 7/19/21 11:28 AM, Pavel Begunkov wrote:
> On 7/19/21 6:13 PM, Jens Axboe wrote:
>> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>>> On 7/16/21 11:57 AM, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>>> WARNING in io_uring_cancel_generic
>>>
>>> __arm_poll doesn't remove a second poll entry in case of failed
>>> __io_queue_proc(), it's most likely the cause here.
>>>
>>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
>>
>> Was my thought on seeing the last debug run too. Haven't written a test
>> case, but my initial thought was catching this at the time that double
>> poll is armed, in __io_queue_proc(). Totally untested, just tossing
>> it out there.
>
> Wouldn't help, unfortunately, the way syz triggers it is making a
> request to go through __io_queue_proc() three times.
>
> Either it's 3 waitqueues or we need to extend the check below to
> the double poll entry.
>
> if (poll_one->head == head)
> return;

Yes good point, that'd depend on single poll erroring first. Given
the variety of cases for it, catching it after the fact like in your
patch is likely the simplest/cleanest way.

--
Jens Axboe

Reply all

Reply to author

Forward

[syzbot] INFO: task hung in io_sq_thread_park (2)

syzbot

Pavel Begunkov

syzbot

Pavel Begunkov

syzbot

Pavel Begunkov

syzbot

Pavel Begunkov

syzbot

Pavel Begunkov

Jens Axboe

syzbot

Pavel Begunkov

Jens Axboe