[syzbot] possible deadlock in io_sq_thread_finish

13 views
Skip to first unread message

syzbot

unread,
Mar 7, 2021, 4:49:22 AM3/7/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: a38fd874 Linux 5.12-rc2
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad00000
kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2
dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac3985...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.12.0-rc2-syzkaller #0 Not tainted
--------------------------------------------
kworker/u4:7/7615 is trying to acquire lock:
ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_stop fs/io_uring.c:7099 [inline]
ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_put_sq_data fs/io_uring.c:7115 [inline]
ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139

but task is already holding lock:
ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&sqd->lock);
lock(&sqd->lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/u4:7/7615:
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2246
#1: ffffc900023a7da8 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250
#2: ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
#2: ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082

stack backtrace:
CPU: 1 PID: 7615 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound io_ring_exit_work
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
check_deadlock kernel/locking/lockdep.c:2872 [inline]
validate_chain kernel/locking/lockdep.c:3661 [inline]
__lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
__mutex_lock_common kernel/locking/mutex.c:946 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093
io_sq_thread_stop fs/io_uring.c:7099 [inline]
io_put_sq_data fs/io_uring.c:7115 [inline]
io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139
io_ring_ctx_free fs/io_uring.c:8408 [inline]
io_ring_exit_work+0x82/0x9a0 fs/io_uring.c:8539
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Pavel Begunkov

unread,
Mar 7, 2021, 7:43:25 AM3/7/21
to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 07/03/2021 09:49, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: a38fd874 Linux 5.12-rc2
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ac3985...@syzkaller.appspotmail.com

Legit error, park() might take an sqd lock, and then we take it again.
I'll patch it up
--
Pavel Begunkov

Pavel Begunkov

unread,
Mar 7, 2021, 8:24:51 AM3/7/21
to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 07/03/2021 12:39, Pavel Begunkov wrote:
> On 07/03/2021 09:49, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: a38fd874 Linux 5.12-rc2
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2
>> dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+ac3985...@syzkaller.appspotmail.com
>
> Legit error, park() might take an sqd lock, and then we take it again.
> I'll patch it up

I was wrong, it looks fine, io_put_sq_data() and io_sq_thread_park()
don't nest. I wonder if that's a false positive due to conditional
locking as below

if (sqd->thread == current)
return;
mutex_lock(&sqd->lock);

syzbot

unread,
Mar 9, 2021, 9:04:18 AM3/9/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: 144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=129addbcd00000
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=167574dad00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c8f566d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac3985...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.12.0-rc2-syzkaller #0 Not tainted
--------------------------------------------
kworker/u4:7/8696 is trying to acquire lock:
ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_stop fs/io_uring.c:7099 [inline]
ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_put_sq_data fs/io_uring.c:7115 [inline]
ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139

but task is already holding lock:
ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&sqd->lock);
lock(&sqd->lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/u4:7/8696:
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2246
#1: ffffc9000253fda8 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250
#2: ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
#2: ffff888015395870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082

stack backtrace:
CPU: 0 PID: 8696 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0

Jens Axboe

unread,
Mar 9, 2021, 9:57:46 AM3/9/21
to syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 3/9/21 7:04 AM, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=129addbcd00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=167574dad00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c8f566d00000

#syz test: git://git.kernel.dk/linux-block io_uring-5.12

--
Jens Axboe

syzbot

unread,
Mar 9, 2021, 5:29:08 PM3/9/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: use-after-free Read in io_sq_thread

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
Read of size 8 at addr ffff888034cbfc78 by task iou-sqp-10518/10523

CPU: 0 PID: 10523 Comm: iou-sqp-10518 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
down_write+0x92/0x150 kernel/locking/rwsem.c:1406
io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 10518:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:427 [inline]
____kasan_kmalloc mm/kasan/common.c:506 [inline]
____kasan_kmalloc mm/kasan/common.c:465 [inline]
__kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515
kmalloc include/linux/slab.h:554 [inline]
kzalloc include/linux/slab.h:684 [inline]
io_get_sq_data fs/io_uring.c:7156 [inline]
io_sq_offload_create fs/io_uring.c:7830 [inline]
io_uring_create fs/io_uring.c:9443 [inline]
io_uring_setup+0x1552/0x2860 fs/io_uring.c:9523
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 396:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367
kasan_slab_free include/linux/kasan.h:199 [inline]
slab_free_hook mm/slub.c:1562 [inline]
slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600
slab_free mm/slub.c:3161 [inline]
kfree+0xe5/0x7f0 mm/slub.c:4213
io_put_sq_data fs/io_uring.c:7098 [inline]
io_sq_thread_finish+0x4b0/0x5f0 fs/io_uring.c:7116
io_ring_ctx_free fs/io_uring.c:8355 [inline]
io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

The buggy address belongs to the object at ffff888034cbfc00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 120 bytes inside of
512-byte region [ffff888034cbfc00, ffff888034cbfe00)
The buggy address belongs to the page:
page:000000004a1f04c4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x34cbc
head:000000004a1f04c4 order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head)
raw: 00fff00000010200 dead000000000100 dead000000000122 ffff88800fc41c80
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888034cbfb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888034cbfb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888034cbfc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888034cbfc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888034cbfd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


Tested on:

commit: 8bf06ba6 io_uring: remove unneeded variable 'ret'
git tree: git://git.kernel.dk/linux-block io_uring-5.12
console output: https://syzkaller.appspot.com/x/log.txt?x=13fcd952d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=b3c6cab008c50864
dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda
compiler:

Jens Axboe

unread,
Mar 9, 2021, 6:34:36 PM3/9/21
to syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Jens Axboe

unread,
Mar 9, 2021, 6:45:07 PM3/9/21
to syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Mar 9, 2021, 10:46:09 PM3/9/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: use-after-free Read in io_sq_thread

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
Read of size 8 at addr ffff888023e47c78 by task iou-sqp-10156/10158

CPU: 0 PID: 10158 Comm: iou-sqp-10156 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
down_write+0x92/0x150 kernel/locking/rwsem.c:1406
io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 10156:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:427 [inline]
____kasan_kmalloc mm/kasan/common.c:506 [inline]
____kasan_kmalloc mm/kasan/common.c:465 [inline]
__kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515
kmalloc include/linux/slab.h:554 [inline]
kzalloc include/linux/slab.h:684 [inline]
io_get_sq_data fs/io_uring.c:7153 [inline]
io_sq_offload_create fs/io_uring.c:7827 [inline]
io_uring_create fs/io_uring.c:9443 [inline]
io_uring_setup+0x154b/0x2940 fs/io_uring.c:9523
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 3392:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367
kasan_slab_free include/linux/kasan.h:199 [inline]
slab_free_hook mm/slub.c:1562 [inline]
slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600
slab_free mm/slub.c:3161 [inline]
kfree+0xe5/0x7f0 mm/slub.c:4213
io_put_sq_data fs/io_uring.c:7095 [inline]
io_sq_thread_finish+0x48e/0x5b0 fs/io_uring.c:7113
io_ring_ctx_free fs/io_uring.c:8355 [inline]
io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

The buggy address belongs to the object at ffff888023e47c00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 120 bytes inside of
512-byte region [ffff888023e47c00, ffff888023e47e00)
The buggy address belongs to the page:
page:00000000200f7571 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023e47400 pfn:0x23e44
head:00000000200f7571 order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head)
raw: 00fff00000010200 ffffea00005f6908 ffffea0000527508 ffff88800fc41c80
raw: ffff888023e47400 000000000010000f 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888023e47b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888023e47b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888023e47c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888023e47c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888023e47d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


Tested on:

commit: dc5c40fb io_uring: always wait for sqd exited when stoppin..
git tree: git://git.kernel.dk/linux-block io_uring-5.12
console output: https://syzkaller.appspot.com/x/log.txt?x=16cd022cd00000

Hillf Danton

unread,
Mar 9, 2021, 11:10:39 PM3/9/21
to syzbot, asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Tue, 09 Mar 2021 14:29:07 -0800
Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree
by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to
the arrival of urgent signal because it misleads io_sq_thread_stop(),
though a followup cleanup should go there.

--- x/fs/io_uring.c
+++ y/fs/io_uring.c
@@ -6689,10 +6689,8 @@ static int io_sq_thread(void *data)
io_sqd_init_new(sqd);
timeout = jiffies + sqd->sq_thread_idle;
}
- if (fatal_signal_pending(current)) {
- set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
+ if (fatal_signal_pending(current))
break;
- }
sqt_spin = false;
cap_entries = !list_is_singular(&sqd->ctx_list);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {

syzbot

unread,
Mar 9, 2021, 11:12:05 PM3/9/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: use-after-free Read in io_sq_thread

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
Read of size 8 at addr ffff88801d418c78 by task iou-sqp-10269/10271

CPU: 1 PID: 10271 Comm: iou-sqp-10269 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
down_write+0x92/0x150 kernel/locking/rwsem.c:1406
io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 10269:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:427 [inline]
____kasan_kmalloc mm/kasan/common.c:506 [inline]
____kasan_kmalloc mm/kasan/common.c:465 [inline]
__kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515
kmalloc include/linux/slab.h:554 [inline]
kzalloc include/linux/slab.h:684 [inline]
io_get_sq_data fs/io_uring.c:7153 [inline]
io_sq_offload_create fs/io_uring.c:7827 [inline]
io_uring_create fs/io_uring.c:9443 [inline]
io_uring_setup+0x154b/0x2940 fs/io_uring.c:9523
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 9:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367
kasan_slab_free include/linux/kasan.h:199 [inline]
slab_free_hook mm/slub.c:1562 [inline]
slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600
slab_free mm/slub.c:3161 [inline]
kfree+0xe5/0x7f0 mm/slub.c:4213
io_put_sq_data fs/io_uring.c:7095 [inline]
io_sq_thread_finish+0x48e/0x5b0 fs/io_uring.c:7113
io_ring_ctx_free fs/io_uring.c:8355 [inline]
io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

The buggy address belongs to the object at ffff88801d418c00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 120 bytes inside of
512-byte region [ffff88801d418c00, ffff88801d418e00)
The buggy address belongs to the page:
page:00000000311e6f59 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d418
head:00000000311e6f59 order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head)
raw: 00fff00000010200 dead000000000100 dead000000000122 ffff88800fc41c80
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88801d418b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88801d418b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801d418c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801d418c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801d418d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


Tested on:

commit: dc5c40fb io_uring: always wait for sqd exited when stoppin..
git tree: git://git.kernel.dk/linux-block io_uring-5.12
console output: https://syzkaller.appspot.com/x/log.txt?x=111d175cd00000

Pavel Begunkov

unread,
Mar 10, 2021, 8:44:40 AM3/10/21
to Hillf Danton, syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 10/03/2021 04:10, Hillf Danton wrote:>
> Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree
> by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to
> the arrival of urgent signal because it misleads io_sq_thread_stop(),
> though a followup cleanup should go there.

That's actually reasonable, just like
8bff1bf8abeda ("io_uring: fix io_sq_offload_create error handling")

Are you going to send a patch?

>
> --- x/fs/io_uring.c
> +++ y/fs/io_uring.c
> @@ -6689,10 +6689,8 @@ static int io_sq_thread(void *data)
> io_sqd_init_new(sqd);
> timeout = jiffies + sqd->sq_thread_idle;
> }
> - if (fatal_signal_pending(current)) {
> - set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
> + if (fatal_signal_pending(current))
> break;
> - }
> sqt_spin = false;
> cap_entries = !list_is_singular(&sqd->ctx_list);
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
>

--
Pavel Begunkov

Jens Axboe

unread,
Mar 10, 2021, 9:26:59 AM3/10/21
to Pavel Begunkov, Hillf Danton, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 3/10/21 6:40 AM, Pavel Begunkov wrote:
> On 10/03/2021 04:10, Hillf Danton wrote:>
>> Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree
>> by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to
>> the arrival of urgent signal because it misleads io_sq_thread_stop(),
>> though a followup cleanup should go there.
>
> That's actually reasonable, just like
> 8bff1bf8abeda ("io_uring: fix io_sq_offload_create error handling")
>
> Are you going to send a patch?

Agree - Hillf, do you mind if I just fold this one in?

--
Jens Axboe

Jens Axboe

unread,
Mar 10, 2021, 9:37:33 AM3/10/21
to syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Mar 10, 2021, 10:29:09 AM3/10/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ac3985...@syzkaller.appspotmail.com

Tested on:

commit: 7d41e854 io_uring: remove indirect ctx into sqo injection
git tree: git://git.kernel.dk/linux-block io_uring-5.12
Note: testing is done by a robot and is best-effort only.

Hillf Danton

unread,
Mar 10, 2021, 9:00:36 PM3/10/21
to Jens Axboe, Pavel Begunkov, Hillf Danton, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, 10 Mar 2021 07:26:58 -0700 Jens Axboe wrote:
> On 3/10/21 6:40 AM, Pavel Begunkov wrote:
> > On 10/03/2021 04:10, Hillf Danton wrote:>
> >> Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree
> >> by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to
> >> the arrival of urgent signal because it misleads io_sq_thread_stop(),
> >> though a followup cleanup should go there.
> >
> > That's actually reasonable, just like
> > 8bff1bf8abeda ("io_uring: fix io_sq_offload_create error handling")

Thanks for taking a look at it :)
> >
> > Are you going to send a patch?
>
> Agree - Hillf, do you mind if I just fold this one in?

Feel free to put it in your tree.

Hillf

syzbot

unread,
Apr 8, 2021, 3:43:10 PM4/8/21
to asml.s...@gmail.com, ax...@kernel.dk, b...@alien8.de, hda...@sina.com, h...@zytor.com, io-u...@vger.kernel.org, jmat...@google.com, jo...@8bytes.org, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, sea...@google.com, syzkall...@googlegroups.com, tg...@linutronix.de, vkuz...@redhat.com, wanp...@tencent.com, x...@kernel.org
syzbot suspects this issue was fixed by commit:

commit f4e61f0c9add3b00bd5f2df3c814d688849b8707
Author: Wanpeng Li <wanp...@tencent.com>
Date: Mon Mar 15 06:55:28 2021 +0000

x86/kvm: Fix broken irq restoration in kvm_wait

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1022d7aad00000
start commit: 144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'..
git tree: upstream
If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: x86/kvm: Fix broken irq restoration in kvm_wait

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

syzbot

unread,
Apr 8, 2021, 4:54:53 PM4/8/21
to Paolo Bonzini, pbon...@redhat.com, syzkall...@googlegroups.com
> #syz fix: x86/kvm: Fix broken irq restoration in kvm_wait

Your 'fix:' command is accepted, but please keep syzkall...@googlegroups.com mailing list in CC next time. It serves as a history of what happened with each bug report. Thank you.

>
Reply all
Reply to author
Forward
0 new messages