[syzbot] [io-uring?] KCSAN: data-race in __flush_work / __flush_work (2)

13 views
Skip to first unread message

syzbot

unread,
Aug 5, 2024, 2:53:18 AMAug 5
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: a5dbd76a8942 Merge tag 'x86-urgent-2024-08-04' of git://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13d5a373980000
kernel config: https://syzkaller.appspot.com/x/.config?x=d16924117a4f7e9
dashboard link: https://syzkaller.appspot.com/bug?extid=b3e4f2f51ed645fd5df2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ba663ad5dbf5/disk-a5dbd76a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/226a427d6581/vmlinux-a5dbd76a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9f982777516a/bzImage-a5dbd76a.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b3e4f2...@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in __flush_work / __flush_work

write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0:
instrument_write include/linux/instrumented.h:41 [inline]
___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline]
insert_wq_barrier kernel/workqueue.c:3790 [inline]
start_flush_work kernel/workqueue.c:4142 [inline]
__flush_work+0x30b/0x570 kernel/workqueue.c:4178
flush_work kernel/workqueue.c:4229 [inline]
flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
io_fallback_tw+0x24b/0x320 io_uring/io_uring.c:1087
tctx_task_work_run+0xd1/0x1b0 io_uring/io_uring.c:1099
tctx_task_work+0x40/0x80 io_uring/io_uring.c:1124
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x5dd/0x1720 kernel/exit.c:882
do_group_exit+0x102/0x150 kernel/exit.c:1031
get_signal+0xf2f/0x1080 kernel/signal.c:2917
arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
__flush_work+0x42a/0x570 kernel/workqueue.c:4188
flush_work kernel/workqueue.c:4229 [inline]
flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
io_uring_try_cancel_requests+0x35b/0x370 io_uring/io_uring.c:3000
io_ring_exit_work+0x148/0x500 io_uring/io_uring.c:2779
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3312
worker_thread+0x526/0x700 kernel/workqueue.c:3390
kthread+0x1d1/0x210 kernel/kthread.c:389
ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

value changed: 0x0000000000400000 -> 0xffff88810006c00d

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 50 Comm: kworker/u8:3 Not tainted 6.11.0-rc1-syzkaller-00334-ga5dbd76a8942 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
Workqueue: iou_exit io_ring_exit_work
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Jens Axboe

unread,
Aug 5, 2024, 10:23:33 AMAug 5
to syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
This looks like workqueue, CC'ing maintainers. Unsure if this is benign
or not.

#syz set subsystems: kernel

--
Jens Axboe

Tejun Heo

unread,
Aug 5, 2024, 2:55:00 PMAug 5
to Jens Axboe, syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
Hello,

On Mon, Aug 05, 2024 at 08:23:28AM -0600, Jens Axboe wrote:
> > read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
> > __flush_work+0x42a/0x570 kernel/workqueue.c:4188
> > flush_work kernel/workqueue.c:4229 [inline]
> > flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
> > io_uring_try_cancel_requests+0x35b/0x370 io_uring/io_uring.c:3000
> > io_ring_exit_work+0x148/0x500 io_uring/io_uring.c:2779
> > process_one_work kernel/workqueue.c:3231 [inline]
> > process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3312
> > worker_thread+0x526/0x700 kernel/workqueue.c:3390
> > kthread+0x1d1/0x210 kernel/kthread.c:389
> > ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

The offending line is:

/*
* start_flush_work() returned %true. If @from_cancel is set, we know
* that @work must have been executing during start_flush_work() and
* can't currently be queued. Its data must contain OFFQ bits. If @work
* was queued on a BH workqueue, we also know that it was running in the
* BH context and thus can be busy-waited.
*/
-> data = *work_data_bits(work);
if (from_cancel &&
!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {

It is benign but the code is also wrong. When @from_cancel, we know that we
own the work item through its pending bit and thus its data bits cannot
change. Also, the read data value is only used when @from_cancel. So, the
code is not necessarily broken but the compiler can easily generate the read
before @from_cancel test, which is what the code is saying anyway and looks
like how the compiler generated the code according to the disassembly of the
vmlinux in the report.

So, it's benign in that the read value won't be used if !@from_cancel and
data race only exists when !@from_cancel. The code is wrong in that it can
easily generate the spurious data race read. I'll fix it.

Thanks.

--
tejun

Tejun Heo

unread,
Aug 5, 2024, 3:48:04 PMAug 5
to Jens Axboe, syzbot, asml.s...@gmail.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
From c5eac4d10384f1ddd728e143ede35eaa6081c61e Mon Sep 17 00:00:00 2001
From: Tejun Heo <t...@kernel.org>
Date: Mon, 5 Aug 2024 09:37:25 -1000

When flushing a work item for cancellation, __flush_work() knows that it
exclusively owns the work item through its PENDING bit. 134874e2eee9
("workqueue: Allow cancel_work_sync() and disable_work() from atomic
contexts on BH work items") added a read of @work->data to determine whether
to use busy wait for BH work items that are being canceled. While the read
is safe when @from_cancel, @work->data was read before testing @from_cancel
to simplify code structure:

data = *work_data_bits(work);
if (from_cancel &&
!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {

While the read data was never used if !@from_cancel, this could trigger
KCSAN data race detection spuriously:

==================================================================
BUG: KCSAN: data-race in __flush_work / __flush_work

write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0:
instrument_write include/linux/instrumented.h:41 [inline]
___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline]
insert_wq_barrier kernel/workqueue.c:3790 [inline]
start_flush_work kernel/workqueue.c:4142 [inline]
__flush_work+0x30b/0x570 kernel/workqueue.c:4178
flush_work kernel/workqueue.c:4229 [inline]
...

read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
__flush_work+0x42a/0x570 kernel/workqueue.c:4188
flush_work kernel/workqueue.c:4229 [inline]
flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
...

value changed: 0x0000000000400000 -> 0xffff88810006c00d

Reorganize the code so that @from_cancel is tested before @work->data is
accessed. The only problem is triggering KCSAN detection spuriously. This
shouldn't need READ_ONCE() or other access qualifiers.

No functional changes.

Signed-off-by: Tejun Heo <t...@kernel.org>
Reported-by: syzbot+b3e4f2...@syzkaller.appspotmail.com
Fixes: 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items")
Link: http://lkml.kernel.org/r/000000000000ae...@google.com
Cc: Jens Axboe <ax...@kernel.dk>
---
Applied to wq/for-6.11-fixes.

Thanks.

kernel/workqueue.c | 45 +++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d56bd2277e58..ef174d8c1f63 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4166,7 +4166,6 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
static bool __flush_work(struct work_struct *work, bool from_cancel)
{
struct wq_barrier barr;
- unsigned long data;

if (WARN_ON(!wq_online))
return false;
@@ -4184,29 +4183,35 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
* was queued on a BH workqueue, we also know that it was running in the
* BH context and thus can be busy-waited.
*/
- data = *work_data_bits(work);
- if (from_cancel &&
- !WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {
- /*
- * On RT, prevent a live lock when %current preempted soft
- * interrupt processing or prevents ksoftirqd from running by
- * keeping flipping BH. If the BH work item runs on a different
- * CPU then this has no effect other than doing the BH
- * disable/enable dance for nothing. This is copied from
- * kernel/softirq.c::tasklet_unlock_spin_wait().
- */
- while (!try_wait_for_completion(&barr.done)) {
- if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
- local_bh_disable();
- local_bh_enable();
- } else {
- cpu_relax();
+ if (from_cancel) {
+ unsigned long data = *work_data_bits(work);
+
+ if (!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) &&
+ (data & WORK_OFFQ_BH)) {
+ /*
+ * On RT, prevent a live lock when %current preempted
+ * soft interrupt processing or prevents ksoftirqd from
+ * running by keeping flipping BH. If the BH work item
+ * runs on a different CPU then this has no effect other
+ * than doing the BH disable/enable dance for nothing.
+ * This is copied from
+ * kernel/softirq.c::tasklet_unlock_spin_wait().
+ */
+ while (!try_wait_for_completion(&barr.done)) {
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ local_bh_disable();
+ local_bh_enable();
+ } else {
+ cpu_relax();
+ }
}
+ goto out_destroy;
}
- } else {
- wait_for_completion(&barr.done);
}

+ wait_for_completion(&barr.done);
+
+out_destroy:
destroy_work_on_stack(&barr.work);
return true;
}
--
2.46.0

Reply all
Reply to author
Forward
0 new messages