[syzbot] possible deadlock in worker_thread

8 views
Skip to first unread message

syzbot

unread,
Feb 10, 2022, 2:27:21 PM2/10/22
to bvana...@acm.org, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: d8ad2ce873ab Merge tag 'ext4_for_linus_stable' of git://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17823662700000
kernel config: https://syzkaller.appspot.com/x/.config?x=6f043113811433a5
dashboard link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+831661...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
------------------------------------------------------
kworker/0:7/17139 is trying to acquire lock:
ffff888077a89938 ((wq_completion)loop1){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13a0 kernel/workqueue.c:2824

but task is already holding lock:
ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #7 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}:
process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #6 ((wq_completion)events_long){+.+.}-{0:0}:
flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
srp_remove_one+0x1cf/0x320 drivers/infiniband/ulp/srp/ib_srp.c:4052
remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:775
disable_device+0x13b/0x270 drivers/infiniband/core/device.c:1281
__ib_unregister_device+0x98/0x1a0 drivers/infiniband/core/device.c:1474
ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #5 (&device->unregistration_lock){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
__ib_unregister_device+0x25/0x1a0 drivers/infiniband/core/device.c:1470
ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&rdma_nl_types[idx].sem){.+.+}-{3:3}:
down_read+0x98/0x440 kernel/locking/rwsem.c:1461
rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
sock_no_sendpage+0xf6/0x140 net/core/sock.c:3091
kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3492
kernel_sendpage net/socket.c:3489 [inline]
sock_sendpage+0xe5/0x140 net/socket.c:1007
pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
splice_from_pipe_feed fs/splice.c:418 [inline]
__splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
splice_from_pipe fs/splice.c:597 [inline]
generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
do_splice_from fs/splice.c:767 [inline]
do_splice+0xb7e/0x1960 fs/splice.c:1079
__do_splice+0x134/0x250 fs/splice.c:1144
__do_sys_splice fs/splice.c:1350 [inline]
__se_sys_splice fs/splice.c:1332 [inline]
__x64_sys_splice+0x198/0x250 fs/splice.c:1332
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&pipe->mutex/1){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
pipe_lock_nested fs/pipe.c:82 [inline]
pipe_lock+0x5a/0x70 fs/pipe.c:90
iter_file_splice_write+0x15a/0xc10 fs/splice.c:635
do_splice_from fs/splice.c:767 [inline]
do_splice+0xb7e/0x1960 fs/splice.c:1079
__do_splice+0x134/0x250 fs/splice.c:1144
__do_sys_splice fs/splice.c:1350 [inline]
__se_sys_splice fs/splice.c:1332 [inline]
__x64_sys_splice+0x198/0x250 fs/splice.c:1332
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#3){.+.+}-{0:0}:
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write include/linux/fs.h:1722 [inline]
sb_start_write include/linux/fs.h:1792 [inline]
file_start_write include/linux/fs.h:2937 [inline]
lo_write_bvec drivers/block/loop.c:242 [inline]
lo_write_simple drivers/block/loop.c:265 [inline]
do_req_filebacked drivers/block/loop.c:494 [inline]
loop_handle_cmd drivers/block/loop.c:1853 [inline]
loop_process_work+0x1499/0x1db0 drivers/block/loop.c:1893
process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((work_completion)(&worker->work)){+.+.}-{0:0}:
process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 ((wq_completion)loop1){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3063 [inline]
check_prevs_add kernel/locking/lockdep.c:3186 [inline]
validate_chain kernel/locking/lockdep.c:3801 [inline]
__lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
__loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

other info that might help us debug this:

Chain exists of:
(wq_completion)loop1 --> (wq_completion)events_long --> (work_completion)(&lo->rundown_work)

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock((work_completion)(&lo->rundown_work));
lock((wq_completion)events_long);
lock((work_completion)(&lo->rundown_work));
lock((wq_completion)loop1);

*** DEADLOCK ***

2 locks held by kworker/0:7/17139:
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:631 [inline]
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:658 [inline]
#0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0x890/0x1650 kernel/workqueue.c:2278
#1: ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

stack backtrace:
CPU: 0 PID: 17139 Comm: kworker/0:7 Not tainted 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_long loop_rundown_workfn
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2143
check_prev_add kernel/locking/lockdep.c:3063 [inline]
check_prevs_add kernel/locking/lockdep.c:3186 [inline]
validate_chain kernel/locking/lockdep.c:3801 [inline]
__lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
__loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
usb 4-1: new high-speed USB device number 63 using dummy_hcd
usb 4-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 4-1: Product: syz
usb 4-1: Manufacturer: syz
usb 4-1: SerialNumber: syz
usb 4-1: config 0 descriptor??
usb-storage 4-1:0.0: USB Mass Storage device detected
usb 1-1: new high-speed USB device number 103 using dummy_hcd
usb 1-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: syz
usb 1-1: Manufacturer: syz
usb 1-1: SerialNumber: syz
usb 1-1: config 0 descriptor??
usb-storage 1-1:0.0: USB Mass Storage device detected
usb 4-1: USB disconnect, device number 65


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Bart Van Assche

unread,
Feb 11, 2022, 1:59:58 PM2/11/22
to syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
On 2/10/22 11:27, syzbot wrote:
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
> ------------------------------------------------------

Since the SRP initiator driver is involved, I will take a look. However,
I'm not sure yet when I will have the time to post a fix.

Thanks,

Bart.

Tetsuo Handa

unread,
Feb 12, 2022, 12:32:07 AM2/12/22
to Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
This problem was already handled by commit bf23747ee0532090 ("loop:
revert "make autoclear operation asynchronous"").

But this report might be suggesting us that we should consider
deprecating (and eventually getting rid of) system-wide workqueues
(declared in include/linux/workqueue.h), for since flush_workqueue()
synchronously waits for completion, sharing system-wide workqueues
among multiple modules can generate unexpected locking dependency
chain (like this report).

If some module needs flush_workqueue() or flush_*_work(), shouldn't
such module create and use their own workqueues?

Tejun, what do you think?

Bart Van Assche

unread,
Feb 12, 2022, 11:37:59 AM2/12/22
to Tetsuo Handa, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
On 2/11/22 21:31, Tetsuo Handa wrote:
> But this report might be suggesting us that we should consider
> deprecating (and eventually getting rid of) system-wide workqueues
> (declared in include/linux/workqueue.h), for since flush_workqueue()
> synchronously waits for completion, sharing system-wide workqueues
> among multiple modules can generate unexpected locking dependency
> chain (like this report).

I do not agree with deprecating system-wide workqueues. I think that all
flush_workqueue(system_long_wq) calls should be reviewed since these are
deadlock-prone.

Thanks,

Bart.

Tetsuo Handa

unread,
Feb 12, 2022, 12:14:20 PM2/12/22
to Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
The loop module is not using flush_workqueue(system_long_wq) call; it only
scheduled a work via system_long_wq which will call flush_workqueue() (of
a local workqueue) from drain_workqueue() from destroy_workqueue() from
work function.

How can reviewing all flush_workqueue(system_long_wq) calls help?

Leon Romanovsky

unread,
Feb 13, 2022, 10:33:53 AM2/13/22
to Tetsuo Handa, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
On Sun, Feb 13, 2022 at 02:14:09AM +0900, Tetsuo Handa wrote:
> On 2022/02/13 1:37, Bart Van Assche wrote:
> > On 2/11/22 21:31, Tetsuo Handa wrote:
> >> But this report might be suggesting us that we should consider
> >> deprecating (and eventually getting rid of) system-wide workqueues
> >> (declared in include/linux/workqueue.h), for since flush_workqueue()
> >> synchronously waits for completion, sharing system-wide workqueues
> >> among multiple modules can generate unexpected locking dependency
> >> chain (like this report).
> >
> > I do not agree with deprecating system-wide workqueues. I think that
> > all flush_workqueue(system_long_wq) calls should be reviewed since
> > these are deadlock-prone.
> >
> > Thanks,
> >
> > Bart.
>

I second to Bart's request to do not deprecate system-wide workqueues.

Thanks

>

Bart Van Assche

unread,
Feb 13, 2022, 6:06:42 PM2/13/22
to Tetsuo Handa, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
On 2/12/22 09:14, Tetsuo Handa wrote:
> How can reviewing all flush_workqueue(system_long_wq) calls help?

It is allowed to queue blocking actions on system_long_wq.
flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp)
wait on a blocking action from a higher layer (e.g. the loop driver) and
thereby cause a deadlock. Hence my proposal to review all
flush_workqueue(system_long_wq) calls.

Thanks,

Bart.

Tetsuo Handa

unread,
Feb 13, 2022, 8:08:10 PM2/13/22
to Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Tejun Heo, Lai Jiangshan
On 2022/02/14 8:06, Bart Van Assche wrote:
> On 2/12/22 09:14, Tetsuo Handa wrote:
>> How can reviewing all flush_workqueue(system_long_wq) calls help?
>
> It is allowed to queue blocking actions on system_long_wq.

Correct.

> flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp)
> wait on a blocking action from a higher layer (e.g. the loop driver)
> and thereby cause a deadlock.

Correct.

> Hence my proposal to review all flush_workqueue(system_long_wq) calls.

Maybe I'm misunderstanding what the "review" means.

My proposal is to "rewrite" any module which needs to call flush_workqueue()
on system-wide workqueues or call flush_work()/flush_*_work() which will
depend on system-wide workqueues.

That is, for example, "rewrite" ib_srp module not to call flush_workqueue(system_long_wq).

+ srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", 0, 0);

- queue_work(system_long_wq, &target->tl_err_work);
+ queue_work(srp_tl_err_wq, &target->tl_err_work);

- flush_workqueue(system_long_wq);
+ flush_workqueue(srp_tl_err_wq);

+ destroy_workqueue(srp_tl_err_wq);

Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.

Tejun Heo

unread,
Feb 13, 2022, 10:44:28 PM2/13/22
to Tetsuo Handa, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
Hello,

On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
> + destroy_workqueue(srp_tl_err_wq);
>
> Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.

Yeah, this is the right thing to do. It makes no sense at all to call
flush_workqueue() on the shared workqueues as the caller has no idea what
it's gonna end up waiting for. It was on my todo list a long while ago but
slipped through the crack. If anyone wanna take a stab at it (including
scrubbing the existing users, of course), please be my guest.

Thanks.

--
tejun

Tetsuo Handa

unread,
Feb 14, 2022, 8:37:07 AM2/14/22
to Tejun Heo, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
OK. Then, I propose below patch. If you are OK with this approach, I can
keep this via my tree as a linux-next only experimental patch for one or
two weeks, in order to see if someone complains.

From 95a3aa8d46c8479c95672305645247ba70312113 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Mon, 14 Feb 2022 22:28:21 +0900
Subject: [PATCH] workqueue: Warn on flushing system-wide workqueues

syzbot found a circular locking dependency which is caused by flushing
system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
to call flush_workqueue() on the shared workqueues as the caller has no
idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, it will be too
difficult to guarantee that all users safely flush system-wide WQs.

Therefore, let's change the direction to that developers had better use
their own WQs if flushing is inevitable. To give developers time to update
their modules, for now just emit a warning message when flush_workqueue()
is called on system-wide WQs. We will eventually convert this warning
message into WARN_ON() and kill flush_scheduled_work().

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
kernel/workqueue.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..5ef40b9a1842 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,37 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
return wait;
}

+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+#ifdef CONFIG_PROVE_LOCKING
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+ const char *name;
+
+ if (wq == system_wq)
+ name = "system_wq";
+ else if (wq == system_highpri_wq)
+ name = "system_highpri_wq";
+ else if (wq == system_long_wq)
+ name = "system_long_wq";
+ else if (wq == system_unbound_wq)
+ name = "system_unbound_wq";
+ else if (wq == system_freezable_wq)
+ name = "system_freezable_wq";
+ else if (wq == system_power_efficient_wq)
+ name = "system_power_efficient_wq";
+ else if (wq == system_freezable_power_efficient_wq)
+ name = "system_freezable_power_efficient_wq";
+ else
+ return;
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+ name);
+ dump_stack();
+#endif
+}
+
/**
* flush_workqueue - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
@@ -2824,6 +2855,8 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;

+ warn_if_flushing_global_workqueue(wq);
+
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);

--
2.32.0

Tejun Heo

unread,
Feb 14, 2022, 12:34:45 PM2/14/22
to Tetsuo Handa, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
Hello,

On Mon, Feb 14, 2022 at 10:36:57PM +0900, Tetsuo Handa wrote:
> OK. Then, I propose below patch. If you are OK with this approach, I can
> keep this via my tree as a linux-next only experimental patch for one or
> two weeks, in order to see if someone complains.

I don't mind you testing that way but this and would much prefer this and
related changes in the wq tree.
Instead of doing the above, please add a wq flag to mark system wqs and
trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.

Thanks.

--
tejun

Tetsuo Handa

unread,
Feb 15, 2022, 5:26:29 AM2/15/22
to Tejun Heo, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Lai Jiangshan
On 2022/02/15 2:34, Tejun Heo wrote:
>
> Instead of doing the above, please add a wq flag to mark system wqs and
> trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.
>

Do you mean something like below?
I think the above is easier to understand (for developers) because

The above prints variable's name (one of 'system_wq', 'system_highpri_wq',
'system_long_wq', 'system_unbound_wq', 'system_freezable_wq', 'system_power_efficient_wq'
or 'system_freezable_power_efficient_wq') with backtrace (which will be translated into
filename:line format) while the below prints queue's name (one of 'events', 'events_highpri',
'events_long', 'events_unbound', 'events_freezable', 'events_power_efficient' or
'events_freezable_power_efficient'). If CONFIG_KALLSYMS_ALL=y, we can print
variable's name using "%ps", but CONFIG_KALLSYMS_ALL is not always enabled.

The flag must not be used by user-defined WQs, for destroy_workqueue() involves
flush_workqueue(). If this flag is by error used on user-defined WQs, pointless
warning will be printed. I didn't pass this flag when creating system-wide WQs
because some developer might copy&paste the
system_wq = alloc_workqueue("events", 0, 0);
lines when converting.

---
include/linux/workqueue.h | 1 +
kernel/workqueue.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e33dcd417d3 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -339,6 +339,7 @@ enum {
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
__WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
__WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
+ __WQ_SYSTEM_WIDE = 1 << 20, /* interbal: don't flush this workqueue */

WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..dbb9c6bb54a7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,21 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
return wait;
}

+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+ if (likely(!(wq->flags & __WQ_SYSTEM_WIDE)))
+ return;
+
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+ wq->name);
+ dump_stack();
+}
+
/**
* flush_workqueue - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
@@ -2824,6 +2839,8 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;

+ warn_if_flushing_global_workqueue(wq);
+
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);

@@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
!system_unbound_wq || !system_freezable_wq ||
!system_power_efficient_wq ||
!system_freezable_power_efficient_wq);
+ system_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_long_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
}

/**
--
2.32.0


Haakon Bugge

unread,
Feb 15, 2022, 5:43:57 AM2/15/22
to Tetsuo Handa, Tejun Heo, Bart Van Assche, syzbot, Jason Gunthorpe, LKML, OFED mailing list, syzkall...@googlegroups.com, Lai Jiangshan
s/interbal/internal/
Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?


Thxs, Håkon

> }
>
> /**
> --
> 2.32.0
>
>

Tetsuo Handa

unread,
Feb 15, 2022, 7:49:08 AM2/15/22
to Haakon Bugge, Tejun Heo, Bart Van Assche, syzbot, Jason Gunthorpe, LKML, OFED mailing list, syzkall...@googlegroups.com, Lai Jiangshan
On 2022/02/15 19:43, Haakon Bugge wrote:
>> @@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
>> !system_unbound_wq || !system_freezable_wq ||
>> !system_power_efficient_wq ||
>> !system_freezable_power_efficient_wq);
>> + system_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_long_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
>
> Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?
>

I do not want to do like

- system_wq = alloc_workqueue("events", 0, 0);
+ system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

because the intent of this change is to ask developers to create their own WQs.
If I pass __WQ_SYSTEM_WIDE to alloc_workqueue(), developers might by error create like

srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", __WQ_SYSTEM_WIDE, 0);

because of

system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

line. The __WQ_SYSTEM_WIDE is absolutely meant to be applied to only 'system_wq',
'system_highpri_wq', 'system_long_wq', 'system_unbound_wq', 'system_freezable_wq',
'system_power_efficient_wq' and 'system_freezable_power_efficient_wq' WQs, in order
to avoid calling flush_workqueue() on these system-wide WQs.

I wish I could define __WQ_SYSTEM_WIDE inside kernel/workqueue_internal.h, but
it seems that kernel/workqueue_internal.h does not define internal flags.

Bart Van Assche

unread,
Feb 15, 2022, 12:05:42 PM2/15/22
to Tetsuo Handa, Haakon Bugge, Tejun Heo, syzbot, Jason Gunthorpe, LKML, OFED mailing list, syzkall...@googlegroups.com, Lai Jiangshan
On 2/15/22 04:48, Tetsuo Handa wrote:
> I do not want to do like
>
> - system_wq = alloc_workqueue("events", 0, 0);
> + system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
>
> because the intent of this change is to ask developers to create their own WQs.

I want more developers to use the system-wide workqueues since that
reduces memory usage. That matters for embedded devices running Linux.

Thanks,

Bart.

Tetsuo Handa

unread,
Feb 15, 2022, 5:05:15 PM2/15/22
to Bart Van Assche, Haakon Bugge, Tejun Heo, syzbot, Jason Gunthorpe, LKML, OFED mailing list, syzkall...@googlegroups.com, Lai Jiangshan
Reserving a kernel thread for WQ_MEM_RECLAIM WQ might consume some memory,
but I don't think that creating a !WQ_MEM_RECLAIM WQ consumes much memory.

Tejun Heo

unread,
Feb 22, 2022, 1:26:58 PM2/22/22
to Bart Van Assche, Tetsuo Handa, Haakon Bugge, syzbot, Jason Gunthorpe, LKML, OFED mailing list, syzkall...@googlegroups.com, Lai Jiangshan
Each wq is just a frontend interface to backend shard pool and doesn't
consume a lot of memory. The only consumption which would matter is when
WQ_MEM_RECLAIM is specified which creates its dedicated rescuer thread to
guarantee forward progress under memory contention, but we aren't talking
about those here.

Thanks.

--
tejun

Tejun Heo

unread,
Feb 22, 2022, 1:30:25 PM2/22/22
to Fabio M. De Francesco, Tetsuo Handa, syzkall...@googlegroups.com, Bart Van Assche, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, Lai Jiangshan
Hello,

On Thu, Feb 17, 2022 at 01:27:08PM +0100, Fabio M. De Francesco wrote:
> Just to think and understand... what if the system-wide WQ were allocated as unbound
> ordered (i.e., as in alloc_ordered_workqueue()) with "max_active" of one?
>
> 1) Would it solve the locks dependency problem?

It'll actually make deadlocks a lot more prevalent. Some work items take
more than one work to complete (e.g. flushing another work directly or
waiting for something which must be completed by something else which may
involve a system work item) and system wq's max active must be high enough
that all those chains taking place at the same time should require fewer
number of work items than max_active.

> 2) Would it introduce performance penalties (bottlenecks)?

I'd be surprised it wouldn't cause at least notcieable latency increases for
some workloads.

Thanks.

--
tejun
Reply all
Reply to author
Forward
0 new messages