Tejun Heo
unread,Dec 4, 2023, 1:07:13 PM12/4/23Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Naohiro Aota, Lai Jiangshan, linux-...@vger.kernel.org, linux...@vger.kernel.org, ceph-...@vger.kernel.org, cgr...@vger.kernel.org, core...@netfilter.org, dm-d...@lists.linux.dev, dri-...@lists.freedesktop.org, gf...@lists.linux.dev, inte...@lists.freedesktop.org, io...@lists.linux.dev, linux-ar...@lists.infradead.org, linux-b...@vger.kernel.org, linux...@vger.kernel.org, linux-...@redhat.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux...@lists.ozlabs.org, linux-f2...@lists.sourceforge.net, linux-...@vger.kernel.org, linux...@vger.kernel.org, linux-m...@lists.infradead.org, linu...@kvack.org, linu...@vger.kernel.org, linu...@vger.kernel.org, linux...@lists.infradead.org, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-re...@vger.kernel.org, linux...@vger.kernel.org, linux-tra...@vger.kernel.org, linu...@vger.kernel.org, linux-w...@vger.kernel.org, linu...@vger.kernel.org, n...@other.debian.org, net...@vger.kernel.org, n...@lists.linux.dev, open-...@googlegroups.com, oss-d...@corigine.com, platform-...@vger.kernel.org, samba-t...@lists.samba.org, target...@vger.kernel.org, virtual...@lists.linux.dev, wire...@lists.zx2c4.com
Hello,
On Mon, Dec 04, 2023 at 04:03:47PM +0000, Naohiro Aota wrote:
> Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use
> per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It
> changed the meaning of alloc_workqueue()'s max_active from an upper limit
> imposed per NUMA node to a limit per CPU. As a result, massive number of
> workers can be running at the same time, especially if the workqueue user
> thinks the max_active is a global limit.
>
> Actually, it is already written it is per-CPU limit in the documentation
> before the commit. However, several callers seem to misuse max_active,
> maybe thinking it is a global limit. It is an unexpected behavior change
> for them.
Right, and the behavior has been like that for a very long time and there
was no other way to achieve reasonable level of concurrency, so the current
situation is expected.
> For example, these callers set max_active = num_online_cpus(), which is a
> suspicious limit applying to per-CPU. This config means we can have nr_cpu
> * nr_cpu active tasks working at the same time.
Yeah, that sounds like a good indicator.
> fs/f2fs/data.c: sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
> fs/f2fs/data.c- WQ_UNBOUND | WQ_HIGHPRI,
> fs/f2fs/data.c- num_online_cpus());
>
> fs/crypto/crypto.c: fscrypt_read_workqueue = alloc_workqueue("fscrypt_read_queue",
> fs/crypto/crypto.c- WQ_UNBOUND | WQ_HIGHPRI,
> fs/crypto/crypto.c- num_online_cpus());
>
> fs/verity/verify.c: fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
> fs/verity/verify.c- WQ_HIGHPRI,
> fs/verity/verify.c- num_online_cpus());
>
> drivers/crypto/hisilicon/qm.c: qm->wq = alloc_workqueue("%s", WQ_HIGHPRI | WQ_MEM_RECLAIM |
> drivers/crypto/hisilicon/qm.c- WQ_UNBOUND, num_online_cpus(),
> drivers/crypto/hisilicon/qm.c- pci_name(qm->pdev));
>
> block/blk-crypto-fallback.c: blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
> block/blk-crypto-fallback.c- WQ_UNBOUND | WQ_HIGHPRI |
> block/blk-crypto-fallback.c- WQ_MEM_RECLAIM, num_online_cpus());
>
> drivers/md/dm-crypt.c: cc->crypt_queue = alloc_workqueue("kcryptd/%s",
> drivers/md/dm-crypt.c- WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND,
> drivers/md/dm-crypt.c- num_online_cpus(), devname);
Most of these work items are CPU bound but not completley so. e.g.
kcrypt_crypt_write_continue() does wait_for_completion(), so setting
max_active to 1 likely isn't what they want either. They mostly want some
reasonable system-wide concurrency limit w.r.t. the CPU count while keeping
some level of flexibility in terms of task placement.
The previous max_active wasn't great for this because its meaning changed
depending on the number of nodes. Now, the meaning doesn't change but it's
not really useful for the above purpose. It's only useful for avoiding
melting the system completely.
One way to go about it is to declare that concurrency level management for
unbound workqueue is on users but that seems not ideal given many use cases
would want it anyway.
Let me think it over but I think the right way to go about it is going the
other direction - ie. making max_active apply to the whole system regardless
of the number of nodes / ccx's / whatever.
Thanks a lot for the report. I think it's a lot more reasonable to assume
that max_active is global for unbound workqueues. The current workqueue
behavior is not very intuitive or useful. I'll try to find something more
reasonable. Thanks for the report and analysis. Much appreciated.
Thanks.
--
tejun