RCU stalls running KUnit on mainline

1 view
Skip to first unread message

Mark Brown

unread,
Feb 17, 2026, 9:10:30 AM (11 days ago) Feb 17
to Brendan Higgins, David Gow, Rae Moar, linux-k...@vger.kernel.org, kuni...@googlegroups.com
Hi,

When running KUnit via qemu on current mailine I'm seeing random
lockups, frequently but not always reporting an RCU stall.
Unfortunately these don't seem to happen in a consistent place which
makes it hard to figure out exactly what's going on, they started in
-next at some point shortly before or early in the merge window but I've
never managed to drill down and investigate them. I don't imagine
they're due to KUnit specifically, though it seems likely some test is
triggering them. Has anyone else seen this, or do you have any leads?

Thanks,
Mark
signature.asc

David Gow

unread,
Feb 18, 2026, 3:53:27 AM (10 days ago) Feb 18
to Mark Brown, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
Hmm… I haven't seen this yet on x86_64, but looking at arm64 and 32-bit
i386, I do see a sporadic panic, often with rcu in the stacktrace. Seems
to happen more often when the KUnit test kthread is starting/stopping
(particularly, at least on i386, if it's due to a trapped fault).

I've not been able to reproduce it after reverting the kthread affinity
series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but
that could just be due to luck. It's flaky enough that my attempt at
bisection kept pointing at documentation patches.

Frederic, any idea if the 7.0 kthread updates could be causing these? My
most reliable repro command thus far is:
./tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1

— David

Mark Brown

unread,
Feb 18, 2026, 6:33:03 AM (10 days ago) Feb 18
to David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
On Wed, Feb 18, 2026 at 04:53:16PM +0800, David Gow wrote:
> Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :

> > When running KUnit via qemu on current mailine I'm seeing random
> > lockups, frequently but not always reporting an RCU stall.
> > Unfortunately these don't seem to happen in a consistent place which
> > makes it hard to figure out exactly what's going on, they started in

> Hmm… I haven't seen this yet on x86_64, but looking at arm64 and 32-bit
> i386, I do see a sporadic panic, often with rcu in the stacktrace. Seems to
> happen more often when the KUnit test kthread is starting/stopping
> (particularly, at least on i386, if it's due to a trapped fault).

I did see this on x86_64 FWIW.

> I've not been able to reproduce it after reverting the kthread affinity
> series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but that
> could just be due to luck. It's flaky enough that my attempt at bisection
> kept pointing at documentation patches.

Yeah, it's not reliably reproducible though it happens pretty often. I
did try things like retrying N times to see if it fails but I have a
horrible feeling there's some dependency on the specific build somehow.
Hopefully not.
signature.asc

Mark Brown

unread,
Feb 18, 2026, 2:31:52 PM (9 days ago) Feb 18
to David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
On Wed, Feb 18, 2026 at 04:53:16PM +0800, David Gow wrote:
> Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :
> > When running KUnit via qemu on current mailine I'm seeing random
> > lockups, frequently but not always reporting an RCU stall.
> > Unfortunately these don't seem to happen in a consistent place which
> > makes it hard to figure out exactly what's going on, they started in
> > -next at some point shortly before or early in the merge window but I've
> > never managed to drill down and investigate them. I don't imagine
> > they're due to KUnit specifically, though it seems likely some test is
> > triggering them. Has anyone else seen this, or do you have any leads?

> I've not been able to reproduce it after reverting the kthread affinity
> series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but that
> could just be due to luck. It's flaky enough that my attempt at bisection
> kept pointing at documentation patches.

One other data point is that there's some range of commits which
generates an actual failure in the runtime PM tests:

[19:26:28] [PASSED] pm_runtime_disabled_test
[19:26:28] Unable to handle kernel execute from non-executable memory at virtual address fff000000145f358
...
[19:26:28] Call trace:
[19:26:28] 0xfff000000145f358 (P)
[19:26:28] rpm_callback+0x74/0x80
[19:26:28] rpm_resume+0x3cc/0x6a0
[19:26:28] __pm_runtime_resume+0x50/0x9c
[19:26:28] device_release_driver_internal+0xd0/0x224
[19:26:28] device_release_driver+0x18/0x24
[19:26:28] bus_remove_device+0xd0/0x114
[19:26:28] device_del+0x14c/0x408
[19:26:28] device_unregister+0x18/0x38
[19:26:28] device_unregister_wrapper+0x10/0x20
[19:26:28] __kunit_action_free+0x14/0x20
...
[19:26:28] [FAILED] pm_runtime_error_test

which might upset bisections.
signature.asc

Guillaume Tucker

unread,
Feb 20, 2026, 10:17:27 AM (7 days ago) Feb 20
to Mark Brown, David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
Hello,

Yes, although I've run some automated VIXI bisections and found one
reliable panic in rcu.  I had to do it in two steps as the first
bisection landed on a merge commit.  After a bit more investigation I
reported what I found here:

    https://lore.kernel.org/all/0150e237-41d2-40ae...@gtucker.io/

I can bisect the other KUnit issues separately too if that helps now
that I have a quick workaround to avoid this panic (see email).

Cheers,
Guillaume

Guillaume Tucker

unread,
Feb 20, 2026, 10:30:23 AM (7 days ago) Feb 20
to Mark Brown, David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
(sorry, resending but not with gmail)

Hello,

Mark Brown

unread,
Feb 21, 2026, 9:10:31 AM (7 days ago) Feb 21
to Guillaume Tucker, David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-k...@vger.kernel.org, kuni...@googlegroups.com
On Fri, Feb 20, 2026 at 04:30:19PM +0100, Guillaume Tucker wrote:

> Yes, although I've run some automated VIXI bisections and found one
> reliable panic in rcu. I had to do it in two steps as the first
> bisection landed on a merge commit. After a bit more investigation I
> reported what I found here:

> https://lore.kernel.org/all/0150e237-41d2-40ae...@gtucker.io/

> I can bisect the other KUnit issues separately too if that helps now
> that I have a quick workaround to avoid this panic (see email).

Thanks for tracking that down and reporting! Hopefully it gets fixed
soon and we can turn KUnit testing back on for -next, one flaw with that
check for intermittent bugs is that it's not so easy to hold back the
tree introducing the problem.
signature.asc
Reply all
Reply to author
Forward
0 new messages