[Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via RV (Runtime Verification) monitor rtapp:sleep

14 views
Skip to first unread message

Yunseong Kim

unread,
Oct 27, 2025, 2:54:29 AMOct 27
to Nam Cao, Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu, Byungchul Park, syzk...@googlegroups.com, linux-r...@lists.linux.dev, LKML
Hi Nam,

I've been very interested in RV (Runtime Verification) to proactively detect
"sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
for ways to find cases where sleeping spinlocks or memory allocations are used
within preemption-disabled or irq-disabled contexts. While searching for
solutions, I discovered the RV subsystem.

I've tested with it as follows, and I have a few questions.

# cat /sys/kernel/tracing/rv/available_monitors
wwnr
rtapp
rtapp:sleep

# cat /sys/kernel/tracing/rv/available_reactors
nop
printk
panic

# echo printk > /sys/kernel/tracing/rv/monitors/rtapp/sleep/reactors

# cat /sys/kernel/tracing/rv/monitors/rtapp/sleep/enable
1

# echo rtapp:sleep > /sys/kernel/tracing/rv/enabled_monitors

> [192735.309072] [ T6957] rv: sleep: multipathd[6957]: violation detected

# echo panic > /sys/kernel/tracing/rv/monitors/rtapp/sleep/reactors

> [ T6957] Kernel panic - not syncing: rv: sleep: multipathd[6957]: violation detected
> [193521.768666][ T6957] CPU: 4 UID: 0 PID: 6957 Comm: multipathd Not tainted 6.17.0-rc3-g39f90c196721 #1 PREEMPT_{RT,(full)}
> [193521.771727][ T6957] Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu3 10/08/2025
> [193521.774126][ T6957] Call trace:
> [193521.774998][ T6957] show_stack+0x2c/0x3c (C)
> [193521.776281][ T6957] __dump_stack+0x30/0x40
> [193521.777523][ T6957] dump_stack_lvl+0x34/0x2bc
> [193521.778797][ T6957] dump_stack+0x1c/0x48
> [193521.779984][ T6957] vpanic+0x220/0x618
> [193521.781211][ T6957] oom_killer_enable+0x0/0x30
> [193521.782512][ T6957] ltl_validate+0x7ac/0xb1c
> [193521.783870][ T6957] ltl_atom_update+0xd0/0x32c
> [193521.785198][ T6957] handle_sched_set_state+0xb8/0x12c
> [193521.786773][ T6957] __trace_set_current_state+0x128/0x174
> [193521.788450][ T6957] do_nanosleep+0x128/0x2a4
> [193521.789731][ T6957] hrtimer_nanosleep+0xb4/0x160
> [193521.791167][ T6957] common_nsleep+0x6c/0x84
> [193521.792404][ T6957] __arm64_sys_clock_nanosleep+0x1a8/0x1f0
> [193521.794031][ T6957] invoke_syscall+0x64/0x168
> [193521.795353][ T6957] el0_svc_common+0x134/0x164
> [193521.796707][ T6957] do_el0_svc+0x2c/0x3c
> [193521.797897][ T6957] el0_svc+0x58/0x184
> [193521.799048][ T6957] el0t_64_sync_handler+0x84/0x12c
> [193521.800514][ T6957] el0t_64_sync+0x1b8/0x1bc
> [193521.801818][ T6957] SMP: stopping secondary CPUs
> [193521.803320][ T6957] Dumping ftrace buffer:
> [193521.804510][ T6957] (ftrace buffer empty)
> [193521.805848][ T6957] Kernel Offset: disabled
> [193521.807084][ T6957] CPU features: 0xc0000,00007800,149a3161,357ff667
> [193521.808941][ T6957] Memory Limit: none
> [193522.655297][ T6957] Rebooting in 86400 seconds..

Here are my questions:

1. Does the rtapp:sleep monitor proactively detect scenarios that
could lead to sleeping in atomic context, perhaps before
CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
sleeping?

2. Is there a way to enable this monitor (e.g., rtapp:sleep)
immediately as soon as the RV subsystem is loaded during boot time?
(How to make this "default turn on"?)

3. When a "violation detected" message occurs at runtime, is it
possible to get a call stack of the location that triggered the
violation? The panic reactor provides a full stack, but I'm
wondering if this is also possible with the printk reactor.


Here is some background on why I'm so interested in this topic:

Recently, I was fuzzing the PREEMPT_RT kernel with syzkaller but ran into
issues where fuzzing wouldn't proceed smoothly. It turned out to be a problem
in the kcov USB API. This issue was fixed after I reported it, together
with Sebastian’s patch.

[PATCH] kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq()
- https://lore.kernel.org/all/20250811082...@linutronix.de/

After this fix, syzkaller fuzzing ran well and was able to detect several
runtime "sleep in atomic context" bugs:

[PATCH] USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
- https://lore.kernel.org/all/bb192ae2-4eee-48ee...@rowland.harvard.edu/

[BUG] usbip: vhci: Sleeping function called from invalid context in
vhci_urb_enqueue on PREEMPT_RT
- https://lore.kernel.org/all/c6c17f0d-b71d-4a44...@kzalloc.com/

This led me to research ways to find these issues proactively at a
static analysis level, and I created some regex and coccinelle scripts
to detect them.

[BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled
on PREEMPT_RT
- https://lore.kernel.org/all/20250812103...@linutronix.de/t/#u

[PATCH] md/raid5-ppl: Fix invalid context sleep in
ppl_io_unit_finished() on PREEMPT_RT
- https://lore.kernel.org/all/f2dbf110-e2a7-4101...@kernel.org/t/#u

Tomas, the author of the rtlockscope project, also gave me some deep
insights into this static analysis approach.

Re: [WIP] coccinelle: rt: Add coccicheck on sleep in atomic context on
PREEMPT_RT
- https://lore.kernel.org/all/CAP4=nvTOE9W+6UtVZ5-5gAoYeEQ...@mail.gmail.com/


Thank you!

Best regards,
Yunseong Kim

Gabriele Monaco

unread,
Oct 27, 2025, 8:21:41 AM (14 days ago) Oct 27
to Yunseong Kim, Nam Cao, Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu, Byungchul Park, syzk...@googlegroups.com, linux-r...@lists.linux.dev, LKML
On Mon, 2025-10-27 at 15:54 +0900, Yunseong Kim wrote:
> Hi Nam,
>
> I've been very interested in RV (Runtime Verification) to proactively detect
> "sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
> for ways to find cases where sleeping spinlocks or memory allocations are used
> within preemption-disabled or irq-disabled contexts. While searching for
> solutions, I discovered the RV subsystem.
>

Hi Yunseong,

I'm sure Nam can be more specific on this, but let me add my 2 cents here.

The sleep monitor doesn't really do what you want, its violations are real time
tasks (typically userspace tasks with RR/FIFO policies) sleeping in a way that
might incur latencies. For instance using non PI locks or imprecise sleep.

What you need here is to validate kernel code, RV was actually designed for
that, but there's currently no monitor that does what you want.

The closest thing I can think of is monitors like scpd and snep in the sched
collection [1]. Those however won't catch what you need because they focus on
the preemption tracepoints and schedule, which works fine also in your scenario.

We could add similar monitors to catch what you want though:

|
|
v
+-----------------+
| cant_sleep | <+
+-----------------+ |
| |
| preempt_enable | preempt_disable
v |
kmalloc |
lock_acquire |
+--------------- can_sleep |
| |
+--------------> -+

which would become slightly more complicated if considering irq enable/disable
too. This is a deterministic automaton representation (see [1] for examples),
you could use an LTL like sleep as well, I assume (needs a per-CPU monitor which
is not merged yet for LTL).

This is simplified but you can of course put conditions on what kind of
allocations and locks you're interested in.

Now this specific case would require lockdep for the definition of lock_acquire
tracepoints. So I'm not sure how useful this monitor would be since lockdep is
going to complain too. You could use contention tracepoints to catch exactly
when sleep is going to occur and not /potential/ failures.

I only gave a quick thought on this, there may be better models/event fitting
your usecase, but I hope you get the idea.

[1] - https://docs.kernel.org/trace/rv/monitor_sched.html#monitor-scpd

> Here are my questions:
>
> 1. Does the rtapp:sleep monitor proactively detect scenarios that
>    could lead to sleeping in atomic context, perhaps before
>    CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
>    sleeping?

I guess I answered this already, but TL;DR no, you'd need a dedicated monitor.

> 2. Is there a way to enable this monitor (e.g., rtapp:sleep)
>    immediately as soon as the RV subsystem is loaded during boot time?
>    (How to make this "default turn on"?)

Currently not, but you could probably use any sort of startup script to turn it
on soon enough.

> 3. When a "violation detected" message occurs at runtime, is it
>    possible to get a call stack of the location that triggered the
>    violation? The panic reactor provides a full stack, but I'm
>    wondering if this is also possible with the printk reactor.

You can use ftrace and rely on error tracepoints instead of reactors. Each RV
violation triggers a tracepoint (e.g. error_sleep) and you can print a call
stack there. E.g.:

echo stacktrace > /sys/kernel/tracing/events/rv/error_sleep/trigger

Here I use sleep as an example, but all monitors have their own error events
(e.g. error_wwnr, error_snep, etc.).

Does this all look useful in your scenario?

Gabriele

Yunseong Kim

unread,
Oct 28, 2025, 6:53:27 PM (12 days ago) Oct 28
to Gabriele Monaco, Nam Cao, Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu, Byungchul Park, syzk...@googlegroups.com, linux-r...@lists.linux.dev, LKML
Hi Gabriele,

On 10/27/25 9:20 PM, Gabriele Monaco wrote:
> On Mon, 2025-10-27 at 15:54 +0900, Yunseong Kim wrote:
>> Hi Nam,
>>
>> I've been very interested in RV (Runtime Verification) to proactively detect
>> "sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
>> for ways to find cases where sleeping spinlocks or memory allocations are used
>> within preemption-disabled or irq-disabled contexts. While searching for
>> solutions, I discovered the RV subsystem.
>>
>
> Hi Yunseong,
>
> I'm sure Nam can be more specific on this, but let me add my 2 cents here.

Thank you so much for your detailed response! It cleared up many of the
questions I had.

> The sleep monitor doesn't really do what you want, its violations are real time
> tasks (typically userspace tasks with RR/FIFO policies) sleeping in a way that
> might incur latencies. For instance using non PI locks or imprecise sleep.

So that’s the role of rtapp:sleep you mentioned. Thank you again for
clarifying it.

> What you need here is to validate kernel code, RV was actually designed for
> that, but there's currently no monitor that does what you want.

It’s a valuable chance to make a contribution to RV!

> The closest thing I can think of is monitors like scpd and snep in the sched
> collection [1]. Those however won't catch what you need because they focus on
> the preemption tracepoints and schedule, which works fine also in your scenario.
>
> We could add similar monitors to catch what you want though:
>
> |
> |
> v
> +-----------------+
> | cant_sleep | <+
> +-----------------+ |
> | |
> | preempt_enable | preempt_disable
> v |
> kmalloc |
> lock_acquire |
> +--------------- can_sleep |
> | |
> +--------------> -+
>
> which would become slightly more complicated if considering irq enable/disable
> too. This is a deterministic automaton representation (see [1] for examples),
> you could use an LTL like sleep as well, I assume (needs a per-CPU monitor which
> is not merged yet for LTL).
>
> This is simplified but you can of course put conditions on what kind of
> allocations and locks you're interested in.

If the goal is to detect this state before the output from __might_resched()
under CONFIG_DEBUG_ATOMIC_SLEEP (i.e., before an actual context switch occurs),
I am considering whether Deterministic Automata (.dot/DA) or Linear Temporal
Logic (.ltl/LTL) would be more appropriate for modeling this check. I'm also
thinking about whether I need to create a comprehensive table of all sleepable
functions for this purpose on the PREEMPT_RT kernel.

If this check is necessary, I’m planning to try the following verification:

RULE = always ((IN_ATOMIC or IRQS_DISABLED) imply not CALLS_RT_SLEEPER)

I’m also planning to add sleepable functions, including sleepable spinlocks
and memory allocations callable under PREEMPT_RT preempt/IRQ-disabled states,
to the RV monitor kernel module.

I’m considering adding the following functions as a result:

// Mutex & Semaphore (or Lockdep's 'lock_acquire' for lock cases)
"mutex_lock",
"mutex_lock_interruptible",
"mutex_lock_killable",
"down_interruptible",
"down_killable",
"rwsem_down_read_failed",
"rwsem_down_write_failed",
"ww_mutex_lock",
"rt_spin_lock",
"rt_read_lock",
"rt_write_lock",
// or just "lock_acquire" for LOCKDEP enabled kernel.

// sleep & schedule
"msleep",
"ssleep",
"usleep_range",
"wait_for_completion",
"schedule",
"cond_resched",

// User-space memory access
"copy_from_user",
"copy_to_user",
"__get_user_asm",
"__put_user_asm",

// memory allocation
"__vmalloc",
"__kmalloc"

> Now this specific case would require lockdep for the definition of lock_acquire
> tracepoints. So I'm not sure how useful this monitor would be since lockdep is
> going to complain too. You could use contention tracepoints to catch exactly
> when sleep is going to occur and not /potential/ failures.

I’ll look into this lockdep realated part further as well.

> I only gave a quick thought on this, there may be better models/event fitting
> your usecase, but I hope you get the idea.
>
> [1] - https://docs.kernel.org/trace/rv/monitor_sched.html#monitor-scpd

Thank you for providing a diagram and references that make it easier to
understand!

>> Here are my questions:
>>
>> 1. Does the rtapp:sleep monitor proactively detect scenarios that
>>    could lead to sleeping in atomic context, perhaps before
>>    CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
>>    sleeping?
>
> I guess I answered this already, but TL;DR no, you'd need a dedicated monitor.
>
>> 2. Is there a way to enable this monitor (e.g., rtapp:sleep)
>>    immediately as soon as the RV subsystem is loaded during boot time?
>>    (How to make this "default turn on"?)
>
> Currently not, but you could probably use any sort of startup script to turn it
> on soon enough.
>
>> 3. When a "violation detected" message occurs at runtime, is it
>>    possible to get a call stack of the location that triggered the
>>    violation? The panic reactor provides a full stack, but I'm
>>    wondering if this is also possible with the printk reactor.
>
> You can use ftrace and rely on error tracepoints instead of reactors. Each RV
> violation triggers a tracepoint (e.g. error_sleep) and you can print a call
> stack there. E.g.:
>
> echo stacktrace > /sys/kernel/tracing/events/rv/error_sleep/trigger
>
> Here I use sleep as an example, but all monitors have their own error events
> (e.g. error_wwnr, error_snep, etc.).
>
> Does this all look useful in your scenario?

Thank you once again for your thorough explanation. Many of the questions
I initially had have now been resolved!

> Gabriele

Best regards,
Yunseong Kim

Gabriele Monaco

unread,
Oct 30, 2025, 6:13:46 AM (11 days ago) Oct 30
to Yunseong Kim, Nam Cao, Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu, Byungchul Park, syzk...@googlegroups.com, linux-r...@lists.linux.dev, LKML
On Wed, 2025-10-29 at 07:53 +0900, Yunseong Kim wrote:
> > What you need here is to validate kernel code, RV was actually designed for
> > that, but there's currently no monitor that does what you want.
>
> It’s a valuable chance to make a contribution to RV!

And could be quite a useful model!

> If the goal is to detect this state before the output from __might_resched()
> under CONFIG_DEBUG_ATOMIC_SLEEP (i.e., before an actual context switch
> occurs),
> I am considering whether Deterministic Automata (.dot/DA) or Linear Temporal
> Logic (.ltl/LTL) would be more appropriate for modeling this check. I'm also
> thinking about whether I need to create a comprehensive table of all sleepable
> functions for this purpose on the PREEMPT_RT kernel.
>
> If this check is necessary, I’m planning to try the following verification:
>
> RULE = always ((IN_ATOMIC or IRQS_DISABLED) imply not CALLS_RT_SLEEPER)

Yes, in this case DA or LTL is mostly down to preference, one thing to keep in
mind is that this is going to be a per-cpu monitor (i.e. the rule stands for
each CPU, as the irq/preemption state is per-cpu).

LTL support for per-cpu is added in [1] (not merged), so you will need to pull
that in if you want to play with LTL.

[1] -
https://lore.kernel.org/lkml/e7fb580ca898c707573fe1dcf6312f0...@linutronix.de
Here you're talking about direct kernel functions, currently RV relies on
tracepoints (that's why I mentioned those earlier). You have two routes:

1. use existing tracepoints and/or add new ones in strategical points
2. use kprobes and attach wherever you want

1. is very easy in RV and you may use tracepoints arguments to narrow down the
search (e.g. just transition state on certain locks, certain allocations), you
may need to discuss with various maintainers to add new ones, but that's usually
alright, have a look at the V2 of the linked thread for an example [2].

2. is a bit more involved, you'd be able to access precisely the functions you
want (usually), but I'm not sure about the overhead of plugging 15 kprobes.
Also RV doesn't support kprobes, although extending it is rather trivial.

You can mix both, of course. But yes, you'd need to identify all the "events"
you care about. I'd start simple with some of those (e.g. malloc and lock
contention tracepoints) and see if it satisfies your needs.

You may also be counting things twice (isn't malloc calling locks, which may end
up calling schedule?), just an idea, but you may find common paths in the above
list.

Gabriele

[2] -
https://lore.kernel.org/lkml/f87ce0cb979daa3e8221c496de16883...@linutronix.de

Nam Cao

unread,
Nov 5, 2025, 4:10:45 AM (5 days ago) Nov 5
to Yunseong Kim, Nam Cao, Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu, Byungchul Park, syzk...@googlegroups.com, linux-r...@lists.linux.dev, LKML
Yunseong Kim <y...@kzalloc.com> writes:
> Hi Nam,
>
> I've been very interested in RV (Runtime Verification)

Cool, happy to learn you find it interesting.

> to proactively detect
> "sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
> for ways to find cases where sleeping spinlocks or memory allocations are used
> within preemption-disabled or irq-disabled contexts. While searching for
> solutions, I discovered the RV subsystem.
...
> Here are my questions:
>
> 1. Does the rtapp:sleep monitor proactively detect scenarios that
> could lead to sleeping in atomic context, perhaps before
> CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
> sleeping?

No it does not, as explained by Gabriele.

I am a bit confused, because CONFIG_DEBUG_ATOMIC_SLEEP seems to already
do what you need. CONFIG_DEBUG_ATOMIC_SLEEP does warn before the actual
point of sleeping. Sleeping locks and memory allocations are marked with
might_sleep(). When they are called in atomic context, we have a warning
regardless of whether actual sleep happens. See the comment above
might_sleep():

"This is a useful debugging help to be able to catch problems early and
not be bitten later when the calling function happens to sleep when it
is not supposed to"

For sure you can implement this functionality in RV, but I don't think
RV can do more. An advantage of doing it in RV is the ability to toggle
at run-time, but that's a different discussion.

> 2. Is there a way to enable this monitor (e.g., rtapp:sleep)
> immediately as soon as the RV subsystem is loaded during boot time?
> (How to make this "default turn on"?)

At the moment, no. But if you need this, we could look into implementing it.

> 3. When a "violation detected" message occurs at runtime, is it
> possible to get a call stack of the location that triggered the
> violation? The panic reactor provides a full stack, but I'm
> wondering if this is also possible with the printk reactor.

You can use monitor's tracepoint to get the stacktrace, as mentioned by Gabriele.

> This led me to research ways to find these issues proactively at a
> static analysis level, and I created some regex and coccinelle scripts
> to detect them.
...
> Tomas, the author of the rtlockscope project, also gave me some deep
> insights into this static analysis approach.

RV is not a static checker, it is a run-time checker.

Just in case you are not aware yet, there is also Smatch:
https://github.com/error27/smatch. But I can't offer much help there.

Nam
Reply all
Reply to author
Forward
0 new messages