[PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()

29 views
Skip to first unread message

Arnaud Lecomte

unread,
Jul 29, 2025, 12:56:42 PMJul 29
to so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, yongho...@linux.dev, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
For build_id mode, we use sizeof(struct bpf_stack_build_id)
to determine capacity, and for normal mode we use sizeof(u64).

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
Changes in v2:
- Use utilty stack_map_data_size to compute map stack map size
---
kernel/bpf/stackmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..6f225d477f07 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
- u32 hash, id, trace_nr, trace_len, i;
+ u32 hash, id, trace_nr, trace_len, i, max_depth;
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
@@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+
+ /* Clamp the trace to max allowed depth */
+ max_depth = smap->map.value_size / stack_map_data_size(map);
+ if (trace_nr > max_depth)
+ trace_nr = max_depth;
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
--
2.43.0

Yonghong Song

unread,
Jul 29, 2025, 6:45:15 PMJul 29
to Arnaud Lecomte, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com


On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
> For build_id mode, we use sizeof(struct bpf_stack_build_id)
> to determine capacity, and for normal mode we use sizeof(u64).
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

Could you add a selftest? This way folks can easily find out what is
the problem and why this fix solves the issue correctly.

Arnaud Lecomte

unread,
Jul 30, 2025, 3:11:03 AMJul 30
to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com
On 29/07/2025 23:45, Yonghong Song wrote:
>
>
> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>>   contains more stack entries than the stack map bucket can hold,
>>   leading to an out-of-bounds write in the bucket's data array.
>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>>   to determine capacity, and for normal mode we use sizeof(u64).
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>
> Could you add a selftest? This way folks can easily find out what is
> the problem and why this fix solves the issue correctly.
>
Sure, will be done after work
Thanks,
Arnaud

Lecomte, Arnaud

unread,
Aug 1, 2025, 2:16:56 PMAug 1
to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com
Well, it turns out it is less straightforward than it looked like to
detect the memory corruption
 without KASAN. I am currently in holidays for the next 3 days so I've
limited access to a
computer. I should be able to sort this out on monday.

Thanks,
Arnaud

Arnaud Lecomte

unread,
Aug 5, 2025, 4:49:54 PMAug 5
to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com
Hi,
I gave it several tries and I can't find a nice to do see properly.
The main challenge is to find a way to detect memory corruption. I
wanted to place a canary value
 by tweaking the map size but we don't have a way from a BPF program
perspective to access to the size
of a stack_map_bucket. If we decide to do this computation manually, we
would end-up with maintainability
 issues:
#include "vmlinux.h"
#include "bpf/bpf_helpers.h"

#define MAX_STACK_DEPTH 32
#define CANARY_VALUE 0xBADCAFE

/* Calculate size based on known layout:
 * - fnode: sizeof(void*)
 * - hash: 4 bytes
 * - nr: 4 bytes
 * - data: MAX_STACK_DEPTH * 8 bytes
 * - canary: 8 bytes
 */
#define VALUE_SIZE (sizeof(void*) + 4 + 4 + (MAX_STACK_DEPTH * 8) + 8)

struct {
    __uint(type, BPF_MAP_TYPE_STACK_TRACE);
    __uint(max_entries, 1);
    __uint(value_size, VALUE_SIZE);
    __uint(key_size, sizeof(u32));
} stackmap SEC(".maps");

static __attribute__((noinline)) void recursive_helper(int depth) {
    if (depth <= 0) return;
    asm volatile("" ::: "memory");
    recursive_helper(depth - 1);
}

SEC("kprobe/do_sys_open")
int test_stack_overflow(void *ctx) {
    u32 key = 0;
    u64 *stack = bpf_map_lookup_elem(&stackmap, &key);
    if (!stack) return 0;

    stack[MAX_STACK_DEPTH] = CANARY_VALUE;

    /* Force minimum stack depth */
    recursive_helper(MAX_STACK_DEPTH + 10);

    (void)bpf_get_stackid(ctx, &stackmap, 0);
    return 0;
}

char _license[] SEC("license") = "GPL";

Yonghong Song

unread,
Aug 5, 2025, 9:52:23 PMAug 5
to Arnaud Lecomte, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com
It looks like it hard to trigger memory corruption inside the kernel.
Maybe kasan can detect it for your specific example.

If without selftests, you can do the following:
__bpf_get_stack() already solved the problem you tried to fix.
I suggest you refactor some portions of the code in __bpf_get_stack()
to set trace_nr properly, and then you can use that refactored function
in __bpf_get_stackid(). So two patches:
1. refactor portion of codes (related elem_size/trace_nr) in __bpf_get_stack().
2. fix the issue in __bpf_get_stackid() with newly created function.

Arnaud Lecomte

unread,
Aug 7, 2025, 1:50:53 PMAug 7
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..14e034045310 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored,
+ * or -EINVAL if size is not a multiple of elem_size
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)
+{
+ u32 max_depth;
+ u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;
+
+ if (unlikely(map_size%map_elem_size))
+ return -EINVAL;
+
+ max_depth = map_size / map_elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;

elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
- if (unlikely(size % elem_size))
- goto clear;

/* cannot get valid user stack for task without user_mode regs */
if (task && user && !user_mode(regs))
@@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
+ if (max_depth < 0)
+ goto err_fault;

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +483,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Arnaud Lecomte

unread,
Aug 7, 2025, 1:53:09 PMAug 7
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 14e034045310..d7ef840971f0 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -250,7 +250,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -266,6 +266,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -325,19 +327,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-
skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}
--
2.43.0

Yonghong Song

unread,
Aug 7, 2025, 3:02:21 PMAug 7
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
> 1 file changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..14e034045310 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored,
> + * or -EINVAL if size is not a multiple of elem_size

-EINVAL is not needed here. See below.

> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)

map_elem_size -> elem_size

> +{
> + u32 max_depth;
> + u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;

reverse Christmas tree?

> +
> + if (unlikely(map_size%map_elem_size))
> + return -EINVAL;

The above should not be here. The checking 'map_size % map_elem_size' is only needed
for bpf_get_stack(), not applicable for bpf_get_stackid().

> +
> + max_depth = map_size / map_elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
>
> elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
> - if (unlikely(size % elem_size))
> - goto clear;

Please keep this one.

>
> /* cannot get valid user stack for task without user_mode regs */
> if (task && user && !user_mode(regs))
> @@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
> + if (max_depth < 0)
> + goto err_fault;

max_depth is never less than 0.

Yonghong Song

unread,
Aug 7, 2025, 3:05:33 PMAug 7
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
the above condition is not needed.

>
> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> false, false);
> @@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> /* couldn't fetch the stack trace */
> return -EFAULT;
>
> - return __bpf_get_stackid(map, trace, flags);
> + return __bpf_get_stackid(map, trace, flags, max_depth);
> }
>
> const struct bpf_func_proto bpf_get_stackid_proto = {
> @@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> + u32 elem_size, pe_max_depth;

pe_max_depth -> max_depth.

>
> /* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
> @@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> -
> + elem_size = stack_map_data_size(map);
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;
> - ret = __bpf_get_stackid(map, trace, flags);
> + pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
>
> /* restore nr */
> trace->nr = nr;
> } else { /* user */
> u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> -

please keep an empty line here.

Yonghong Song

unread,
Aug 7, 2025, 3:07:42 PMAug 7
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/7/25 12:01 PM, Yonghong Song wrote:
>
>
> On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
>> A new helper function stack_map_calculate_max_depth() that
>> computes the max depth for a stackmap.
>>
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> ---
>>   kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
>>   1 file changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 3615c06b7dfa..14e034045310 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct
>> bpf_map *map)
>>           sizeof(struct bpf_stack_build_id) : sizeof(u64);
>>   }
>>   +/**
>> + * stack_map_calculate_max_depth - Calculate maximum allowed stack
>> trace depth
>> + * @map_size:        Size of the buffer/map value in bytes
>> + * @elem_size:       Size of each stack trace element
>> + * @map_flags:       BPF stack trace flags (BPF_F_USER_STACK,
>> BPF_F_USER_BUILD_ID, ...)

One more thing: map_flags -> flags, as 'flags is used in bpf_get_stackid/bpf_get_stack etc.

>> + *
>> + * Return: Maximum number of stack trace entries that can be safely
>> stored,
>> + * or -EINVAL if size is not a multiple of elem_size
>
> -EINVAL is not needed here. See below.

[...]

syzbot ci

unread,
Aug 8, 2025, 3:30:16 AMAug 8
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syz...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, syz...@lists.linux.dev, syzkall...@googlegroups.com
syzbot ci has tested the following series

[v1] bpf: refactor max_depth computation in bpf_get_stack()
https://lore.kernel.org/all/20250807175032...@arnaud-lcm.com
* [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
* [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()

and found the following issues:
* KASAN: stack-out-of-bounds Write in __bpf_get_stack
* PANIC: double fault in its_return_thunk

Full report is available here:
https://ci.syzbot.org/series/2af1b227-99e3-4e64-ac23-827848a4b8a5

***

KASAN: stack-out-of-bounds Write in __bpf_get_stack

tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/c_repro
syz repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/syz_repro

netdevsim netdevsim1 netdevsim0: renamed from eth0
netdevsim netdevsim1 netdevsim1: renamed from eth1
==================================================================
BUG: KASAN: stack-out-of-bounds in __bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
Write of size 208 at addr ffffc90003655ee8 by task syz-executor/5952

CPU: 1 UID: 0 PID: 5952 Comm: syz-executor Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:189
__asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
__bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
____bpf_get_stack kernel/bpf/stackmap.c:525 [inline]
bpf_get_stack+0x33/0x50 kernel/bpf/stackmap.c:522
____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1835 [inline]
bpf_get_stack_raw_tp+0x1a9/0x220 kernel/trace/bpf_trace.c:1825
bpf_prog_4e330ebee64cb698+0x43/0x4b
bpf_dispatcher_nop_func include/linux/bpf.h:1332 [inline]
__bpf_prog_run include/linux/filter.h:718 [inline]
bpf_prog_run include/linux/filter.h:725 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2257 [inline]
bpf_trace_run10+0x2e4/0x500 kernel/trace/bpf_trace.c:2306
__bpf_trace_percpu_alloc_percpu+0x364/0x400 include/trace/events/percpu.h:11
__do_trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
pcpu_alloc_noprof+0x1534/0x16b0 mm/percpu.c:1892
fib_nh_common_init+0x9c/0x3b0 net/ipv4/fib_semantics.c:620
fib6_nh_init+0x1608/0x1ff0 net/ipv6/route.c:3671
ip6_route_info_create_nh+0x16a/0xab0 net/ipv6/route.c:3892
ip6_route_add+0x6e/0x1b0 net/ipv6/route.c:3944
addrconf_add_mroute net/ipv6/addrconf.c:2552 [inline]
addrconf_add_dev+0x24f/0x340 net/ipv6/addrconf.c:2570
addrconf_dev_config net/ipv6/addrconf.c:3479 [inline]
addrconf_init_auto_addrs+0x57c/0xa30 net/ipv6/addrconf.c:3567
addrconf_notify+0xacc/0x1010 net/ipv6/addrconf.c:3740
notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
call_netdevice_notifiers net/core/dev.c:2281 [inline]
__dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1
netif_change_flags+0xe8/0x1a0 net/core/dev.c:9608
do_setlink+0xc55/0x41c0 net/core/rtnetlink.c:3143
rtnl_changelink net/core/rtnetlink.c:3761 [inline]
__rtnl_newlink net/core/rtnetlink.c:3920 [inline]
rtnl_newlink+0x160b/0x1c70 net/core/rtnetlink.c:4057
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6946
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:729
__sys_sendto+0x3bd/0x520 net/socket.c:2228
__do_sys_sendto net/socket.c:2235 [inline]
__se_sys_sendto net/socket.c:2231 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2231
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fec5c790a7c
Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
RSP: 002b:00007fff7b55f7b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fec5d4e35c0 RCX: 00007fec5c790a7c
RDX: 0000000000000030 RSI: 00007fec5d4e3610 RDI: 0000000000000006
RBP: 0000000000000000 R08: 00007fff7b55f804 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000006
R13: 0000000000000000 R14: 00007fec5d4e3610 R15: 0000000000000000
</TASK>

The buggy address belongs to stack of task syz-executor/5952
and is located at offset 296 in frame:
__bpf_get_stack+0x0/0xa70 include/linux/mmap_lock.h:-1

This frame has 1 object:
[32, 36) 'rctx.i'

The buggy address belongs to a 8-page vmalloc region starting at 0xffffc90003650000 allocated at copy_process+0x54b/0x3c00 kernel/fork.c:2002
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888024c63200 pfn:0x24c62
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff888024c63200 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO|__GFP_NOWARN), pid 5845, tgid 5845 (syz-executor), ts 59049058263, free_ts 59031992240
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1851
prep_new_page mm/page_alloc.c:1859 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3858
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5148
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_frozen_pages_noprof mm/mempolicy.c:2487 [inline]
alloc_pages_noprof+0xa9/0x190 mm/mempolicy.c:2507
vm_area_alloc_pages mm/vmalloc.c:3642 [inline]
__vmalloc_area_node mm/vmalloc.c:3720 [inline]
__vmalloc_node_range_noprof+0x97d/0x12f0 mm/vmalloc.c:3893
__vmalloc_node_noprof+0xc2/0x110 mm/vmalloc.c:3956
alloc_thread_stack_node kernel/fork.c:318 [inline]
dup_task_struct+0x3e7/0x860 kernel/fork.c:879
copy_process+0x54b/0x3c00 kernel/fork.c:2002
kernel_clone+0x21e/0x840 kernel/fork.c:2603
__do_sys_clone3 kernel/fork.c:2907 [inline]
__se_sys_clone3+0x256/0x2d0 kernel/fork.c:2886
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5907 tgid 5907 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1395 [inline]
__free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2895
vfree+0x25a/0x400 mm/vmalloc.c:3434
kcov_put kernel/kcov.c:439 [inline]
kcov_close+0x28/0x50 kernel/kcov.c:535
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x6b5/0x2300 kernel/exit.c:966
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
get_signal+0x1286/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x750 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x75/0x110 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
ffffc90003655e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90003655e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc90003655f00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2
^
ffffc90003655f80: 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3
ffffc90003656000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


***

PANIC: double fault in its_return_thunk

tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/c_repro
syz repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/syz_repro

traps: PANIC: double fault, error_code: 0x0
Oops: double fault: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 5789 Comm: syz-executor930 Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
Call Trace:
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: cc int3
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: cc int3
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29: cc int3
* 2a: c3 ret <-- trapping instruction
2b: cc int3
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 90 nop
33: 90 nop
34: 90 nop
35: 90 nop
36: 90 nop
37: 90 nop
38: 90 nop
39: 90 nop
3a: e9 6b 2b b9 f5 jmp 0xf5b92baa
3f: cc int3


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syz...@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzk...@googlegroups.com.

Arnaud Lecomte

unread,
Aug 9, 2025, 7:56:49 AMAug 9
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = map_size / elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Arnaud Lecomte

unread,
Aug 9, 2025, 7:58:44 AMAug 9
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..30c4f7f2ccd1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +344,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +376,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,24 +395,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-
skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}
--
2.43.0

Arnaud Lecomte

unread,
Aug 9, 2025, 8:09:34 AMAug 9
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Arnaud Lecomte

unread,
Aug 9, 2025, 8:14:20 AMAug 9
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..b3995724776c 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,16 +393,18 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */
+
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;

skip += nr_kernel;
@@ -409,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0

Yonghong Song

unread,
Aug 12, 2025, 12:40:05 AMAug 12
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/9/25 5:09 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.

Please add 'bpf-next' in the subject like [PATCH bpf-next v2 1/2]
so CI can properly test the patch set.

>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..532447606532 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes

let us rename 'map_size' to 'size' since the size represents size of
buffer or map, not just for map.

> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)

map_size -> size
Also, you can replace 'flags' to 'skip', so below 'u32 skip = flags & BPF_F_SKIP_FIELD_MASK'
is not necessary.

Arnaud Lecomte

unread,
Aug 12, 2025, 3:30:49 PMAug 12
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..a267567e36dd 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = size / elem_size;
--
2.43.0

Arnaud Lecomte

unread,
Aug 12, 2025, 3:32:18 PMAug 12
to Yonghong Song, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Thanks Yonghong for your feedbacks and your patience !

Arnaud Lecomte

unread,
Aug 12, 2025, 3:33:09 PMAug 12
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a267567e36dd..e1ee18cbbbb2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;

Yonghong Song

unread,
Aug 13, 2025, 1:54:21 AMAug 13
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/12/25 12:30 PM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

LGTM with a small nit below.

Acked-by: Yonghong Song <yongho...@linux.dev>

> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..a267567e36dd 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)

Let us have consistent format, e.g.
 * @size:  Size of ...
 * @elem_size:  Size of ...
 * @flags:  BPF stack trace ...

Yonghong Song

unread,
Aug 13, 2025, 2:00:02 AMAug 13
to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/12/25 12:32 PM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Changes in v2:
> - Fixed max_depth names across get stack id
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

LGTM with a few nits below.

Acked-by: Yonghong Song <yongho...@linux.dev>
Remove the above empty line.

Arnaud Lecomte

unread,
Aug 13, 2025, 4:46:18 PMAug 13
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..b9cc6c72a2a5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Arnaud Lecomte

unread,
Aug 13, 2025, 4:55:19 PMAug 13
to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b9cc6c72a2a5..318f150460bb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,12 +393,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;
@@ -409,7 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0

Lecomte, Arnaud

unread,
Aug 18, 2025, 9:49:40 AMAug 18
to so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev
Hey,
Just forwarding the patch to the associated maintainers with `stackmap.c`.
Have a great day,
Cheers

Yonghong Song

unread,
Aug 18, 2025, 12:58:13 PMAug 18
to Lecomte, Arnaud, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
> Hey,
> Just forwarding the patch to the associated maintainers with
> `stackmap.c`.

Arnaud, please add Ack (provided in comments for v3) to make things easier
for maintainers.

Also, looks like all your patch sets (v1 to v4) in the same thread.
It would be good to have all these versions in separate thread.
Please look at some examples in bpf mailing list.

> Have a great day,
> Cheers
>
> On 13/08/2025 21:55, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>>   contains more stack entries than the stack map bucket can hold,
>>   leading to an out-of-bounds write in the bucket's data array.
>>
>> Changes in v2:
>>   - Fixed max_depth names across get stack id
>>
>> Changes in v4:
>>   - Removed unnecessary empty line in __bpf_get_stackid
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> ---
>>   kernel/bpf/stackmap.c | 23 +++++++++++++----------
>>   1 file changed, 13 insertions(+), 10 deletions(-)
>>
[...]

Yonghong Song

unread,
Aug 18, 2025, 1:03:01 PMAug 18
to Lecomte, Arnaud, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com


On 8/18/25 9:57 AM, Yonghong Song wrote:
>
>
> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>> Hey,
>> Just forwarding the patch to the associated maintainers with
>> `stackmap.c`.
>
> Arnaud, please add Ack (provided in comments for v3) to make things
> easier
> for maintainers.
>
> Also, looks like all your patch sets (v1 to v4) in the same thread.

sorry, it should be v3 and v4 in the same thread.

Arnaud Lecomte

unread,
Aug 19, 2025, 12:21:03 PMAug 19
to Yonghong Song, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
On 18/08/2025 18:02, Yonghong Song wrote:
>
>
> On 8/18/25 9:57 AM, Yonghong Song wrote:
>>
>>
>> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>>> Hey,
>>> Just forwarding the patch to the associated maintainers with
>>> `stackmap.c`.
>>
>> Arnaud, please add Ack (provided in comments for v3) to make things
>> easier
>> for maintainers.
>>
>> Also, looks like all your patch sets (v1 to v4) in the same thread.
>
> sorry, it should be v3 and v4 in the same thread.
>
Hey, ty for the feedback !
I am going to provide the link to the v3 in the v4 commit and resent the
v4 with the Acked-by.

Arnaud Lecomte

unread,
Aug 19, 2025, 12:27:10 PMAug 19
to so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Link to v3: https://lore.kernel.org/all/09dc40eb-a84e-472a...@linux.dev/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..b9cc6c72a2a5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Arnaud Lecomte

unread,
Aug 19, 2025, 12:29:39 PMAug 19
to so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Link to v3: https://lore.kernel.org/all/997d3b8a-4b3a-4720...@linux.dev/
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b9cc6c72a2a5..318f150460bb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
--
2.43.0

Martin KaFai Lau

unread,
Aug 19, 2025, 5:15:45 PMAug 19
to Arnaud Lecomte, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org
hmm... this looks a bit suspicious. Is it possible that
sysctl_perf_event_max_stack is being changed to a larger value in parallel?
I suspect it was fine because trace_nr was still bounded by num_elem.

> + trace_nr = min(trace_nr, max_depth - skip);

but now the min() is also based on max_depth which could be
sysctl_perf_event_max_stack.

beside, if I read it correctly, in "max_depth - skip", the max_depth could also
be less than skip. I assume trace->nr is bound by max_depth, so should be less
of a problem but still a bit unintuitive to read.

Lecomte, Arnaud

unread,
Aug 25, 2025, 12:39:34 PM (13 days ago) Aug 25
to Martin KaFai Lau, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org
Hi Martin, this is a valid concern as sysctl_perf_event_max_stack can be
modified at runtime through /proc/sys/kernel/perf_event_max_stack.
What we could maybe do instead is to create a copy: u32 current_max =
READ_ONCE(sysctl_perf_event_max_stack);
Any thoughts on this ?
We should bring back the num_elem bound as an additional safe net.

Yonghong Song

unread,
Aug 25, 2025, 2:28:09 PM (13 days ago) Aug 25
to Lecomte, Arnaud, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org
There is no need to have READ_ONCE. Jut do
int curr_sysctl_max_stack = sysctl_perf_event_max_stack;
if (max_depth > curr_sysctl_max_stack)
return curr_sysctl_max_stack;

Because of the above change, the patch is not a refactoring change any more.

Lecomte, Arnaud

unread,
Aug 25, 2025, 4:07:17 PM (13 days ago) Aug 25
to Yonghong Song, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org
Why would you not consider it as a refactoring change anymore ?

Yonghong Song

unread,
Aug 25, 2025, 5:15:19 PM (13 days ago) Aug 25
to Lecomte, Arnaud, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org
Sorry, I think I made a couple of mistakes in the above.

First, yes, we do want READ_ONCE, other potentially compiler may optimization
the above back to the original code with two references to sysctl_perf_event_max_stack.

Second, yes, it is indeed a refactoring.

Arnaud Lecomte

unread,
Aug 26, 2025, 5:22:52 PM (12 days ago) Aug 26
to so...@kernel.org, yongho...@linux.dev, marti...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Link to v4: https://lore.kernel.org/all/20250819162652...@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..796cc105eacb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,28 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+ u32 curr_sysctl_max_stack = READ_ONCE(sysctl_perf_event_max_stack);
+
+ max_depth = size / elem_size;
+ max_depth += skip;
+ if (max_depth > curr_sysctl_max_stack)
+ return curr_sysctl_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -438,10 +460,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -460,6 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto err_fault;
}

+ num_elem = size / elem_size;
trace_nr = trace->nr - skip;
trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;
--
2.43.0

Arnaud Lecomte

unread,
Aug 26, 2025, 5:24:05 PM (12 days ago) Aug 26
to so...@kernel.org, yongho...@linux.dev, marti...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Link to v4: https://lore.kernel.org/all/20250813205506....@arnaud-lcm.com/
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 796cc105eacb..ef8269ab8d6f 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -247,7 +247,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -322,19 +324,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -343,7 +343,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -375,6 +375,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -393,12 +394,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;
@@ -410,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

Alexei Starovoitov

unread,
Aug 29, 2025, 1:29:38 PM (9 days ago) Aug 29
to Arnaud Lecomte, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
The patch might have fixed this particular syzbot repro
with OOB in stackmap-with-buildid case,
but above two line looks wrong.
trace_len is computed before being capped by max_depth.
So non-buildid case below is using
memcpy(new_bucket->data, ips, trace_len);

so OOB is still there?

Alexei Starovoitov

unread,
Aug 29, 2025, 8:28:17 PM (9 days ago) Aug 29
to Song Liu, Arnaud Lecomte, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
On Fri, Aug 29, 2025 at 11:50 AM Song Liu <so...@kernel.org> wrote:
>
> On Fri, Aug 29, 2025 at 10:29 AM Alexei Starovoitov
> <alexei.st...@gmail.com> wrote:
> [...]
> > >
> > > static long __bpf_get_stackid(struct bpf_map *map,
> > > - struct perf_callchain_entry *trace, u64 flags)
> > > + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> > > {
> > > struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> > > struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> > > @@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
> > >
> > > trace_nr = trace->nr - skip;
> > > trace_len = trace_nr * sizeof(u64);
> > > + trace_nr = min(trace_nr, max_depth - skip);
> > > +
> >
> > The patch might have fixed this particular syzbot repro
> > with OOB in stackmap-with-buildid case,
> > but above two line looks wrong.
> > trace_len is computed before being capped by max_depth.
> > So non-buildid case below is using
> > memcpy(new_bucket->data, ips, trace_len);
> >
> > so OOB is still there?
>
> +1 for this observation.
>
> We are calling __bpf_get_stackid() from two functions: bpf_get_stackid
> and bpf_get_stackid_pe. The check against max_depth is only needed
> from bpf_get_stackid_pe, so it is better to just check here.

Good point.

> I have got the following on top of patch 1/2. This makes more sense to
> me.
>
> PS: The following also includes some clean up in __bpf_get_stack.
> I include those because it also uses stack_map_calculate_max_depth.
>
> Does this look better?

yeah. It's certainly cleaner to avoid adding extra arg to
__bpf_get_stackid()

Song Liu

unread,
Aug 30, 2025, 9:35:07 AM (8 days ago) Aug 30
to Alexei Starovoitov, Arnaud Lecomte, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
On Fri, Aug 29, 2025 at 10:29 AM Alexei Starovoitov
<alexei.st...@gmail.com> wrote:
[...]
> >
> > static long __bpf_get_stackid(struct bpf_map *map,
> > - struct perf_callchain_entry *trace, u64 flags)
> > + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> > {
> > struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> > struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> > @@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
> >
> > trace_nr = trace->nr - skip;
> > trace_len = trace_nr * sizeof(u64);
> > + trace_nr = min(trace_nr, max_depth - skip);
> > +
>
> The patch might have fixed this particular syzbot repro
> with OOB in stackmap-with-buildid case,
> but above two line looks wrong.
> trace_len is computed before being capped by max_depth.
> So non-buildid case below is using
> memcpy(new_bucket->data, ips, trace_len);
>
> so OOB is still there?

+1 for this observation.

We are calling __bpf_get_stackid() from two functions: bpf_get_stackid
and bpf_get_stackid_pe. The check against max_depth is only needed
from bpf_get_stackid_pe, so it is better to just check here.

I have got the following on top of patch 1/2. This makes more sense to
me.

PS: The following also includes some clean up in __bpf_get_stack.
I include those because it also uses stack_map_calculate_max_depth.

Does this look better?

Thanks,
Song


diff --git c/kernel/bpf/stackmap.c w/kernel/bpf/stackmap.c
index 796cc105eacb..08554fb146e1 100644
--- c/kernel/bpf/stackmap.c
+++ w/kernel/bpf/stackmap.c
@@ -262,7 +262,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
return -EFAULT;

trace_nr = trace->nr - skip;
- trace_len = trace_nr * sizeof(u64);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -297,6 +297,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
return -EEXIST;
}
} else {
+ trace_len = trace_nr * sizeof(u64);
if (hash_matches && bucket->nr == trace_nr &&
memcmp(bucket->data, ips, trace_len) == 0)
return id;
@@ -322,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size,
elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -375,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -393,11 +393,12 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

- trace->nr = nr_kernel;
+ max_depth =
stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */
@@ -410,6 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, trace->nr, max_depth);
ret = __bpf_get_stackid(map, trace, flags);
}
return ret;
@@ -428,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -465,13 +468,15 @@ static long __bpf_get_stack(struct pt_regs
*regs, struct task_struct *task,
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -479,9 +484,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
struct task_struct *task,
goto err_fault;
}

- num_elem = size / elem_size;
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

Lecomte, Arnaud

unread,
Aug 30, 2025, 1:14:00 PM (8 days ago) Aug 30
to Alexei Starovoitov, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
Nice catch, thanks !
>
>> I have got the following on top of patch 1/2. This makes more sense to
>> me.
>>
>> PS: The following also includes some clean up in __bpf_get_stack.
>> I include those because it also uses stack_map_calculate_max_depth.
>>
>> Does this look better?
> yeah. It's certainly cleaner to avoid adding extra arg to
> __bpf_get_stackid()
>
Are Song patches going to be applied then ?  Or should I raise a new
revision
 of the patch with Song's modifications with a Co-developped tag ?
Thanks for your guidance in advance,
Arnaud

Alexei Starovoitov

unread,
Aug 31, 2025, 9:10:47 PM (7 days ago) Aug 31
to Lecomte, Arnaud, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
Pls resubmit and retest with a tag.

Arnaud Lecomte

unread,
Sep 3, 2025, 9:52:30 AM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Link to v5: https://lore.kernel.org/all/20250826212229....@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
Cc: Song Lui <so...@kernel.org>
---
kernel/bpf/stackmap.c | 58 ++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..1ebc525b7c2f 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -300,20 +322,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
-
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -350,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;
@@ -371,11 +391,14 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);

if (kernel) {
__u64 nr = trace->nr;

- trace->nr = nr_kernel;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */
@@ -388,6 +411,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, trace->nr, max_depth);
ret = __bpf_get_stackid(map, trace, flags);
}
return ret;
@@ -406,8 +432,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
@@ -438,21 +464,20 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
- crosstask, false);
+ crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -461,7 +486,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.47.3

Arnaud Lecomte

unread,
Sep 3, 2025, 9:53:30 AM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Arnaud Lecomte

unread,
Sep 3, 2025, 9:53:53 AM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changs in v6:
- Added back trace_len computation in __bpf_get_stackid
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 1ebc525b7c2f..8b2dcb8a6dc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -251,8 +251,9 @@ static long __bpf_get_stackid(struct bpf_map *map,
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
+ u32 hash, id, trace_nr, trace_len, i, max_depth;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
- u32 hash, id, trace_nr, trace_len, i;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
@@ -261,8 +262,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
/* skipping more than usable stack trace */
return -EFAULT;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace_nr = trace->nr - skip;
+ trace_nr = min_t(u32, trace_nr, max_depth - skip);
trace_len = trace_nr * sizeof(u64);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
--
2.47.3

Alexei Starovoitov

unread,
Sep 3, 2025, 12:13:14 PM (4 days ago) Sep 3
to Arnaud Lecomte, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
On Wed, Sep 3, 2025 at 6:52 AM Arnaud Lecomte <con...@arnaud-lcm.com> wrote:
>
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Changes in v4:
> - Fixed indentation in max depth helper for args
>
> Changes in v5:
> - Bound back trace_nr to num_elem in __bpf_get_stack
> - Make a copy of sysctl_perf_event_max_stack
> in stack_map_calculate_max_depth
>
> Changes in v6:
> - Restrained max_depth computation only when required
> - Additional cleanup from Song in __bpf_get_stack

This is not a refactor anymore.
Pls don't squash different things into one patch.
Keep refactor as patch 1, and another cleanup as patch 2.

pw-bot: cr

Lecomte, Arnaud

unread,
Sep 3, 2025, 12:20:52 PM (4 days ago) Sep 3
to Alexei Starovoitov, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
The main problem is that patch 2 is not a cleanup too. It is a bug fix
so it doesn't really
fit either.
We could maybe split this patch into 2 new patches but I don't really
like this idea.
If we decide to stick to 2 patches format, I don't have any preference
which patch's scope
should be extended.

>
> pw-bot: cr
>

Alexei Starovoitov

unread,
Sep 3, 2025, 12:22:29 PM (4 days ago) Sep 3
to Lecomte, Arnaud, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs
I wasn't proposing to squash cleanup into patch 2.
Make 3 patches where each one is doing one thing.

Arnaud Lecomte

unread,
Sep 3, 2025, 7:39:50 PM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Changes in v7:
- Removed additional cleanup from v6

Link to v6: https://lore.kernel.org/all/20250903135323....@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..ed707bc07173 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
-
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -406,8 +425,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
@@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
--
2.47.3

Arnaud Lecomte

unread,
Sep 3, 2025, 7:40:59 PM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Clean-up bounds checking for trace->nr in
__bpf_get_stack by limiting it only to
max_depth.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Cc: Song Lui <so...@kernel.org>
---
kernel/bpf/stackmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index ed707bc07173..9f3ae426ddc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -462,13 +462,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -477,7 +479,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.47.3

Arnaud Lecomte

unread,
Sep 3, 2025, 7:43:32 PM (4 days ago) Sep 3
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changs in v6:
- Added back trace_len computation in __bpf_get_stackid

Link to v6: https://lore.kernel.org/all/20250903135348....@arnaud-lcm.com/

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 9f3ae426ddc3..29e05c9ff1bd 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;
@@ -390,11 +391,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);

if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */
@@ -407,6 +412,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, trace->nr, max_depth);
ret = __bpf_get_stackid(map, trace, flags);
}
return ret;
--
2.47.3

Lecomte, Arnaud

unread,
Sep 3, 2025, 7:46:41 PM (4 days ago) Sep 3
to Alexei Starovoitov, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

Lecomte, Arnaud

unread,
Sep 4, 2025, 6:52:36 PM (3 days ago) Sep 4
to Song Liu, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 05/09/2025 00:40, Song Liu wrote:
>
> On 9/3/25 4:40 PM, Arnaud Lecomte wrote:
>> Clean-up bounds checking for trace->nr in
>> __bpf_get_stack by limiting it only to
>> max_depth.
>>
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> Cc: Song Lui <so...@kernel.org>
>
> Typo in my name, which is "Song Liu".
>
> This looks right.
>
> Acked-by: Song Liu <so...@kernel.org>
>
Oops sorry !

Lecomte, Arnaud

unread,
Sep 4, 2025, 6:53:52 PM (3 days ago) Sep 4
to Song Liu, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 05/09/2025 00:45, Song Liu wrote:
>
> On 9/3/25 4:43 PM, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>>   contains more stack entries than the stack map bucket can hold,
>>   leading to an out-of-bounds write in the bucket's data array.
>>
>> Changes in v2:
>>   - Fixed max_depth names across get stack id
>>
>> Changes in v4:
>>   - Removed unnecessary empty line in __bpf_get_stackid
>>
>> Changs in v6:
>>   - Added back trace_len computation in __bpf_get_stackid
>>
>> Link to v6:
>> https://lore.kernel.org/all/20250903135348....@arnaud-lcm.com/
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to
>> accommodate skip > 0")
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> Acked-by: Yonghong Song <yongho...@linux.dev>
>
> For future patches, please keep the "Changes in vX.." at the end of
Good to know, thanks !
>
> your commit log and after a "---". IOW, something like
>
>
> Acked-by: Yonghong Song <yongho...@linux.dev>
>
> ---
>
> changes in v2:
>
> ...
>
> ---
>
> kernel/bpf/stackmap.c | 8 ++++++++
>
>
> In this way, the "changes in vXX" part will be removed by git-am.
>
>> ---
>>   kernel/bpf/stackmap.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 9f3ae426ddc3..29e05c9ff1bd 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>>   {
>>       struct perf_event *event = ctx->event;
>>       struct perf_callchain_entry *trace;
>> +    u32 elem_size, max_depth;
>>       bool kernel, user;
>>       __u64 nr_kernel;
>>       int ret;
>> @@ -390,11 +391,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>>           return -EFAULT;
>>         nr_kernel = count_kernel_ip(trace);
>> +    elem_size = stack_map_data_size(map);
>>         if (kernel) {
>>           __u64 nr = trace->nr;
>>             trace->nr = nr_kernel;
>
> this trace->nr = is useless.
>
>> +        max_depth =
>> +            stack_map_calculate_max_depth(map->value_size,
>> elem_size, flags);
>> +        trace->nr = min_t(u32, nr_kernel, max_depth);
>>           ret = __bpf_get_stackid(map, trace, flags);
>>             /* restore nr */
>> @@ -407,6 +412,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>>               return -EFAULT;
>>             flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +        max_depth =
>> +            stack_map_calculate_max_depth(map->value_size,
>> elem_size, flags);
>> +        trace->nr = min_t(u32, trace->nr, max_depth);
>>           ret = __bpf_get_stackid(map, trace, flags);
>
> I missed this part earlier. Here we need to restore trace->nr, just
> like we did

>
> in the "if (kernel)" branch.
>
Make sense, thanks !
> Thanks,
>
> Song
>
>>       }
>>       return ret;
>
Thanks,
Arnaud

Arnaud lecomte

unread,
Sep 5, 2025, 9:46:46 AM (2 days ago) Sep 5
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
From: Arnaud Lecomte <con...@arnaud-lcm.com>

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
Acked-by: Song Liu <so...@kernel.org>
---
Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Changes in v7:
- Removed additional cleanup from v6

Link to v7: https://lore.kernel.org/all/20250903233910....@arnaud-lcm.com/
---
kernel/bpf/stackmap.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..ed707bc07173 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -406,8 +425,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
@@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
--
2.47.3

Arnaud lecomte

unread,
Sep 5, 2025, 9:47:51 AM (2 days ago) Sep 5
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
From: Arnaud Lecomte <con...@arnaud-lcm.com>

Clean-up bounds checking for trace->nr in
__bpf_get_stack by limiting it only to
max_depth.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Song Liu <so...@kernel.org>
Cc: Song Liu <so...@kernel.org>
---
kernel/bpf/stackmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index ed707bc07173..9f3ae426ddc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -462,13 +462,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -477,7 +479,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.47.3

Arnaud lecomte

unread,
Sep 5, 2025, 9:48:42 AM (2 days ago) Sep 5
to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte
From: Arnaud Lecomte <con...@arnaud-lcm.com>

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changes in v6:
- Added back trace_len computation in __bpf_get_stackid

Changes in v7:
- Removed usefull trace->nr assignation in bpf_get_stackid_pe
- Added restoration of trace->nr for both kernel and user traces
in bpf_get_stackid_pe

Link to v7: https://lore.kernel.org/all/20250903234325....@arnaud-lcm.com/
---
kernel/bpf/stackmap.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 9f3ae426ddc3..9b57b8307565 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;
@@ -390,15 +391,16 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);
+ __u64 nr = trace->nr; /* save original */

if (kernel) {
- __u64 nr = trace->nr;
-
trace->nr = nr_kernel;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

- /* restore nr */
- trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -407,8 +409,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, trace->nr, max_depth);
ret = __bpf_get_stackid(map, trace, flags);
}
+
+ /* restore nr */
+ trace->nr = nr;
+
return ret;
}

--
2.47.3

Reply all
Reply to author
Forward
0 new messages