[PATCH v2] bpf: fix stackmap overflow check in __bpf_get

Arnaud Lecomte

unread,

Jul 29, 2025, 12:56:42 PMJul 29

to so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, yongho...@linux.dev, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
For build_id mode, we use sizeof(struct bpf_stack_build_id)
to determine capacity, and for normal mode we use sizeof(u64).

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
Changes in v2:
- Use utilty stack_map_data_size to compute map stack map size
---
kernel/bpf/stackmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..6f225d477f07 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
- u32 hash, id, trace_nr, trace_len, i;
+ u32 hash, id, trace_nr, trace_len, i, max_depth;
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
@@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+
+ /* Clamp the trace to max allowed depth */
+ max_depth = smap->map.value_size / stack_map_data_size(map);
+ if (trace_nr > max_depth)
+ trace_nr = max_depth;
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
--
2.43.0

Yonghong Song

unread,

Jul 29, 2025, 6:45:15 PMJul 29

to Arnaud Lecomte, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com

On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
> For build_id mode, we use sizeof(struct bpf_stack_build_id)
> to determine capacity, and for normal mode we use sizeof(u64).
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

Could you add a selftest? This way folks can easily find out what is
the problem and why this fix solves the issue correctly.

Arnaud Lecomte

unread,

Jul 30, 2025, 3:11:03 AMJul 30

to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com

On 29/07/2025 23:45, Yonghong Song wrote:
>
>
> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>> contains more stack entries than the stack map bucket can hold,
>> leading to an out-of-bounds write in the bucket's data array.
>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>> to determine capacity, and for normal mode we use sizeof(u64).
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Tested-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>
> Could you add a selftest? This way folks can easily find out what is
> the problem and why this fix solves the issue correctly.
>

Sure, will be done after work
Thanks,
Arnaud

Lecomte, Arnaud

unread,

Aug 1, 2025, 2:16:56 PMAug 1

to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com

Well, it turns out it is less straightforward than it looked like to
detect the memory corruption
without KASAN. I am currently in holidays for the next 3 days so I've
limited access to a
computer. I should be able to sort this out on monday.

Thanks,
Arnaud

Arnaud Lecomte

unread,

Aug 5, 2025, 4:49:54 PMAug 5

to Yonghong Song, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com

Hi,
I gave it several tries and I can't find a nice to do see properly.
The main challenge is to find a way to detect memory corruption. I
wanted to place a canary value
by tweaking the map size but we don't have a way from a BPF program
perspective to access to the size
of a stack_map_bucket. If we decide to do this computation manually, we
would end-up with maintainability
issues:
#include "vmlinux.h"
#include "bpf/bpf_helpers.h"

#define MAX_STACK_DEPTH 32
#define CANARY_VALUE 0xBADCAFE

/* Calculate size based on known layout:
* - fnode: sizeof(void*)
* - hash: 4 bytes
* - nr: 4 bytes
* - data: MAX_STACK_DEPTH * 8 bytes
* - canary: 8 bytes
*/
#define VALUE_SIZE (sizeof(void*) + 4 + 4 + (MAX_STACK_DEPTH * 8) + 8)

struct {
    __uint(type, BPF_MAP_TYPE_STACK_TRACE);
    __uint(max_entries, 1);
    __uint(value_size, VALUE_SIZE);
    __uint(key_size, sizeof(u32));
} stackmap SEC(".maps");

static __attribute__((noinline)) void recursive_helper(int depth) {
    if (depth <= 0) return;
    asm volatile("" ::: "memory");
    recursive_helper(depth - 1);
}

SEC("kprobe/do_sys_open")
int test_stack_overflow(void *ctx) {
    u32 key = 0;
    u64 *stack = bpf_map_lookup_elem(&stackmap, &key);
    if (!stack) return 0;

    stack[MAX_STACK_DEPTH] = CANARY_VALUE;

    /* Force minimum stack depth */
    recursive_helper(MAX_STACK_DEPTH + 10);

    (void)bpf_get_stackid(ctx, &stackmap, 0);
    return 0;
}

char _license[] SEC("license") = "GPL";

Yonghong Song

unread,

Aug 5, 2025, 9:52:23 PMAug 5

to Arnaud Lecomte, so...@kernel.org, jo...@kernel.org, a...@kernel.org, dan...@iogearbox.net, and...@kernel.org, marti...@linux.dev, edd...@gmail.com, john.fa...@gmail.com, kps...@kernel.org, s...@fomichev.me, hao...@google.com, b...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzbot+c9b724...@syzkaller.appspotmail.com

It looks like it hard to trigger memory corruption inside the kernel.
Maybe kasan can detect it for your specific example.

If without selftests, you can do the following:
__bpf_get_stack() already solved the problem you tried to fix.
I suggest you refactor some portions of the code in __bpf_get_stack()
to set trace_nr properly, and then you can use that refactored function
in __bpf_get_stackid(). So two patches:
1. refactor portion of codes (related elem_size/trace_nr) in __bpf_get_stack().
2. fix the issue in __bpf_get_stackid() with newly created function.

Arnaud Lecomte

unread,

Aug 7, 2025, 1:50:53 PMAug 7

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..14e034045310 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored,
+ * or -EINVAL if size is not a multiple of elem_size
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)
+{
+ u32 max_depth;
+ u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;
+
+ if (unlikely(map_size%map_elem_size))
+ return -EINVAL;
+
+ max_depth = map_size / map_elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;

elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
- if (unlikely(size % elem_size))
- goto clear;

/* cannot get valid user stack for task without user_mode regs */
if (task && user && !user_mode(regs))
@@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
+ if (max_depth < 0)
+ goto err_fault;

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +483,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Aug 7, 2025, 1:53:09 PMAug 7

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 14e034045310..d7ef840971f0 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -250,7 +250,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)

}

static long __bpf_get_stackid(struct bpf_map *map,

- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{

struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;

@@ -266,6 +266,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

@@ -325,19 +327,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);

bool user = flags & BPF_F_USER_STACK;

struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-
skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}
--
2.43.0

Yonghong Song

unread,

Aug 7, 2025, 3:02:21 PMAug 7

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
> 1 file changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..14e034045310 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored,
> + * or -EINVAL if size is not a multiple of elem_size

-EINVAL is not needed here. See below.

> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)

map_elem_size -> elem_size

> +{
> + u32 max_depth;
> + u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;

reverse Christmas tree?

> +
> + if (unlikely(map_size%map_elem_size))
> + return -EINVAL;

The above should not be here. The checking 'map_size % map_elem_size' is only needed
for bpf_get_stack(), not applicable for bpf_get_stackid().

> +
> + max_depth = map_size / map_elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
>
> elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
> - if (unlikely(size % elem_size))
> - goto clear;

Please keep this one.

>
> /* cannot get valid user stack for task without user_mode regs */
> if (task && user && !user_mode(regs))
> @@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
> + if (max_depth < 0)
> + goto err_fault;

max_depth is never less than 0.

Yonghong Song

unread,

Aug 7, 2025, 3:05:33 PMAug 7

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

the above condition is not needed.

>
> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> false, false);
> @@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> /* couldn't fetch the stack trace */
> return -EFAULT;
>
> - return __bpf_get_stackid(map, trace, flags);
> + return __bpf_get_stackid(map, trace, flags, max_depth);
> }
>
> const struct bpf_func_proto bpf_get_stackid_proto = {
> @@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> + u32 elem_size, pe_max_depth;

pe_max_depth -> max_depth.

>
> /* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
> @@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> -
> + elem_size = stack_map_data_size(map);
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;
> - ret = __bpf_get_stackid(map, trace, flags);
> + pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
>
> /* restore nr */
> trace->nr = nr;
> } else { /* user */
> u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> -

please keep an empty line here.

Yonghong Song

unread,

Aug 7, 2025, 3:07:42 PMAug 7

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/7/25 12:01 PM, Yonghong Song wrote:
>
>
> On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
>> A new helper function stack_map_calculate_max_depth() that
>> computes the max depth for a stackmap.
>>
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> ---
>> kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
>> 1 file changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 3615c06b7dfa..14e034045310 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct
>> bpf_map *map)
>>           sizeof(struct bpf_stack_build_id) : sizeof(u64);
>> }
>> +/**
>> + * stack_map_calculate_max_depth - Calculate maximum allowed stack
>> trace depth
>> + * @map_size:        Size of the buffer/map value in bytes
>> + * @elem_size:       Size of each stack trace element
>> + * @map_flags:       BPF stack trace flags (BPF_F_USER_STACK,
>> BPF_F_USER_BUILD_ID, ...)

One more thing: map_flags -> flags, as 'flags is used in bpf_get_stackid/bpf_get_stack etc.

>> + *
>> + * Return: Maximum number of stack trace entries that can be safely
>> stored,
>> + * or -EINVAL if size is not a multiple of elem_size
>
> -EINVAL is not needed here. See below.

[...]

syzbot ci

unread,

Aug 8, 2025, 3:30:16 AMAug 8

to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syz...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, syz...@lists.linux.dev, syzkall...@googlegroups.com

syzbot ci has tested the following series

[v1] bpf: refactor max_depth computation in bpf_get_stack()
https://lore.kernel.org/all/20250807175032...@arnaud-lcm.com
* [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
* [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()

and found the following issues:
* KASAN: stack-out-of-bounds Write in __bpf_get_stack
* PANIC: double fault in its_return_thunk

Full report is available here:
https://ci.syzbot.org/series/2af1b227-99e3-4e64-ac23-827848a4b8a5

***

KASAN: stack-out-of-bounds Write in __bpf_get_stack

tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/c_repro
syz repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/syz_repro

netdevsim netdevsim1 netdevsim0: renamed from eth0
netdevsim netdevsim1 netdevsim1: renamed from eth1
==================================================================
BUG: KASAN: stack-out-of-bounds in __bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
Write of size 208 at addr ffffc90003655ee8 by task syz-executor/5952

CPU: 1 UID: 0 PID: 5952 Comm: syz-executor Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:189
__asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
__bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
____bpf_get_stack kernel/bpf/stackmap.c:525 [inline]
bpf_get_stack+0x33/0x50 kernel/bpf/stackmap.c:522
____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1835 [inline]
bpf_get_stack_raw_tp+0x1a9/0x220 kernel/trace/bpf_trace.c:1825
bpf_prog_4e330ebee64cb698+0x43/0x4b
bpf_dispatcher_nop_func include/linux/bpf.h:1332 [inline]
__bpf_prog_run include/linux/filter.h:718 [inline]
bpf_prog_run include/linux/filter.h:725 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2257 [inline]
bpf_trace_run10+0x2e4/0x500 kernel/trace/bpf_trace.c:2306
__bpf_trace_percpu_alloc_percpu+0x364/0x400 include/trace/events/percpu.h:11
__do_trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
pcpu_alloc_noprof+0x1534/0x16b0 mm/percpu.c:1892
fib_nh_common_init+0x9c/0x3b0 net/ipv4/fib_semantics.c:620
fib6_nh_init+0x1608/0x1ff0 net/ipv6/route.c:3671
ip6_route_info_create_nh+0x16a/0xab0 net/ipv6/route.c:3892
ip6_route_add+0x6e/0x1b0 net/ipv6/route.c:3944
addrconf_add_mroute net/ipv6/addrconf.c:2552 [inline]
addrconf_add_dev+0x24f/0x340 net/ipv6/addrconf.c:2570
addrconf_dev_config net/ipv6/addrconf.c:3479 [inline]
addrconf_init_auto_addrs+0x57c/0xa30 net/ipv6/addrconf.c:3567
addrconf_notify+0xacc/0x1010 net/ipv6/addrconf.c:3740
notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
call_netdevice_notifiers net/core/dev.c:2281 [inline]
__dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1
netif_change_flags+0xe8/0x1a0 net/core/dev.c:9608
do_setlink+0xc55/0x41c0 net/core/rtnetlink.c:3143
rtnl_changelink net/core/rtnetlink.c:3761 [inline]
__rtnl_newlink net/core/rtnetlink.c:3920 [inline]
rtnl_newlink+0x160b/0x1c70 net/core/rtnetlink.c:4057
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6946
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:729
__sys_sendto+0x3bd/0x520 net/socket.c:2228
__do_sys_sendto net/socket.c:2235 [inline]
__se_sys_sendto net/socket.c:2231 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2231
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fec5c790a7c
Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
RSP: 002b:00007fff7b55f7b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fec5d4e35c0 RCX: 00007fec5c790a7c
RDX: 0000000000000030 RSI: 00007fec5d4e3610 RDI: 0000000000000006
RBP: 0000000000000000 R08: 00007fff7b55f804 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000006
R13: 0000000000000000 R14: 00007fec5d4e3610 R15: 0000000000000000
</TASK>

The buggy address belongs to stack of task syz-executor/5952
and is located at offset 296 in frame:
__bpf_get_stack+0x0/0xa70 include/linux/mmap_lock.h:-1

This frame has 1 object:
[32, 36) 'rctx.i'

The buggy address belongs to a 8-page vmalloc region starting at 0xffffc90003650000 allocated at copy_process+0x54b/0x3c00 kernel/fork.c:2002
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888024c63200 pfn:0x24c62
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff888024c63200 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO|__GFP_NOWARN), pid 5845, tgid 5845 (syz-executor), ts 59049058263, free_ts 59031992240
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1851
prep_new_page mm/page_alloc.c:1859 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3858
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5148
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_frozen_pages_noprof mm/mempolicy.c:2487 [inline]
alloc_pages_noprof+0xa9/0x190 mm/mempolicy.c:2507
vm_area_alloc_pages mm/vmalloc.c:3642 [inline]
__vmalloc_area_node mm/vmalloc.c:3720 [inline]
__vmalloc_node_range_noprof+0x97d/0x12f0 mm/vmalloc.c:3893
__vmalloc_node_noprof+0xc2/0x110 mm/vmalloc.c:3956
alloc_thread_stack_node kernel/fork.c:318 [inline]
dup_task_struct+0x3e7/0x860 kernel/fork.c:879
copy_process+0x54b/0x3c00 kernel/fork.c:2002
kernel_clone+0x21e/0x840 kernel/fork.c:2603
__do_sys_clone3 kernel/fork.c:2907 [inline]
__se_sys_clone3+0x256/0x2d0 kernel/fork.c:2886
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5907 tgid 5907 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1395 [inline]
__free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2895
vfree+0x25a/0x400 mm/vmalloc.c:3434
kcov_put kernel/kcov.c:439 [inline]
kcov_close+0x28/0x50 kernel/kcov.c:535
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x6b5/0x2300 kernel/exit.c:966
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
get_signal+0x1286/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x750 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x75/0x110 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
ffffc90003655e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90003655e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc90003655f00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2
^
ffffc90003655f80: 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3
ffffc90003656000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

***

PANIC: double fault in its_return_thunk

tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/c_repro
syz repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/syz_repro

traps: PANIC: double fault, error_code: 0x0
Oops: double fault: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 5789 Comm: syz-executor930 Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
Call Trace:
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: cc int3
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: cc int3
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29: cc int3
* 2a: c3 ret <-- trapping instruction
2b: cc int3
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 90 nop
33: 90 nop
34: 90 nop
35: 90 nop
36: 90 nop
37: 90 nop
38: 90 nop
39: 90 nop
3a: e9 6b 2b b9 f5 jmp 0xf5b92baa
3f: cc int3

***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syz...@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzk...@googlegroups.com.

Arnaud Lecomte

unread,

Aug 9, 2025, 7:56:49 AMAug 9

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)

sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element

+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)

+ *
+ * Return: Maximum number of stack trace entries that can be safely stored

+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = map_size / elem_size;

+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +

@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)

- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

+ trace_nr = min(trace_nr, max_depth - skip);

copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Aug 9, 2025, 7:58:44 AMAug 9

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..30c4f7f2ccd1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)

}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;

@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

@@ -321,19 +323,19 @@ static long __bpf_get_stackid(struct bpf_map *map,

BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -342,7 +344,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,

/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {

@@ -374,6 +376,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))

@@ -392,24 +395,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */

u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-

skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}

--
2.43.0

Arnaud Lecomte

unread,

Aug 9, 2025, 8:09:34 AMAug 9

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

+ trace_nr = min(trace_nr, max_depth - skip);

copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Aug 9, 2025, 8:14:20 AMAug 9

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..b3995724776c 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,

BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,

/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {

@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

bool kernel, user;
__u64 nr_kernel;
int ret;

+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))

@@ -392,16 +393,18 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;
} else { /* user */

+
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;

skip += nr_kernel;
@@ -409,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0

Yonghong Song

unread,

Aug 12, 2025, 12:40:05 AMAug 12

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/9/25 5:09 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.

Please add 'bpf-next' in the subject like [PATCH bpf-next v2 1/2]
so CI can properly test the patch set.

>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..532447606532 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes

let us rename 'map_size' to 'size' since the size represents size of
buffer or map, not just for map.

> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)

map_size -> size
Also, you can replace 'flags' to 'skip', so below 'u32 skip = flags & BPF_F_SKIP_FIELD_MASK'
is not necessary.

Arnaud Lecomte

unread,

Aug 12, 2025, 3:30:49 PMAug 12

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c

index 3615c06b7dfa..a267567e36dd 100644

--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth

+ * @size: Size of the buffer/map value in bytes

+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */

+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)

+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+

+ max_depth = size / elem_size;

--
2.43.0

Arnaud Lecomte

unread,

Aug 12, 2025, 3:32:18 PMAug 12

to Yonghong Song, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Thanks Yonghong for your feedbacks and your patience !

Arnaud Lecomte

unread,

Aug 12, 2025, 3:33:09 PMAug 12

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a267567e36dd..e1ee18cbbbb2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;

Yonghong Song

unread,

Aug 13, 2025, 1:54:21 AMAug 13

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/12/25 12:30 PM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

LGTM with a small nit below.

Acked-by: Yonghong Song <yongho...@linux.dev>

> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..a267567e36dd 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)

Let us have consistent format, e.g.
* @size: Size of ...
* @elem_size: Size of ...
* @flags: BPF stack trace ...

Yonghong Song

unread,

Aug 13, 2025, 2:00:02 AMAug 13

to Arnaud Lecomte, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/12/25 12:32 PM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Changes in v2:
> - Fixed max_depth names across get stack id
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

LGTM with a few nits below.

Acked-by: Yonghong Song <yongho...@linux.dev>

Remove the above empty line.

Arnaud Lecomte

unread,

Aug 13, 2025, 4:46:18 PMAug 13

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..b9cc6c72a2a5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

+ trace_nr = min(trace_nr, max_depth - skip);

copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Aug 13, 2025, 4:55:19 PMAug 13

to yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b9cc6c72a2a5..318f150460bb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))

@@ -392,12 +393,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;

@@ -409,7 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}

--
2.43.0

Lecomte, Arnaud

unread,

Aug 18, 2025, 9:49:40 AMAug 18

to so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev

Hey,
Just forwarding the patch to the associated maintainers with `stackmap.c`.
Have a great day,
Cheers

Yonghong Song

unread,

Aug 18, 2025, 12:58:13 PMAug 18

to Lecomte, Arnaud, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
> Hey,
> Just forwarding the patch to the associated maintainers with
> `stackmap.c`.

Arnaud, please add Ack (provided in comments for v3) to make things easier
for maintainers.

Also, looks like all your patch sets (v1 to v4) in the same thread.
It would be good to have all these versions in separate thread.
Please look at some examples in bpf mailing list.

> Have a great day,
> Cheers
>
> On 13/08/2025 21:55, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>> contains more stack entries than the stack map bucket can hold,
>> leading to an out-of-bounds write in the bucket's data array.
>>
>> Changes in v2:
>> - Fixed max_depth names across get stack id
>>
>> Changes in v4:
>> - Removed unnecessary empty line in __bpf_get_stackid
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> ---
>> kernel/bpf/stackmap.c | 23 +++++++++++++----------
>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>

[...]

Yonghong Song

unread,

Aug 18, 2025, 1:03:01 PMAug 18

to Lecomte, Arnaud, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 8/18/25 9:57 AM, Yonghong Song wrote:
>
>
> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>> Hey,
>> Just forwarding the patch to the associated maintainers with
>> `stackmap.c`.
>
> Arnaud, please add Ack (provided in comments for v3) to make things
> easier
> for maintainers.
>
> Also, looks like all your patch sets (v1 to v4) in the same thread.

sorry, it should be v3 and v4 in the same thread.

Arnaud Lecomte

unread,

Aug 19, 2025, 12:21:03 PMAug 19

to Yonghong Song, so...@kernel.org, jo...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 18/08/2025 18:02, Yonghong Song wrote:
>
>
> On 8/18/25 9:57 AM, Yonghong Song wrote:
>>
>>
>> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>>> Hey,
>>> Just forwarding the patch to the associated maintainers with
>>> `stackmap.c`.
>>
>> Arnaud, please add Ack (provided in comments for v3) to make things
>> easier
>> for maintainers.
>>
>> Also, looks like all your patch sets (v1 to v4) in the same thread.
>
> sorry, it should be v3 and v4 in the same thread.
>

Hey, ty for the feedback !
I am going to provide the link to the v3 in the v4 commit and resent the
v4 with the Acked-by.

Arnaud Lecomte

unread,

Aug 19, 2025, 12:27:10 PMAug 19

to so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, Arnaud Lecomte

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Link to v3: https://lore.kernel.org/all/09dc40eb-a84e-472a...@linux.dev/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..b9cc6c72a2a5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

+ trace_nr = min(trace_nr, max_depth - skip);

copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Aug 19, 2025, 12:29:39 PMAug 19

to so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, yongho...@linux.dev, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Link to v3: https://lore.kernel.org/all/997d3b8a-4b3a-4720...@linux.dev/

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---

kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b9cc6c72a2a5..318f150460bb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;

--
2.43.0

Martin KaFai Lau

unread,

Aug 19, 2025, 5:15:45 PMAug 19

to Arnaud Lecomte, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org

hmm... this looks a bit suspicious. Is it possible that
sysctl_perf_event_max_stack is being changed to a larger value in parallel?

I suspect it was fine because trace_nr was still bounded by num_elem.

> + trace_nr = min(trace_nr, max_depth - skip);

but now the min() is also based on max_depth which could be
sysctl_perf_event_max_stack.

beside, if I read it correctly, in "max_depth - skip", the max_depth could also
be less than skip. I assume trace->nr is bound by max_depth, so should be less
of a problem but still a bit unintuitive to read.

Lecomte, Arnaud

unread,

Aug 25, 2025, 12:39:34 PMAug 25

to Martin KaFai Lau, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org

Hi Martin, this is a valid concern as sysctl_perf_event_max_stack can be
modified at runtime through /proc/sys/kernel/perf_event_max_stack.
What we could maybe do instead is to create a copy: u32 current_max =
READ_ONCE(sysctl_perf_event_max_stack);
Any thoughts on this ?

We should bring back the num_elem bound as an additional safe net.

Yonghong Song

unread,

Aug 25, 2025, 2:28:09 PMAug 25

to Lecomte, Arnaud, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org

There is no need to have READ_ONCE. Jut do
int curr_sysctl_max_stack = sysctl_perf_event_max_stack;
if (max_depth > curr_sysctl_max_stack)
return curr_sysctl_max_stack;

Because of the above change, the patch is not a refactoring change any more.

Lecomte, Arnaud

unread,

Aug 25, 2025, 4:07:17 PMAug 25

to Yonghong Song, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org

Why would you not consider it as a refactoring change anymore ?

Yonghong Song

unread,

Aug 25, 2025, 5:15:19 PMAug 25

to Lecomte, Arnaud, Martin KaFai Lau, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, so...@kernel.org

Sorry, I think I made a couple of mistakes in the above.

First, yes, we do want READ_ONCE, other potentially compiler may optimization
the above back to the original code with two references to sysctl_perf_event_max_stack.

Second, yes, it is indeed a refactoring.

Arnaud Lecomte

unread,

Aug 26, 2025, 5:22:52 PMAug 26

to so...@kernel.org, yongho...@linux.dev, marti...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Link to v4: https://lore.kernel.org/all/20250819162652...@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---

kernel/bpf/stackmap.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..796cc105eacb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,28 @@ static inline int stack_map_data_size(struct bpf_map *map)

sizeof(struct bpf_stack_build_id) : sizeof(u64);
}

+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;

+ u32 curr_sysctl_max_stack = READ_ONCE(sysctl_perf_event_max_stack);

+
+ max_depth = size / elem_size;
+ max_depth += skip;

+ if (max_depth > curr_sysctl_max_stack)
+ return curr_sysctl_max_stack;

+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +

@@ -438,10 +460,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

@@ -460,6 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto err_fault;
}

+ num_elem = size / elem_size;

trace_nr = trace->nr - skip;

trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

copy_len = trace_nr * elem_size;

--
2.43.0

Arnaud Lecomte

unread,

Aug 26, 2025, 5:24:05 PMAug 26

to so...@kernel.org, yongho...@linux.dev, marti...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Link to v4: https://lore.kernel.org/all/20250813205506....@arnaud-lcm.com/

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---

kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 796cc105eacb..ef8269ab8d6f 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -247,7 +247,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)

}

static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;

@@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

trace_nr = trace->nr - skip;

trace_len = trace_nr * sizeof(u64);

+ trace_nr = min(trace_nr, max_depth - skip);

+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

@@ -322,19 +324,17 @@ static long __bpf_get_stackid(struct bpf_map *map,

BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)

- max_depth = sysctl_perf_event_max_stack;

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -343,7 +343,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,

/* couldn't fetch the stack trace */
return -EFAULT;

- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}

const struct bpf_func_proto bpf_get_stackid_proto = {

@@ -375,6 +375,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))

@@ -393,12 +394,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);

/* restore nr */
trace->nr = nr;

@@ -410,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

Alexei Starovoitov

unread,

Aug 29, 2025, 1:29:38 PMAug 29

to Arnaud Lecomte, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

The patch might have fixed this particular syzbot repro
with OOB in stackmap-with-buildid case,
but above two line looks wrong.
trace_len is computed before being capped by max_depth.
So non-buildid case below is using
memcpy(new_bucket->data, ips, trace_len);

so OOB is still there?

Alexei Starovoitov

unread,

Aug 29, 2025, 8:28:17 PMAug 29

to Song Liu, Arnaud Lecomte, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

On Fri, Aug 29, 2025 at 11:50 AM Song Liu <so...@kernel.org> wrote:
>
> On Fri, Aug 29, 2025 at 10:29 AM Alexei Starovoitov
> <alexei.st...@gmail.com> wrote:
> [...]

> > >
> > > static long __bpf_get_stackid(struct bpf_map *map,
> > > - struct perf_callchain_entry *trace, u64 flags)
> > > + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> > > {
> > > struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> > > struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> > > @@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
> > >
> > > trace_nr = trace->nr - skip;
> > > trace_len = trace_nr * sizeof(u64);
> > > + trace_nr = min(trace_nr, max_depth - skip);
> > > +
> >
> > The patch might have fixed this particular syzbot repro
> > with OOB in stackmap-with-buildid case,
> > but above two line looks wrong.
> > trace_len is computed before being capped by max_depth.
> > So non-buildid case below is using
> > memcpy(new_bucket->data, ips, trace_len);
> >
> > so OOB is still there?
>

> +1 for this observation.
>
> We are calling __bpf_get_stackid() from two functions: bpf_get_stackid
> and bpf_get_stackid_pe. The check against max_depth is only needed
> from bpf_get_stackid_pe, so it is better to just check here.

Good point.

> I have got the following on top of patch 1/2. This makes more sense to
> me.
>
> PS: The following also includes some clean up in __bpf_get_stack.
> I include those because it also uses stack_map_calculate_max_depth.
>
> Does this look better?

yeah. It's certainly cleaner to avoid adding extra arg to
__bpf_get_stackid()

Song Liu

unread,

Aug 30, 2025, 9:35:07 AMAug 30

to Alexei Starovoitov, Arnaud Lecomte, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

On Fri, Aug 29, 2025 at 10:29 AM Alexei Starovoitov
<alexei.st...@gmail.com> wrote:
[...]
> >

> > static long __bpf_get_stackid(struct bpf_map *map,
> > - struct perf_callchain_entry *trace, u64 flags)
> > + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> > {
> > struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> > struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> > @@ -263,6 +263,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
> >
> > trace_nr = trace->nr - skip;
> > trace_len = trace_nr * sizeof(u64);
> > + trace_nr = min(trace_nr, max_depth - skip);
> > +
>
> The patch might have fixed this particular syzbot repro
> with OOB in stackmap-with-buildid case,
> but above two line looks wrong.
> trace_len is computed before being capped by max_depth.
> So non-buildid case below is using
> memcpy(new_bucket->data, ips, trace_len);
>
> so OOB is still there?

+1 for this observation.

We are calling __bpf_get_stackid() from two functions: bpf_get_stackid
and bpf_get_stackid_pe. The check against max_depth is only needed
from bpf_get_stackid_pe, so it is better to just check here.

I have got the following on top of patch 1/2. This makes more sense to
me.

PS: The following also includes some clean up in __bpf_get_stack.
I include those because it also uses stack_map_calculate_max_depth.

Does this look better?

Thanks,
Song

diff --git c/kernel/bpf/stackmap.c w/kernel/bpf/stackmap.c
index 796cc105eacb..08554fb146e1 100644
--- c/kernel/bpf/stackmap.c
+++ w/kernel/bpf/stackmap.c
@@ -262,7 +262,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
return -EFAULT;

trace_nr = trace->nr - skip;

- trace_len = trace_nr * sizeof(u64);
+

ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

@@ -297,6 +297,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
return -EEXIST;
}
} else {
+ trace_len = trace_nr * sizeof(u64);
if (hash_matches && bucket->nr == trace_nr &&
memcmp(bucket->data, ips, trace_len) == 0)
return id;
@@ -322,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,

BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size,
elem_size, flags);

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -375,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct

bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;

/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))

@@ -393,11 +393,12 @@ BPF_CALL_3(bpf_get_stackid_pe, struct

bpf_perf_event_data_kern *, ctx,
return -EFAULT;

nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;

- trace->nr = nr_kernel;

+ max_depth =
stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, nr_kernel, max_depth);

ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */
@@ -410,6 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct

bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;

+ max_depth =
stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, trace->nr, max_depth);

ret = __bpf_get_stackid(map, trace, flags);
}

return ret;
@@ -428,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
struct task_struct *task,

struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -465,13 +468,15 @@ static long __bpf_get_stack(struct pt_regs
*regs, struct task_struct *task,

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -479,9 +484,7 @@ static long __bpf_get_stack(struct pt_regs *regs,

struct task_struct *task,
goto err_fault;
}

- num_elem = size / elem_size;

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;

copy_len = trace_nr * elem_size;

Lecomte, Arnaud

unread,

Aug 30, 2025, 1:14:00 PMAug 30

to Alexei Starovoitov, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

Nice catch, thanks !

>
>> I have got the following on top of patch 1/2. This makes more sense to
>> me.
>>
>> PS: The following also includes some clean up in __bpf_get_stack.
>> I include those because it also uses stack_map_calculate_max_depth.
>>
>> Does this look better?
> yeah. It's certainly cleaner to avoid adding extra arg to
> __bpf_get_stackid()
>

Are Song patches going to be applied then ? Or should I raise a new
revision
of the patch with Song's modifications with a Co-developped tag ?
Thanks for your guidance in advance,
Arnaud

Alexei Starovoitov

unread,

Aug 31, 2025, 9:10:47 PMAug 31

to Lecomte, Arnaud, Song Liu, Yonghong Song, Martin KaFai Lau, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

Pls resubmit and retest with a tag.

Arnaud Lecomte

unread,

Sep 3, 2025, 9:52:30 AMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Link to v5: https://lore.kernel.org/all/20250826212229....@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>

Cc: Song Lui <so...@kernel.org>
---
kernel/bpf/stackmap.c | 58 ++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..1ebc525b7c2f 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -300,20 +322,17 @@ static long __bpf_get_stackid(struct bpf_map *map,

BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;

-

+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -350,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;

bool kernel, user;
__u64 nr_kernel;
int ret;

@@ -371,11 +391,14 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);

+ elem_size = stack_map_data_size(map);

if (kernel) {
__u64 nr = trace->nr;

- trace->nr = nr_kernel;
+ max_depth =

+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */

@@ -388,6 +411,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =

+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, trace->nr, max_depth);
ret = __bpf_get_stackid(map, trace, flags);
}
return ret;

@@ -406,8 +432,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;

bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

bool user = flags & BPF_F_USER_STACK;

@@ -438,21 +464,20 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;

}

- num_elem = size / elem_size;

- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)

- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

- crosstask, false);
+ crosstask, false);

+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)

@@ -461,7 +486,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.47.3

Arnaud Lecomte

unread,

Sep 3, 2025, 9:53:30 AMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Arnaud Lecomte

unread,

Sep 3, 2025, 9:53:53 AMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changs in v6:
- Added back trace_len computation in __bpf_get_stackid

Link to v5: https://lore.kernel.org/all/20250826212229....@arnaud-lcm.com/

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>

---
kernel/bpf/stackmap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 1ebc525b7c2f..8b2dcb8a6dc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -251,8 +251,9 @@ static long __bpf_get_stackid(struct bpf_map *map,

{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;

+ u32 hash, id, trace_nr, trace_len, i, max_depth;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

- u32 hash, id, trace_nr, trace_len, i;

+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;

u64 *ips;
bool hash_matches;
@@ -261,8 +262,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
/* skipping more than usable stack trace */
return -EFAULT;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

trace_nr = trace->nr - skip;

+ trace_nr = min_t(u32, trace_nr, max_depth - skip);

trace_len = trace_nr * sizeof(u64);
+

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);

--
2.47.3

Alexei Starovoitov

unread,

Sep 3, 2025, 12:13:14 PMSep 3

to Arnaud Lecomte, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

On Wed, Sep 3, 2025 at 6:52 AM Arnaud Lecomte <con...@arnaud-lcm.com> wrote:
>
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Changes in v4:
> - Fixed indentation in max depth helper for args
>
> Changes in v5:
> - Bound back trace_nr to num_elem in __bpf_get_stack
> - Make a copy of sysctl_perf_event_max_stack
> in stack_map_calculate_max_depth
>
> Changes in v6:
> - Restrained max_depth computation only when required
> - Additional cleanup from Song in __bpf_get_stack

This is not a refactor anymore.
Pls don't squash different things into one patch.
Keep refactor as patch 1, and another cleanup as patch 2.

pw-bot: cr

Lecomte, Arnaud

unread,

Sep 3, 2025, 12:20:52 PMSep 3

to Alexei Starovoitov, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

The main problem is that patch 2 is not a cleanup too. It is a bug fix
so it doesn't really
fit either.
We could maybe split this patch into 2 new patches but I don't really
like this idea.
If we decide to stick to 2 patches format, I don't have any preference
which patch's scope
should be extended.

>
> pw-bot: cr
>

Alexei Starovoitov

unread,

Sep 3, 2025, 12:22:29 PMSep 3

to Lecomte, Arnaud, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

I wasn't proposing to squash cleanup into patch 2.
Make 3 patches where each one is doing one thing.

Arnaud Lecomte

unread,

Sep 3, 2025, 7:39:50 PMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Changes in v7:
- Removed additional cleanup from v6

Link to v6: https://lore.kernel.org/all/20250903135323....@arnaud-lcm.com/

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---

kernel/bpf/stackmap.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..ed707bc07173 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;

struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;

if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;

- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
-
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);

@@ -406,8 +425,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

bool user = flags & BPF_F_USER_STACK;

@@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

--
2.47.3

Arnaud Lecomte

unread,

Sep 3, 2025, 7:40:59 PMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Clean-up bounds checking for trace->nr in
__bpf_get_stack by limiting it only to
max_depth.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

Cc: Song Lui <so...@kernel.org>
---

kernel/bpf/stackmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index ed707bc07173..9f3ae426ddc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -462,13 +462,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)

@@ -477,7 +479,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

}

trace_nr = trace->nr - skip;

- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.47.3

Arnaud Lecomte

unread,

Sep 3, 2025, 7:43:32 PMSep 3

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changs in v6:
- Added back trace_len computation in __bpf_get_stackid

Link to v6: https://lore.kernel.org/all/20250903135348....@arnaud-lcm.com/

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>
---

kernel/bpf/stackmap.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 9f3ae426ddc3..29e05c9ff1bd 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;

@@ -390,11 +391,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);

if (kernel) {
__u64 nr = trace->nr;

trace->nr = nr_kernel;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

/* restore nr */

@@ -407,6 +412,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, trace->nr, max_depth);

ret = __bpf_get_stackid(map, trace, flags);
}
return ret;

--
2.47.3

Lecomte, Arnaud

unread,

Sep 3, 2025, 7:46:41 PMSep 3

to Alexei Starovoitov, Yonghong Song, Song Liu, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Stanislav Fomichev, syzbot+c9b724...@syzkaller.appspotmail.com, syzkaller-bugs

I've sent it:
https://lore.kernel.org/all/20250903233910....@arnaud-lcm.com/
Thanks !
Arnaud
>

Lecomte, Arnaud

unread,

Sep 4, 2025, 6:52:36 PMSep 4

to Song Liu, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 05/09/2025 00:40, Song Liu wrote:

>
> On 9/3/25 4:40 PM, Arnaud Lecomte wrote:
>> Clean-up bounds checking for trace->nr in
>> __bpf_get_stack by limiting it only to
>> max_depth.
>>
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> Cc: Song Lui <so...@kernel.org>
>

> Typo in my name, which is "Song Liu".
>
> This looks right.
>
> Acked-by: Song Liu <so...@kernel.org>
>
Oops sorry !

Lecomte, Arnaud

unread,

Sep 4, 2025, 6:53:52 PMSep 4

to Song Liu, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 05/09/2025 00:45, Song Liu wrote:

>
> On 9/3/25 4:43 PM, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>> contains more stack entries than the stack map bucket can hold,
>> leading to an out-of-bounds write in the bucket's data array.
>>
>> Changes in v2:
>> - Fixed max_depth names across get stack id
>>
>> Changes in v4:
>> - Removed unnecessary empty line in __bpf_get_stackid
>>
>> Changs in v6:
>> - Added back trace_len computation in __bpf_get_stackid
>>
>> Link to v6:
>> https://lore.kernel.org/all/20250903135348....@arnaud-lcm.com/
>>
>> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to
>> accommodate skip > 0")
>> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
>> Acked-by: Yonghong Song <yongho...@linux.dev>
>

> For future patches, please keep the "Changes in vX.." at the end of
Good to know, thanks !
>
> your commit log and after a "---". IOW, something like

>
>
> Acked-by: Yonghong Song <yongho...@linux.dev>
>
> ---
>

> changes in v2:
>
> ...
>
> ---
>
> kernel/bpf/stackmap.c | 8 ++++++++
>
>
> In this way, the "changes in vXX" part will be removed by git-am.

>
>> ---
>> kernel/bpf/stackmap.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 9f3ae426ddc3..29e05c9ff1bd 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>> {
>>       struct perf_event *event = ctx->event;
>>       struct perf_callchain_entry *trace;
>> +    u32 elem_size, max_depth;
>>       bool kernel, user;
>>       __u64 nr_kernel;
>>       int ret;
>> @@ -390,11 +391,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>>           return -EFAULT;
>>       nr_kernel = count_kernel_ip(trace);
>> +    elem_size = stack_map_data_size(map);
>>       if (kernel) {
>>           __u64 nr = trace->nr;
>>           trace->nr = nr_kernel;
>

> this trace->nr = is useless.

>
>> +        max_depth =
>> +            stack_map_calculate_max_depth(map->value_size,
>> elem_size, flags);
>> +        trace->nr = min_t(u32, nr_kernel, max_depth);
>>           ret = __bpf_get_stackid(map, trace, flags);
>>           /* restore nr */
>> @@ -407,6 +412,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct
>> bpf_perf_event_data_kern *, ctx,
>>               return -EFAULT;
>>           flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +        max_depth =
>> +            stack_map_calculate_max_depth(map->value_size,
>> elem_size, flags);
>> +        trace->nr = min_t(u32, trace->nr, max_depth);
>>           ret = __bpf_get_stackid(map, trace, flags);
>

> I missed this part earlier. Here we need to restore trace->nr, just
> like we did

>
> in the "if (kernel)" branch.
>
Make sense, thanks !
> Thanks,
>
> Song
>
>> }
>> return ret;
>
Thanks,
Arnaud

Arnaud lecomte

unread,

Sep 5, 2025, 9:46:46 AMSep 5

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

From: Arnaud Lecomte <con...@arnaud-lcm.com>

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>

Acked-by: Song Liu <so...@kernel.org>
---

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Changes in v7:
- Removed additional cleanup from v6

Link to v7: https://lore.kernel.org/all/20250903233910....@arnaud-lcm.com/

---
kernel/bpf/stackmap.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..ed707bc07173 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

false, false);

@@ -406,8 +425,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
@@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

--
2.47.3

Arnaud lecomte

unread,

Sep 5, 2025, 9:47:51 AMSep 5

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

From: Arnaud Lecomte <con...@arnaud-lcm.com>

Clean-up bounds checking for trace->nr in
__bpf_get_stack by limiting it only to
max_depth.

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Song Liu <so...@kernel.org>
Cc: Song Liu <so...@kernel.org>
---
kernel/bpf/stackmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index ed707bc07173..9f3ae426ddc3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -462,13 +462,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)

+ trace->nr = min_t(u32, trace->nr, max_depth);

+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
@@ -477,7 +479,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.47.3

Arnaud lecomte

unread,

Sep 5, 2025, 9:48:42 AMSep 5

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Arnaud Lecomte

From: Arnaud Lecomte <con...@arnaud-lcm.com>

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
Acked-by: Yonghong Song <yongho...@linux.dev>

---
Changes in v2:

- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changes in v6:

- Added back trace_len computation in __bpf_get_stackid

Changes in v7:
- Removed usefull trace->nr assignation in bpf_get_stackid_pe
- Added restoration of trace->nr for both kernel and user traces
in bpf_get_stackid_pe

Link to v7: https://lore.kernel.org/all/20250903234325....@arnaud-lcm.com/
---
kernel/bpf/stackmap.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 9f3ae426ddc3..9b57b8307565 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

@@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;

@@ -390,15 +391,16 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);

+ __u64 nr = trace->nr; /* save original */

if (kernel) {
- __u64 nr = trace->nr;

-
trace->nr = nr_kernel;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, nr_kernel, max_depth);

ret = __bpf_get_stackid(map, trace, flags);

- /* restore nr */
- trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -407,8 +409,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, trace->nr, max_depth);

ret = __bpf_get_stackid(map, trace, flags);
}

+
+ /* restore nr */
+ trace->nr = nr;
+
return ret;
}

--
2.47.3

Lecomte, Arnaud

unread,

Sep 9, 2025, 8:47:47 AMSep 9

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Hi,
Can you confirm you received this one Alexei as I got an undelivered
email reply.
Cheers,
Arnaud

Song Liu

unread,

Sep 9, 2025, 12:41:39 PMSep 9

to Arnaud lecomte, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On Fri, Sep 5, 2025 at 9:48 AM Arnaud lecomte <con...@arnaud-lcm.com> wrote:
>
> From: Arnaud Lecomte <con...@arnaud-lcm.com>
>
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> Acked-by: Yonghong Song <yongho...@linux.dev>

Acked-by: Song Liu <so...@kernel.org>

With one nitpick below.

[...]

> @@ -390,15 +391,16 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> + elem_size = stack_map_data_size(map);
> + __u64 nr = trace->nr; /* save original */

nit: I think all variable declarations should go to the beginning of
the {} block.
I am surprised ./scripts/checkpatch.pl doesn't complain this.

>
> if (kernel) {
> - __u64 nr = trace->nr;
> -

[...]

Song Liu

unread,

Sep 9, 2025, 4:08:32 PMSep 9

to Arnaud Lecomte, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 9/3/25 4:39 PM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Changes in v4:
> - Fixed indentation in max depth helper for args
>
> Changes in v5:
> - Bound back trace_nr to num_elem in __bpf_get_stack
> - Make a copy of sysctl_perf_event_max_stack
> in stack_map_calculate_max_depth
>
> Changes in v6:
> - Restrained max_depth computation only when required
> - Additional cleanup from Song in __bpf_get_stack
>
> Changes in v7:
> - Removed additional cleanup from v6
>
> Link to v6: https://lore.kernel.org/all/20250903135323....@arnaud-lcm.com/
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> Acked-by: Yonghong Song <yongho...@linux.dev>

Acked-by: Song Liu <so...@kernel.org>

Song Liu

unread,

Sep 9, 2025, 4:08:32 PMSep 9

to Arnaud Lecomte, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 9/3/25 4:43 PM, Arnaud Lecomte wrote:

> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Changes in v2:
> - Fixed max_depth names across get stack id
>
> Changes in v4:
> - Removed unnecessary empty line in __bpf_get_stackid
>
> Changs in v6:
> - Added back trace_len computation in __bpf_get_stackid
>
> Link to v6: https://lore.kernel.org/all/20250903135348....@arnaud-lcm.com/
>
> Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> Acked-by: Yonghong Song <yongho...@linux.dev>

For future patches, please keep the "Changes in vX.." at the end of

your commit log and after a "---". IOW, something like

Acked-by: Yonghong Song <yongho...@linux.dev>

---

changes in v2:

...

---

kernel/bpf/stackmap.c | 8 ++++++++

In this way, the "changes in vXX" part will be removed by git-am.

> ---
> kernel/bpf/stackmap.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 9f3ae426ddc3..29e05c9ff1bd 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -369,6 +369,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> {
> struct perf_event *event = ctx->event;
> struct perf_callchain_entry *trace;
> + u32 elem_size, max_depth;
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> @@ -390,11 +391,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> + elem_size = stack_map_data_size(map);
>
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;

this trace->nr = is useless.

> + max_depth =
> + stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + trace->nr = min_t(u32, nr_kernel, max_depth);
> ret = __bpf_get_stackid(map, trace, flags);
>
> /* restore nr */
> @@ -407,6 +412,9 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> + max_depth =
> + stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + trace->nr = min_t(u32, trace->nr, max_depth);
> ret = __bpf_get_stackid(map, trace, flags);

I missed this part earlier. Here we need to restore trace->nr, just like
we did

in the "if (kernel)" branch.

Thanks,

Song

> }
> return ret;

Song Liu

unread,

Sep 9, 2025, 4:08:32 PMSep 9

to Arnaud Lecomte, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 9/3/25 4:40 PM, Arnaud Lecomte wrote:

> Clean-up bounds checking for trace->nr in
> __bpf_get_stack by limiting it only to
> max_depth.
>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
> Cc: Song Lui <so...@kernel.org>

Typo in my name, which is "Song Liu".

This looks right.

Acked-by: Song Liu <so...@kernel.org>

Arnaud Lecomte

unread,

Sep 10, 2025, 4:01:29 PMSep 10

to Song Liu, alexei.st...@gmail.com, yongho...@linux.dev, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Good catch, thanks !
I should maybe wait for comments from other reviewers because raising
an other revision.

>> if (kernel) {
>> - __u64 nr = trace->nr;
>> -
> [...]
>

Thanks,
Arnaud

Arnaud Lecomte

unread,

Sep 12, 2025, 7:34:20 PMSep 12

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, con...@arnaud-lcm.com

A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.

Acked-by: Yonghong Song <yongho...@linux.dev>
Acked-by: Song Liu <so...@kernel.org>

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic

Changes in v3:
- Changed map size param to size in max depth helper

Changes in v4:
- Fixed indentation in max depth helper for args

Changes in v5:
- Bound back trace_nr to num_elem in __bpf_get_stack
- Make a copy of sysctl_perf_event_max_stack
in stack_map_calculate_max_depth

Changes in v6:
- Restrained max_depth computation only when required
- Additional cleanup from Song in __bpf_get_stack

Changes in v7:
- Removed additional cleanup from v6

Changes in v9:
- Fixed incorrect removal of num_elem in get stack

Link to v8: https://lore.kernel.org/all/20250905134625....@arnaud-lcm.com/
---
---
kernel/bpf/stackmap.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..a794e04f5ae9 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

false, false);

@@ -406,7 +425,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;

+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}

- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

@@ -461,7 +477,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

}

trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;

--
2.43.0

Arnaud Lecomte

unread,

Sep 12, 2025, 7:35:20 PMSep 12

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, con...@arnaud-lcm.com

Clean-up bounds checking for trace->nr in
__bpf_get_stack by limiting it only to
max_depth.

Acked-by: Song Liu <so...@kernel.org>
Cc: Song Liu <so...@kernel.org>

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

kernel/bpf/stackmap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a794e04f5ae9..9a86b5acac10 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -462,13 +462,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */

- if (trace_in)
+ if (trace_in) {
trace = trace_in;
- else if (kernel && task)
+ trace->nr = min_t(u32, trace->nr, max_depth);
+ } else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
- else
+ } else {

trace = get_perf_callchain(regs, 0, kernel, user, max_depth,

crosstask, false);
+ }

if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)

--
2.43.0

Arnaud Lecomte

unread,

Sep 12, 2025, 7:36:10 PMSep 12

to alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, con...@arnaud-lcm.com

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Reported-by: syzbot+c9b724...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")

Acked-by: Yonghong Song <yongho...@linux.dev>
Acked-by: Song Liu <so...@kernel.org>

Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

Changes in v2:
- Fixed max_depth names across get stack id

Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid

Changes in v6:

- Added back trace_len computation in __bpf_get_stackid

Changes in v7:

- Removed usefull trace->nr assignation in bpf_get_stackid_pe
- Added restoration of trace->nr for both kernel and user traces
in bpf_get_stackid_pe

Changes in v9:
- Fixed variable declarations in bpf_get_stackid_pe
- Added the missing truncate of trace_nr in __bpf_getstackid

Link to v8: https://lore.kernel.org/all/20250905134833....@arnaud-lcm.com/
---
---
kernel/bpf/stackmap.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 9a86b5acac10..ac5ec3253ce6 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -251,8 +251,8 @@ static long __bpf_get_stackid(struct bpf_map *map,

{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;

+ u32 hash, id, trace_nr, trace_len, i, max_storable;

u32 skip = flags & BPF_F_SKIP_FIELD_MASK;

- u32 hash, id, trace_nr, trace_len, i;

bool user = flags & BPF_F_USER_STACK;

u64 *ips;
bool hash_matches;
@@ -261,7 +261,9 @@ static long __bpf_get_stackid(struct bpf_map *map,

/* skipping more than usable stack trace */
return -EFAULT;

+ max_storable = map->value_size / stack_map_data_size(map);

trace_nr = trace->nr - skip;

+ trace_nr = min_t(u32, trace_nr, max_storable);

trace_len = trace_nr * sizeof(u64);

ips = trace->ip + skip;

hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);

@@ -369,6 +371,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

{
struct perf_event *event = ctx->event;
struct perf_callchain_entry *trace;
+ u32 elem_size, max_depth;
bool kernel, user;
__u64 nr_kernel;
int ret;

@@ -390,15 +393,16 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,

return -EFAULT;

nr_kernel = count_kernel_ip(trace);
+ elem_size = stack_map_data_size(map);
+ __u64 nr = trace->nr; /* save original */

if (kernel) {
- __u64 nr = trace->nr;
-

trace->nr = nr_kernel;

+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ trace->nr = min_t(u32, nr_kernel, max_depth);
ret = __bpf_get_stackid(map, trace, flags);

- /* restore nr */
- trace->nr = nr;
} else { /* user */

u64 skip = flags & BPF_F_SKIP_FIELD_MASK;

@@ -407,8 +411,15 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;

flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
+ max_depth =
+ stack_map_calculate_max_depth(map->value_size, elem_size, flags);

+ trace->nr = min_t(u32, trace->nr, max_depth);

ret = __bpf_get_stackid(map, trace, flags);
}

+
+ /* restore nr */
+ trace->nr = nr;
+
return ret;
}

--
2.43.0

Andrii Nakryiko

unread,

Sep 19, 2025, 6:50:44 PMSep 19

to Arnaud Lecomte, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Is this also part of refactoring? If yes, it deserves a mention on why
it's ok to just drop this.

pw-bot: cr

Lecomte, Arnaud

unread,

Sep 20, 2025, 3:32:34 PMSep 20

to Andrii Nakryiko, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Yes it is also part of the refactoring as stack_map_calculate_max_depth now already curtains the trace->nr to the max possible number of elements, there is no need to do the clamping twice. This is valid assuming that get_perf_callchain and get_callchain_entry_for_task correctly set this limit.

        copy_len = trace_nr * elem_size;

        ips = trace->ip + skip;
--
2.43.0

Thanks,
Arnaud

Lecomte, Arnaud

unread,

Sep 20, 2025, 4:02:56 PMSep 20

to Andrii Nakryiko, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 19/09/2025 23:50, Andrii Nakryiko wrote:

Yes it is also part of the refactoring as stack_map_calculate_max_depth
now already curtains the trace->nr to the max possible number of
elements, there is no need to do the clamping twice. This is valid
assuming that get_perf_callchain and get_callchain_entry_for_task
correctly set this limit.
>

>> copy_len = trace_nr * elem_size;
>>
>> ips = trace->ip + skip;
>> --
>> 2.43.0
>>

Thanks,
Arnaud

Andrii Nakryiko

unread,

Sep 22, 2025, 6:38:24 PMSep 22

to Arnaud Lecomte, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

unnecessary, you'll be overriding it two lines below

> + max_depth =
> + stack_map_calculate_max_depth(map->value_size, elem_size, flags);

here and below, keep on the same line, it's under 100 characters

Andrii Nakryiko

unread,

Sep 22, 2025, 6:39:08 PMSep 22

to Lecomte, Arnaud, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

What about that third case:

if (trace_in)
trace = trace_in;

Did you analyze if that gets its trace->nr set properly as well (as of
this patch, without taking into account changes in the follow up
patches). Because it looks like this removal belongs in patch #2, no?

In either case, all the other changes in this patch except the removal
of this line is refactoring and as far as I can tell don't change the
logic. This line removal does (potentially) change the logic, so it
would be good to do it separately, explaining why you think it's the
correct thing to do.

Lecomte, Arnaud

unread,

Oct 11, 2025, 12:25:17 PM (18 hours ago) Oct 11

to Andrii Nakryiko, alexei.st...@gmail.com, yongho...@linux.dev, so...@kernel.org, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, syzbot+c9b724...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

Back from holidays, catching up now :)
You are indeed totally right Andrii, thanks for pointing it out.
We should squash the 2 first patch together.

>
>
>> copy_len = trace_nr * elem_size;
>>
>> ips = trace->ip + skip;
>> --
>> 2.43.0
>>
>> Thanks,
>> Arnaud

Thanks,
Arnaud

Reply all

Reply to author

Forward

[PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()

Arnaud Lecomte

Yonghong Song

Arnaud Lecomte

Lecomte, Arnaud

Arnaud Lecomte

Yonghong Song

Arnaud Lecomte

Arnaud Lecomte

Yonghong Song

Yonghong Song

Yonghong Song

syzbot ci

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Yonghong Song

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Yonghong Song

Yonghong Song

Arnaud Lecomte

Arnaud Lecomte

Lecomte, Arnaud

Yonghong Song

Yonghong Song

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Martin KaFai Lau

Lecomte, Arnaud

Yonghong Song

Lecomte, Arnaud

Yonghong Song

Arnaud Lecomte

Arnaud Lecomte

Alexei Starovoitov

Alexei Starovoitov

Song Liu

Lecomte, Arnaud

Alexei Starovoitov

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Alexei Starovoitov

Lecomte, Arnaud

Alexei Starovoitov

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Lecomte, Arnaud

Lecomte, Arnaud

Lecomte, Arnaud

Arnaud lecomte

Arnaud lecomte

Arnaud lecomte

Lecomte, Arnaud

Song Liu

Song Liu

Song Liu

Song Liu

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Arnaud Lecomte

Andrii Nakryiko

Lecomte, Arnaud

Lecomte, Arnaud

Andrii Nakryiko

Andrii Nakryiko

Lecomte, Arnaud