[PATCH] bpf-next: Prevent out of bound buffer write in __bpf_get

Arnaud Lecomte

unread,

Jan 4, 2026, 3:52:36 PMJan 4

to syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stack()
during stack trace copying.

The issue occurs when: the callchain entry (stored as a per-cpu variable)
grow between collection and buffer copy, causing it to exceed the initially
calculated buffer size based on max_depth.

The callchain collection intentionally avoids locking for performance
reasons, but this creates a window where concurrent modifications can
occur during the copy operation.

To prevent this from happening, we clamp the trace len to the max
depth initially calculated with the buffer size and the size of
a trace.

Reported-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/691231dc.a70a022...@google.com/T/
Fixes: e17d62fedd10 ("bpf: Refactor stack map trace depth calculation into helper function")
Tested-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
Cc: Brahmajit Das <lis...@listout.xyz>
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---
Thanks Brahmajit Das for the initial fix he proposed that I tweaked
with the correct justification and a better implementation in my
opinion.
---
kernel/bpf/stackmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index da3d328f5c15..e56752a9a891 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -465,7 +465,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

if (trace_in) {
trace = trace_in;
- trace->nr = min_t(u32, trace->nr, max_depth);
} else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
} else {
@@ -479,7 +478,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto err_fault;
}

- trace_nr = trace->nr - skip;
+ trace_nr = min(trace->nr, max_depth);
+ trace_nr = trace_nr - skip;
copy_len = trace_nr * elem_size;

ips = trace->ip + skip;
--
2.43.0

Yonghong Song

unread,

Jan 5, 2026, 12:50:16 AMJan 5

to Arnaud Lecomte, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, Brahmajit Das

On 1/4/26 12:52 PM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stack()
> during stack trace copying.
>
> The issue occurs when: the callchain entry (stored as a per-cpu variable)
> grow between collection and buffer copy, causing it to exceed the initially
> calculated buffer size based on max_depth.
>
> The callchain collection intentionally avoids locking for performance
> reasons, but this creates a window where concurrent modifications can
> occur during the copy operation.
>
> To prevent this from happening, we clamp the trace len to the max
> depth initially calculated with the buffer size and the size of
> a trace.
>
> Reported-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/691231dc.a70a022...@google.com/T/
> Fixes: e17d62fedd10 ("bpf: Refactor stack map trace depth calculation into helper function")
> Tested-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
> Cc: Brahmajit Das <lis...@listout.xyz>
> Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>

LGTM.

Acked-by: Yonghong Song <yongho...@linux.dev>

Andrii Nakryiko

unread,

Jan 5, 2026, 7:52:02 PMJan 5

to Arnaud Lecomte, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

there is `trace->nr < skip` check right above, should it be moved here
and done against adjusted trace_nr (but before we subtract skip, of
course)?

Lecomte, Arnaud

unread,

Jan 7, 2026, 1:09:03 PMJan 7

to Andrii Nakryiko, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

We could indeed be more proactive on the clamping even-though I would
say it does not fundamentally change anything in my opinion.
Happy to raise a new rev.

>> + trace_nr = trace_nr - skip;
>> copy_len = trace_nr * elem_size;
>>
>> ips = trace->ip + skip;
>> --
>> 2.43.0
>>

Thanks for the review !
Arnaud

Arnaud Lecomte

unread,

Jan 7, 2026, 1:12:57 PMJan 7

to syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stack()
during stack trace copying.

The issue occurs when: the callchain entry (stored as a per-cpu variable)
grow between collection and buffer copy, causing it to exceed the initially
calculated buffer size based on max_depth.

The callchain collection intentionally avoids locking for performance
reasons, but this creates a window where concurrent modifications can
occur during the copy operation.

To prevent this from happening, we clamp the trace len to the max
depth initially calculated with the buffer size and the size of
a trace.

Reported-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/691231dc.a70a022...@google.com/T/
Fixes: e17d62fedd10 ("bpf: Refactor stack map trace depth calculation into helper function")
Tested-by: syzbot+d1b7fa...@syzkaller.appspotmail.com
Cc: Brahmajit Das <lis...@listout.xyz>
Signed-off-by: Arnaud Lecomte <con...@arnaud-lcm.com>
---

Changes in v2:
- Moved the trace_nr clamping to max_depth above trace->nr skip
verification.
Link to v1: https://lore.kernel.org/all/20260104205220....@arnaud-lcm.com/

Thanks Brahmajit Das for the initial fix he proposed that I tweaked
with the correct justification and a better implementation in my
opinion.
---

kernel/bpf/stackmap.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index da3d328f5c15..c0a430f9eafb 100644

--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -465,7 +465,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,

if (trace_in) {
trace = trace_in;
- trace->nr = min_t(u32, trace->nr, max_depth);
} else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
} else {

@@ -473,13 +472,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
crosstask, false, 0);
}

- if (unlikely(!trace) || trace->nr < skip) {

+ trace_nr = min(trace->nr, max_depth);

+
+ if (unlikely(!trace) || trace_nr < skip) {
if (may_fault)
rcu_read_unlock();

goto err_fault;
}

- trace_nr = trace->nr - skip;

bot+b...@kernel.org

unread,

Jan 7, 2026, 1:31:40 PMJan 7

to con...@arnaud-lcm.com, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, con...@arnaud-lcm.com, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, lis...@listout.xyz, marti...@kernel.org, c...@meta.com, ihor.s...@linux.dev

> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index da3d328f5c15..c0a430f9eafb 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -465,7 +465,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>
> if (trace_in) {
> trace = trace_in;
> - trace->nr = min_t(u32, trace->nr, max_depth);
> } else if (kernel && task) {
> trace = get_callchain_entry_for_task(task, max_depth);
> } else {
> @@ -473,13 +472,15 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> crosstask, false, 0);
> }
>

> + trace_nr = min(trace->nr, max_depth);

^^^^^^^^

Can this dereference trace when it is NULL? Both get_callchain_entry_for_task()
and get_perf_callchain() can return NULL, and several callers pass trace_in as
NULL (bpf_get_stack, bpf_get_stack_sleepable, __bpf_get_task_stack, and one
path in bpf_get_stack_pe).

The NULL check below comes after this dereference:

> +
> + if (unlikely(!trace) || trace_nr < skip) {

^^^^^^

This check is now ineffective because trace->nr was already dereferenced above.

[ ... ]

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20791981532

Lecomte, Arnaud

unread,

Jan 7, 2026, 1:35:38 PMJan 7

to Andrii Nakryiko, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

Nvm, this is not really possible as we are checking that the trace is
not NULL.
Moving it above could lead to a NULL dereference.

Lecomte, Arnaud

unread,

Jan 7, 2026, 1:44:59 PMJan 7

to syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

Aborting in favor of my comment on the first rev.

Andrii Nakryiko

unread,

Jan 9, 2026, 2:05:35 PMJan 9

to Lecomte, Arnaud, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

ok, so what are we doing then?

if (unlikely(!trace)) { ... }

trace_nr = min(trace->nr, max_depth);

if (trace->nr < skip) { ... }

trace_nr = trace->nr - skip;

(which is what I proposed, or am I still missing why this shouldn't be
done like that?)

pw-bot: cr

Lecomte, Arnaud

unread,

Jan 13, 2026, 4:01:37 PMJan 13

to Andrii Nakryiko, syzbot+d1b7fa...@syzkaller.appspotmail.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, Brahmajit Das

I think it doesn't really bring any value to adopt this split check.
The underlying problem is the lack of locking mechanism on
the trace structure. I will try to find a workaround minimizing
it's performance impact. I think this is an interesting problem
actually. Will get back to you soon !

> pw-bot: cr
>
>
>
>>>>> + trace_nr = trace_nr - skip;
>>>>> copy_len = trace_nr * elem_size;
>>>>>
>>>>> ips = trace->ip + skip;
>>>>> --
>>>>> 2.43.0
>>>>>
>>> Thanks for the review !
>>> Arnaud

Thanks, Arnaud

Reply all

Reply to author

Forward

[PATCH] bpf-next: Prevent out of bound buffer write in __bpf_get_stack

Arnaud Lecomte

Yonghong Song

Andrii Nakryiko

Lecomte, Arnaud

Arnaud Lecomte

bot+b...@kernel.org

Lecomte, Arnaud

Lecomte, Arnaud

Andrii Nakryiko

Lecomte, Arnaud