[syzbot] [bpf?] possible deadlock in bpf_lru_push_free (2)

0 views
Skip to first unread message

syzbot

unread,
Nov 12, 2025, 11:26:32 PM (2 days ago) Nov 12
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev
Hello,

syzbot found the following issue on:

HEAD commit: e427054ae7bc Merge branch 'x86-fgraph-bpf-fix-orc-stack-un..
git tree: bpf
console output: https://syzkaller.appspot.com/x/log.txt?x=136b70b4580000
kernel config: https://syzkaller.appspot.com/x/.config?x=e46b8a1c645465a9
dashboard link: https://syzkaller.appspot.com/bug?extid=18b26edb69b2e19f3b33
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10013c12580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16541c12580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c1ac942fc5fb/disk-e427054a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/be05ef12ba31/vmlinux-e427054a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c75604292a15/bzImage-e427054a.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+18b26e...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
syzkaller #0 Not tainted
--------------------------------------------
syz-executor149/10558 is trying to acquire lock:
ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:514 [inline]
ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_push_free+0x33b/0xbb0 kernel/bpf/bpf_lru_list.c:553

but task is already holding lock:
ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:440 [inline]
ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_pop_free+0x1ab/0x19b0 kernel/bpf/bpf_lru_list.c:496

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&loc_l->lock);
lock(&loc_l->lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by syz-executor149/10558:
#0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: bpf_percpu_hash_update+0x2b/0x200 kernel/bpf/hashtab.c:2409
#1: ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:440 [inline]
#1: ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_pop_free+0x1ab/0x19b0 kernel/bpf/bpf_lru_list.c:496
#2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2074 [inline]
#2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: bpf_trace_run2+0x186/0x4b0 kernel/trace/bpf_trace.c:2116

stack backtrace:


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Menglong Dong

unread,
Nov 14, 2025, 4:40:21 AM (23 hours ago) Nov 14
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, net...@vger.kernel.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev, syzbot
I were working on this issue by using rqspinlock for LRU map:
https://lore.kernel.org/bpf/20251030030010....@chinatelecom.cn/

However, the lock here is too complex. Take the
htab_lru_map_update_elem for example, it will pop a free node,
updating the hash table, and push the old node to the lru.

The pop and push are both using lock, which means that they
both can fail. For the failure of the pop, we can return the
errno directly. However, what we can do with the failure of
the pushing? In the batch updating, the situation become
much more worse.

Hmm...I have not figure out a good idea, and maybe we can
use some transaction process here. Is there anyone else
that working on this issue?

Thanks!
Menglong Dong

Alexei Starovoitov

unread,
Nov 14, 2025, 9:36:49 PM (6 hours ago) Nov 14
to Menglong Dong, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, Eduard, Hao Luo, John Fastabend, Jiri Olsa, KP Singh, LKML, Martin KaFai Lau, Network Development, Stanislav Fomichev, Song Liu, syzkaller-bugs, Yonghong Song, syzbot
On Thu, Nov 13, 2025 at 11:08 PM Menglong Dong <menglo...@linux.dev> wrote:
>
>
> Hmm...I have not figure out a good idea, and maybe we can
> use some transaction process here. Is there anyone else
> that working on this issue?

yeah. it's not easy. rqspinlock is not a drop-in replacement.
But before we move any further, can you actually reproduce?
I tried the repro.c with lockdep, kasan and all other debug configs
and it doesn't repro.
Maybe it was fixed already by nokprobe-ing lru, but syzbot didn't notice.
Reply all
Reply to author
Forward
0 new messages