[syzbot] [bpf?] WARNING: locking bug in trie_delete_elem

9 views
Skip to first unread message

syzbot

unread,
Oct 31, 2024, 3:32:28 PM10/31/24
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev
Hello,

syzbot found the following issue on:

HEAD commit: f9f24ca362a4 Add linux-next specific files for 20241031
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1387c6f7980000
kernel config: https://syzkaller.appspot.com/x/.config?x=328572ed4d152be9
dashboard link: https://syzkaller.appspot.com/bug?extid=b506de56cbbb63148c33
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1387655f980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ac5540580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/eb84549dd6b3/disk-f9f24ca3.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/beb29bdfa297/vmlinux-f9f24ca3.xz
kernel image: https://storage.googleapis.com/syzbot-assets/8881fe3245ad/bzImage-f9f24ca3.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b506de...@syzkaller.appspotmail.com

=============================
[ BUG: Invalid wait context ]
6.12.0-rc5-next-20241031-syzkaller #0 Not tainted
-----------------------------
swapper/0/0 is trying to lock:
ffff8880261e7a00 (&trie->lock){....}-{3:3}, at: trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462
other info that might help us debug this:
context-{3:3}
5 locks held by swapper/0/0:
#0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3}, at: vp_vring_interrupt drivers/virtio/virtio_pci_common.c:80 [inline]
#0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3}, at: vp_interrupt+0x142/0x200 drivers/virtio/virtio_pci_common.c:113
#1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3}, at: spin_lock include/linux/spinlock.h:351 [inline]
#1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3}, at: stats_request+0x6f/0x230 drivers/virtio/virtio_balloon.c:438
#2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: __queue_work+0x199/0xf50 kernel/workqueue.c:2259
#3: ffff8880b863dd18 (&pool->lock){-.-.}-{2:2}, at: __queue_work+0x759/0xf50
#4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2339 [inline]
#4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: bpf_trace_run1+0x1d6/0x520 kernel/trace/bpf_trace.c:2380
stack backtrace:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc5-next-20241031-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline]
check_wait_context kernel/locking/lockdep.c:4898 [inline]
__lock_acquire+0x15a8/0x2100 kernel/locking/lockdep.c:5176
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462
bpf_prog_2c29ac5cdc6b1842+0x43/0x47
bpf_dispatcher_nop_func include/linux/bpf.h:1290 [inline]
__bpf_prog_run include/linux/filter.h:701 [inline]
bpf_prog_run include/linux/filter.h:708 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2340 [inline]
bpf_trace_run1+0x2ca/0x520 kernel/trace/bpf_trace.c:2380
trace_workqueue_activate_work+0x186/0x1f0 include/trace/events/workqueue.h:59
__queue_work+0xc7b/0xf50 kernel/workqueue.c:2338
queue_work_on+0x1c2/0x380 kernel/workqueue.c:2390
queue_work include/linux/workqueue.h:662 [inline]
stats_request+0x1a3/0x230 drivers/virtio/virtio_balloon.c:441
vring_interrupt+0x21d/0x380 drivers/virtio/virtio_ring.c:2595
vp_vring_interrupt drivers/virtio/virtio_pci_common.c:82 [inline]
vp_interrupt+0x192/0x200 drivers/virtio/virtio_pci_common.c:113
__handle_irq_event_percpu+0x29a/0xa80 kernel/irq/handle.c:158
handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
handle_irq_event+0x89/0x1f0 kernel/irq/handle.c:210
handle_fasteoi_irq+0x48a/0xae0 kernel/irq/chip.c:720
generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
handle_irq arch/x86/kernel/irq.c:247 [inline]
call_irq_handler arch/x86/kernel/irq.c:259 [inline]
__common_interrupt+0x136/0x230 arch/x86/kernel/irq.c:285
common_interrupt+0xb4/0xd0 arch/x86/kernel/irq.c:278
</IRQ>
<TASK>
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
RIP: 0010:finish_task_switch+0x1ea/0x870 kernel/sched/core.c:5201
Code: c9 50 e8 29 05 0c 00 48 83 c4 08 4c 89 f7 e8 4d 39 00 00 0f 1f 44 00 00 4c 89 f7 e8 a0 45 69 0a e8 4b 9e 38 00 fb 48 8b 5d c0 <48> 8d bb f8 15 00 00 48 89 f8 48 c1 e8 03 49 be 00 00 00 00 00 fc
RSP: 0018:ffffffff8e607ae8 EFLAGS: 00000282
RAX: 467bb178e56b5700 RBX: ffffffff8e6945c0 RCX: ffffffff9a3d4903
RDX: dffffc0000000000 RSI: ffffffff8c0ad3a0 RDI: ffffffff8c604dc0
RBP: ffffffff8e607b30 R08: ffffffff901d03b7 R09: 1ffffffff203a076
R10: dffffc0000000000 R11: fffffbfff203a077 R12: 1ffff110170c7e74
R13: dffffc0000000000 R14: ffff8880b863e580 R15: ffff8880b863f3a0
context_switch kernel/sched/core.c:5330 [inline]
__schedule+0x1857/0x4c30 kernel/sched/core.c:6707
schedule_idle+0x56/0x90 kernel/sched/core.c:6825
do_idle+0x567/0x5c0 kernel/sched/idle.c:353
cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:423
rest_init+0x2dc/0x300 init/main.c:747
start_kernel+0x47f/0x500 init/main.c:1102
x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:507
x86_64_start_kernel+0x9f/0xa0 arch/x86/kernel/head64.c:488
common_startup_64+0x13e/0x147
</TASK>
----------------
Code disassembly (best guess):
0: c9 leave
1: 50 push %rax
2: e8 29 05 0c 00 call 0xc0530
7: 48 83 c4 08 add $0x8,%rsp
b: 4c 89 f7 mov %r14,%rdi
e: e8 4d 39 00 00 call 0x3960
13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
18: 4c 89 f7 mov %r14,%rdi
1b: e8 a0 45 69 0a call 0xa6945c0
20: e8 4b 9e 38 00 call 0x389e70
25: fb sti
26: 48 8b 5d c0 mov -0x40(%rbp),%rbx
* 2a: 48 8d bb f8 15 00 00 lea 0x15f8(%rbx),%rdi <-- trapping instruction
31: 48 89 f8 mov %rdi,%rax
34: 48 c1 e8 03 shr $0x3,%rax
38: 49 rex.WB
39: be 00 00 00 00 mov $0x0,%esi
3e: 00 fc add %bh,%ah


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Hou Tao

unread,
Nov 1, 2024, 5:44:19 AM11/1/24
to syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, hao...@google.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev
Hi,

On 11/1/2024 3:32 AM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: f9f24ca362a4 Add linux-next specific files for 20241031
> git tree: linux-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1387c6f7980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=328572ed4d152be9
> dashboard link: https://syzkaller.appspot.com/bug?extid=b506de56cbbb63148c33
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1387655f980000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ac5540580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/eb84549dd6b3/disk-f9f24ca3.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/beb29bdfa297/vmlinux-f9f24ca3.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/8881fe3245ad/bzImage-f9f24ca3.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+b506de...@syzkaller.appspotmail.com
>
> =============================
> [ BUG: Invalid wait context ]
> 6.12.0-rc5-next-20241031-syzkaller #0 Not tainted
> -----------------------------
> swapper/0/0 is trying to lock:
> ffff8880261e7a00 (&trie->lock){....}-{3:3}, at: trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462

Sorry for the resend. The previous mail was rejected by the mail list
due to HTML content.

The warning is due to the lock for lpm_trie is a spinlock_t lock. It may
sleep under PREEMPT_RT kernel, but the bpf program has already taken a
raw_spinlock in queue_work() and the bpf program is also running inside
an interrupt handler, so lockdep warns about it. The lock should be
changed to raw_spinlock_t. Will fix it.

There have been multiple lpm trie related syzbot reports, includes:

(1) possible deadlock in get_page_from_freelist [1]
The deadlock is due to the locking of lock(&zone->lock) and
lock(&trie->lock). zone->lock comes from lpm_trie_node_alloc()

(2) possible deadlock in trie_delete_elem [2]
The deadlock is due to the recursive locking lock(&trie->lock). The
recursion comes from lpm_trie_node_alloc()

(3) possible deadlock in trie_update_elem [3]
(4) possible deadlock in stack_depot_save_flags [4]
(5) possible deadlock in get_partial_node [5]
(6) possible deadlock in deactivate_slab[6]
(7) possible deadlock in __put_partials [7]
(8) possible deadlock in debug_check_no_obj_freed [8]
issue (3)-(8) are similar with the first issue.

[1] https://syzkaller.appspot.com/bug?extid=a7f061d2d16154538c58
[2] https://syzkaller.appspot.com/bug?extid=9d95beb2a3c260622518
[3] https://syzkaller.appspot.com/bug?extid=ea624e536fee669a05cf
[4] https://syzkaller.appspot.com/bug?extid=c065d8dfbb5ad8cbdceb
[5] https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7
[6] https://syzkaller.appspot.com/bug?extid=a4acbb99845d381e5e2f
[7] https://syzkaller.appspot.com/bug?extid=5a878c984150fad34185
[8] https://syzkaller.appspot.com/bug?extid=b12149f7ab5a8751740f

Using the bpf memory allocator for the allocation of both new node and
intermediate node will fix these reports. However, I was hesitant about
supporting the recursive lock prevention on the same CPU for lpm trie.
About fix months ago, Siddharth posted a patch set [9] to support the
recursive lock prevention for queue/stack map, so maybe I could continue
the work and also add the support for lpm trie in the same patch set.

[9]
https://lore.kernel.org/bpf/20240514124052.12402...@gmail.com/
> .

syzbot

unread,
Nov 7, 2024, 4:00:05 PM11/7/24
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, fred...@kernel.org, hao...@google.com, hou...@huaweicloud.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, pet...@infradead.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de, yongho...@linux.dev
syzbot has bisected this issue to:

commit 4febce44cfebcb490b196d5d10ae9f403ca4c956
Author: Thomas Gleixner <tg...@linutronix.de>
Date: Tue Oct 1 08:42:03 2024 +0000

posix-timers: Cure si_sys_private race

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129f2d87980000
start commit: f9f24ca362a4 Add linux-next specific files for 20241031
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=169f2d87980000
Reported-by: syzbot+b506de...@syzkaller.appspotmail.com
Fixes: 4febce44cfeb ("posix-timers: Cure si_sys_private race")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Thomas Gleixner

unread,
Nov 11, 2024, 9:24:43 PM11/11/24
to syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, fred...@kernel.org, hao...@google.com, hou...@huaweicloud.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, pet...@infradead.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev
I seriously doubt that this bisection is even remotely correct.

This commit has absolutely nothing to do with the lockdep splat and
trie_delete_elem().

Thanks,

tglx

Aleksandr Nogikh

unread,
Nov 13, 2024, 6:04:39 AM11/13/24
to Thomas Gleixner, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, edd...@gmail.com, fred...@kernel.org, hao...@google.com, hou...@huaweicloud.com, john.fa...@gmail.com, jo...@kernel.org, kps...@kernel.org, linux-...@vger.kernel.org, marti...@linux.dev, pet...@infradead.org, s...@fomichev.me, so...@kernel.org, syzkall...@googlegroups.com, yongho...@linux.dev
Yes, the bisection is wrong, please ignore it.
I've added this case to the issue that tracks the underlying problem:
https://github.com/google/syzkaller/issues/5414

--
Aleksandr

>
> Thanks,
>
> tglx
>
> --

syzbot

unread,
Feb 20, 2025, 6:52:13 PMFeb 20
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages