[v6.1] possible deadlock in blkcg_deactivate_policy

已查看 1 次
跳至第一个未读帖子

syzbot

未读,
2023年3月10日 04:38:422023/3/10
收件人 syzkaller...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 8a923980a190 Linux 6.1.16
git tree: linux-6.1.y
console output: https://syzkaller.appspot.com/x/log.txt?x=152566eac80000
kernel config: https://syzkaller.appspot.com/x/.config?x=fc32d7322291d081
dashboard link: https://syzkaller.appspot.com/bug?extid=a61a7dbe3e052af19de8
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/bf09a4a426d0/disk-8a923980.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/99e88c1c3e26/vmlinux-8a923980.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d13a720e0836/Image-8a923980.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a61a7d...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.1.16-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u4:13/5402 is trying to acquire lock:
ffff0000cc5250a8 ((&sq->pending_timer)){+.-.}-{0:0}, at: del_timer_sync+0x74/0x210 kernel/time/timer.c:1417

but task is already holding lock:
ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: blkcg_deactivate_policy+0x1b8/0x4bc block/blk-cgroup.c:1495

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&blkcg->lock){....}-{2:2}:
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x54/0x6c kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:350 [inline]
blkg_create+0x9f4/0x1158 block/blk-cgroup.c:336
blkcg_init_disk+0xe4/0x32c block/blk-cgroup.c:1260
__alloc_disk_node+0x26c/0x484 block/genhd.c:1385
__blk_alloc_disk+0x40/0xbc block/genhd.c:1424
brd_alloc+0x2ac/0x5c8 drivers/block/brd.c:397
brd_init+0x108/0x1c4 drivers/block/brd.c:484
do_one_initcall+0x310/0xda4 init/main.c:1303
do_initcall_level+0x154/0x214 init/main.c:1376
do_initcalls+0x58/0xac init/main.c:1392
do_basic_setup+0x8c/0xa0 init/main.c:1411
kernel_init_freeable+0x3a4/0x528 init/main.c:1631
kernel_init+0x24/0x29c init/main.c:1519
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

-> #1 (&q->queue_lock){..-.}-{2:2}:
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline]
_raw_spin_lock_irq+0x70/0x9c kernel/locking/spinlock.c:170
spin_lock_irq include/linux/spinlock.h:375 [inline]
throtl_pending_timer_fn+0x104/0xdcc block/blk-throttle.c:1193
call_timer_fn+0x270/0xcf4 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x554/0x718 kernel/time/timer.c:1790
run_timer_softirq+0x7c/0x114 kernel/time/timer.c:1803
__do_softirq+0x37c/0xff4 kernel/softirq.c:571
____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
call_on_irq_stack+0x2c/0x54 arch/arm64/kernel/entry.S:889
do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
invoke_softirq kernel/softirq.c:452 [inline]
__irq_exit_rcu+0x28c/0x534 kernel/softirq.c:650
irq_exit_rcu+0x14/0x84 kernel/softirq.c:662
__el1_irq arch/arm64/kernel/entry-common.c:472 [inline]
el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:486
el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:491
el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:577
arch_local_irq_enable arch/arm64/include/asm/irqflags.h:35 [inline]
__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
_raw_spin_unlock_irq+0x44/0x90 kernel/locking/spinlock.c:202
process_one_work+0x664/0x16f4 kernel/workqueue.c:2262
worker_thread+0x8e4/0xfec kernel/workqueue.c:2436
kthread+0x24c/0x2d4 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

-> #0 ((&sq->pending_timer)){+.-.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3098 [inline]
check_prevs_add kernel/locking/lockdep.c:3217 [inline]
validate_chain kernel/locking/lockdep.c:3832 [inline]
__lock_acquire+0x3338/0x764c kernel/locking/lockdep.c:5056
lock_acquire+0x300/0x8e4 kernel/locking/lockdep.c:5669
del_timer_sync+0x9c/0x210 kernel/time/timer.c:1417
throtl_pd_free+0x20/0x48 block/blk-throttle.c:493
blkcg_deactivate_policy+0x2d8/0x4bc block/blk-cgroup.c:1499
blk_throtl_exit+0x9c/0x13c block/blk-throttle.c:2406
blkcg_exit_disk+0x4c/0x5c block/blk-cgroup.c:1301
disk_release+0x170/0x2d8 block/genhd.c:1171
device_release+0x8c/0x1ac
kobject_cleanup lib/kobject.c:681 [inline]
kobject_release lib/kobject.c:712 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x2a8/0x41c lib/kobject.c:729
put_device+0x28/0x40 drivers/base/core.c:3772
put_disk+0x4c/0x64 block/genhd.c:1450
nbd_dev_remove drivers/block/nbd.c:253 [inline]
nbd_dev_remove_work+0x50/0xe8 drivers/block/nbd.c:269
process_one_work+0x868/0x16f4 kernel/workqueue.c:2289
worker_thread+0x8e4/0xfec kernel/workqueue.c:2436
kthread+0x24c/0x2d4 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

other info that might help us debug this:

Chain exists of:
(&sq->pending_timer) --> &q->queue_lock --> &blkcg->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&blkcg->lock);
lock(&q->queue_lock);
lock(&blkcg->lock);
lock((&sq->pending_timer));

*** DEADLOCK ***

5 locks held by kworker/u4:13/5402:
#0: ffff0000cc477938 ((wq_completion)nbd-del){+.+.}-{0:0}, at: process_one_work+0x664/0x16f4 kernel/workqueue.c:2262
#1: ffff80002cf37c20 ((work_completion)(&nbd->remove_work)){+.+.}-{0:0}, at: process_one_work+0x6a8/0x16f4 kernel/workqueue.c:2264
#2: ffff0000cc5402e0 (&q->blkcg_mutex){+.+.}-{3:3}, at: blkcg_deactivate_policy+0xfc/0x4bc block/blk-cgroup.c:1487
#3: ffff0000cc5400d0 (&q->queue_lock){..-.}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:375 [inline]
#3: ffff0000cc5400d0 (&q->queue_lock){..-.}-{2:2}, at: blkcg_deactivate_policy+0x108/0x4bc block/blk-cgroup.c:1488
#4: ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
#4: ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: blkcg_deactivate_policy+0x1b8/0x4bc block/blk-cgroup.c:1495

stack backtrace:
CPU: 0 PID: 5402 Comm: kworker/u4:13 Not tainted 6.1.16-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Workqueue: nbd-del nbd_dev_remove_work
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:158
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:165
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x108/0x170 lib/dump_stack.c:106
dump_stack+0x1c/0x5c lib/dump_stack.c:113
print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2056
check_noncircular+0x2cc/0x378 kernel/locking/lockdep.c:2178
check_prev_add kernel/locking/lockdep.c:3098 [inline]
check_prevs_add kernel/locking/lockdep.c:3217 [inline]
validate_chain kernel/locking/lockdep.c:3832 [inline]
__lock_acquire+0x3338/0x764c kernel/locking/lockdep.c:5056
lock_acquire+0x300/0x8e4 kernel/locking/lockdep.c:5669
del_timer_sync+0x9c/0x210 kernel/time/timer.c:1417
throtl_pd_free+0x20/0x48 block/blk-throttle.c:493
blkcg_deactivate_policy+0x2d8/0x4bc block/blk-cgroup.c:1499
blk_throtl_exit+0x9c/0x13c block/blk-throttle.c:2406
blkcg_exit_disk+0x4c/0x5c block/blk-cgroup.c:1301
disk_release+0x170/0x2d8 block/genhd.c:1171
device_release+0x8c/0x1ac
kobject_cleanup lib/kobject.c:681 [inline]
kobject_release lib/kobject.c:712 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x2a8/0x41c lib/kobject.c:729
put_device+0x28/0x40 drivers/base/core.c:3772
put_disk+0x4c/0x64 block/genhd.c:1450
nbd_dev_remove drivers/block/nbd.c:253 [inline]
nbd_dev_remove_work+0x50/0xe8 drivers/block/nbd.c:269
process_one_work+0x868/0x16f4 kernel/workqueue.c:2289
worker_thread+0x8e4/0xfec kernel/workqueue.c:2436
kthread+0x24c/0x2d4 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

未读,
2023年3月10日 13:44:502023/3/10
收件人 syzkaller...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: 8a923980a190 Linux 6.1.16
git tree: linux-6.1.y
console output: https://syzkaller.appspot.com/x/log.txt?x=105d5a66c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=fc32d7322291d081
dashboard link: https://syzkaller.appspot.com/bug?extid=a61a7dbe3e052af19de8
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12146588c80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15dfbf92c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/bf09a4a426d0/disk-8a923980.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/99e88c1c3e26/vmlinux-8a923980.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d13a720e0836/Image-8a923980.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a61a7d...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.1.16-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor181/4306 is trying to acquire lock:
ffff0000ccd080a8 ((&sq->pending_timer)){+.-.}-{0:0}, at: del_timer_sync+0x74/0x210 kernel/time/timer.c:1417
arch_local_irq_enable+0xc/0x18 arch/arm64/include/asm/irqflags.h:35
default_idle_call+0x68/0xdc kernel/sched/idle.c:109
cpuidle_idle_call kernel/sched/idle.c:191 [inline]
do_idle+0x1e0/0x514 kernel/sched/idle.c:303
cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:400
secondary_start_kernel+0x19c/0x1c4 arch/arm64/kernel/smp.c:265
__secondary_switched+0xb0/0xb4 arch/arm64/kernel/head.S:618

-> #0 ((&sq->pending_timer)){+.-.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3098 [inline]
check_prevs_add kernel/locking/lockdep.c:3217 [inline]
validate_chain kernel/locking/lockdep.c:3832 [inline]
__lock_acquire+0x3338/0x764c kernel/locking/lockdep.c:5056
lock_acquire+0x300/0x8e4 kernel/locking/lockdep.c:5669
del_timer_sync+0x9c/0x210 kernel/time/timer.c:1417
throtl_pd_free+0x20/0x48 block/blk-throttle.c:493
blkcg_deactivate_policy+0x2d8/0x4bc block/blk-cgroup.c:1499
blk_throtl_exit+0x9c/0x13c block/blk-throttle.c:2406
blkcg_exit_disk+0x4c/0x5c block/blk-cgroup.c:1301
disk_release+0x170/0x2d8 block/genhd.c:1171
device_release+0x8c/0x1ac
kobject_cleanup lib/kobject.c:681 [inline]
kobject_release lib/kobject.c:712 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x2a8/0x41c lib/kobject.c:729
put_device+0x28/0x40 drivers/base/core.c:3772
put_disk+0x4c/0x64 block/genhd.c:1450
loop_remove drivers/block/loop.c:2080 [inline]
loop_control_remove drivers/block/loop.c:2128 [inline]
loop_control_ioctl+0x534/0x650 drivers/block/loop.c:2166
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:856
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x218 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

other info that might help us debug this:

Chain exists of:
(&sq->pending_timer) --> &q->queue_lock --> &blkcg->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&blkcg->lock);
lock(&q->queue_lock);
lock(&blkcg->lock);
lock((&sq->pending_timer));

*** DEADLOCK ***

3 locks held by syz-executor181/4306:
#0: ffff0000cccf0b28 (&q->blkcg_mutex){+.+.}-{3:3}, at: blkcg_deactivate_policy+0xfc/0x4bc block/blk-cgroup.c:1487
#1: ffff0000cccf0918 (&q->queue_lock){..-.}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:375 [inline]
#1: ffff0000cccf0918 (&q->queue_lock){..-.}-{2:2}, at: blkcg_deactivate_policy+0x108/0x4bc block/blk-cgroup.c:1488
#2: ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
#2: ffff800019965b30 (&blkcg->lock){....}-{2:2}, at: blkcg_deactivate_policy+0x1b8/0x4bc block/blk-cgroup.c:1495

stack backtrace:
CPU: 1 PID: 4306 Comm: syz-executor181 Not tainted 6.1.16-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
loop_remove drivers/block/loop.c:2080 [inline]
loop_control_remove drivers/block/loop.c:2128 [inline]
loop_control_ioctl+0x534/0x650 drivers/block/loop.c:2166
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:856
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x218 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

syzbot

未读,
2023年5月27日 03:17:372023/5/27
收件人 syzkaller...@googlegroups.com
syzbot suspects this issue was fixed by commit:

commit b5dae1cd0d8368b4338430ff93403df67f0b8bcc
Author: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Date: Sat Mar 11 09:34:32 2023 +0000

Revert "blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()"

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1730354d280000
start commit: 8a923980a190 Linux 6.1.16
git tree: linux-6.1.y
If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: Revert "blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()"

For information about bisection process see: https://goo.gl/tpsmEJ#bisection
回复全部
回复作者
转发
0 个新帖子