[syzbot] [kernel?] possible deadlock in stack_depot

syzbot

unread,

Nov 25, 2023, 4:07:23 PM11/25/23

to fred...@kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de

Hello,

syzbot found the following issue on:

HEAD commit: 8c9660f65153 Add linux-next specific files for 20231124
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15c1a4d8e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=ca1e8655505e280
dashboard link: https://syzkaller.appspot.com/bug?extid=186b55175d8360728234
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/345ed4af3a0d/disk-8c9660f6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/191053c69d57/vmlinux-8c9660f6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/aac7ee5e55e0/bzImage-8c9660f6.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+186b55...@syzkaller.appspotmail.com

------------[ cut here ]------------
======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc2-next-20231124-syzkaller #0 Not tainted
------------------------------------------------------
jbd2/sda1-8/4474 is trying to acquire lock:
ffffffff8cf9a9b8 ((console_sem).lock){-...}-{2:2}, at: down_trylock+0x12/0x70 kernel/locking/semaphore.c:139

but task is already holding lock:
ffffffff8da73f98 (pool_rwlock){----}-{2:2}, at: stack_depot_put lib/stackdepot.c:621 [inline]
ffffffff8da73f98 (pool_rwlock){----}-{2:2}, at: stack_depot_put+0x24/0x110 lib/stackdepot.c:613

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (pool_rwlock){----}-{2:2}:
__raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
_raw_read_lock_irqsave+0x46/0x90 kernel/locking/spinlock.c:236
stack_depot_save_flags+0x169/0x740 lib/stackdepot.c:508
kasan_save_stack+0x42/0x50 mm/kasan/common.c:47
__kasan_record_aux_stack+0xc2/0xd0 mm/kasan/generic.c:508
task_work_add+0x88/0x2a0 kernel/task_work.c:48
task_tick_numa kernel/sched/fair.c:3512 [inline]
task_tick_fair+0x5a5/0xc30 kernel/sched/fair.c:12599
scheduler_tick+0x210/0x650 kernel/sched/core.c:5674
update_process_times+0x19e/0x220 kernel/time/timer.c:2076
tick_sched_handle+0x8e/0x170 kernel/time/tick-sched.c:255
tick_nohz_highres_handler+0xe9/0x110 kernel/time/tick-sched.c:1516
__run_hrtimer kernel/time/hrtimer.c:1688 [inline]
__hrtimer_run_queues+0x654/0xc20 kernel/time/hrtimer.c:1752
hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1814
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1065 [inline]
__sysvec_apic_timer_interrupt+0x10c/0x410 arch/x86/kernel/apic/apic.c:1082
sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1076
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
sha256_transform_rorx+0xffc/0x1120 arch/x86/crypto/sha256-avx2-asm.S:655
lib_sha256_base_do_update.isra.0+0x12d/0x150 include/crypto/sha256_base.h:63
sha256_base_do_update include/crypto/sha256_base.h:81 [inline]
_sha256_update arch/x86/crypto/sha256_ssse3_glue.c:74 [inline]
_sha256_update+0xb6/0xe0 arch/x86/crypto/sha256_ssse3_glue.c:58
ima_calc_file_hash_tfm+0x2fe/0x3d0 security/integrity/ima/ima_crypto.c:496
ima_calc_file_shash security/integrity/ima/ima_crypto.c:516 [inline]
ima_calc_file_hash+0x1c6/0x4a0 security/integrity/ima/ima_crypto.c:573
ima_collect_measurement+0x85e/0xa20 security/integrity/ima/ima_api.c:290
process_measurement+0xe92/0x2260 security/integrity/ima/ima_main.c:359
ima_file_mmap+0x1ad/0x1d0 security/integrity/ima/ima_main.c:449
security_mmap_file+0x186/0x1d0 security/security.c:2788
vm_mmap_pgoff+0xdb/0x3c0 mm/util.c:552
vm_mmap+0x8e/0xc0 mm/util.c:575
elf_map fs/binfmt_elf.c:385 [inline]
elf_load+0x196/0x870 fs/binfmt_elf.c:408
load_elf_interp fs/binfmt_elf.c:675 [inline]
load_elf_binary+0x3436/0x4e10 fs/binfmt_elf.c:1200
search_binary_handler fs/exec.c:1736 [inline]
exec_binprm fs/exec.c:1778 [inline]
bprm_execve fs/exec.c:1853 [inline]
bprm_execve+0x7ef/0x1a80 fs/exec.c:1809
kernel_execve+0x3d7/0x4e0 fs/exec.c:2021
try_to_run_init_process init/main.c:1364 [inline]
kernel_init+0x137/0x2a0 init/main.c:1497
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242

-> #2 (&rq->__lock){-.-.}-{2:2}:
_raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:558
raw_spin_rq_lock kernel/sched/sched.h:1374 [inline]
rq_lock kernel/sched/sched.h:1688 [inline]
task_fork_fair+0x70/0x240 kernel/sched/fair.c:12619
sched_cgroup_fork+0x3cf/0x510 kernel/sched/core.c:4835
copy_process+0x6a38/0x9770 kernel/fork.c:2601
kernel_clone+0xfd/0x940 kernel/fork.c:2899
user_mode_thread+0xb4/0xf0 kernel/fork.c:2977
rest_init+0x27/0x2b0 init/main.c:695
arch_call_rest_init+0x13/0x30 init/main.c:827
start_kernel+0x39e/0x480 init/main.c:1072
x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:555
x86_64_start_kernel+0xb2/0xc0 arch/x86/kernel/head64.c:536
secondary_startup_64_no_verify+0x166/0x16b

-> #1 (&p->pi_lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162
class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:522 [inline]
try_to_wake_up+0x9a/0x13d0 kernel/sched/core.c:4252
up+0x79/0xb0 kernel/locking/semaphore.c:191
__up_console_sem kernel/printk/printk.c:340 [inline]
__console_unlock kernel/printk/printk.c:2706 [inline]
console_unlock+0x1cf/0x260 kernel/printk/printk.c:3038
vprintk_emit+0x17f/0x5f0 kernel/printk/printk.c:2303
dev_vprintk_emit drivers/base/core.c:4850 [inline]
dev_printk_emit+0xfb/0x140 drivers/base/core.c:4861
__dev_printk+0xf5/0x270 drivers/base/core.c:4873
_dev_printk+0xde/0x120 drivers/base/core.c:4890
sdev_prefix_printk+0x1a2/0x230 drivers/scsi/scsi_logging.c:78
ioctl_internal_command.constprop.0+0x57e/0x5f0 drivers/scsi/scsi_ioctl.c:93
scsi_send_start_stop drivers/scsi/scsi_ioctl.c:241 [inline]
scsi_ioctl+0x46b/0x1840 drivers/scsi/scsi_ioctl.c:907
sg_ioctl+0xb7b/0x2760 drivers/scsi/sg.c:1163
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x62/0x6a

-> #0 ((console_sem).lock){-...}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x2466/0x3b10 kernel/locking/lockdep.c:5136
lock_acquire kernel/locking/lockdep.c:5753 [inline]
lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5718
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162
down_trylock+0x12/0x70 kernel/locking/semaphore.c:139
__down_trylock_console_sem+0x40/0x140 kernel/printk/printk.c:323
console_trylock+0x73/0x130 kernel/printk/printk.c:2659
console_trylock_spinning kernel/printk/printk.c:1923 [inline]
vprintk_emit+0x162/0x5f0 kernel/printk/printk.c:2302
vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
_printk+0xc8/0x100 kernel/printk/printk.c:2328
__warn_printk+0x158/0x350 kernel/panic.c:721
refcount_warn_saturate+0x149/0x210 lib/refcount.c:28
__refcount_sub_and_test include/linux/refcount.h:283 [inline]
__refcount_dec_and_test include/linux/refcount.h:315 [inline]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
stack_depot_put lib/stackdepot.c:627 [inline]
stack_depot_put+0xe4/0x110 lib/stackdepot.c:613
__kasan_record_aux_stack+0xb3/0xd0 mm/kasan/generic.c:506
insert_work+0x38/0x230 kernel/workqueue.c:1653
__queue_work+0x633/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
mod_delayed_work_on+0xcc/0x1a0 kernel/workqueue.c:2027
kblockd_mod_delayed_work_on+0x29/0x40 block/blk-core.c:1038
blk_kick_flush block/blk-flush.c:348 [inline]
blk_flush_complete_seq+0xa66/0x10c0 block/blk-flush.c:213
blk_insert_flush+0x349/0x6d0 block/blk-flush.c:456
blk_mq_submit_bio+0x16cd/0x2150 block/blk-mq.c:2992
__submit_bio+0xfd/0x310 block/blk-core.c:599
__submit_bio_noacct_mq block/blk-core.c:678 [inline]
submit_bio_noacct_nocheck+0x852/0xbb0 block/blk-core.c:707
submit_bio_noacct+0x87b/0x1b90 block/blk-core.c:801
journal_submit_commit_record+0x73b/0xab0 fs/jbd2/commit.c:156
jbd2_journal_commit_transaction+0x39fd/0x63c0 fs/jbd2/commit.c:880
kjournald2+0x1f8/0x8f0 fs/jbd2/journal.c:201
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242

other info that might help us debug this:

Chain exists of:
(console_sem).lock --> &rq->__lock --> pool_rwlock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(pool_rwlock);
lock(&rq->__lock);
lock(pool_rwlock);
lock((console_sem).lock);

*** DEADLOCK ***

4 locks held by jbd2/sda1-8/4474:
#0: ffff88801e366218 (&fq->mq_flush_lock){..-.}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:376 [inline]
#0: ffff88801e366218 (&fq->mq_flush_lock){..-.}-{2:2}, at: blk_insert_flush+0x337/0x6d0 block/blk-flush.c:455
#1: ffffffff8cfacf60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:301 [inline]
#1: ffffffff8cfacf60 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
#1: ffffffff8cfacf60 (rcu_read_lock){....}-{1:2}, at: __queue_work+0xf2/0x11d0 kernel/workqueue.c:1730
#2: ffff8880b993c2f8 (&pool->lock){-.-.}-{2:2}, at: __queue_work+0x39e/0x11d0 kernel/workqueue.c:1766
#3: ffffffff8da73f98 (pool_rwlock){----}-{2:2}, at: stack_depot_put lib/stackdepot.c:621 [inline]
#3: ffffffff8da73f98 (pool_rwlock){----}-{2:2}, at: stack_depot_put+0x24/0x110 lib/stackdepot.c:613

stack backtrace:
CPU: 1 PID: 4474 Comm: jbd2/sda1-8 Not tainted 6.7.0-rc2-next-20231124-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
check_noncircular+0x316/0x400 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x2466/0x3b10 kernel/locking/lockdep.c:5136
lock_acquire kernel/locking/lockdep.c:5753 [inline]
lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5718
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162
down_trylock+0x12/0x70 kernel/locking/semaphore.c:139
__down_trylock_console_sem+0x40/0x140 kernel/printk/printk.c:323
console_trylock+0x73/0x130 kernel/printk/printk.c:2659
console_trylock_spinning kernel/printk/printk.c:1923 [inline]
vprintk_emit+0x162/0x5f0 kernel/printk/printk.c:2302
vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
_printk+0xc8/0x100 kernel/printk/printk.c:2328
__warn_printk+0x158/0x350 kernel/panic.c:721
refcount_warn_saturate+0x149/0x210 lib/refcount.c:28
__refcount_sub_and_test include/linux/refcount.h:283 [inline]
__refcount_dec_and_test include/linux/refcount.h:315 [inline]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
stack_depot_put lib/stackdepot.c:627 [inline]
stack_depot_put+0xe4/0x110 lib/stackdepot.c:613
__kasan_record_aux_stack+0xb3/0xd0 mm/kasan/generic.c:506
insert_work+0x38/0x230 kernel/workqueue.c:1653
__queue_work+0x633/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
mod_delayed_work_on+0xcc/0x1a0 kernel/workqueue.c:2027
kblockd_mod_delayed_work_on+0x29/0x40 block/blk-core.c:1038
blk_kick_flush block/blk-flush.c:348 [inline]
blk_flush_complete_seq+0xa66/0x10c0 block/blk-flush.c:213
blk_insert_flush+0x349/0x6d0 block/blk-flush.c:456
blk_mq_submit_bio+0x16cd/0x2150 block/blk-mq.c:2992
__submit_bio+0xfd/0x310 block/blk-core.c:599
__submit_bio_noacct_mq block/blk-core.c:678 [inline]
submit_bio_noacct_nocheck+0x852/0xbb0 block/blk-core.c:707
submit_bio_noacct+0x87b/0x1b90 block/blk-core.c:801
journal_submit_commit_record+0x73b/0xab0 fs/jbd2/commit.c:156
jbd2_journal_commit_transaction+0x39fd/0x63c0 fs/jbd2/commit.c:880
kjournald2+0x1f8/0x8f0 fs/jbd2/journal.c:201
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 4474 at lib/refcount.c:28 refcount_warn_saturate+0x14a/0x210 lib/refcount.c:28
Modules linked in:
CPU: 1 PID: 4474 Comm: jbd2/sda1-8 Not tainted 6.7.0-rc2-next-20231124-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:refcount_warn_saturate+0x14a/0x210 lib/refcount.c:28
Code: ff 89 de e8 78 4b 24 fd 84 db 0f 85 66 ff ff ff e8 3b 50 24 fd c6 05 7e 9b 9f 0a 01 90 48 c7 c7 e0 c1 2e 8b e8 57 42 ea fc 90 <0f> 0b 90 90 e9 43 ff ff ff e8 18 50 24 fd 0f b6 1d 59 9b 9f 0a 31
RSP: 0018:ffffc9000e3df300 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814e05d9
RDX: ffff88801bc89dc0 RSI: ffffffff814e05e6 RDI: 0000000000000001
RBP: ffff88806d1c243c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff888017e80780
R13: ffff888018a92400 R14: ffff888018a9245c R15: ffff8880b993c340
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020064000 CR3: 000000007cc12000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_sub_and_test include/linux/refcount.h:283 [inline]
__refcount_dec_and_test include/linux/refcount.h:315 [inline]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
stack_depot_put lib/stackdepot.c:627 [inline]
stack_depot_put+0xe4/0x110 lib/stackdepot.c:613
__kasan_record_aux_stack+0xb3/0xd0 mm/kasan/generic.c:506
insert_work+0x38/0x230 kernel/workqueue.c:1653
__queue_work+0x633/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
mod_delayed_work_on+0xcc/0x1a0 kernel/workqueue.c:2027
kblockd_mod_delayed_work_on+0x29/0x40 block/blk-core.c:1038
blk_kick_flush block/blk-flush.c:348 [inline]
blk_flush_complete_seq+0xa66/0x10c0 block/blk-flush.c:213
blk_insert_flush+0x349/0x6d0 block/blk-flush.c:456
blk_mq_submit_bio+0x16cd/0x2150 block/blk-mq.c:2992
__submit_bio+0xfd/0x310 block/blk-core.c:599
__submit_bio_noacct_mq block/blk-core.c:678 [inline]
submit_bio_noacct_nocheck+0x852/0xbb0 block/blk-core.c:707
submit_bio_noacct+0x87b/0x1b90 block/blk-core.c:801
journal_submit_commit_record+0x73b/0xab0 fs/jbd2/commit.c:156
jbd2_journal_commit_transaction+0x39fd/0x63c0 fs/jbd2/commit.c:880
kjournald2+0x1f8/0x8f0 fs/jbd2/journal.c:201
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

xingwei lee

unread,

Nov 26, 2023, 8:29:48 PM11/26/23

to syzbot+186b55...@syzkaller.appspotmail.com, fred...@kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de

Hi, I reproduce this bug with repro.c and repro.txt and confirmed the crash.

repro.txt
r0 = socket$alg(0x26, 0x5, 0x0)
bind$alg(r0, &(0x7f0000000440)={0x26, 'skcipher\x00', 0x0, 0x0,
'ecb-cipher_null\x00'}, 0x58)
r1 = accept$alg(r0, 0x0, 0x0)
r2 = dup(r1)
open(&(0x7f0000000140)='./file1\x00', 0x10f0c2, 0x0)
r3 = dup(r1)
mount$9p_fd(0x0, &(0x7f0000000000)='./file1\x00', &(0x7f0000000040),
0x0, &(0x7f0000000a40)=ANY=[@ANYBLOB='trans=fd,rfdno=', @ANYRESHEX=r3,
@ANYBLOB=',wfdno=', @ANYRESHEX=r2])

repro.c

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}

static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}

uint64_t r[4] = {0xffffffffffffffff, 0xffffffffffffffff,
0xffffffffffffffff, 0xffffffffffffffff};

void execute_one(void)
{
intptr_t res = 0;
res = syscall(__NR_socket, /*domain=*/0x26ul, /*type=*/5ul, /*proto=*/0);
if (res != -1)
r[0] = res;
*(uint16_t*)0x20000440 = 0x26;
memcpy((void*)0x20000442, "skcipher\000\000\000\000\000\000", 14);
*(uint32_t*)0x20000450 = 0;
*(uint32_t*)0x20000454 = 0;
memcpy((void*)0x20000458,
"ecb-cipher_null\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
64);
syscall(__NR_bind, /*fd=*/r[0], /*addr=*/0x20000440ul, /*addrlen=*/0x58ul);
res = syscall(__NR_accept, /*fd=*/r[0], /*peer=*/0ul, /*peerlen=*/0ul);
if (res != -1)
r[1] = res;
res = syscall(__NR_dup, /*oldfd=*/r[1]);
if (res != -1)
r[2] = res;
memcpy((void*)0x20000140, "./file1\000", 8);
syscall(__NR_open, /*file=*/0x20000140ul, /*flags=*/0x10f0c2ul, /*mode=*/0ul);
res = syscall(__NR_dup, /*oldfd=*/r[1]);
if (res != -1)
r[3] = res;
memcpy((void*)0x20000000, "./file1\000", 8);
memcpy((void*)0x20000040, "9p\000", 3);
memcpy((void*)0x20000a40, "trans=fd,rfdno=", 15);
sprintf((char*)0x20000a4f, "0x%016llx", (long long)r[3]);
memcpy((void*)0x20000a61, ",wfdno=", 7);
sprintf((char*)0x20000a68, "0x%016llx", (long long)r[2]);
syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000000ul,
/*type=*/0x20000040ul, /*flags=*/0ul, /*opts=*/0x20000a40ul);

}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul,
/*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
/*prot=*/7ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul,
/*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
loop();
return 0;
}

repro.txt

repro.c

xingwei lee

unread,

Nov 26, 2023, 8:56:14 PM11/26/23

to syzbot+186b55...@syzkaller.appspotmail.com, fred...@kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de

Sorry for containing HTML subpart, I'll repeat this mail.
Hi, I reproduce this bug with repro.c and repro.txt and confirmed crash.

Ingo Molnar

unread,

Nov 27, 2023, 4:42:33 AM11/27/23

to xingwei lee, syzbot+186b55...@syzkaller.appspotmail.com, fred...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de

BTW., could you please run such autogenerated repro.c files through
"indent --linux-style"? See below, the code becomes much more readable.

Thanks,

Ingo

============================>

return (uint64_t) ts.tv_sec * 1000 + (uint64_t) ts.tv_nsec / 1000000;
}

static bool write_file(const char *file, const char *what, ...)

{
char buf[1024];
va_list args;

va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);

if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;

close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

static void kill_and_wait(int pid, int *status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}

DIR *dir = opendir("/sys/fs/fuse/connections");

memcpy((void *)0x20000442, "skcipher\000\000\000\000\000\000", 14);

*(uint32_t *) 0x20000450 = 0;
*(uint32_t *) 0x20000454 = 0;
memcpy((void *)0x20000458,
"ecb-cipher_null\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
64);
syscall(__NR_bind, /*fd= */ r[0], /*addr= */ 0x20000440ul, /*addrlen= */
0x58ul);
res = syscall(__NR_accept, /*fd= */ r[0], /*peer= */ 0ul, /*peerlen= */
0ul);
if (res != -1)
r[1] = res;

res = syscall(__NR_dup, /*oldfd= */ r[1]);

if (res != -1)
r[2] = res;

memcpy((void *)0x20000140, "./file1\000", 8);

syscall(__NR_open, /*file= */ 0x20000140ul, /*flags= */ 0x10f0c2ul,
/*mode= */ 0ul);

res = syscall(__NR_dup, /*oldfd= */ r[1]);

if (res != -1)
r[3] = res;
memcpy((void *)0x20000000, "./file1\000", 8);
memcpy((void *)0x20000040, "9p\000", 3);
memcpy((void *)0x20000a40, "trans=fd,rfdno=", 15);

sprintf((char *)0x20000a4f, "0x%016llx", (long long)r[3]);

memcpy((void *)0x20000a61, ",wfdno=", 7);

sprintf((char *)0x20000a68, "0x%016llx", (long long)r[2]);

xingwei lee

unread,

Nov 27, 2023, 9:14:51 PM11/27/23

to xrive...@gmail.com, fred...@kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, syzbot+186b55...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, tg...@linutronix.de

Thanks for your advice. Maybe the TEXT/PLAIN mode disrupt my code formatting unintentionally.

repro.c

============================>

return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;

}

static bool write_file(const char* file, const char* what, ...)

{

char buf[1024];

va_list args;

va_start(args, what);

vsnprintf(buf, sizeof(buf), what, args);

va_end(args);

buf[sizeof(buf) - 1] = 0;

int len = strlen(buf);

int fd = open(file, O_WRONLY | O_CLOEXEC);

if (fd == -1)

return false;

if (write(fd, buf, len) != len) {

int err = errno;

close(fd);

errno = err;

return false;

}

close(fd);

return true;

}

static void kill_and_wait(int pid, int* status)

{

kill(-pid, SIGKILL);

kill(pid, SIGKILL);

for (int i = 0; i < 100; i++) {

if (waitpid(-1, status, WNOHANG | __WALL) == pid)

return;

usleep(1000);

}

DIR* dir = opendir("/sys/fs/fuse/connections");

memcpy((void*)0x20000442, "skcipher\000\000\000\000\000\000", 14);

*(uint32_t*)0x20000450 = 0;

*(uint32_t*)0x20000454 = 0;

memcpy((void*)0x20000458, "ecb-cipher_null\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 64);

syscall(__NR_bind, /*fd=*/r[0], /*addr=*/0x20000440ul, /*addrlen=*/0x58ul);

res = syscall(__NR_accept, /*fd=*/r[0], /*peer=*/0ul, /*peerlen=*/0ul);

if (res != -1)

r[1] = res;

res = syscall(__NR_dup, /*oldfd=*/r[1]);

if (res != -1)

r[2] = res;

memcpy((void*)0x20000140, "./file1\000", 8);

syscall(__NR_open, /*file=*/0x20000140ul, /*flags=*/0x10f0c2ul, /*mode=*/0ul);

res = syscall(__NR_dup, /*oldfd=*/r[1]);

if (res != -1)

r[3] = res;

memcpy((void*)0x20000000, "./file1\000", 8);

memcpy((void*)0x20000040, "9p\000", 3);

memcpy((void*)0x20000a40, "trans=fd,rfdno=", 15);

sprintf((char*)0x20000a4f, "0x%016llx", (long long)r[3]);

memcpy((void*)0x20000a61, ",wfdno=", 7);

sprintf((char*)0x20000a68, "0x%016llx", (long long)r[2]);

Tetsuo Handa

unread,

Dec 3, 2023, 8:17:39 AM12/3/23

to syzbot, Andrey Konovalov, syzkaller-bugs, kasan-dev@googlegroups.com >> kasan-dev

On 2023/11/26 6:07, syzbot wrote:
> refcount_t: underflow; use-after-free.

#syz set subsystems: kasan

By the way, shouldn't pool_rwlock section be guarded by printk_deferred_enter() ?

Hillf Danton

unread,

Dec 5, 2023, 6:31:26 AM12/5/23

to syzbot, linux-...@vger.kernel.org, Matthew Wilcox, Petr Mladek, John Ogness, Tetsuo Handa, Waiman Long, Linus Torvalds, syzkall...@googlegroups.com

On Sat, 25 Nov 2023 13:07:22 -0800

Unlike down_trylock(), mutex_trylock() is unable to trigger any lockdep
warning, so why is a binary semaphore prefered over mutex?

Tetsuo Handa

unread,

Dec 5, 2023, 7:01:42 AM12/5/23

to Hillf Danton, syzbot, linux-...@vger.kernel.org, Matthew Wilcox, Petr Mladek, John Ogness, Waiman Long, Linus Torvalds, syzkall...@googlegroups.com

On 2023/12/05 20:31, Hillf Danton wrote:
> Unlike down_trylock(), mutex_trylock() is unable to trigger any lockdep
> warning, so why is a binary semaphore prefered over mutex?

The mutex has limitations which makes it impossible to use for console lock.

https://elixir.bootlin.com/linux/v6.7-rc4/source/kernel/locking/mutex.c#L537

By the way, this is a KASAN bug saying "refcount_t: underflow; use-after-free.".
Possibly a candidate for printk_deferred_enter() user?

Petr Mladek

unread,

Dec 6, 2023, 4:45:32 AM12/6/23

to Tetsuo Handa, Hillf Danton, syzbot, linux-...@vger.kernel.org, Matthew Wilcox, John Ogness, Waiman Long, Linus Torvalds, syzkall...@googlegroups.com

On Tue 2023-12-05 21:00:46, Tetsuo Handa wrote:
> On 2023/12/05 20:31, Hillf Danton wrote:
> > Unlike down_trylock(), mutex_trylock() is unable to trigger any lockdep
> > warning, so why is a binary semaphore prefered over mutex?
>
> The mutex has limitations which makes it impossible to use for console lock.
>
> https://elixir.bootlin.com/linux/v6.7-rc4/source/kernel/locking/mutex.c#L537

In particular, mutexes can't be acquired in an interrupt context not even
via mutex_trylock().

> By the way, this is a KASAN bug saying "refcount_t: underflow; use-after-free.".
> Possibly a candidate for printk_deferred_enter() user?

In practice, it would mean adding

printk_deferred_enter()
printk_deferred_exit()

around the KASAN/stackdepot code which might be called in any context
and might print a message. For example, see show_one_worker_pool().

It should be used only when really needed because it reduces the
chance to see the messages.

But honestly, I do not see a better solution. printk_deferred() is
used on many locations inside the scheduler to avoid these deadlocks
between console_sem and rq->lock.

It should be solved by the printk rework introducing per-console
locks. It might eventually allow to get rid of console_sem
completely. But it might be a long ride until all console
drivers get converted.

Best Regards,
Petr

Hillf Danton

unread,

Dec 6, 2023, 6:22:41 AM12/6/23

to Petr Mladek, Tetsuo Handa, syzbot, linux-...@vger.kernel.org, Matthew Wilcox, John Ogness, Waiman Long, Linus Torvalds, syzkall...@googlegroups.com

On Wed, 6 Dec 2023 10:42:31 +0100 Petr Mladek <pml...@suse.com>

> On Tue 2023-12-05 21:00:46, Tetsuo Handa wrote:
> > On 2023/12/05 20:31, Hillf Danton wrote:
> > > Unlike down_trylock(), mutex_trylock() is unable to trigger any lockdep
> > > warning, so why is a binary semaphore prefered over mutex?
> >
> > The mutex has limitations which makes it impossible to use for console lock.
> >

Given the same pattern in both up() and __mutex_unlock_slowpath() where
acquire raw spinlock to wake waiter up, it is safe to unlock mutex in
irq context.

> > https://elixir.bootlin.com/linux/v6.7-rc4/source/kernel/locking/mutex.c#L537
>
> In particular, mutexes can't be acquired in an interrupt context not even
> via mutex_trylock().
>

No mutex is taken in irq context without a hoofed skull. Given wakeup in
irq context, why is it unsafe to do atomic operations in irq?

Linus Torvalds

unread,

Dec 6, 2023, 6:40:32 AM12/6/23

to Hillf Danton, Petr Mladek, Tetsuo Handa, syzbot, linux-...@vger.kernel.org, Matthew Wilcox, John Ogness, Waiman Long, syzkall...@googlegroups.com

On Wed, 6 Dec 2023 at 20:22, Hillf Danton <hda...@sina.com> wrote:
>
> Given the same pattern in both up() and __mutex_unlock_slowpath() where
> acquire raw spinlock to wake waiter up, it is safe to unlock mutex in
> irq context.

What? No. That spinlock is exactly why it is NOT OK to unlock a mutex
in irq context.

If somebody else is trying to get or release the mutex at the same
time an interrupt happens, you now have an immediate deadlock.

No spinlocks - raw or not - are irq safe.

The only way you make them irq-safe is by disabling interrupts
entirely across the locked region, which the mutex code very much does
not do, and does not want to do.

So no. Mutexes are not usable from interrupts.

So repeat after me: MUTEXES CANNOT BE USED IN ANY FORM IN INTERRUPT
CONTEXT. End of story.

Other locks do work. completions are designed to be done from
interrupts. And our legacy semaphores were irq-safe (for wakeups) from
day one, which is then why the spinlock in the legacy semaphore is
done with interrupts disabled, and why you can do "down_trylock()" and
"up[()" in interrupt context.

But mutexes wanted to consciously avoid that, partly *exactly* because
they didn't want to have the more expensive irq-safe spinlocks
(particularly with the debugging versions)

Linus

Hillf Danton

unread,

Dec 7, 2023, 6:04:21 AM12/7/23

to Linus Torvalds, Petr Mladek, Tetsuo Handa, syzbot, linux-...@vger.kernel.org, Matthew Wilcox, John Ogness, Waiman Long, syzkall...@googlegroups.com

On Wed, 6 Dec 2023 20:40:10 +0900 Linus Torvalds wrote:
> On Wed, 6 Dec 2023 at 20:22, Hillf Danton <hda...@sina.com> wrote:
> >
> > Given the same pattern in both up() and __mutex_unlock_slowpath() where
> > acquire raw spinlock to wake waiter up, it is safe to unlock mutex in
> > irq context.
>
> What? No. That spinlock is exactly why it is NOT OK to unlock a mutex
> in irq context.
>
> If somebody else is trying to get or release the mutex at the same
> time an interrupt happens, you now have an immediate deadlock.
>

Yes, you are right.

> No spinlocks - raw or not - are irq safe.
>
> The only way you make them irq-safe is by disabling interrupts
> entirely across the locked region, which the mutex code very much does
> not do, and does not want to do.
>
> So no. Mutexes are not usable from interrupts.
>
> So repeat after me: MUTEXES CANNOT BE USED IN ANY FORM IN INTERRUPT
> CONTEXT. End of story.
>
> Other locks do work. completions are designed to be done from
> interrupts. And our legacy semaphores were irq-safe (for wakeups) from
> day one, which is then why the spinlock in the legacy semaphore is
> done with interrupts disabled, and why you can do "down_trylock()" and
> "up[()" in interrupt context.
>
> But mutexes wanted to consciously avoid that, partly *exactly* because
> they didn't want to have the more expensive irq-safe spinlocks
> (particularly with the debugging versions)

Given the irq-safe spinlock in rwsem, making the spinlock in mutex irq safe
is not difficult, but it is another story.

Thanks for your light on the issue.
Hillf

Andrey Konovalov

unread,

Dec 12, 2023, 6:30:15 AM12/12/23

to Tetsuo Handa, syzbot, Andrey Konovalov, syzkaller-bugs, kasan-dev@googlegroups.com >> kasan-dev

On Sun, Dec 3, 2023 at 11:33 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> On 2023/11/26 6:07, syzbot wrote:
> > refcount_t: underflow; use-after-free.
>
> #syz set subsystems: kasan

Thank you for pointing this out! I've debugged the issue, will send a fix soon.

> By the way, shouldn't pool_rwlock section be guarded by printk_deferred_enter() ?

And for this one as well.

Reply all

Reply to author

Forward

[syzbot] [kernel?] possible deadlock in stack_depot_put

syzbot

xingwei lee

xingwei lee

Ingo Molnar

xingwei lee

Tetsuo Handa

Hillf Danton

Tetsuo Handa

Petr Mladek

Hillf Danton

Linus Torvalds

Hillf Danton

Andrey Konovalov