[v5.15] possible deadlock in ocfs2_finish_quota_recovery

8 views

Skip to first unread message

syzbot

unread,

Jan 29, 2025, 9:34:21 PM1/29/25

to syzkaller...@googlegroups.com

Hello,

syzbot found the following issue on:

HEAD commit: 003148680b79 Linux 5.15.177
git tree: linux-5.15.y
console output: https://syzkaller.appspot.com/x/log.txt?x=126bd6b0580000
kernel config: https://syzkaller.appspot.com/x/.config?x=f2c956168ff5b89
dashboard link: https://syzkaller.appspot.com/bug?extid=9d13e0bd9eb62200af15
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/452e6089e1b8/disk-00314868.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/368807beca93/vmlinux-00314868.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e98b52fba46c/bzImage-00314868.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9d13e0...@syzkaller.appspotmail.com

ocfs2: Finishing quota recovery on device (7,2) for slot 0
======================================================
WARNING: possible circular locking dependency detected
5.15.177-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u4:0/9 is trying to acquire lock:
ffff88807e7f20e0 (&type->s_umount_key#53){++++}-{3:3}, at: ocfs2_finish_quota_recovery+0x15a/0x2260 fs/ocfs2/quota_local.c:600

but task is already holding lock:
ffffc90000ce7d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7d0/0x10c0 kernel/workqueue.c:2285

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}:
lock_acquire+0x1db/0x4f0 kernel/locking/lockdep.c:5623
process_one_work+0x7f1/0x10c0 kernel/workqueue.c:2286
worker_thread+0xaca/0x1280 kernel/workqueue.c:2457
kthread+0x3f6/0x4f0 kernel/kthread.c:334
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287

-> #1 ((null)){+.+.}-{0:0}:
lock_acquire+0x1db/0x4f0 kernel/locking/lockdep.c:5623
flush_workqueue+0x170/0x1610 kernel/workqueue.c:2830
ocfs2_shutdown_local_alloc+0x105/0xa90 fs/ocfs2/localalloc.c:379
ocfs2_dismount_volume+0x1db/0x8b0 fs/ocfs2/super.c:1882
generic_shutdown_super+0x130/0x310 fs/super.c:475
kill_block_super+0x7a/0xe0 fs/super.c:1427
deactivate_locked_super+0xa0/0x110 fs/super.c:335
cleanup_mnt+0x44e/0x500 fs/namespace.c:1143
task_work_run+0x129/0x1a0 kernel/task_work.c:188
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop+0x106/0x130 kernel/entry/common.c:181
exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:214
__syscall_exit_to_user_mode_work kernel/entry/common.c:296 [inline]
syscall_exit_to_user_mode+0x5d/0x240 kernel/entry/common.c:307
do_syscall_64+0x47/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x66/0xd0

-> #0 (&type->s_umount_key#53){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3053 [inline]
check_prevs_add kernel/locking/lockdep.c:3172 [inline]
validate_chain+0x1649/0x5930 kernel/locking/lockdep.c:3788
__lock_acquire+0x1295/0x1ff0 kernel/locking/lockdep.c:5012
lock_acquire+0x1db/0x4f0 kernel/locking/lockdep.c:5623
down_read+0x45/0x2e0 kernel/locking/rwsem.c:1498
ocfs2_finish_quota_recovery+0x15a/0x2260 fs/ocfs2/quota_local.c:600
ocfs2_complete_recovery+0x173c/0x24a0 fs/ocfs2/journal.c:1295
process_one_work+0x8a1/0x10c0 kernel/workqueue.c:2310
worker_thread+0xaca/0x1280 kernel/workqueue.c:2457
kthread+0x3f6/0x4f0 kernel/kthread.c:334
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287

other info that might help us debug this:

Chain exists of:
&type->s_umount_key#53 --> (null) --> (work_completion)(&journal->j_recovery_work)

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock((work_completion)(&journal->j_recovery_work));
lock((null));
lock((work_completion)(&journal->j_recovery_work));
lock(&type->s_umount_key#53);

*** DEADLOCK ***

2 locks held by kworker/u4:0/9:
#0: ffff88807aad9138 ((wq_completion)ocfs2_wq){+.+.}-{0:0}, at: process_one_work+0x78a/0x10c0 kernel/workqueue.c:2283
#1: ffffc90000ce7d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7d0/0x10c0 kernel/workqueue.c:2285

stack backtrace:
CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.15.177-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
Workqueue: ocfs2_wq ocfs2_complete_recovery
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2d0 lib/dump_stack.c:106
check_noncircular+0x2f8/0x3b0 kernel/locking/lockdep.c:2133
check_prev_add kernel/locking/lockdep.c:3053 [inline]
check_prevs_add kernel/locking/lockdep.c:3172 [inline]
validate_chain+0x1649/0x5930 kernel/locking/lockdep.c:3788
__lock_acquire+0x1295/0x1ff0 kernel/locking/lockdep.c:5012
lock_acquire+0x1db/0x4f0 kernel/locking/lockdep.c:5623
down_read+0x45/0x2e0 kernel/locking/rwsem.c:1498
ocfs2_finish_quota_recovery+0x15a/0x2260 fs/ocfs2/quota_local.c:600
ocfs2_complete_recovery+0x173c/0x24a0 fs/ocfs2/journal.c:1295
process_one_work+0x8a1/0x10c0 kernel/workqueue.c:2310
worker_thread+0xaca/0x1280 kernel/workqueue.c:2457
kthread+0x3f6/0x4f0 kernel/kthread.c:334
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,

Jan 31, 2025, 11:27:26 AM1/31/25

to syzkaller...@googlegroups.com

Hello,

syzbot found the following issue on:

HEAD commit: 75cefdf153f5 Linux 6.1.127
git tree: linux-6.1.y
console output: https://syzkaller.appspot.com/x/log.txt?x=139af6b0580000
kernel config: https://syzkaller.appspot.com/x/.config?x=3dc848f1f9c50685
dashboard link: https://syzkaller.appspot.com/bug?extid=3a53f7e871535e55d967

compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:

disk image: https://storage.googleapis.com/syzbot-assets/9fa70d9fd10a/disk-75cefdf1.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d01cb9bb9789/vmlinux-75cefdf1.xz
kernel image: https://storage.googleapis.com/syzbot-assets/487423b91d50/bzImage-75cefdf1.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:

Reported-by: syzbot+3a53f7...@syzkaller.appspotmail.com

ocfs2: Finishing quota recovery on device (7,1) for slot 0

======================================================
WARNING: possible circular locking dependency detected

6.1.127-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u4:4/56 is trying to acquire lock:
ffff88802982a0e0 (&type->s_umount_key#53){++++}-{3:3}, at: ocfs2_finish_quota_recovery+0x158/0x2300 fs/ocfs2/quota_local.c:600

but task is already holding lock:

ffffc90001577d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}:

lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
process_one_work+0x7dc/0x11d0 kernel/workqueue.c:2268
worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
kthread+0x28d/0x320 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((wq_completion)ocfs2_wq){+.+.}-{0:0}:
lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
__flush_workqueue+0x170/0x1610 kernel/workqueue.c:2812
ocfs2_shutdown_local_alloc+0x105/0xa90 fs/ocfs2/localalloc.c:379
ocfs2_dismount_volume+0x1fb/0x960 fs/ocfs2/super.c:1879
generic_shutdown_super+0x130/0x340 fs/super.c:501
kill_block_super+0x7a/0xe0 fs/super.c:1470
deactivate_locked_super+0xa0/0x110 fs/super.c:332
cleanup_mnt+0x490/0x520 fs/namespace.c:1186
task_work_run+0x246/0x300 kernel/task_work.c:203
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0xde/0x100 kernel/entry/common.c:177
exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:210
__syscall_exit_to_user_mode_work kernel/entry/common.c:292 [inline]
syscall_exit_to_user_mode+0x60/0x270 kernel/entry/common.c:303
do_syscall_64+0x47/0xb0 arch/x86/entry/common.c:87
entry_SYSCALL_64_after_hwframe+0x68/0xd2

-> #0 (&type->s_umount_key#53){++++}-{3:3}:

check_prev_add kernel/locking/lockdep.c:3090 [inline]
check_prevs_add kernel/locking/lockdep.c:3209 [inline]
validate_chain+0x1661/0x5950 kernel/locking/lockdep.c:3825
__lock_acquire+0x125b/0x1f80 kernel/locking/lockdep.c:5049
lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
down_read+0xad/0xa30 kernel/locking/rwsem.c:1520
ocfs2_finish_quota_recovery+0x158/0x2300 fs/ocfs2/quota_local.c:600
ocfs2_complete_recovery+0x18e2/0x2840 fs/ocfs2/journal.c:1324
process_one_work+0x8a9/0x11d0 kernel/workqueue.c:2292
worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
kthread+0x28d/0x320 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

other info that might help us debug this:

Chain exists of:

&type->s_umount_key#53 --> (wq_completion)ocfs2_wq --> (work_completion)(&journal->j_recovery_work)

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock((work_completion)(&journal->j_recovery_work));

lock((wq_completion)ocfs2_wq);

lock((work_completion)(&journal->j_recovery_work));
lock(&type->s_umount_key#53);

*** DEADLOCK ***

2 locks held by kworker/u4:4/56:
#0: ffff8880547db138 ((wq_completion)ocfs2_wq
#2){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267
#1: ffffc90001577d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267

stack backtrace:
CPU: 1 PID: 56 Comm: kworker/u4:4 Not tainted 6.1.127-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
Workqueue: ocfs2_wq ocfs2_complete_recovery
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]

dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
check_noncircular+0x2fa/0x3b0 kernel/locking/lockdep.c:2170
check_prev_add kernel/locking/lockdep.c:3090 [inline]
check_prevs_add kernel/locking/lockdep.c:3209 [inline]
validate_chain+0x1661/0x5950 kernel/locking/lockdep.c:3825
__lock_acquire+0x125b/0x1f80 kernel/locking/lockdep.c:5049
lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
down_read+0xad/0xa30 kernel/locking/rwsem.c:1520
ocfs2_finish_quota_recovery+0x158/0x2300 fs/ocfs2/quota_local.c:600
ocfs2_complete_recovery+0x18e2/0x2840 fs/ocfs2/journal.c:1324
process_one_work+0x8a9/0x11d0 kernel/workqueue.c:2292
worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
kthread+0x28d/0x320 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

syzbot

unread,

Feb 1, 2025, 1:16:23 AM2/1/25

to syzkaller...@googlegroups.com

syzbot has found a reproducer for the following issue on:

HEAD commit: 75cefdf153f5 Linux 6.1.127
git tree: linux-6.1.y

console output: https://syzkaller.appspot.com/x/log.txt?x=177fc518580000

kernel config: https://syzkaller.appspot.com/x/.config?x=3dc848f1f9c50685
dashboard link: https://syzkaller.appspot.com/bug?extid=3a53f7e871535e55d967
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dd0eb0580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116595f8580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9fa70d9fd10a/disk-75cefdf1.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d01cb9bb9789/vmlinux-75cefdf1.xz
kernel image: https://storage.googleapis.com/syzbot-assets/487423b91d50/bzImage-75cefdf1.xz

mounted in repro: https://storage.googleapis.com/syzbot-assets/02f47c8a3ae3/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3a53f7...@syzkaller.appspotmail.com

ocfs2: Finishing quota recovery on device (7,2) for slot 0

======================================================
WARNING: possible circular locking dependency detected
6.1.127-syzkaller #0 Not tainted
------------------------------------------------------

kworker/u4:1/11 is trying to acquire lock:
ffff88806292c0e0 (&type->s_umount_key#50){++++}-{3:3}, at: ocfs2_finish_quota_recovery+0x158/0x2300 fs/ocfs2/quota_local.c:600

but task is already holding lock:

ffffc90000107d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}:
lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
process_one_work+0x7dc/0x11d0 kernel/workqueue.c:2268
worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
kthread+0x28d/0x320 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((wq_completion)ocfs2_wq#2){+.+.}-{0:0}:

lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
__flush_workqueue+0x170/0x1610 kernel/workqueue.c:2812
ocfs2_shutdown_local_alloc+0x105/0xa90 fs/ocfs2/localalloc.c:379
ocfs2_dismount_volume+0x1fb/0x960 fs/ocfs2/super.c:1879
generic_shutdown_super+0x130/0x340 fs/super.c:501
kill_block_super+0x7a/0xe0 fs/super.c:1470
deactivate_locked_super+0xa0/0x110 fs/super.c:332
cleanup_mnt+0x490/0x520 fs/namespace.c:1186
task_work_run+0x246/0x300 kernel/task_work.c:203
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0xde/0x100 kernel/entry/common.c:177
exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:210
__syscall_exit_to_user_mode_work kernel/entry/common.c:292 [inline]
syscall_exit_to_user_mode+0x60/0x270 kernel/entry/common.c:303
do_syscall_64+0x47/0xb0 arch/x86/entry/common.c:87
entry_SYSCALL_64_after_hwframe+0x68/0xd2

-> #0 (&type->s_umount_key#50){++++}-{3:3}:

&type->s_umount_key#50 --> (wq_completion)ocfs2_wq#2 --> (work_completion)(&journal->j_recovery_work)

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock((work_completion)(&journal->j_recovery_work));

lock((wq_completion)ocfs2_wq#2);
lock((work_completion)(&journal->j_recovery_work));
lock(&type->s_umount_key#50);

*** DEADLOCK ***

2 locks held by kworker/u4:1/11:
#0: ffff88807b3bf138 ((wq_completion)ocfs2_wq#2){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267
#1: ffffc90000107d20 ((work_completion)(&journal->j_recovery_work)){+.+.}-{0:0}, at: process_one_work+0x7a9/0x11d0 kernel/workqueue.c:2267

stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.127-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
Workqueue: ocfs2_wq ocfs2_complete_recovery
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
check_noncircular+0x2fa/0x3b0 kernel/locking/lockdep.c:2170
check_prev_add kernel/locking/lockdep.c:3090 [inline]
check_prevs_add kernel/locking/lockdep.c:3209 [inline]
validate_chain+0x1661/0x5950 kernel/locking/lockdep.c:3825
__lock_acquire+0x125b/0x1f80 kernel/locking/lockdep.c:5049
lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
down_read+0xad/0xa30 kernel/locking/rwsem.c:1520
ocfs2_finish_quota_recovery+0x158/0x2300 fs/ocfs2/quota_local.c:600
ocfs2_complete_recovery+0x18e2/0x2840 fs/ocfs2/journal.c:1324
process_one_work+0x8a9/0x11d0 kernel/workqueue.c:2292
worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
kthread+0x28d/0x320 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>

---

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

syzbot

unread,

Jun 5, 2025, 7:46:04 PM6/5/25

to syzkaller...@googlegroups.com

syzbot suspects this issue was fixed by commit:

commit 4c3a0b0b23dd9639739732556830a1d2fe14dc60
Author: Jan Kara <ja...@suse.cz>
Date: Thu Apr 24 13:45:13 2025 +0000

ocfs2: stop quota recovery before disabling quotas

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11998c0c580000
start commit: 75cefdf153f5 Linux 6.1.127
git tree: linux-6.1.y

kernel config: https://syzkaller.appspot.com/x/.config?x=3dc848f1f9c50685
dashboard link: https://syzkaller.appspot.com/bug?extid=3a53f7e871535e55d967

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dd0eb0580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116595f8580000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: ocfs2: stop quota recovery before disabling quotas

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

syzbot

unread,

Jul 26, 2025, 7:28:21 PM7/26/25

to syzkaller...@googlegroups.com

Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.

Reply all

Reply to author

Forward

0 new messages