WARNING: ODEBUG bug in hfsplus_fill_super

17 views
Skip to first unread message

syzbot

unread,
Mar 31, 2018, 4:47:06 PM3/31/18
to ak...@linux-foundation.org, dhow...@redhat.com, ernesto.mn...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, sl...@dubeyko.com, syzkall...@googlegroups.com
Hello,

syzbot hit the following crash on upstream commit
9dd2326890d89a5179967c947dab2bab34d7ddee (Fri Mar 30 17:29:47 2018 +0000)
Merge tag 'ceph-for-4.16-rc8' of git://github.com/ceph/ceph-client
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=4f2e5f086147d543ab03

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5787167037063168
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=5103671176069120
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6295681903886336
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-2760467897697295172
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4f2e5f...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

hfsplus: found bad thread record in catalog
hfsplus: failed to load root directory
hfsplus: failed to load root directory
------------[ cut here ]------------
ODEBUG: free active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1355
WARNING: CPU: 1 PID: 5536 at lib/debugobjects.c:291
debug_print_object+0x166/0x220 lib/debugobjects.c:288
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 5536 Comm: syzkaller048155 Not tainted 4.16.0-rc7+ #7
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x1f4/0x2b0 lib/bug.c:186
fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
RSP: 0018:ffff8801acbf73d0 EFLAGS: 00010082
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815b193e
RDX: 0000000000000000 RSI: 1ffff1003597ee2a RDI: 1ffff1003597edff
RBP: ffff8801acbf7410 R08: 0000000000000000 R09: 1ffff1003597edd1
R10: ffffed003597eea9 R11: ffffffff88583158 R12: 0000000000000001
R13: ffffffff885949e0 R14: ffffffff8775c040 R15: ffffffff81604110
__debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
kfree+0xc7/0x260 mm/slab.c:3800
hfsplus_fill_super+0x114b/0x2030 fs/hfsplus/super.c:609
mount_bdev+0x2b7/0x370 fs/super.c:1119
hfsplus_mount+0x34/0x40 fs/hfsplus/super.c:645
mount_fs+0x66/0x2d0 fs/super.c:1222
vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:2509 [inline]
do_new_mount fs/namespace.c:2512 [inline]
do_mount+0xea4/0x2bb0 fs/namespace.c:2842
SYSC_mount fs/namespace.c:3058 [inline]
SyS_mount+0xab/0x120 fs/namespace.c:3035
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x44a51a
RSP: 002b:00007f40bbaadd38 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000200003a8 RCX: 000000000044a51a
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f40bbaadd50
RBP: 00000000006e49c4 R08: 0000000020017900 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000003
R13: 0000000000000004 R14: 00000000006e49c0 R15: 0073756c70736668

======================================================
WARNING: possible circular locking dependency detected
4.16.0-rc7+ #7 Not tainted
------------------------------------------------------
syzkaller048155/5536 is trying to acquire lock:
((console_sem).lock){....}, at: [<00000000d4a7cd50>]
down_trylock+0x13/0x70 kernel/locking/semaphore.c:136

but task is already holding lock:
(&obj_hash[i].lock){-.-.}, at: [<00000000507259e1>]
__debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
(&obj_hash[i].lock){-.-.}, at: [<00000000507259e1>]
debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&obj_hash[i].lock){-.-.}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
__debug_object_init+0x109/0x1040 lib/debugobjects.c:343
debug_object_init+0x17/0x20 lib/debugobjects.c:391
debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
debug_init kernel/time/hrtimer.c:458 [inline]
hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1259
init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1060
__sched_fork+0x2bb/0xb60 kernel/sched/core.c:2189
init_idle+0x75/0x820 kernel/sched/core.c:5352
sched_init+0xb19/0xc43 kernel/sched/core.c:6049
start_kernel+0x452/0x819 init/main.c:586
x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:239

-> #2 (&rq->lock){-.-.}:
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
rq_lock kernel/sched/sched.h:1760 [inline]
task_fork_fair+0x7a/0x690 kernel/sched/fair.c:9471
sched_fork+0x450/0xc10 kernel/sched/core.c:2405
copy_process.part.38+0x17c9/0x4bd0 kernel/fork.c:1763
copy_process kernel/fork.c:1606 [inline]
_do_fork+0x1f7/0xf70 kernel/fork.c:2087
kernel_thread+0x34/0x40 kernel/fork.c:2146
rest_init+0x22/0xf0 init/main.c:403
start_kernel+0x7f1/0x819 init/main.c:717
x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:239

-> #1 (&p->pi_lock){-.-.}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
try_to_wake_up+0xbc/0x15f0 kernel/sched/core.c:1989
wake_up_process+0x10/0x20 kernel/sched/core.c:2152
__up.isra.0+0x1cc/0x2c0 kernel/locking/semaphore.c:262
up+0x13b/0x1d0 kernel/locking/semaphore.c:187
__up_console_sem+0xb2/0x1a0 kernel/printk/printk.c:242
console_unlock+0x5af/0xfb0 kernel/printk/printk.c:2417
do_con_write+0x106e/0x1f70 drivers/tty/vt/vt.c:2433
con_write+0x25/0xb0 drivers/tty/vt/vt.c:2782
process_output_block drivers/tty/n_tty.c:579 [inline]
n_tty_write+0x5ef/0xec0 drivers/tty/n_tty.c:2308
do_tty_write drivers/tty/tty_io.c:958 [inline]
tty_write+0x3fa/0x840 drivers/tty/tty_io.c:1042
__vfs_write+0xef/0x970 fs/read_write.c:480
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #0 ((console_sem).lock){....}:
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
__down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:225
console_trylock+0x15/0x70 kernel/printk/printk.c:2229
console_trylock_spinning kernel/printk/printk.c:1643 [inline]
vprintk_emit+0x5b5/0xb90 kernel/printk/printk.c:1906
vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
printk+0xaa/0xca kernel/printk/printk.c:1980
__warn_printk+0x90/0xf0 kernel/panic.c:599
debug_print_object+0x166/0x220 lib/debugobjects.c:288
__debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
kfree+0xc7/0x260 mm/slab.c:3800
hfsplus_fill_super+0x114b/0x2030 fs/hfsplus/super.c:609
mount_bdev+0x2b7/0x370 fs/super.c:1119
hfsplus_mount+0x34/0x40 fs/hfsplus/super.c:645
mount_fs+0x66/0x2d0 fs/super.c:1222
vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:2509 [inline]
do_new_mount fs/namespace.c:2512 [inline]
do_mount+0xea4/0x2bb0 fs/namespace.c:2842
SYSC_mount fs/namespace.c:3058 [inline]
SyS_mount+0xab/0x120 fs/namespace.c:3035
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

Chain exists of:
(console_sem).lock --> &rq->lock --> &obj_hash[i].lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&obj_hash[i].lock);
lock(&rq->lock);
lock(&obj_hash[i].lock);
lock((console_sem).lock);

*** DEADLOCK ***

2 locks held by syzkaller048155/5536:
#0: (&type->s_umount_key#35/1){+.+.}, at: [<00000000651258d3>]
alloc_super fs/super.c:211 [inline]
#0: (&type->s_umount_key#35/1){+.+.}, at: [<00000000651258d3>]
sget_userns+0x3a1/0xe40 fs/super.c:502
#1: (&obj_hash[i].lock){-.-.}, at: [<00000000507259e1>]
__debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
#1: (&obj_hash[i].lock){-.-.}, at: [<00000000507259e1>]
debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774

stack backtrace:
CPU: 1 PID: 5536 Comm: syzkaller048155 Not tainted 4.16.0-rc7+ #7
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
check_prev_add kernel/locking/lockdep.c:1863 [inline]
check_prevs_add kernel/locking/lockdep.c:1976 [inline]
validate_chain kernel/locking/lockdep.c:2417 [inline]
__lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
__down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:225
console_trylock+0x15/0x70 kernel/printk/printk.c:2229
console_trylock_spinning kernel/printk/printk.c:1643 [inline]
vprintk_emit+0x5b5/0xb90 kernel/printk/printk.c:1906
vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
printk+0xaa/0xca kernel/printk/printk.c:1980
__warn_printk+0x90/0xf0 kernel/panic.c:599
debug_print_object+0x166/0x220 lib/debugobjects.c:288
__debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
kfree+0xc7/0x260 mm/slab.c:3800
hfsplus_fill_super+0x114b/0x2030 fs/hfsplus/super.c:609
mount_bdev+0x2b7/0x370 fs/super.c:1119
hfsplus_mount+0x34/0x40 fs/hfsplus/super.c:645
mount_fs+0x66/0x2d0 fs/super.c:1222
vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:2509 [inline]
do_new_mount fs/namespace.c:2512 [inline]
do_mount+0xea4/0x2bb0 fs/namespace.c:2842
SYSC_mount fs/namespace.c:3058 [inline]
SyS_mount+0xab/0x120 fs/namespace.c:3035
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x44a51a
RSP: 002b:00007f40bbaadd38 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000200003a8 RCX: 000000000044a51a
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f40bbaadd50
RBP: 00000000006e49c4 R08: 0000000020017900 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000003
R13: 0000000000000004 R14: 00000000006e49c0 R15: 0073756c70736668
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

syzbot

unread,
Mar 31, 2018, 4:47:07 PM3/31/18
to ak...@linux-foundation.org, dhow...@redhat.com, ernesto.mn...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, sl...@dubeyko.com, syzkall...@googlegroups.com

Tetsuo Handa

unread,
May 15, 2018, 6:11:38 AM5/15/18
to syzbot, syzkall...@googlegroups.com, Al Viro, ak...@linux-foundation.org, dhow...@redhat.com, ernesto.mn...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, sl...@dubeyko.com
From ffd64dcf946502e7bb1d23c021ee9a4fc92f9312 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Tue, 15 May 2018 12:23:03 +0900
Subject: [PATCH] hfsplus: stop workqueue when fill_super() failed

syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].
This is because hfsplus_fill_super() forgot to call
cancel_delayed_work_sync().

As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not
fail if queue_delayed_work() was called, and the out_put_hidden_dir label
is the appropriate location to call cancel_delayed_work_sync().

[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9

Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4f2e5f...@syzkaller.appspotmail.com>
Cc: Al Viro <vi...@zeniv.linux.org.uk>
---
fs/hfsplus/super.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 9e690ae..80abba5 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -590,6 +590,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
return 0;

out_put_hidden_dir:
+ cancel_delayed_work_sync(&sbi->sync_work);
iput(sbi->hidden_dir);
out_put_root:
dput(sb->s_root);
--
1.8.3.1


Ernesto A. Fernández

unread,
May 15, 2018, 12:26:27 PM5/15/18
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Al Viro, ak...@linux-foundation.org, dhow...@redhat.com, ernesto.mn...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, sl...@dubeyko.com
I sent this same patch a couple of weeks ago:

https://www.spinics.net/lists/linux-fsdevel/msg125240.html

I should probably have sent it in reply to syzbot to prevent this kind of
duplication of work. Sorry about that.

Andrew Morton

unread,
May 15, 2018, 5:34:09 PM5/15/18
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Al Viro, dhow...@redhat.com, ernesto.mn...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, sl...@dubeyko.com
On Tue, 15 May 2018 19:11:06 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:

> >From ffd64dcf946502e7bb1d23c021ee9a4fc92f9312 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
> Date: Tue, 15 May 2018 12:23:03 +0900
> Subject: [PATCH] hfsplus: stop workqueue when fill_super() failed
>
> syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].
> This is because hfsplus_fill_super() forgot to call
> cancel_delayed_work_sync().
>
> As far as I can see, it is hfsplus_mark_mdb_dirty() from
> hfsplus_new_inode() in hfsplus_fill_super() that calls
> queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not
> fail if queue_delayed_work() was called, and the out_put_hidden_dir label
> is the appropriate location to call cancel_delayed_work_sync().

Yes, I was scratching my head over that - it is quite unobvious
whereabouts in hfsplus_fill_super() that the work actually starts
getting scheduled.

"somewhere after the last goto out_put_root" might be true, for now.
But it isn't at all obvious and it isn't very maintainable.

Perhaps it's simply wrong for hfsplus to be marking things dirty and
performing these complex operations partway through fill_super() before
everything is fully set up.

And I wouldn't be comfortable putting the cancel_work_sync() right at
the end of the hfsplus_fill_super() cleanup because the delayed work
handler might be using things which have already been torn down by that
stage.

So... ugh. A solid shotgun approach would be to put a
cancel_work_sync() immediately after each and every goto target in that
cleanup path, but that's just stupid :(

Nasty. I can't think of anything clever here. I guess we can go with
this patch for now, and if new problems crop up we can look at moving
the cancel_work_sync() down to a later part of the teardown sequence.

Reply all
Reply to author
Forward
0 new messages