INFO: task hung in io_queue_file_removal

9 views
Skip to first unread message

syzbot

unread,
Mar 21, 2020, 1:50:13ā€ÆAM3/21/20
to ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot found the following crash on:

HEAD commit: cd607737 Merge tag '5.6-rc6-smb3-fixes' of git://git.samba..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1730c023e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9f894bd92023de02
dashboard link: https://syzkaller.appspot.com/bug?extid=538d1957ce178382a394
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=108ebbe3e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=139fb973e00000

The bug was bisected to:

commit 05f3fb3c5397524feae2e73ee8e150a9090a7da2
Author: Jens Axboe <ax...@kernel.dk>
Date: Mon Dec 9 18:22:50 2019 +0000

io_uring: avoid ring quiesce for fixed file set unregister and update

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1237ad73e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=1137ad73e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1637ad73e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+538d19...@syzkaller.appspotmail.com
Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")

INFO: task syz-executor975:9880 blocked for more than 143 seconds.
Not tainted 5.6.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor975 D27576 9880 9878 0x80004000
Call Trace:
schedule+0xd0/0x2a0 kernel/sched/core.c:4154
schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
do_wait_for_common kernel/sched/completion.c:83 [inline]
__wait_for_common kernel/sched/completion.c:104 [inline]
wait_for_common kernel/sched/completion.c:115 [inline]
wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
__io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
io_sqe_files_update fs/io_uring.c:5918 [inline]
__io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
__do_sys_io_uring_register fs/io_uring.c:7202 [inline]
__se_sys_io_uring_register fs/io_uring.c:7184 [inline]
__x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440659
Code: Bad RIP value.
RSP: 002b:00007ffc4689a358 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00007ffc4689a360 RCX: 0000000000440659
RDX: 0000000020000300 RSI: 0000000000000006 RDI: 0000000000000003
RBP: 0000000000000005 R08: 0000000000000001 R09: 00007ffc46890031
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401f40
R13: 0000000000401fd0 R14: 0000000000000000 R15: 0000000000000000

Showing all locks held in the system:
1 lock held by khungtaskd/1137:
#0: ffffffff897accc0 (rcu_read_lock){....}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5331
1 lock held by rsyslogd/9761:
#0: ffff8880a8f3ada0 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xe3/0x100 fs/file.c:821
2 locks held by getty/9850:
#0: ffff88809fad3090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc900017bb2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9851:
#0: ffff8880a7b96090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc900017cb2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9852:
#0: ffff88809e41c090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc900017eb2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9853:
#0: ffff888090392090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc900017ab2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9854:
#0: ffff88809fb1b090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc900017db2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9855:
#0: ffff88809a302090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc9000178b2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
2 locks held by getty/9856:
#0: ffff88809d9dc090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
#1: ffffc9000172b2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x21d/0x1b30 drivers/tty/n_tty.c:2156
1 lock held by syz-executor975/9880:
#0: ffff88808f392320 (&ctx->uring_lock){+.+.}, at: __do_sys_io_uring_register fs/io_uring.c:7201 [inline]
#0: ffff88808f392320 (&ctx->uring_lock){+.+.}, at: __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
#0: ffff88808f392320 (&ctx->uring_lock){+.+.}, at: __x64_sys_io_uring_register+0x181/0x560 fs/io_uring.c:7184

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1137 Comm: khungtaskd Not tainted 5.6.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0xa8c/0x1010 kernel/hung_task.c:289
kthread+0x357/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Mar 21, 2020, 8:38:44ā€ÆAM3/21/20
to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk

On Fri, 20 Mar 2020 22:50:12 -0700
Flush work before waiting for completion.

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5823,8 +5823,8 @@ static bool io_queue_file_removal(struct

if (pfile == &pfile_stack) {
percpu_ref_switch_to_atomic(&data->refs, io_atomic_switch);
- wait_for_completion(&done);
flush_work(&data->ref_work);
+ wait_for_completion(&done);
return false;
}

--

And perhaps a tiny cleanup: no deed to wait for completion as
flushing work itself will wait until the work is done.

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5604,9 +5604,7 @@ static void io_ring_file_ref_flush(struc
while ((node = llist_del_all(&data->put_llist)) != NULL) {
llist_for_each_entry_safe(pfile, tmp, node, llist) {
io_ring_file_put(data->ctx, pfile->file);
- if (pfile->done)
- complete(pfile->done);
- else
+ if (!pfile->done)
kfree(pfile);
}
}
@@ -5806,7 +5804,6 @@ static bool io_queue_file_removal(struct
struct file *file)
{
struct io_file_put *pfile, pfile_stack;
- DECLARE_COMPLETION_ONSTACK(done);

/*
* If we fail allocating the struct we need for doing async reomval
@@ -5815,7 +5812,7 @@ static bool io_queue_file_removal(struct
pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
if (!pfile) {
pfile = &pfile_stack;
- pfile->done = &done;
+ pfile->done = pfile;
}

pfile->file = file;
@@ -5824,7 +5821,6 @@ static bool io_queue_file_removal(struct
if (pfile == &pfile_stack) {
percpu_ref_switch_to_atomic(&data->refs, io_atomic_switch);
flush_work(&data->ref_work);
- wait_for_completion(&done);
return false;
}


Jens Axboe

unread,
Mar 21, 2020, 4:03:28ā€ÆPM3/21/20
to Hillf Danton, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Care to send this version as a real patch? Seems kind of pointless to
just do the above change with that in mind. And then at the same time
turn ->done into ->do_file_put or something, and make it a bool.

--
Jens Axboe

Hillf Danton

unread,
Mar 21, 2020, 10:06:22ā€ÆPM3/21/20
to Jens Axboe, Hillf Danton, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk

On Sat, 21 Mar 2020 14:03:24 -0600 Jens Axboe wrote:
>
> On 3/21/20 6:38 AM, Hillf Danton wrote:
> >
> > Flush work before waiting for completion.
> >
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -5823,8 +5823,8 @@ static bool io_queue_file_removal(struct
> >
> > if (pfile == &pfile_stack) {
> > percpu_ref_switch_to_atomic(&data->refs, io_atomic_switch);
> > - wait_for_completion(&done);
> > flush_work(&data->ref_work);
> > + wait_for_completion(&done);
> > return false;
> > }
> >
> > --
> >
> > And perhaps a tiny cleanup: no deed to wait for completion as
> > flushing work itself will wait until the work is done.
>
> Care to send this version as a real patch? Seems kind of pointless to
> just do the above change with that in mind. And then at the same time
> turn ->done into ->do_file_put or something, and make it a bool.

Have trouble making a patch with the ideas in your mind all folded in so
it may be better that you do it this time leaving me a chance to learn
a lesson.

Hillf

Jens Axboe

unread,
Mar 22, 2020, 4:17:39ā€ÆPM3/22/20
to Hillf Danton, syzbot, io-u...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Maybe my explanation wasn't quite clear! What I meant was that since
we're no longer using pfile->done, turn that ->done into a ->needs_kfree
or something, and make that a bool. So basically the same patch as the
one you posted, just making that naming (and type) change as well.

Does that help? Would prefer if you sent a patch, you already did
99% of the hard work.

--
Jens Axboe

Reply all
Reply to author
Forward
0 new messages