WARNING in percpu_ref_kill_and_confirm

21 views
Skip to first unread message

syzbot

unread,
Apr 22, 2019, 12:06:06 PM4/22/19
to ar...@arndb.de, ax...@kernel.dk, b...@alien8.de, darric...@oracle.com, gre...@linuxfoundation.org, h...@zytor.com, linu...@vger.kernel.org, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, lu...@kernel.org, mathieu....@efficios.com, mi...@redhat.com, m...@ellerman.id.au, syzkall...@googlegroups.com, tg...@linutronix.de, torv...@linux-foundation.org, vi...@zeniv.linux.org.uk, x...@kernel.org
Hello,

syzbot found the following crash on:

HEAD commit: 9e5de623 Merge tag 'nfs-for-5.1-5' of git://git.linux-nfs...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15624257200000
kernel config: https://syzkaller.appspot.com/x/.config?x=856fc6d0fbbeede9
dashboard link: https://syzkaller.appspot.com/bug?extid=10d25e23199614b7721f
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
userspace arch: i386
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17ff39f3200000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15758647200000

The bug was bisected to:

commit 38e7571c07be01f9f19b355a9306a4e3d5cb0f5b
Author: Linus Torvalds <torv...@linux-foundation.org>
Date: Fri Mar 8 22:48:40 2019 +0000

Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1736bc57200000
final crash: https://syzkaller.appspot.com/x/report.txt?x=14b6bc57200000
console output: https://syzkaller.appspot.com/x/log.txt?x=10b6bc57200000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+10d25e...@syzkaller.appspotmail.com
Fixes: 38e7571c07be ("Merge tag 'io_uring-2019-03-06' of
git://git.kernel.dk/linux-block")

------------[ cut here ]------------
percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
WARNING: CPU: 1 PID: 7815 at lib/percpu-refcount.c:335
percpu_ref_kill_and_confirm+0x341/0x3b0 lib/percpu-refcount.c:335
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 7815 Comm: syz-executor269 Not tainted 5.1.0-rc5+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x45 kernel/panic.c:571
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:percpu_ref_kill_and_confirm+0x341/0x3b0 lib/percpu-refcount.c:335
Code: 42 e0 2a 06 01 48 89 fa 48 c1 ea 03 80 3c 02 00 75 76 49 8b 54 24 10
48 c7 c6 a0 71 a1 87 48 c7 c7 40 71 a1 87 e8 ad 92 13 fe <0f> 0b 48 b8 00
00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02
RSP: 0018:ffff8880a96cfce0 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000607f5142e35b RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815afcf6 RDI: ffffed10152d9f8e
RBP: ffff8880a96cfd10 R08: ffff8880a85c40c0 R09: fffffbfff1133639
R10: fffffbfff1133638 R11: ffffffff8899b1c3 R12: ffff88809ee571c0
R13: ffff88809ee571c8 R14: 0000000000000286 R15: 0000000000000000
percpu_ref_kill include/linux/percpu-refcount.h:128 [inline]
__io_uring_register+0xa7/0x1fe0 fs/io_uring.c:2937
__do_sys_io_uring_register fs/io_uring.c:2998 [inline]
__se_sys_io_uring_register fs/io_uring.c:2980 [inline]
__ia32_sys_io_uring_register+0x193/0x1f0 fs/io_uring.c:2980
do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
do_fast_syscall_32+0x281/0xc98 arch/x86/entry/common.c:397
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f16869
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000f7ef11ec EFLAGS: 00000296 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Jens Axboe

unread,
Apr 22, 2019, 12:23:05 PM4/22/19
to syzbot, ar...@arndb.de, b...@alien8.de, darric...@oracle.com, gre...@linuxfoundation.org, h...@zytor.com, linu...@vger.kernel.org, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, lu...@kernel.org, mathieu....@efficios.com, mi...@redhat.com, m...@ellerman.id.au, syzkall...@googlegroups.com, tg...@linutronix.de, torv...@linux-foundation.org, vi...@zeniv.linux.org.uk, x...@kernel.org
I think the below should fix this. Very early versions of io_uring didn't
have this issue, since we did the percpu ref tryget for io_uring_register().
But I think we'll be just fine just checking if the ref is already dying
once inside the mutex. If it is, it's either going away, or someone else
is already doing io_uring_register() on it.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index f65f85d89217..a2f39faed6a7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2934,6 +2934,14 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
{
int ret;

+ /*
+ * We're inside the ring mutex, if the ref is already dying, then
+ * someone else killed the ctx or is already going through
+ * io_uring_register().
+ */
+ if (percpu_ref_is_dying(&ctx->refs))
+ return -ENXIO;
+
percpu_ref_kill(&ctx->refs);

/*

--
Jens Axboe

Linus Torvalds

unread,
Apr 22, 2019, 12:24:13 PM4/22/19
to syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
On Mon, Apr 22, 2019 at 9:06 AM syzbot
<syzbot+10d25e...@syzkaller.appspotmail.com> wrote:
>
>
> The bug was bisected to:
>
> commit 38e7571c07be01f9f19b355a9306a4e3d5cb0f5b
> Author: Linus Torvalds <torv...@linux-foundation.org>
> Date: Fri Mar 8 22:48:40 2019 +0000
>
> Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block
>
> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!

So I don't see how that happens in the original code (because
__io_uring_register() is called with the uring_lock held), but let's
see.

HOWEVER.

I do see how it happens now as of the latest kernel as of commit
b19062a56726 ("io_uring: fix possible deadlock between
io_uring_{enter,register}") where the code explicitly drops the mutex
in order to wait for other uring users to finish.

So Jens, I think that commit was buggy. I suspect that
io_uring_register() should perhaps do something like

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2934,7 +2934,10 @@ static int __io_uring_register(struct
io_ring_ctx *ctx, unsigned opcode,
{
int ret;

+ if (!percpu_ref_tryget(&ctx->refs))
+ return -EBUSY;
percpu_ref_kill(&ctx->refs);
+ percpu_ref_put(&ctx->refs);

/*
* Drop uring mutex before waiting for references to exit. If another

to guarantee that it's the *only* case of io_uring_register() doing that kill.

Hmm?

Linus

Linus Torvalds

unread,
Apr 22, 2019, 12:28:18 PM4/22/19
to Jens Axboe, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
[ Crossed emails ]

On Mon, Apr 22, 2019 at 9:23 AM Jens Axboe <ax...@kernel.dk> wrote:
>
> I think the below should fix this. Very early versions of io_uring didn't
> have this issue, since we did the percpu ref tryget for io_uring_register().

Ok, so I like your patch better than mine, but note how syzbot
bisected this to the initial merge of the io_uring code.

I agree that code shouldn't have had this particular issue, but it
looks like it does.

Is there some way to race with io_ring_ctx_wait_and_kill(), which
_also_ does that ref_kill() thing? I'm not seeing how that could
happen, but maybe if the file ref counts get screwed up you have
->release() called early..

Linus

Jens Axboe

unread,
Apr 22, 2019, 12:28:35 PM4/22/19
to Linus Torvalds, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
Just sent out something as well. I think we can get by with just
checking if it's dying, or we can go the route of what you did which is
actually very similar to what the earlier versions did. Both versions
should fix the issue.

I'll test just to be totally sure.

--
Jens Axboe

Jens Axboe

unread,
Apr 22, 2019, 12:32:35 PM4/22/19
to Linus Torvalds, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
On 4/22/19 10:27 AM, Linus Torvalds wrote:
> [ Crossed emails ]
>
> On Mon, Apr 22, 2019 at 9:23 AM Jens Axboe <ax...@kernel.dk> wrote:
>>
>> I think the below should fix this. Very early versions of io_uring didn't
>> have this issue, since we did the percpu ref tryget for io_uring_register().
>
> Ok, so I like your patch better than mine, but note how syzbot
> bisected this to the initial merge of the io_uring code.

Yes, I did think about that too...

> I agree that code shouldn't have had this particular issue, but it
> looks like it does.
>
> Is there some way to race with io_ring_ctx_wait_and_kill(), which
> _also_ does that ref_kill() thing? I'm not seeing how that could
> happen, but maybe if the file ref counts get screwed up you have
> ->release() called early..

I just tried on the current code and it triggers easily, but that's
with that mutex patch in there. I agree it should not trigger before
that, unless something is wonky. I'll try and play around with it a bit
and see what is going on (or if I can trigger it at all with the mutex
change reverted).

--
Jens Axboe

Jens Axboe

unread,
Apr 22, 2019, 12:38:35 PM4/22/19
to Linus Torvalds, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
With the mutex change in, I can trigger it in a second or so. Just ran
the reproducer with that change reverted, and I'm not seeing any badness.
So I do wonder if the bisect results are accurate?

I think the dying check should cover it, and then marked with fixing
that mutex commit.

--
Jens Axboe

Linus Torvalds

unread,
Apr 22, 2019, 12:48:28 PM4/22/19
to Jens Axboe, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
On Mon, Apr 22, 2019 at 9:38 AM Jens Axboe <ax...@kernel.dk> wrote:
>
> With the mutex change in, I can trigger it in a second or so. Just ran
> the reproducer with that change reverted, and I'm not seeing any badness.
> So I do wonder if the bisect results are accurate?

Looking at the syzbot report, it's syzbot being confused.

The actual WARNING in percpu_ref_kill_and_confirm() only happens with
recent kernels.

But then syzbot mixes it up with a completely different bug:

crash: BUG: MAX_STACK_TRACE_ENTRIES too low!
BUG: MAX_STACK_TRACE_ENTRIES too low!

and for some reason decides that *that* bug is the same thing entirely.

So yeah, I think the simple percpu_ref_is_dying() check is sufficient,
and that the syzbot bisection is completely bogus.

Linus

Jens Axboe

unread,
Apr 22, 2019, 12:50:59 PM4/22/19
to Linus Torvalds, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers
Ah good, that makes me feel better. I'll queue the fix up, thanks.

--
Jens Axboe

Dmitry Vyukov

unread,
Apr 23, 2019, 10:41:52 AM4/23/19
to Linus Torvalds, Jens Axboe, syzbot, Arnd Bergmann, Borislav Petkov, Darrick J. Wong, Greg Kroah-Hartman, Peter Anvin, Linux API, linux-arch, linux-block, linux-fsdevel, Linux List Kernel Mailing, Andrew Lutomirski, Mathieu Desnoyers, Ingo Molnar, Michael Ellerman, syzkaller-bugs, Thomas Gleixner, Al Viro, the arch/x86 maintainers, syzkaller
Using crashed/not-crashed predicate gives better results overall. More
than half kernel bugs have different manifestations due to different
reasons. And even if we can say for sure that we see a different bug,
we still don't know if the original bug is also there or not. See the
following threads for details:
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/y6gUEsvAAgAJ
https://groups.google.com/d/msg/syzkaller/sR8aAXaWEF4/tTWYRgvmAwAJ

Unrelated crashes is the most common cause of incorrect bisection
results (66%). To enable better bisection we would need to integrate
some meaningful precommit testing into kernel development process
(would be tremendously useful for other reasons too). E.g. this "BUG:
MAX_STACK_TRACE_ENTRIES too low!" is this:
https://syzkaller.appspot.com/bug?id=dbd70f0407487a061d2d46fdc6bccc94b95ce3c0
and the reproducer is simply opening /dev/infiniband/rdma_cm or
/dev/vhci or something equally simple with LOCKDEP enabled. None of
this was done in a testing environment for several weeks. And then it
took another month to propagate the fix through all distributed kernel
trees. For all that time simple programs crash and bisection can't be
done and we are spending time here...
Reply all
Reply to author
Forward
0 new messages