[syzbot] WARNING in __percpu_ref_exit (2)

8 views
Skip to first unread message

syzbot

unread,
Mar 15, 2021, 7:58:23 AM3/15/21
to asml.s...@gmail.com, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 75013c6c Merge tag 'perf_urgent_for_v5.12-rc3' of git://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=174df32ad00000
kernel config: https://syzkaller.appspot.com/x/.config?x=844457676c06b88c
dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d6218c...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 53 at lib/percpu-refcount.c:113 __percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113
Modules linked in:
CPU: 1 PID: 53 Comm: kworker/u4:2 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound io_ring_exit_work
RIP: 0010:__percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113
Code: fd 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 61 49 83 7c 24 10 00 74 07 e8 28 42 ac fd <0f> 0b e8 21 42 ac fd 48 89 ef e8 e9 fa da fd 48 89 da 48 b8 00 00
RSP: 0018:ffffc90000f1fb78 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88805c976000 RCX: 0000000000000000
RDX: ffff888011839bc0 RSI: ffffffff83c76be8 RDI: ffff88802b2a9010
RBP: 0000607f46077778 R08: 0000000000000000 R09: ffffffff8fab0967
R10: ffffffff83c76b88 R11: 0000000000000009 R12: ffff88802b2a9000
R13: 0000000000000001 R14: ffff88802b2a9000 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000085a0004 CR3: 000000001896a000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
percpu_ref_exit+0x3b/0x140 lib/percpu-refcount.c:134
io_ring_ctx_free fs/io_uring.c:8419 [inline]
io_ring_exit_work+0x599/0xcf0 fs/io_uring.c:8565
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Pavel Begunkov

unread,
Mar 15, 2021, 8:22:19 AM3/15/21
to syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 15/03/2021 11:58, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 75013c6c Merge tag 'perf_urgent_for_v5.12-rc3' of git://gi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=174df32ad00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=844457676c06b88c
> dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d6218c...@syzkaller.appspotmail.com
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 53 at lib/percpu-refcount.c:113 __percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113

if (percpu_count) {
/* non-NULL confirm_switch indicates switching in progress */
WARN_ON_ONCE(ref->data && ref->data->confirm_switch);
...
}

Points to this warning. Not sure, but not yet included
"io_uring: halt SQO submission on ctx exit" may fix it or at least is
related.
--
Pavel Begunkov

Hillf Danton

unread,
Mar 18, 2021, 4:33:58 AM3/18/21
to Pavel Begunkov, Ming Lei, syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 15 Mar 2021 12:18:20 +0000 Pavel Begunkov wrote:
> On 15/03/2021 11:58, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 75013c6c Merge tag 'perf_urgent_for_v5.12-rc3' of git://gi..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=174df32ad00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=844457676c06b88c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d6218c...@syzkaller.appspotmail.com
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 53 at lib/percpu-refcount.c:113 __percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113
>
> if (percpu_count) {
> /* non-NULL confirm_switch indicates switching in progress */
> WARN_ON_ONCE(ref->data && ref->data->confirm_switch);
> ...
> }
>
> Points to this warning. Not sure, but not yet included
> "io_uring: halt SQO submission on ctx exit" may fix it or at least is
> related.

Seems it does not, nor related, see below.
Thoughts for sync RCU are appreciated if the chance for the race
between rcu and workqueue is not zero on killing io ctx.

CPU0
----
io_ring_ctx_wait_and_kill
percpu_ref_kill(&ctx->refs);
percpu_ref_kill_and_confirm(ref, NULL);
spin_lock_irqsave(&percpu_ref_switch_lock, flags);
__percpu_ref_switch_mode(ref, confirm_switch);
__percpu_ref_switch_to_atomic
ref->data->confirm_switch = confirm_switch ?:
percpu_ref_noop_confirm_switch;
call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);

INIT_WORK(&ctx->exit_work, io_ring_exit_work);
queue_work(system_unbound_wq, &ctx->exit_work);

CPU1
----
io_ring_exit_work
io_ring_ctx_free(ctx);
percpu_ref_exit(&ctx->refs);
__percpu_ref_exit(ref);
WARN_ON_ONCE(ref->data &&
ref->data->confirm_switch);


percpu_ref_switch_to_atomic_rcu
percpu_ref_call_confirm_rcu(rcu);
data->confirm_switch(ref);
data->confirm_switch = NULL;
wake_up_all(&percpu_ref_switch_waitq);

Pavel Begunkov

unread,
Mar 18, 2021, 10:32:10 AM3/18/21
to Hillf Danton, Ming Lei, syzbot, ax...@kernel.dk, io-u...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Would you elaborate? Because your case below doesn't make much
sense.

1) io_ring_ctx_wait_and_kill() indeed kills ctx->refs
2) io_ring_exit_work() waits for a completion signaled by
ctx->refs hitting 0
3) and only then calls io_ring_ctx_free().

And 2) won't complete until the switching is over
--
Pavel Begunkov

syzbot

unread,
Apr 18, 2021, 3:30:13 PM4/18/21
to asml.s...@gmail.com, ax...@kernel.dk, hda...@sina.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: c98ff1d0 Merge tag 'scsi-fixes' of git://git.kernel.org/pu..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=163d7229d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=1c70e618af4c2e92
dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=145cb2b6d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=157b72b1d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d6218c...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 169 at lib/percpu-refcount.c:113 __percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113
Modules linked in:
CPU: 1 PID: 169 Comm: kworker/u4:3 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound io_ring_exit_work
RIP: 0010:__percpu_ref_exit+0x98/0x100 lib/percpu-refcount.c:113
Code: fd 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 61 49 83 7c 24 10 00 74 07 e8 a8 4a ab fd <0f> 0b e8 a1 4a ab fd 48 89 ef e8 69 f0 d9 fd 48 89 da 48 b8 00 00
RSP: 0018:ffffc90001077b48 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802d5ca000 RCX: 0000000000000000
RDX: ffff88801217a1c0 RSI: ffffffff83c7db28 RDI: ffff88801d58f010
RBP: 0000607f4607bcb8 R08: 0000000000000000 R09: ffffffff8fa9f977
R10: ffffffff83c7dac8 R11: 0000000000000009 R12: ffff88801d58f000
R13: 000000010002865e R14: ffff88801d58f000 R15: ffff88802d5ca8b0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000044 CR3: 0000000015c02000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
percpu_ref_exit+0x3b/0x140 lib/percpu-refcount.c:134
io_ring_ctx_free fs/io_uring.c:8483 [inline]
io_ring_exit_work+0xa64/0x12d0 fs/io_uring.c:8620

Pavel Begunkov

unread,
Apr 19, 2021, 8:07:30 AM4/19/21
to syzbot, ax...@kernel.dk, hda...@sina.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, syzkall...@googlegroups.com
#syz test: git://git.kernel.dk/linux-block for-5.13/io_uring

--
Pavel Begunkov

syzbot

unread,
Apr 19, 2021, 11:02:06 AM4/19/21
to asml.s...@gmail.com, ax...@kernel.dk, hda...@sina.com, io-u...@vger.kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+d6218c...@syzkaller.appspotmail.com

Tested on:

commit: 75c4021a io_uring: check register restriction afore quiesce
git tree: git://git.kernel.dk/linux-block for-5.13/io_uring
kernel config: https://syzkaller.appspot.com/x/.config?x=1dfd9a1e63100694
dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
compiler:

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Sep 13, 2021, 5:22:19 AM9/13/21
to asml.s...@gmail.com, ax...@kernel.dk, core...@netfilter.org, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, hda...@sina.com, io-u...@vger.kernel.org, kad...@netfilter.org, ku...@kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, yosh...@linux-ipv6.org
syzbot suspects this issue was fixed by commit:

commit 43016d02cf6e46edfc4696452251d34bba0c0435
Author: Florian Westphal <f...@strlen.de>
Date: Mon May 3 11:51:15 2021 +0000

netfilter: arptables: use pernet ops struct during unregister

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10acd273300000
start commit: c98ff1d013d2 Merge tag 'scsi-fixes' of git://git.kernel.or..
git tree: upstream
If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: netfilter: arptables: use pernet ops struct during unregister

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Dmitry Vyukov

unread,
Sep 16, 2021, 4:00:00 AM9/16/21
to syzbot, asml.s...@gmail.com, ax...@kernel.dk, core...@netfilter.org, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, hda...@sina.com, io-u...@vger.kernel.org, kad...@netfilter.org, ku...@kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, yosh...@linux-ipv6.org
On Mon, 13 Sept 2021 at 11:22, syzbot
I guess this is a wrong commit and it was fixed by something in io_uring.
Searching for refcount fixes I see
a298232ee6b9a1d5d732aa497ff8be0d45b5bd82 "io_uring: fix link timeout
refs".
Pavel, does it look right to you?

Pavel Begunkov

unread,
Sep 16, 2021, 9:18:24 AM9/16/21
to Dmitry Vyukov, syzbot, ax...@kernel.dk, core...@netfilter.org, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, hda...@sina.com, io-u...@vger.kernel.org, kad...@netfilter.org, ku...@kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, yosh...@linux-ipv6.org
I don't remember to be honest, if the dates fit, it can pretty well be it.
Let's test one thing to be sure it hasn't been shut just by coincidence.

#syz test: https://github.com/isilence/linux.git syz_test_quiesce_files


--
Pavel Begunkov

syzbot

unread,
Sep 16, 2021, 10:01:13 AM9/16/21
to asml.s...@gmail.com, ax...@kernel.dk, core...@netfilter.org, da...@davemloft.net, dsa...@kernel.org, dvy...@google.com, f...@strlen.de, hda...@sina.com, io-u...@vger.kernel.org, kad...@netfilter.org, ku...@kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, yosh...@linux-ipv6.org
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+d6218c...@syzkaller.appspotmail.com

Tested on:

commit: 5318e5b9 io_uring: quiesce files reg
git tree: https://github.com/isilence/linux.git syz_test_quiesce_files
kernel config: https://syzkaller.appspot.com/x/.config?x=f7d9f99709463d21
dashboard link: https://syzkaller.appspot.com/bug?extid=d6218cb2fae0b2411e9d
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Dmitry Vyukov

unread,
Sep 20, 2021, 4:15:24 AM9/20/21
to syzbot, asml.s...@gmail.com, ax...@kernel.dk, core...@netfilter.org, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, hda...@sina.com, io-u...@vger.kernel.org, kad...@netfilter.org, ku...@kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, yosh...@linux-ipv6.org
On Thu, 16 Sept 2021 at 16:01, syzbot
OK, since it's not failing, I assume we can say:

#syz fix: io_uring: fix link timeout refs

(and it's better to close it with a wrong fix, then to keep it open
forever anyway)
Reply all
Reply to author
Forward
0 new messages