[syzbot] WARNING: refcount bug in __linkwatch_run_queue

9 views
Skip to first unread message

syzbot

unread,
Nov 16, 2021, 4:23:20 AM11/16/21
to da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, w...@1wt.eu
Hello,

syzbot found the following issue on:

HEAD commit: fa55b7dcdc43 Linux 5.16-rc1
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=109339e1b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=6c3ab72998e7f1a4
dashboard link: https://syzkaller.appspot.com/bug?extid=6f8ddb9f2ff4adf065cb
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+6f8ddb...@syzkaller.appspotmail.com

------------[ cut here ]------------
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 1 PID: 26140 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Modules linked in:
CPU: 1 PID: 26140 Comm: kworker/1:26 Not tainted 5.16.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events linkwatch_event
RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Code: f4 8a 31 c0 e8 f5 45 38 fd 0f 0b e9 64 ff ff ff e8 49 6a 6e fd c6 05 06 40 f5 09 01 48 c7 c7 80 fe f4 8a 31 c0 e8 d4 45 38 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff
RSP: 0018:ffffc90003537b68 EFLAGS: 00010246
RAX: 26aa35cdf4075e00 RBX: 0000000000000004 RCX: ffff88803baaba00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000004 R08: ffffffff8169fb82 R09: ffffed10173667b1
R10: ffffed10173667b1 R11: 0000000000000000 R12: ffff88807ccac000
R13: 1ffff1100f9958b7 R14: 1ffff1100f9958b8 R15: ffff88807ccac5b8
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f217b0e6740 CR3: 0000000019e96000 CR4: 00000000003526e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000000ff46
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__linkwatch_run_queue+0x4f5/0x800 net/core/link_watch.c:213
linkwatch_event+0x48/0x50 net/core/link_watch.c:252
process_one_work+0x853/0x1140 kernel/workqueue.c:2298
worker_thread+0xac1/0x1320 kernel/workqueue.c:2445
kthread+0x468/0x490 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Willy Tarreau

unread,
Nov 17, 2021, 3:19:14 AM11/17/21
to syzbot, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Thanks for the report. I'm seeing that linkwatch_do_dev() is also
called in linkwatch_forget_dev(), and am wondering if we're not
seeing a sequence like this one:

linkwatch_forget_dev()
list_del_init()
linkwatch_do_dev()
netdev_state_change()
... one of the notifiers
... linkwatch_add_event() => adds to watch list
dev_put()
...

__linkwatch_run_queue()
linkwatch_do_dev()
dev_put()
=> bang!

Well, in theory, no, since linkwatch_add_event() will call dev_hold()
when adding to the list, so we ought to leave the first call with a
refcount still covering the list's presence, and I don't see how it
can reach zero before reaching dev_put() in linkwatch_do_dev() as this
function is only called when the event was picked from the list.

The only difference I'm seeing is that before the patch, a call to
linkwatch_forget_dev() on a non-present device would call dev_put()
without going through dev_activate(), dev_deactivate(), nor
netdev_state_change(), but I'm not seeing how that could make a
difference. linkwatch_forget_dev() is called from netdev_wait_allrefs()
which will wait for the refcnt to be exactly 1, thus even if we queue
an extra event we cant leave that function until the event has been
processed.

Thus for now I'm puzzled :-/

Willy

Jakub Kicinski

unread,
Nov 17, 2021, 9:15:50 AM11/17/21
to Willy Tarreau, syzbot, da...@davemloft.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
The ref leak could come from anywhere, tho. Like:

https://lore.kernel.org/all/87a6i3t...@nvidia.com/

Willy Tarreau

unread,
Nov 17, 2021, 9:20:17 AM11/17/21
to Jakub Kicinski, syzbot, da...@davemloft.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
OK thanks for the link, so better wait for this part to clarify itself
and see if the issue magically disappears ?

Willy

Jakub Kicinski

unread,
Nov 17, 2021, 9:22:03 AM11/17/21
to Willy Tarreau, syzbot, da...@davemloft.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, 17 Nov 2021 15:20:12 +0100 Willy Tarreau wrote:
> > The ref leak could come from anywhere, tho. Like:
> >
> > https://lore.kernel.org/all/87a6i3t...@nvidia.com/
>
> OK thanks for the link, so better wait for this part to clarify itself
> and see if the issue magically disappears ?

Yeah, that's my thinking.

syzbot

unread,
Mar 21, 2022, 10:36:22 PM3/21/22
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages