WARNING in ib_umad_kill_port

11 views
Skip to first unread message

syzbot

unread,
Apr 6, 2020, 2:37:16ā€ÆAM4/6/20
to gre...@linuxfoundation.org, linux-...@vger.kernel.org, net...@vger.kernel.org, raf...@kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9627a9...@syzkaller.appspotmail.com

------------[ cut here ]------------
sysfs group 'power' not found for kobject 'umad1'
WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound ib_unregister_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x35 kernel/panic.c:582
report_bug+0x27b/0x2f0 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:175 [inline]
fixup_bug arch/x86/kernel/traps.c:170 [inline]
do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline]
RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff
RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e
RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1
R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070
R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000
dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794
device_del+0x18b/0xd30 drivers/base/core.c:2687
cdev_device_del+0x15/0x80 fs/char_dev.c:570
ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327
ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409
remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724
disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270
__ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437
ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547
process_one_work+0x965/0x16a0 kernel/workqueue.c:2266
worker_thread+0x96/0xe20 kernel/workqueue.c:2412
kthread+0x388/0x470 kernel/kthread.c:268
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Leon Romanovsky

unread,
Apr 6, 2020, 1:21:57ā€ÆPM4/6/20
to syzbot, RDMA mailing list, gre...@linuxfoundation.org, linux-...@vger.kernel.org, net...@vger.kernel.org, raf...@kernel.org, syzkall...@googlegroups.com
+ RDMA

Jason Gunthorpe

unread,
Apr 6, 2020, 1:44:43ā€ÆPM4/6/20
to Leon Romanovsky, syzbot, RDMA mailing list, gre...@linuxfoundation.org, linux-...@vger.kernel.org, net...@vger.kernel.org, raf...@kernel.org, syzkall...@googlegroups.com
On Mon, Apr 06, 2020 at 08:21:51PM +0300, Leon Romanovsky wrote:
> + RDMA
>
> On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin..
> > git tree: net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+9627a9...@syzkaller.appspotmail.com
> >
I'm not sure what could be done wrong here to elicit this:

sysfs group 'power' not found for kobject 'umad1'

??

I've seen another similar sysfs related trigger that we couldn't
figure out.

Hard to investigate without a reproducer.

Jason

Dmitry Vyukov

unread,
Apr 7, 2020, 5:56:44ā€ÆAM4/7/20
to Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs
Based on all of the sysfs-related bugs I've seen, my bet would be on
some races. E.g. one thread registers devices, while another
unregisters these.

Jason Gunthorpe

unread,
Apr 7, 2020, 7:55:50ā€ÆAM4/7/20
to Dmitry Vyukov, Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs
On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > I'm not sure what could be done wrong here to elicit this:
> >
> > sysfs group 'power' not found for kobject 'umad1'
> >
> > ??
> >
> > I've seen another similar sysfs related trigger that we couldn't
> > figure out.
> >
> > Hard to investigate without a reproducer.
>
> Based on all of the sysfs-related bugs I've seen, my bet would be on
> some races. E.g. one thread registers devices, while another
> unregisters these.

I did check that the naming is ordered right, at least we won't be
concurrently creating and destroying umadX sysfs of the same names.

I'm also fairly sure we can't be destroying the parent at the same
time as this child.

Do you see the above commonly? Could it be some driver core thing? Or
is it more likely something wrong in umad?

Jason

Dmitry Vyukov

unread,
Apr 7, 2020, 8:40:00ā€ÆAM4/7/20
to Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs
Mmmm... I can't say, I am looking at some bugs very briefly. I've
noticed that sysfs comes up periodically (or was it some other similar
fs?). General observation is that code frequently assumes only the
happy scenario and only, say, a single administrator doing one thing
at a time, slowly and carefully, and it is not really hardened against
armies of monkeys.
But I did not look at code abstractions, bug patterns, contracts, etc.

Greg KH may know better. Greg, as far as I remember you commented on
some of these reports along the lines of, for example, "the warning is
in sysfs code, but the bug is in the callers".

Greg Kroah-Hartman

unread,
Apr 7, 2020, 10:33:09ā€ÆAM4/7/20
to Dmitry Vyukov, Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list, LKML, netdev, Rafael Wysocki, syzkaller-bugs
Yes, that is correct.

Jason Gunthorpe

unread,
Apr 7, 2020, 10:35:32ā€ÆAM4/7/20
to Dmitry Vyukov, Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs
On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <j...@ziepe.ca> wrote:
> >
> > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > I'm not sure what could be done wrong here to elicit this:
> > > >
> > > > sysfs group 'power' not found for kobject 'umad1'
> > > >
> > > > ??
> > > >
> > > > I've seen another similar sysfs related trigger that we couldn't
> > > > figure out.
> > > >
> > > > Hard to investigate without a reproducer.
> > >
> > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > some races. E.g. one thread registers devices, while another
> > > unregisters these.
> >
> > I did check that the naming is ordered right, at least we won't be
> > concurrently creating and destroying umadX sysfs of the same names.
> >
> > I'm also fairly sure we can't be destroying the parent at the same
> > time as this child.
> >
> > Do you see the above commonly? Could it be some driver core thing? Or
> > is it more likely something wrong in umad?
>
> Mmmm... I can't say, I am looking at some bugs very briefly. I've
> noticed that sysfs comes up periodically (or was it some other similar
> fs?).

Hmm..

Looking at the git history I see several cases where there are
ordering problems. I wonder if the rdma parent device is being
destroyed before the rdma devices complete destruction?

I see the syzkaller is creating a bunch of virtual net devices, and I
assume it has created a software rdma device on one of these virtual
devices.

So I'm guessing that it is also destroying a parent? But I can't guess
which.. Some simple tests with veth suggest it is OK because the
parent is virtual. But maybe bond or bridge or something?

The issue in rdma is that unregistering a netdev triggers an async
destruction of the RDMA devices. This has to be async because the
netdev notification is delivered with RTNL held, and a rdma device
cannot be destroyed while holding RTNL.

So there is a race, I suppose, where the netdev can complete
destruction while rdma continues, and if someone deletes the sysfs
holding the netdev before rdma completes, I'm going to guess, that we
hit this warning?

Could it be? I would love to know what netdev the rdma device was
created on, but it doesn't seem to show in the trace :\

This theory could be made more likely by adding a sleep to
ib_unregister_work() to increase the race window - is there some way
to get syzkaller to search for a reproducer with that patch?

Jason

Dmitry Vyukov

unread,
Apr 9, 2020, 9:35:15ā€ÆAM4/9/20
to Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs
Bad it happened in kthread context. Otherwise it's usually possible to
pinpoint the test based on process name.

syz-repro utility will do reproduction process with a any kernel you give it:
https://github.com/google/syzkaller/blob/master/docs/reproducing_crashes.md

Or it's possible to run individual programs, or whole log with
syz-execprog utility:
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md

Or maybe you could pinpoint the guilty test program by hand in the log
(it's probably somewhere closer to the end):
https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000

syzbot

unread,
Apr 14, 2021, 10:29:12ā€ÆPM4/14/21
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages