[syzbot] WARNING: refcount bug in nldev_newlink

12 views
Skip to first unread message

syzbot

unread,
Dec 7, 2022, 3:51:41 PM12/7/22
to j...@ziepe.ca, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 591cd61541b9 Add linux-next specific files for 20221207
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=11aeafad880000
kernel config: https://syzkaller.appspot.com/x/.config?x=8b2d3e63e054c24f
dashboard link: https://syzkaller.appspot.com/bug?extid=3fd8326d9a0812d19218
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=112536fb880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16aa2e6d880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/bc862c01ec56/disk-591cd615.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/8f9b93f8ed2f/vmlinux-591cd615.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9d5cb636d548/bzImage-591cd615.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3fd832...@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 5156 at lib/refcount.c:31 refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 5156 Comm: syz-executor773 Not tainted 6.1.0-rc8-next-20221207-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31
Code: 05 5a 60 51 0a 01 e8 35 0a b5 05 0f 0b e9 d3 fe ff ff e8 6c 9b 75 fd 48 c7 c7 c0 6d a6 8a c6 05 37 60 51 0a 01 e8 16 0a b5 05 <0f> 0b e9 b4 fe ff ff 48 89 ef e8 5a b5 c3 fd e9 5c fe ff ff 0f 1f
RSP: 0018:ffffc90003ebf0d8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88802bfcba80 RSI: ffffffff8166b1dc RDI: fffff520007d7e0d
RBP: ffff888070296600 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000000 R12: 1ffff920007d7e20
R13: 0000000000000000 R14: ffff888070296600 R15: ffffc90003ebf608
FS: 000055555600f300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffed185b004 CR3: 00000000265db000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
ref_tracker_free+0x539/0x6b0 lib/ref_tracker.c:118
netdev_tracker_free include/linux/netdevice.h:4039 [inline]
netdev_put include/linux/netdevice.h:4056 [inline]
dev_put include/linux/netdevice.h:4082 [inline]
nldev_newlink+0x360/0x5d0 drivers/infiniband/core/nldev.c:1733
rdma_nl_rcv_msg+0x371/0x6a0 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0+0x2fc/0x440 drivers/infiniband/core/netlink.c:239
netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fd5bc473699
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffed185aff8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd5bc473699
RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 00007ffed185aa70 R11: 0000000000000246 R12: 00007ffed185b010
R13: 00000000000f4240 R14: 0000000000011fc1 R15: 00007ffed185b004
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Leon Romanovsky

unread,
Dec 8, 2022, 4:14:45 AM12/8/22
to syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
Jason, what do you think?

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index a981ac2f0975..982938c1dae3 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1730,7 +1730,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
#endif
err = ops ? ops->newlink(ibdev_name, ndev) : -EINVAL;
up_read(&link_ops_rwsem);
- dev_put(ndev);
+ if (err)
+ dev_put(ndev);

return err;
}

Guoqing Jiang

unread,
Dec 8, 2022, 6:42:26 AM12/8/22
to Leon Romanovsky, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
Hi,
I guess the dev_put is paired with dev_hold in dev_get_by_name,
maybe it should be protected by ink_ops_rwsem, otherwise
siw_exit_module could call  ib_unregister_driver -> free_netdevs
after rdma_link_unregister (which needs to hold link_ops_rwsem),
then seems it is possible that ndev is freed before nldev_newlink
calls dev_put.

diff --git a/drivers/infiniband/core/nldev.c
b/drivers/infiniband/core/nldev.c
index 12dc97067ed2..f49bc8ee46da 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1715,8 +1715,8 @@ static int nldev_newlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
        }
 #endif
        err = ops ? ops->newlink(ibdev_name, ndev) : -EINVAL;
-       up_read(&link_ops_rwsem);
        dev_put(ndev);
+       up_read(&link_ops_rwsem);

        return err;
 }

Thanks,
Guoqing

Leon Romanovsky

unread,
Dec 8, 2022, 6:54:21 AM12/8/22
to Guoqing Jiang, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
Yes, my concern is that call to ->delink() calls to dev_put
unconditionally in free_netdevs() and I didn't see who called
to dev_hold() except dev_get_by_name.

Thanks

Guoqing Jiang

unread,
Dec 8, 2022, 7:48:37 PM12/8/22
to Leon Romanovsky, syzbot, j...@ziepe.ca, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
My understanding is ops->newlink (either rxe_newlink or siw_newlink)
triggers dev_hold eventually which paired with dev_put in free_netdevs.

Thanks,
Guoqing

Jason Gunthorpe

unread,
Dec 9, 2022, 8:01:17 AM12/9/22
to Leon Romanovsky, syzbot, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
No, the key to this report is that the refcount dec is inside the tracker:

> > __refcount_dec include/linux/refcount.h:344 [inline]
> > refcount_dec include/linux/refcount.h:359 [inline]
> > ref_tracker_free+0x539/0x6b0 lib/ref_tracker.c:118
> > netdev_tracker_free include/linux/netdevice.h:4039 [inline]

Which is not underflowing the refcount on the dev, it is actually
trying to say the tracker has become unbalanced.

Eg this put is not matched with a hold that specified the tracker.

Probably this:

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index ff35cebb25e265..115b77c5e9a146 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2192,6 +2192,7 @@ static void free_netdevs(struct ib_device *ib_dev)
if (ndev) {
spin_lock(&ndev_hash_lock);
hash_del_rcu(&pdata->ndev_hash_link);
+ netdev_tracker_free(ndev, &pdata->netdev_tracker);
spin_unlock(&ndev_hash_lock);

/*
@@ -2201,7 +2202,7 @@ static void free_netdevs(struct ib_device *ib_dev)
* comparisons after the put
*/
rcu_assign_pointer(pdata->netdev, NULL);
- dev_put(ndev);
+ __dev_put(ndev);
}
spin_unlock_irqrestore(&pdata->netdev_lock, flags);
}

Hillf Danton

unread,
Dec 9, 2022, 8:42:44 AM12/9/22
to Jason Gunthorpe, Leon Romanovsky, syzbot, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
On 9 Dec 2022 09:01:14 -0400 Jason Gunthorpe <j...@ziepe.ca>
Wonder why this makes sense given rcu_assign_pointer(pdata->netdev, NULL)
under pdata->netdev_lock.

Jason Gunthorpe

unread,
Dec 9, 2022, 9:06:58 AM12/9/22
to Hillf Danton, Leon Romanovsky, syzbot, linux-...@vger.kernel.org, linux...@vger.kernel.org, mark...@nvidia.com, ohar...@nvidia.com, syzkall...@googlegroups.com
Oh, yah, that is right, so we can just do the natural thing:

rcu_assign_pointer(pdata->netdev, NULL);
- dev_put(ndev);
+ netdev_put(ndev, &pdata->netdev_tracker);


Jason

Hillf Danton

unread,
Dec 10, 2022, 8:43:58 PM12/10/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 07 Dec 2022 12:51:39 -0800
> syzbot found the following issue on:
>
> HEAD commit: 591cd61541b9 Add linux-next specific files for 20221207
> git tree: linux-next
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16aa2e6d880000

Collapse add_ndev_hash() in ib_device_set_netdev() to match the locking sequence
in free_netdevs() in a bid to add debug info.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 591cd61541b9

--- x/drivers/infiniband/core/device.c
+++ y/drivers/infiniband/core/device.c
@@ -2093,29 +2093,6 @@ int ib_query_port(struct ib_device *devi
}
EXPORT_SYMBOL(ib_query_port);

-static void add_ndev_hash(struct ib_port_data *pdata)
-{
- unsigned long flags;
-
- might_sleep();
-
- spin_lock_irqsave(&ndev_hash_lock, flags);
- if (hash_hashed(&pdata->ndev_hash_link)) {
- hash_del_rcu(&pdata->ndev_hash_link);
- spin_unlock_irqrestore(&ndev_hash_lock, flags);
- /*
- * We cannot do hash_add_rcu after a hash_del_rcu until the
- * grace period
- */
- synchronize_rcu();
- spin_lock_irqsave(&ndev_hash_lock, flags);
- }
- if (pdata->netdev)
- hash_add_rcu(ndev_hash, &pdata->ndev_hash_link,
- (uintptr_t)pdata->netdev);
- spin_unlock_irqrestore(&ndev_hash_lock, flags);
-}
-
/**
* ib_device_set_netdev - Associate the ib_dev with an underlying net_device
* @ib_dev: Device to modify
@@ -2139,6 +2116,10 @@ int ib_device_set_netdev(struct ib_devic
unsigned long flags;
int ret;

+ might_sleep();
+
+ if (!rdma_is_port_valid(ib_dev, port))
+ return -EINVAL;
/*
* Drivers wish to call this before ib_register_driver, so we have to
* setup the port data early.
@@ -2147,28 +2128,32 @@ int ib_device_set_netdev(struct ib_devic
if (ret)
return ret;

- if (!rdma_is_port_valid(ib_dev, port))
- return -EINVAL;
-
pdata = &ib_dev->port_data[port];
+again:
spin_lock_irqsave(&pdata->netdev_lock, flags);
- old_ndev = rcu_dereference_protected(
- pdata->netdev, lockdep_is_held(&pdata->netdev_lock));
+ old_ndev = rcu_dereference_protected(pdata->netdev, lockdep_is_held(&pdata->netdev_lock));
if (old_ndev == ndev) {
spin_unlock_irqrestore(&pdata->netdev_lock, flags);
return 0;
}

- if (old_ndev)
- netdev_tracker_free(ndev, &pdata->netdev_tracker);
- if (ndev)
- netdev_hold(ndev, &pdata->netdev_tracker, GFP_ATOMIC);
+ spin_lock(&ndev_hash_lock);
+ if (hash_hashed(&pdata->ndev_hash_link)) {
+ hash_del_rcu(&pdata->ndev_hash_link);
+ spin_unlock(&ndev_hash_lock);
+ spin_unlock_irqrestore(&pdata->netdev_lock, flags);
+ synchronize_rcu();
+ goto again;
+ }
rcu_assign_pointer(pdata->netdev, ndev);
+ if (ndev) {
+ dev_hold(ndev);
+ hash_add_rcu(ndev_hash, &pdata->ndev_hash_link, (uintptr_t)pdata->netdev);
+ }
+ spin_unlock(&ndev_hash_lock);
spin_unlock_irqrestore(&pdata->netdev_lock, flags);

- add_ndev_hash(pdata);
- if (old_ndev)
- __dev_put(old_ndev);
+ dev_put(old_ndev);

return 0;
}
--

syzbot

unread,
Dec 11, 2022, 5:42:18 AM12/11/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5545 } 2629 jiffies s: 2809 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 591cd615 Add linux-next specific files for 20221207
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=13b5948f880000
kernel config: https://syzkaller.appspot.com/x/.config?x=8b2d3e63e054c24f
dashboard link: https://syzkaller.appspot.com/bug?extid=3fd8326d9a0812d19218
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1119320b880000

Reply all
Reply to author
Forward
0 new messages