[syzbot] BUG: corrupted list in netif_napi_add

42 views
Skip to first unread message

syzbot

unread,
Oct 13, 2021, 7:40:22 AM10/13/21
to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com
Hello,

syzbot found the following issue on:

HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+62e474...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_add_rcu include/linux/rculist.h:79 [inline]
list_add_rcu include/linux/rculist.h:106 [inline]
netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
veth_xdp_set drivers/net/veth.c:1483 [inline]
veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
__rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f841f2718d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace 7281cadbc8534f23 ]---
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Daniel Borkmann

unread,
Oct 13, 2021, 9:36:05 AM10/13/21
to syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, pab...@redhat.com, to...@toke.dk, joa...@gmail.com
On 10/13/21 1:40 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:

[ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

Paolo Abeni

unread,
Oct 13, 2021, 10:38:15 AM10/13/21
to syzbot, syzkall...@googlegroups.com
No clue, but this looks related to another recent veth splat. Dumping
some debug info before splatting, with the hope to shed some light.

At the very best this will lead to more dbg patches later.

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

---
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 50eb43e5bf45..888ee44fa8ef 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -997,7 +997,7 @@ static bool veth_gro_requested(const struct net_device *dev)
}

static int veth_enable_xdp_range(struct net_device *dev, int start, int end,
- bool napi_already_on)
+ bool napi_already_on, void *old_xdp)
{
struct veth_priv *priv = netdev_priv(dev);
int err, i;
@@ -1005,8 +1005,17 @@ static int veth_enable_xdp_range(struct net_device *dev, int start, int end,
for (i = start; i < end; i++) {
struct veth_rq *rq = &priv->rq[i];

- if (!napi_already_on)
+ if (!napi_already_on) {
+ if (rq->xdp_napi.napi_hash_node.pprev) {
+ int j;
+
+ pr_err("napi %d already in hash table old xdp %d nr rx queue %d",
+ i, !!old_xdp, dev->real_num_rx_queues);
+ for (j = 0; j < min(2u, dev->real_num_rx_queues); ++j)
+ pr_err(" queue %d reg %d", j, xdp_rxq_info_is_reg(&priv->rq[j].xdp_rxq));
+ }
netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
+ }
err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, rq->xdp_napi.napi_id);
if (err < 0)
goto err_rxq_reg;
@@ -1053,14 +1062,14 @@ static void veth_disable_xdp_range(struct net_device *dev, int start, int end,
}
}

-static int veth_enable_xdp(struct net_device *dev)
+static int veth_enable_xdp(struct net_device *dev, void *old_xdp)
{
bool napi_already_on = veth_gro_requested(dev) && (dev->flags & IFF_UP);
struct veth_priv *priv = netdev_priv(dev);
int err, i;

if (!xdp_rxq_info_is_reg(&priv->rq[0].xdp_rxq)) {
- err = veth_enable_xdp_range(dev, 0, dev->real_num_rx_queues, napi_already_on);
+ err = veth_enable_xdp_range(dev, 0, dev->real_num_rx_queues, napi_already_on, old_xdp);
if (err)
return err;

@@ -1167,7 +1176,7 @@ static int veth_enable_range_safe(struct net_device *dev, int start, int end)
/* these channels are freshly initialized, napi is not on there even
* when GRO is requeste
*/
- err = veth_enable_xdp_range(dev, start, end, false);
+ err = veth_enable_xdp_range(dev, start, end, false, priv->_xdp_prog);
if (err)
return err;

@@ -1267,7 +1276,7 @@ static int veth_open(struct net_device *dev)
return -ENOTCONN;

if (priv->_xdp_prog) {
- err = veth_enable_xdp(dev);
+ err = veth_enable_xdp(dev, priv->_xdp_prog);
if (err)
return err;
} else if (veth_gro_requested(dev)) {
@@ -1480,7 +1489,7 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
}

if (dev->flags & IFF_UP) {
- err = veth_enable_xdp(dev);
+ err = veth_enable_xdp(dev, old_prog);
if (err) {
NL_SET_ERR_MSG_MOD(extack, "Setup for XDP failed");
goto err;

Paolo Abeni

unread,
Oct 13, 2021, 10:41:13 AM10/13/21
to syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com
On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
>
> IMPORTANTIMPORTANT: if you fix the issue, please add the following tag to the commit:
For the record: I'm wild guessing this is related to:

https://syzkaller.appspot.com/bug?extid=67f89551088ea1a6850e

(hopefully they share the same root cause)

I spent some time investigating the latter, with no real clue. This has
a repro, so I'll ask syzbot to provide more info with debug patches.

Cheers,

Paolo

syzbot

unread,
Oct 13, 2021, 11:49:11 AM10/13/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/---: failed to run ["git" "fetch" "--force" "ee231638377c5338d84cc90b6c821555f8e3813f" "---"]: exit status 129
error: unknown option `-'
usage: git fetch [<options>] [<repository> [<refspec>...]]
or: git fetch [<options>] <group>
or: git fetch --multiple [<options>] [(<repository> | <group>)...]
or: git fetch --all [<options>]

-v, --verbose be more verbose
-q, --quiet be more quiet
--all fetch from all remotes
--set-upstream set upstream for git pull/fetch
-a, --append append to .git/FETCH_HEAD instead of overwriting
--upload-pack <path> path to upload pack on remote end
-f, --force force overwrite of local reference
-m, --multiple fetch from multiple remotes
-t, --tags fetch all tags and associated objects
-n do not fetch all tags (--no-tags)
-j, --jobs <n> number of submodules fetched in parallel
-p, --prune prune remote-tracking branches no longer on remote
-P, --prune-tags prune local tags no longer on remote and clobber changed tags
--recurse-submodules[=<on-demand>]
control recursive fetching of submodules
--dry-run dry run
--write-fetch-head write fetched references to the FETCH_HEAD file
-k, --keep keep downloaded pack
-u, --update-head-ok allow updating of HEAD ref
--progress force progress reporting
--depth <depth> deepen history of shallow clone
--shallow-since <time>
deepen history of shallow repository based on time
--shallow-exclude <revision>
deepen history of shallow clone, excluding rev
--deepen <n> deepen history of shallow clone
--unshallow convert to a complete repository
--update-shallow accept refs that update .git/shallow
--refmap <refmap> specify fetch refmap
-o, --server-option <server-specific>
option to transmit
-4, --ipv4 use IPv4 addresses only
-6, --ipv6 use IPv6 addresses only
--negotiation-tip <revision>
report that we have only objects reachable from this object
--filter <args> object filtering
--auto-maintenance run 'maintenance --auto' after fetching
--auto-gc run 'maintenance --auto' after fetching
--show-forced-updates
check for forced-updates on all updated branches
--write-commit-graph write the commit-graph after fetching
--stdin accept refspecs from stdin




Tested on:

commit: [unknown
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git ---
patch: https://syzkaller.appspot.com/x/patch.diff?x=168e9bd4b00000

syzbot

unread,
Oct 13, 2021, 11:59:26 AM10/13/21
to Paolo Abeni, pab...@redhat.com, syzkall...@googlegroups.com
> On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
>> syzbot found the following issue on:
>>
>> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
>> git tree: linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> Darn, in the previous iteration I/the email client magled the repo URL,
> let's try again...
>
> #syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

want 2 args (repo, branch), got 1

Paolo Abeni

unread,
Oct 13, 2021, 11:59:33 AM10/13/21
to syzbot, syzkall...@googlegroups.com
On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
diffs

Paolo Abeni

unread,
Oct 13, 2021, 12:02:43 PM10/13/21
to syzbot, syzkall...@googlegroups.com
On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
> syzbot found the following issue on:
>
> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+62e474...@syzkaller.appspotmail.com

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git master

diffs

syzbot

unread,
Oct 13, 2021, 12:16:16 PM10/13/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+62e474...@syzkaller.appspotmail.com

Tested on:

commit: 50515cac net: qed_debug: fix check of false (grc_param..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git --
kernel config: https://syzkaller.appspot.com/x/.config?x=253081a8bb5ed81
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=17d81f84b00000

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Oct 13, 2021, 12:34:08 PM10/13/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+62e474...@syzkaller.appspotmail.com

Tested on:

commit: 5f3b8ace Merge branch 'add-functional-support-for-giga..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=253081a8bb5ed81
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=10fff294b00000

Paolo Abeni

unread,
Oct 13, 2021, 1:31:15 PM10/13/21
to syzbot, syzkall...@googlegroups.com
diffs

syzbot

unread,
Oct 13, 2021, 1:48:10 PM10/13/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+62e474...@syzkaller.appspotmail.com

Tested on:

commit: 8006b911 Add linux-next specific files for 20211013
git tree: linux-next
kernel config: https://syzkaller.appspot.com/x/.config?x=9f85caab63d032e8
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=102e9bd4b00000

Paolo Abeni

unread,
Oct 14, 2021, 5:17:25 AM10/14/21
to syzbot, syzkall...@googlegroups.com
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 683f29b781aeaab6bf302eeb2ef08a5e5f9d8a27


diffs

syzbot

unread,
Oct 14, 2021, 5:35:14 AM10/14/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: corrupted list in netif_napi_add

IPv6: ADDRCONF(NETDEV_CHANGE): vxcan1: link becomes ready
napi 0 already in hash table old xdp 0 nr rx queue 1
queue 0 reg 0
list_add double add: new=ffff88807d2ac160, prev=ffff88807b82e050, next=ffff88807d2ac160.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10964 Comm: syz-executor.2 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002ddeb40 EFLAGS: 00010282
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888018450000 RSI: ffffffff815e0d78 RDI: fffff520005bbd5a
RBP: ffff88807d2ac160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff88807d2ac160
R13: ffff88807d2ac000 R14: ffff88807d2ac160 R15: ffff88807d2ac160
FS: 00007f6aeb1db700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000068f0d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_add_rcu include/linux/rculist.h:79 [inline]
list_add_rcu include/linux/rculist.h:106 [inline]
netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
veth_enable_xdp_range+0x195/0x350 drivers/net/veth.c:1017
veth_enable_xdp+0x2ab/0x630 drivers/net/veth.c:1072
veth_xdp_set drivers/net/veth.c:1492 [inline]
veth_xdp+0x4d7/0x790 drivers/net/veth.c:1532
dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
__rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f6aeba648d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6aeb1db188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f6aebb68f60 RCX: 00007f6aeba648d9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f6aebabecb4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffeb8cec5ff R14: 00007f6aeb1db300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace 11e4d9f710e460c3 ]---
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002ddeb40 EFLAGS: 00010282
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888018450000 RSI: ffffffff815e0d78 RDI: fffff520005bbd5a
RBP: ffff88807d2ac160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff88807d2ac160
R13: ffff88807d2ac000 R14: ffff88807d2ac160 R15: ffff88807d2ac160
FS: 00007f6aeb1db700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000068f0d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: 683f29b7 Add linux-next specific files for 20211008
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11fdee68b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=131fe370b00000

Paolo Abeni

unread,
Oct 14, 2021, 7:57:19 AM10/14/21
to syzbot, syzkall...@googlegroups.com
One more time: double checking if the issue is reliably reproducible on
the reported commit, and ev. dump more debug infos


diffs

syzbot

unread,
Oct 14, 2021, 8:14:09 AM10/14/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+62e474...@syzkaller.appspotmail.com

Tested on:

commit: 683f29b7 Add linux-next specific files for 20211008
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15907ad0b00000

Paolo Abeni

unread,
Oct 14, 2021, 9:50:05 AM10/14/21
to Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, to...@toke.dk, joa...@gmail.com
On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> On 10/13/21 1:40 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
>
> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

For the records: Toke and me are actively investigating this issue and
the other recent related one. So far we could not find anything
relevant.

The onluy note is that the reproducer is not extremelly reliable - I
could not reproduce locally, and multiple syzbot runs on the same code
give different results. Anyhow, so far the issue was only observerable
on a specific 'next' commit which is currently "not reachable" from any
branch. I'm wondering if the issue was caused by some incosistent
status of such tree.

Cheers,

Paolo

Paolo Abeni

unread,
Oct 14, 2021, 9:54:21 AM10/14/21
to syzkall...@googlegroups.com, Dmitry Vyukov
It looks like the reproducer is not very stable, as it reported
opposite results on 2 consecutive runs on almost the same code - The
difference is just more printk when the error condition is detected.

Is there any way to tell syzbot to try multiple times the same code -
or to run the repro for longer time? I *think* it should cause less
resources consumption than multiple consecutinve invocations.

Thanks!

Paolo

Vlad Buslov

unread,
Oct 18, 2021, 10:04:41 AM10/18/21
to Paolo Abeni, Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, to...@toke.dk, joa...@gmail.com, Saeed Mahameed, Maxim Mikityanskiy
Hi,

We got a use-after-free with very similar trace [0] during nightly
regression. The issue happens when ip link up/down state is flipped
several times in loop and doesn't reproduce for me manually. The fact
that it didn't reproduce for me after running test ten times suggests
that it is either very hard to reproduce or that it is a result of some
interaction between several tests in our suite.

[0]:

[ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
[ 3187.890694] ==================================================================
[ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
[ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
[ 3187.895683]
[ 3187.896209] CPU: 0 PID: 119618 Comm: ip Not tainted 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3187.898445] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3187.901075] Call Trace:
[ 3187.901858] dump_stack_lvl+0x57/0x7d
[ 3187.902899] print_address_description.constprop.0+0x1f/0x140
[ 3187.904346] ? __list_add_valid+0xc3/0xf0
[ 3187.905439] ? __list_add_valid+0xc3/0xf0
[ 3187.906565] kasan_report.cold+0x83/0xdf
[ 3187.907619] ? __list_add_valid+0xc3/0xf0
[ 3187.908693] __list_add_valid+0xc3/0xf0
[ 3187.909765] netif_napi_add+0x399/0x9a0
[ 3187.910794] ? kmalloc_order_trace+0x6a/0x120
[ 3187.911944] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
[ 3187.913872] ? rwlock_bug.part.0+0x90/0x90
[ 3187.914959] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
[ 3187.916584] ? mutex_is_locked+0x13/0x50
[ 3187.917703] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
[ 3187.919368] mlx5e_open+0x35/0xb0 [mlx5_core]
[ 3187.920863] __dev_open+0x22f/0x420
[ 3187.921852] ? dev_set_rx_mode+0x80/0x80
[ 3187.922920] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3187.924866] ? __local_bh_enable_ip+0xa2/0x100
[ 3187.926148] ? trace_hardirqs_on+0x32/0x120
[ 3187.927270] __dev_change_flags+0x451/0x670
[ 3187.928387] ? dev_set_allmulti+0x10/0x10
[ 3187.929480] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3187.930592] dev_change_flags+0x8b/0x150
[ 3187.931651] do_setlink+0x820/0x2d60
[ 3187.932631] ? rtnetlink_put_metrics+0x490/0x490
[ 3187.933852] ? lock_release+0x460/0x750
[ 3187.934881] ? kvm_async_pf_task_wake+0x410/0x410
[ 3187.936122] ? lock_downgrade+0x6e0/0x6e0
[ 3187.937203] ? do_raw_spin_unlock+0x54/0x220
[ 3187.938351] ? memset+0x20/0x40
[ 3187.939246] ? __nla_validate_parse+0xb2/0x22c0
[ 3187.940426] ? do_raw_spin_lock+0x126/0x270
[ 3187.941568] ? push_cpu_stop+0x830/0x830
[ 3187.942638] ? rwlock_bug.part.0+0x90/0x90
[ 3187.943733] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3187.945065] ? nla_get_range_signed+0x540/0x540
[ 3187.946272] ? memcpy+0x39/0x60
[ 3187.947162] ? memset+0x20/0x40
[ 3187.948058] ? memset+0x20/0x40
[ 3187.948943] __rtnl_newlink+0xac0/0x1370
[ 3187.950038] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3187.951380] ? rtnl_setlink+0x330/0x330
[ 3187.952417] ? deref_stack_reg+0x160/0x160
[ 3187.953534] ? deref_stack_reg+0xe6/0x160
[ 3187.954619] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.955848] ? lock_release+0x460/0x750
[ 3187.956886] ? is_bpf_text_address+0x54/0x110
[ 3187.958047] ? lock_downgrade+0x6e0/0x6e0
[ 3187.959133] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3187.960469] ? deref_stack_reg+0x160/0x160
[ 3187.961592] ? is_bpf_text_address+0x73/0x110
[ 3187.962759] ? kernel_text_address+0xda/0x100
[ 3187.963920] ? __kernel_text_address+0xe/0x30
[ 3187.965069] ? unwind_get_return_address+0x56/0xa0
[ 3187.966334] ? __thaw_task+0x70/0x70
[ 3187.967320] ? arch_stack_walk+0x98/0xf0
[ 3187.968405] ? lock_downgrade+0x6e0/0x6e0
[ 3187.969510] ? trace_hardirqs_on+0x32/0x120
[ 3187.970644] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.971883] rtnl_newlink+0x5f/0x90
[ 3187.972866] rtnetlink_rcv_msg+0x32b/0x950
[ 3187.973968] ? deref_stack_reg+0x160/0x160
[ 3187.975088] ? rtnl_fdb_dump+0x830/0x830
[ 3187.976160] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.977393] ? lock_acquire+0x38d/0x4c0
[ 3187.978443] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.979685] ? lock_acquire+0x38d/0x4c0
[ 3187.980733] netlink_rcv_skb+0x11d/0x340
[ 3187.981812] ? rtnl_fdb_dump+0x830/0x830
[ 3187.982862] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.984105] ? netlink_ack+0x930/0x930
[ 3187.985136] ? netlink_deliver_tap+0x140/0xb10
[ 3187.986316] ? netlink_deliver_tap+0x14c/0xb10
[ 3187.987495] ? _copy_from_iter+0x282/0xbe0
[ 3187.988597] netlink_unicast+0x433/0x700
[ 3187.989693] ? netlink_attachskb+0x740/0x740
[ 3187.990819] ? __alloc_skb+0x117/0x2c0
[ 3187.991855] netlink_sendmsg+0x707/0xbf0
[ 3187.992921] ? netlink_unicast+0x700/0x700
[ 3187.994024] ? netlink_unicast+0x700/0x700
[ 3187.995121] sock_sendmsg+0xb0/0xe0
[ 3187.996091] ____sys_sendmsg+0x4fa/0x6d0
[ 3187.997163] ? iovec_from_user+0x136/0x280
[ 3187.998276] ? kernel_sendmsg+0x30/0x30
[ 3188.012806] ? __import_iovec+0x51/0x610
[ 3188.013858] ___sys_sendmsg+0x12e/0x1b0
[ 3188.014875] ? do_recvmmsg+0x500/0x500
[ 3188.015877] ? get_max_files+0x10/0x10
[ 3188.016866] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.018108] ? call_rcu+0x87/0xd40
[ 3188.019041] ? task_work_run+0xc5/0x160
[ 3188.020044] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.021271] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.022563] ? do_syscall_64+0x4a/0x90
[ 3188.023559] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.024858] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.026121] ? lock_release+0x460/0x750
[ 3188.027174] ? mntput_no_expire+0x113/0xb40
[ 3188.028302] ? lock_downgrade+0x6e0/0x6e0
[ 3188.029398] ? rwlock_bug.part.0+0x90/0x90
[ 3188.030555] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.031812] ? mntput_no_expire+0x132/0xb40
[ 3188.032940] ? __fget_light+0x51/0x220
[ 3188.033986] __sys_sendmsg+0xa4/0x120
[ 3188.034992] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.036115] ? call_rcu+0x543/0xd40
[ 3188.037084] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.038406] ? trace_hardirqs_on+0x32/0x120
[ 3188.039515] do_syscall_64+0x3d/0x90
[ 3188.040502] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.041896] RIP: 0033:0x7f904ec94c17
[ 3188.042891] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.047412] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.049361] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
[ 3188.051121] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
[ 3188.052881] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
[ 3188.054645] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.056403] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
[ 3188.058189]
[ 3188.058732] The buggy address belongs to the page:
[ 3188.059996] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.062378] flags: 0x8000000000000000(zone=2)
[ 3188.063551] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.065548] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.067518] page dumped because: kasan: bad access detected
[ 3188.068930]
[ 3188.069481] Memory state around the buggy address:
[ 3188.070730] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.072618] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.074508] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.076378] ^
[ 3188.077711] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.079584] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.081470] ==================================================================
[ 3188.083406] ==================================================================
[ 3188.085280] BUG: KASAN: use-after-free in netif_napi_add+0x8b7/0x9a0
[ 3188.086952] Write of size 8 at addr ffff8881150b3fb8 by task ip/119618
[ 3188.089181]
[ 3188.089987] CPU: 0 PID: 119618 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3188.092659] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3188.095481] Call Trace:
[ 3188.096222] dump_stack_lvl+0x57/0x7d
[ 3188.097238] print_address_description.constprop.0+0x1f/0x140
[ 3188.098764] ? netif_napi_add+0x8b7/0x9a0
[ 3188.099862] ? netif_napi_add+0x8b7/0x9a0
[ 3188.100940] kasan_report.cold+0x83/0xdf
[ 3188.102041] ? netif_napi_add+0x8b7/0x9a0
[ 3188.103140] netif_napi_add+0x8b7/0x9a0
[ 3188.104180] ? kmalloc_order_trace+0x6a/0x120
[ 3188.105336] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
[ 3188.107145] ? rwlock_bug.part.0+0x90/0x90
[ 3188.108238] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
[ 3188.109882] ? mutex_is_locked+0x13/0x50
[ 3188.110985] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
[ 3188.112644] mlx5e_open+0x35/0xb0 [mlx5_core]
[ 3188.114215] __dev_open+0x22f/0x420
[ 3188.115186] ? dev_set_rx_mode+0x80/0x80
[ 3188.116247] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3188.118252] ? __local_bh_enable_ip+0xa2/0x100
[ 3188.119438] ? trace_hardirqs_on+0x32/0x120
[ 3188.120554] __dev_change_flags+0x451/0x670
[ 3188.121705] ? dev_set_allmulti+0x10/0x10
[ 3188.122828] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3188.123943] dev_change_flags+0x8b/0x150
[ 3188.124995] do_setlink+0x820/0x2d60
[ 3188.126023] ? rtnetlink_put_metrics+0x490/0x490
[ 3188.127233] ? lock_release+0x460/0x750
[ 3188.128269] ? kvm_async_pf_task_wake+0x410/0x410
[ 3188.129502] ? lock_downgrade+0x6e0/0x6e0
[ 3188.130620] ? do_raw_spin_unlock+0x54/0x220
[ 3188.131781] ? memset+0x20/0x40
[ 3188.132663] ? __nla_validate_parse+0xb2/0x22c0
[ 3188.133894] ? do_raw_spin_lock+0x126/0x270
[ 3188.135066] ? push_cpu_stop+0x830/0x830
[ 3188.136136] ? rwlock_bug.part.0+0x90/0x90
[ 3188.137230] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3188.138585] ? nla_get_range_signed+0x540/0x540
[ 3188.139780] ? memcpy+0x39/0x60
[ 3188.140683] ? memset+0x20/0x40
[ 3188.141580] ? memset+0x20/0x40
[ 3188.142517] __rtnl_newlink+0xac0/0x1370
[ 3188.143579] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.144914] ? rtnl_setlink+0x330/0x330
[ 3188.145974] ? deref_stack_reg+0x160/0x160
[ 3188.147078] ? deref_stack_reg+0xe6/0x160
[ 3188.148157] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.149378] ? lock_release+0x460/0x750
[ 3188.150490] ? is_bpf_text_address+0x54/0x110
[ 3188.151648] ? lock_downgrade+0x6e0/0x6e0
[ 3188.152725] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.154075] ? deref_stack_reg+0x160/0x160
[ 3188.155176] ? is_bpf_text_address+0x73/0x110
[ 3188.156353] ? kernel_text_address+0xda/0x100
[ 3188.157510] ? __kernel_text_address+0xe/0x30
[ 3188.158707] ? unwind_get_return_address+0x56/0xa0
[ 3188.159992] ? __thaw_task+0x70/0x70
[ 3188.160979] ? arch_stack_walk+0x98/0xf0
[ 3188.162072] ? lock_downgrade+0x6e0/0x6e0
[ 3188.163167] ? trace_hardirqs_on+0x32/0x120
[ 3188.164295] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.165546] rtnl_newlink+0x5f/0x90
[ 3188.166558] rtnetlink_rcv_msg+0x32b/0x950
[ 3188.167677] ? deref_stack_reg+0x160/0x160
[ 3188.168782] ? rtnl_fdb_dump+0x830/0x830
[ 3188.169857] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.171089] ? lock_acquire+0x38d/0x4c0
[ 3188.172131] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.173367] ? lock_acquire+0x38d/0x4c0
[ 3188.174472] netlink_rcv_skb+0x11d/0x340
[ 3188.175531] ? rtnl_fdb_dump+0x830/0x830
[ 3188.176592] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.177824] ? netlink_ack+0x930/0x930
[ 3188.178848] ? netlink_deliver_tap+0x140/0xb10
[ 3188.180013] ? netlink_deliver_tap+0x14c/0xb10
[ 3188.181188] ? _copy_from_iter+0x282/0xbe0
[ 3188.182351] netlink_unicast+0x433/0x700
[ 3188.183418] ? netlink_attachskb+0x740/0x740
[ 3188.184552] ? __alloc_skb+0x117/0x2c0
[ 3188.185606] netlink_sendmsg+0x707/0xbf0
[ 3188.186672] ? netlink_unicast+0x700/0x700
[ 3188.187783] ? netlink_unicast+0x700/0x700
[ 3188.188882] sock_sendmsg+0xb0/0xe0
[ 3188.189862] ____sys_sendmsg+0x4fa/0x6d0
[ 3188.190971] ? iovec_from_user+0x136/0x280
[ 3188.192074] ? kernel_sendmsg+0x30/0x30
[ 3188.193130] ? __import_iovec+0x51/0x610
[ 3188.194225] ___sys_sendmsg+0x12e/0x1b0
[ 3188.195267] ? do_recvmmsg+0x500/0x500
[ 3188.196301] ? get_max_files+0x10/0x10
[ 3188.197333] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.198558] ? call_rcu+0x87/0xd40
[ 3188.199519] ? task_work_run+0xc5/0x160
[ 3188.200557] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.201872] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.203134] ? do_syscall_64+0x4a/0x90
[ 3188.204152] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.205511] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.206782] ? lock_release+0x460/0x750
[ 3188.207870] ? mntput_no_expire+0x113/0xb40
[ 3188.209025] ? lock_downgrade+0x6e0/0x6e0
[ 3188.210272] ? rwlock_bug.part.0+0x90/0x90
[ 3188.211864] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.213644] ? mntput_no_expire+0x132/0xb40
[ 3188.215253] ? __fget_light+0x51/0x220
[ 3188.216535] __sys_sendmsg+0xa4/0x120
[ 3188.217574] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.218707] ? call_rcu+0x543/0xd40
[ 3188.219679] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.221004] ? trace_hardirqs_on+0x32/0x120
[ 3188.235475] do_syscall_64+0x3d/0x90
[ 3188.236463] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.237744] RIP: 0033:0x7f904ec94c17
[ 3188.238693] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.242968] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.244834] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
[ 3188.246604] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
[ 3188.248362] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
[ 3188.250140] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.251889] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
[ 3188.253667]
[ 3188.254215] The buggy address belongs to the page:
[ 3188.255460] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.257812] flags: 0x8000000000000000(zone=2)
[ 3188.258985] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.260971] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.262993] page dumped because: kasan: bad access detected
[ 3188.264413]
[ 3188.264943] Memory state around the buggy address:
[ 3188.266203] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.268082] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.269957] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.271818] ^
[ 3188.273122] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.275000] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.276862] ==================================================================
[ 3188.371511] mlx5_core 0000:08:00.0 enp8s0f0: Link up
[ 3188.376126] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0f0: link becomes ready
[ 3188.430532] ==================================================================
[ 3188.432378] BUG: KASAN: use-after-free in __list_del_entry_valid+0x14b/0x180
[ 3188.434254] Read of size 8 at addr ffff8881150b3fb8 by task ip/119619
[ 3188.435826]
[ 3188.436365] CPU: 3 PID: 119619 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3188.439688] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3188.442423] Call Trace:
[ 3188.443172] dump_stack_lvl+0x57/0x7d
[ 3188.444186] print_address_description.constprop.0+0x1f/0x140
[ 3188.445703] ? __list_del_entry_valid+0x14b/0x180
[ 3188.447004] ? __list_del_entry_valid+0x14b/0x180
[ 3188.448255] kasan_report.cold+0x83/0xdf
[ 3188.449323] ? __list_del_entry_valid+0x14b/0x180
[ 3188.450670] __list_del_entry_valid+0x14b/0x180
[ 3188.451887] ? _raw_spin_unlock+0x1f/0x30
[ 3188.452969] __netif_napi_del.part.0+0xec/0x4a0
[ 3188.454453] mlx5e_close_channel+0x7d/0xd0 [mlx5_core]
[ 3188.456988] mlx5e_close_channels+0xf9/0x200 [mlx5_core]
[ 3188.459599] mlx5e_close_locked+0x101/0x130 [mlx5_core]
[ 3188.462156] mlx5e_close+0xad/0x100 [mlx5_core]
[ 3188.463961] __dev_close_many+0x18e/0x2b0
[ 3188.465045] ? list_netdevice+0x3a0/0x3a0
[ 3188.466187] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3188.468156] ? __local_bh_enable_ip+0xa2/0x100
[ 3188.469333] ? trace_hardirqs_on+0x32/0x120
[ 3188.470496] __dev_change_flags+0x254/0x670
[ 3188.471605] ? dev_set_allmulti+0x10/0x10
[ 3188.472692] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3188.473854] dev_change_flags+0x8b/0x150
[ 3188.474965] do_setlink+0x820/0x2d60
[ 3188.475950] ? rtnetlink_put_metrics+0x490/0x490
[ 3188.477165] ? lock_release+0x460/0x750
[ 3188.478306] ? kvm_async_pf_task_wake+0x410/0x410
[ 3188.479542] ? lock_downgrade+0x6e0/0x6e0
[ 3188.480615] ? do_raw_spin_unlock+0x54/0x220
[ 3188.481790] ? memset+0x20/0x40
[ 3188.482963] ? __nla_validate_parse+0xb2/0x22c0
[ 3188.484167] ? do_raw_spin_lock+0x126/0x270
[ 3188.485281] ? push_cpu_stop+0x830/0x830
[ 3188.486457] ? rwlock_bug.part.0+0x90/0x90
[ 3188.487557] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3188.488894] ? nla_get_range_signed+0x540/0x540
[ 3188.490168] ? memcpy+0x39/0x60
[ 3188.491083] ? memset+0x20/0x40
[ 3188.491966] ? memset+0x20/0x40
[ 3188.492855] __rtnl_newlink+0xac0/0x1370
[ 3188.493987] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.495384] ? rtnl_setlink+0x330/0x330
[ 3188.496446] ? deref_stack_reg+0x160/0x160
[ 3188.497551] ? deref_stack_reg+0xe6/0x160
[ 3188.498713] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.499929] ? lock_release+0x460/0x750
[ 3188.501232] ? is_bpf_text_address+0x54/0x110
[ 3188.502735] ? lock_downgrade+0x6e0/0x6e0
[ 3188.503831] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.505157] ? deref_stack_reg+0x160/0x160
[ 3188.506298] ? is_bpf_text_address+0x73/0x110
[ 3188.507459] ? kernel_text_address+0xda/0x100
[ 3188.508615] ? __kernel_text_address+0xe/0x30
[ 3188.509776] ? unwind_get_return_address+0x56/0xa0
[ 3188.511047] ? __thaw_task+0x70/0x70
[ 3188.512033] ? arch_stack_walk+0x98/0xf0
[ 3188.513059] ? lock_downgrade+0x6e0/0x6e0
[ 3188.514191] ? trace_hardirqs_on+0x32/0x120
[ 3188.515303] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.516524] rtnl_newlink+0x5f/0x90
[ 3188.517513] rtnetlink_rcv_msg+0x32b/0x950
[ 3188.518652] ? deref_stack_reg+0x160/0x160
[ 3188.519761] ? rtnl_fdb_dump+0x830/0x830
[ 3188.520816] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.522119] ? lock_acquire+0x38d/0x4c0
[ 3188.523211] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.524435] ? lock_acquire+0x38d/0x4c0
[ 3188.525498] netlink_rcv_skb+0x11d/0x340
[ 3188.526649] ? rtnl_fdb_dump+0x830/0x830
[ 3188.527722] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.528949] ? netlink_ack+0x930/0x930
[ 3188.530055] ? netlink_deliver_tap+0x140/0xb10
[ 3188.531347] ? netlink_deliver_tap+0x14c/0xb10
[ 3188.532549] ? _copy_from_iter+0x282/0xbe0
[ 3188.533711] netlink_unicast+0x433/0x700
[ 3188.534845] ? netlink_attachskb+0x740/0x740
[ 3188.535987] ? __alloc_skb+0x117/0x2c0
[ 3188.537006] netlink_sendmsg+0x707/0xbf0
[ 3188.538150] ? netlink_unicast+0x700/0x700
[ 3188.539337] ? netlink_unicast+0x700/0x700
[ 3188.540448] sock_sendmsg+0xb0/0xe0
[ 3188.541424] ____sys_sendmsg+0x4fa/0x6d0
[ 3188.542743] ? iovec_from_user+0x136/0x280
[ 3188.543932] ? kernel_sendmsg+0x30/0x30
[ 3188.544963] ? __import_iovec+0x51/0x610
[ 3188.546063] ___sys_sendmsg+0x12e/0x1b0
[ 3188.547189] ? do_recvmmsg+0x500/0x500
[ 3188.548209] ? get_max_files+0x10/0x10
[ 3188.549226] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.550547] ? call_rcu+0x87/0xd40
[ 3188.551509] ? task_work_run+0xc5/0x160
[ 3188.552546] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.553896] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.555195] ? do_syscall_64+0x4a/0x90
[ 3188.556206] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.557634] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.558903] ? lock_release+0x460/0x750
[ 3188.559948] ? mntput_no_expire+0x113/0xb40
[ 3188.561059] ? lock_downgrade+0x6e0/0x6e0
[ 3188.562231] ? rwlock_bug.part.0+0x90/0x90
[ 3188.563338] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.564583] ? mntput_no_expire+0x132/0xb40
[ 3188.565731] ? __fget_light+0x51/0x220
[ 3188.566858] __sys_sendmsg+0xa4/0x120
[ 3188.567878] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.568995] ? call_rcu+0x543/0xd40
[ 3188.570047] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.571387] ? trace_hardirqs_on+0x32/0x120
[ 3188.572502] do_syscall_64+0x3d/0x90
[ 3188.573491] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.574916] RIP: 0033:0x7fc68ffd4c17
[ 3188.575900] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.580625] RSP: 002b:00007ffd26634f18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.582945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc68ffd4c17
[ 3188.585684] RDX: 0000000000000000 RSI: 00007ffd26634f80 RDI: 0000000000000003
[ 3188.587965] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007fc690095a40
[ 3188.589788] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.591618] R13: 00007ffd26635630 R14: 00007ffd26635c85 R15: 000000000048f520
[ 3188.593365]
[ 3188.593953] The buggy address belongs to the page:
[ 3188.595288] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.597966] flags: 0x8000000000000000(zone=2)
[ 3188.599643] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.601766] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.603786] page dumped because: kasan: bad access detected
[ 3188.622507]
[ 3188.623291] Memory state around the buggy address:
[ 3188.625031] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.627617] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.630275] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.632956] ^
[ 3188.634838] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.637544] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.640221] ==================================================================

[...]

Jakub Kicinski

unread,
Oct 18, 2021, 11:42:05 AM10/18/21
to Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, to...@toke.dk, joa...@gmail.com, Saeed Mahameed, Maxim Mikityanskiy
On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> We got a use-after-free with very similar trace [0] during nightly
> regression. The issue happens when ip link up/down state is flipped
> several times in loop and doesn't reproduce for me manually. The fact
> that it didn't reproduce for me after running test ten times suggests
> that it is either very hard to reproduce or that it is a result of some
> interaction between several tests in our suite.
>
> [0]:
>
> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> [ 3187.890694] ==================================================================
> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618

Hm, not sure how similar it is. This one looks like channel was freed
without deleting NAPI. Do you have list debug enabled?

Vlad Buslov

unread,
Oct 18, 2021, 12:13:04 PM10/18/21
to Jakub Kicinski, Paolo Abeni, Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, to...@toke.dk, joa...@gmail.com, Saeed Mahameed, Maxim Mikityanskiy
Yes, CONFIG_DEBUG_LIST is enabled.

Toke Høiland-Jørgensen

unread,
Oct 18, 2021, 1:40:43 PM10/18/21
to Jakub Kicinski, Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, joa...@gmail.com, Saeed Mahameed, Maxim Mikityanskiy
Well, the other report[0] also kinda looks like the NAPI thread keeps
running after it should have been disabled, so maybe they are in fact
related?

-Toke

[0] https://lore.kernel.org/r/000000000000c1...@google.com

Jakub Kicinski

unread,
Oct 18, 2021, 1:58:34 PM10/18/21
to Toke Høiland-Jørgensen, Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, joa...@gmail.com, Saeed Mahameed, Maxim Mikityanskiy
> [0] https://lore.kernel.org/r/000000000000c1...@google.com

Could be, if napi->state gets corrupted it may lose NAPI_STATE_LISTED.

719c57197010 ("net: make napi_disable() symmetric with enable")
3765996e4f0b ("napi: fix race inside napi_enable")
is the only thing that comes to mind, but they look fine to me.

Saeed Mahameed

unread,
Oct 18, 2021, 7:31:46 PM10/18/21
to Vlad Buslov, ku...@kernel.org, songliu...@fb.com, ha...@kernel.org, syzkall...@googlegroups.com, ka...@fb.com, da...@davemloft.net, john.fa...@gmail.com, and...@kernel.org, linux-...@vger.kernel.org, pab...@redhat.com, kps...@kernel.org, a...@kernel.org, joa...@gmail.com, y...@fb.com, to...@toke.dk, dan...@iogearbox.net, b...@vger.kernel.org, Maxim Mikityanskiy, net...@vger.kernel.org, syzbot+62e474...@syzkaller.appspotmail.com
do you have core dumps ?
let's enable kernel.panic_on_oops with core dumps and look at it next
time we see this, I really don't think mlx5 is leaking..

Jussi Maki

unread,
Oct 19, 2021, 6:11:40 AM10/19/21
to Paolo Abeni, Daniel Borkmann, syzbot, Andrii Nakryiko, a...@kernel.org, bpf, da...@davemloft.net, ha...@kernel.org, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, Network Development, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, to...@toke.dk
Hey,

I took a look at the bond/XDP related bits and couldn't find anything
obvious there. And for what it's worth, I was running the syzbot repro
under bpf-next tree (223f903e9c8) in the bpf vmtest.sh environment for
30 minutes without hitting this. An inconsistent tree might be a
plausible cause.

Hillf Danton

unread,
Oct 23, 2021, 9:45:13 AM10/23/21
to Vlad Buslov, Paolo Abeni, Dmitry Vyukov, Daniel Borkmann, syzbot, LKML, syzkall...@googlegroups.com
On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> [ 3188.574916] RIP: 0033:0x7fc68ffd4c17
> [ 3188.237744] RIP: 0033:0x7f904ec94c17

Dmitry, what addresses are these RIPs pointing to?

Dmitry Vyukov

unread,
Oct 25, 2021, 2:41:59 AM10/25/21
to Hillf Danton, Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, LKML, syzkall...@googlegroups.com
This report did not come from syzkaller/syzbot. We need to ask Vlad.
For syzkaller/syzbot I wouldn't be able to answer such a question. But
I guess it's just a code that executes the sendmsg syscall instruction
in user-space. What aspect of that code are you interested in?

Hillf Danton

unread,
Oct 26, 2021, 7:34:49 PM10/26/21
to Dmitry Vyukov, Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, LKML, syzkall...@googlegroups.com
No more questions in mind and thanks for your light in dark.

Hillf

Paolo Abeni

unread,
Nov 5, 2021, 7:41:35 AM11/5/21
to syzbot, syzkall...@googlegroups.com
On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+62e474...@syzkaller.appspotmail.com
>
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_add_rcu include/linux/rculist.h:79 [inline]
> list_add_rcu include/linux/rculist.h:106 [inline]
> netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
> veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
> veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
> veth_xdp_set drivers/net/veth.c:1483 [inline]
> veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
> bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
> bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
> dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
> dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
> dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
> do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
> rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
> __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
> rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
> sock_sendmsg_nosec net/socket.c:704 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:724
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f841f2718d9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
> </TASK>
> Modules linked in:
> ---[ end trace 7281cadbc8534f23 ]---
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000000040

At very least the error code path lacks a napi_del, let's see if that
addresses the issue...

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
---
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 50eb43e5bf45..b78894c38933 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1025,6 +1025,8 @@ static int veth_enable_xdp_range(struct net_device *dev, int start, int end,
err_reg_mem:
xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
err_rxq_reg:
+ if (!napi_already_on)
+ netif_napi_del(&priv->rq[i].xdp_napi);
for (i--; i >= start; i--) {
struct veth_rq *rq = &priv->rq[i];


syzbot

unread,
Nov 5, 2021, 7:46:08 AM11/5/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/---: failed to run ["git" "fetch" "--force" "a5a65480fd08f84422d56537d217faf190618d6e" "---"]: exit status 129
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git ---
patch: https://syzkaller.appspot.com/x/patch.diff?x=108ff25ab00000

Paolo Abeni

unread,
Nov 5, 2021, 7:53:32 AM11/5/21
to syzbot, syzkall...@googlegroups.com
whoops, missing branch name...

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git master

syzbot

unread,
Nov 5, 2021, 8:10:12 AM11/5/21
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: corrupted list in netif_napi_add

IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
list_add double add: new=ffff888077a8d160, prev=ffff888077bc4050, next=ffff888077a8d160.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 11009 Comm: syz-executor.4 Not tainted 5.15.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: 74 17 d9 fa 4c 89 e1 48 c7 c7 80 6f e4 89 e8 aa 7f f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 c0 70 e4 89 e8 93 7f f1 ff <0f> 0b 48 89 f1 48 c7 c7 40 70 e4 89 4c 89 e6 e8 7f 7f f1 ff 0f 0b
RSP: 0018:ffffc90002c9ea40 EFLAGS: 00010282
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888075d89d00 RSI: ffffffff815ef908 RDI: fffff52000593d3a
RBP: ffff888077a8d160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815e96ee R11: 0000000000000000 R12: ffff888077a8d160
R13: ffff888077a8d000 R14: ffff888077a8d160 R15: ffff888077a8d160
FS: 00007f8deccd9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0768b03998 CR3: 0000000072242000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_add_rcu include/linux/rculist.h:79 [inline]
list_add_rcu include/linux/rculist.h:106 [inline]
netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6913
veth_enable_xdp_range+0x1ba/0x370 drivers/net/veth.c:1009
veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1065
veth_xdp_set drivers/net/veth.c:1485 [inline]
veth_xdp+0x4d4/0x780 drivers/net/veth.c:1525
bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
dev_xdp_install+0xd5/0x270 net/core/dev.c:9384
dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9532
dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9772
do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
__rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8ded563a39
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8deccd9188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f8ded676f60 RCX: 00007f8ded563a39
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 00007f8ded5bde8f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff577b05ff R14: 00007f8deccd9300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace a469a1fc9da802bf ]---
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: 74 17 d9 fa 4c 89 e1 48 c7 c7 80 6f e4 89 e8 aa 7f f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 c0 70 e4 89 e8 93 7f f1 ff <0f> 0b 48 89 f1 48 c7 c7 40 70 e4 89 4c 89 e6 e8 7f 7f f1 ff 0f 0b
RSP: 0018:ffffc90002c9ea40 EFLAGS: 00010282
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888075d89d00 RSI: ffffffff815ef908 RDI: fffff52000593d3a
RBP: ffff888077a8d160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815e96ee R11: 0000000000000000 R12: ffff888077a8d160
R13: ffff888077a8d000 R14: ffff888077a8d160 R15: ffff888077a8d160
FS: 00007f8deccd9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0768b03998 CR3: 0000000072242000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: 3f81c579 amt: Fix NULL but dereferenced coccicheck error
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=17b1232eb00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2554266f3ca5bf4b
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=167d1f96b00000

syzbot

unread,
Dec 14, 2021, 2:52:09 AM12/14/21
to alexandr...@intel.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, dvy...@google.com, edum...@google.com, ha...@kernel.org, hda...@sina.com, jesse.br...@intel.com, joa...@gmail.com, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, max...@nvidia.com, net...@vger.kernel.org, pab...@redhat.com, sae...@nvidia.com, songliu...@fb.com, syzkall...@googlegroups.com, to...@toke.dk, vla...@nvidia.com, y...@fb.com
syzbot suspects this issue was fixed by commit:

commit 0315a075f1343966ea2d9a085666a88a69ea6a3d
Author: Alexander Lobakin <alexandr...@intel.com>
Date: Wed Nov 10 19:56:05 2021 +0000

net: fix premature exit from NAPI state polling in napi_disable()

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=138dffbeb00000
start commit: 911e3a46fb38 net: phy: Fix unsigned comparison with less t..
git tree: net-next
kernel config: https://syzkaller.appspot.com/x/.config?x=d36d2402e8523638
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141592f2b00000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: net: fix premature exit from NAPI state polling in napi_disable()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Dmitry Vyukov

unread,
Dec 14, 2021, 12:57:05 PM12/14/21
to syzbot, alexandr...@intel.com, and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, edum...@google.com, ha...@kernel.org, hda...@sina.com, jesse.br...@intel.com, joa...@gmail.com, john.fa...@gmail.com, ka...@fb.com, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, max...@nvidia.com, net...@vger.kernel.org, pab...@redhat.com, sae...@nvidia.com, songliu...@fb.com, syzkall...@googlegroups.com, to...@toke.dk, vla...@nvidia.com, y...@fb.com
Looks reasonable based on the subsystem:
Reply all
Reply to author
Forward
0 new messages