general protection fault in can_rx_register

38 views
Skip to first unread message

syzbot

unread,
Jan 17, 2020, 8:46:13 AM1/17/20
to da...@davemloft.net, dev....@vandijck-laurijssen.be, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, sock...@hartkopp.net, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
dashboard link: https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

The bug was bisected to:

commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
Author: Kurt Van Dijck <dev....@vandijck-laurijssen.be>
Date: Mon Oct 8 09:48:33 2018 +0000

can: introduce CAN_REQUIRED_SIZE macro

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c3ea30...@syzkaller.appspotmail.com
Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
raw_enable_filters net/can/raw.c:189 [inline]
raw_enable_allfilters net/can/raw.c:255 [inline]
raw_bind+0x326/0x1230 net/can/raw.c:428
__sys_bind+0x2bd/0x3a0 net/socket.c:1649
__do_sys_bind net/socket.c:1660 [inline]
__se_sys_bind net/socket.c:1658 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1658
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446ba9
Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
Modules linked in:
---[ end trace 0dedabb13ca8e7d7 ]---
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Oliver Hartkopp

unread,
Jan 17, 2020, 3:03:21 PM1/17/20
to dev....@vandijck-laurijssen.be, m...@pengutronix.de, o.re...@pengutronix.de, syzbot, da...@davemloft.net, linu...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hi Marc, Oleksij, Kurt,
include/linux/rculist.h:528 is

struct hlist_node *first = h->first;

which would mean that 'h' must be NULL.

But the h parameter is rcv_list from
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

Which can not return NULL - at least when dev_rcv_lists is a proper
pointer to the dev_rcv_lists provided by can_dev_rcv_lists_find().

So either dev->ml_priv is NULL in the case of having a CAN interface
(here vxcan) or we have not allocated net->can.rx_alldev_list in
can_pernet_init() properly (which would lead to an -ENOMEM which is
reported to whom?).

Hm. I'm lost. Any ideas?

Regards,
Oliver

Kurt Van Dijck

unread,
Jan 20, 2020, 4:11:56 AM1/20/20
to Oliver Hartkopp, m...@pengutronix.de, o.re...@pengutronix.de, syzbot, da...@davemloft.net, linu...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
If bisect was right with this:

> >The bug was bisected to:
> >
> >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> >Author: Kurt Van Dijck <dev....@vandijck-laurijssen.be>
> >Date:   Mon Oct 8 09:48:33 2018 +0000
> >
> >     can: introduce CAN_REQUIRED_SIZE macro

Then I'd start looking in malformed sockaddr_can data instead.

Is this code what triggers the bug?
> >C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

Kind regards,
Kurt

Dmitry Vyukov

unread,
Jan 20, 2020, 4:23:09 AM1/20/20
to Oliver Hartkopp, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
On Mon, Jan 20, 2020 at 10:11 AM Kurt Van Dijck
<dev....@vandijck-laurijssen.be> wrote:
>
> If bisect was right with this:
>
> > >The bug was bisected to:
> > >
> > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> > >Author: Kurt Van Dijck <dev....@vandijck-laurijssen.be>
> > >Date: Mon Oct 8 09:48:33 2018 +0000
> > >
> > > can: introduce CAN_REQUIRED_SIZE macro
>
> Then I'd start looking in malformed sockaddr_can data instead.
>
> Is this code what triggers the bug?
> > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

yes
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200120091146.GD11138%40x1.vandijck-laurijssen.be.

Oliver Hartkopp

unread,
Jan 20, 2020, 5:02:31 PM1/20/20
to Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
Hi all,

On 20/01/2020 10.22, Dmitry Vyukov wrote:

>> Is this code what triggers the bug?
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
>
> yes
>

(..)

>>>> RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
>>>> RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
>>>
>>> include/linux/rculist.h:528 is
>>>
>>> struct hlist_node *first = h->first;
>>>
>>> which would mean that 'h' must be NULL.
>>>
>>> But the h parameter is rcv_list from
>>> rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
>>>
>>> Which can not return NULL - at least when dev_rcv_lists is a proper pointer
>>> to the dev_rcv_lists provided by can_dev_rcv_lists_find().
>>>
>>> So either dev->ml_priv is NULL in the case of having a CAN interface (here
>>> vxcan) ...

Added some code to check whether dev->ml_priv is NULL:

~/linux$ git diff
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 128d37a4c2e0..6fb4ae4c359e 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
spin_lock_bh(&net->can.rcvlists_lock);

dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
+ if (!dev_rcv_lists) {
+ pr_err("dev_rcv_lists == NULL! %p\n", dev);
+ goto out_unlock;
+ }
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

rcv->can_id = can_id;
@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
rcv_lists_stats->rcv_entries++;
rcv_lists_stats->rcv_entries_max =
max(rcv_lists_stats->rcv_entries_max,

rcv_lists_stats->rcv_entries);
+out_unlock:
spin_unlock_bh(&net->can.rcvlists_lock);

return err;

And the output (after some time) is:

[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.508045] bond7148: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.532025] bond10413: (slave vxcan1): The slave device specified
does not support setting the MAC address
[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f
[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.549310] bond7145: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.552301] bond7146: (slave vxcan1): The slave device specified does
not support setting the MAC address

So we can see that we get a ml_priv pointer which is NULL which should
not be possible due to this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743

Btw. the variable 'size' is set two times at the top of
alloc_candev_mqs() depending on echo_skb_max. This looks wrong.

Best regards,
Oliver

Oliver Hartkopp

unread,
Jan 20, 2020, 5:35:33 PM1/20/20
to Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs, Kurt Van Dijck
Answering myself ...
This reference doesn't point to the right code as vxcan has its own
handling do assign ml_priv in vxcan.c .

> Btw. the variable 'size' is set two times at the top of
> alloc_candev_mqs() depending on echo_skb_max. This looks wrong.

No. It looks right as I did not get behind the ALIGN() macro at first sight.

But it is still open why dev->ml_priv is not set correctly in vxcan.c as
all the settings for .priv_size and in vxcan_setup look fine.

Best regards,
Oliver

Kurt Van Dijck

unread,
Jan 21, 2020, 3:30:47 AM1/21/20
to Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
Maybe I got completely lost:
Shouldn't can_ml_priv and vxcan_priv not be similar?
Where is the dev_rcv_lists in the vxcan case?

>
> Best regards,
> Oliver

Kurt Van Dijck

unread,
Jan 21, 2020, 3:36:11 AM1/21/20
to Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
IMHO, net/can/af_can.c:306 is wrong in the vxcan case.

>
> >
> > Best regards,
> > Oliver

Kurt Van Dijck

unread,
Jan 21, 2020, 1:54:11 PM1/21/20
to Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote:
I indeed got completely lost. vxcan_priv & can_ml_priv form together the
private part. I continue looking
>
> >
> > Best regards,
> > Oliver

Oliver Hartkopp

unread,
Jan 21, 2020, 2:29:15 PM1/21/20
to Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs, Kurt Van Dijck
Hi Kurt,
I added some more debug output:

@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
spin_lock_bh(&net->can.rcvlists_lock);

dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
+ if (!dev_rcv_lists) {
+ pr_err("dev_rcv_lists == NULL! %p (%s)\n", dev, dev->name);
+ goto out_unlock;
+ }
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

rcv->can_id = can_id;


and the output becomes:

[ 1814.644087] bond5130: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.644106] bond5130: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.648867] bond5128: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128)
[ 1814.696420] bond5129: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.696438] bond5129: (slave vxcan1): Error -22 calling dev_set_mtu

So it's not the vxcan1 netdev that causes the issue but (sporadically!!)
the bonding netdev.

Interesting enough that the bonding device bond5128 obviously passes the

if (dev && dev->type != ARPHRD_CAN)
return -ENODEV;
test.

?!?

Regards,
Oliver

Kurt Van Dijck

unread,
Jan 21, 2020, 2:47:14 PM1/21/20
to Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.re...@pengutronix.de, syzbot, David Miller, linu...@vger.kernel.org, LKML, netdev, syzkaller-bugs
Did you consider my hypothesis I sent you (at 20h22 tonight)?
I don't personally understand all the locks around networking, but your
observation acks my theory of race condition.

>
> Regards,
> Oliver

syzbot

unread,
Apr 20, 2023, 1:19:44 PM4/20/23
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages