[syzbot] [can?] KMSAN: uninit-value in can_receive (3)

24 views
Skip to first unread message

syzbot

unread,
Sep 5, 2025, 9:36:29 AM9/5/25
to linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, sock...@hartkopp.net, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: c3812b15000c Merge tag 'scsi-fixes' of git://git.kernel.or..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=144ece64580000
kernel config: https://syzkaller.appspot.com/x/.config?x=b9c31a1485dceb8e
dashboard link: https://syzkaller.appspot.com/bug?extid=4b8a1e4690e64b018227
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=177841f8580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=124ece64580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7736716fa3f9/disk-c3812b15.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/170d88657b1e/vmlinux-c3812b15.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b06f49d4c006/bzImage-c3812b15.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4b8a1e...@syzkaller.appspotmail.com

=====================================================
BUG: KMSAN: uninit-value in can_receive+0x219/0x5d0 net/can/af_can.c:654
can_receive+0x219/0x5d0 net/can/af_can.c:654
can_rcv+0x209/0x3a0 net/can/af_can.c:688
__netif_receive_skb_one_core net/core/dev.c:5704 [inline]
__netif_receive_skb+0x42b/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595
do_softirq+0x9a/0x100 kernel/softirq.c:462
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:389
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x2758/0x57d0 net/core/dev.c:4493
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
can_send+0xf1c/0x13b0 net/can/af_can.c:277
isotp_sendmsg+0x1afc/0x2340 net/can/isotp.c:1087
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
____sys_sendmsg+0x903/0xb60 net/socket.c:2583
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
__sys_sendmmsg+0x2ff/0x880 net/socket.c:2726
__do_sys_sendmmsg net/socket.c:2753 [inline]
__se_sys_sendmmsg net/socket.c:2750 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2750
x64_sys_call+0x33c2/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
slab_post_alloc_hook mm/slub.c:4125 [inline]
slab_alloc_node mm/slub.c:4168 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_node_track_caller_noprof+0x945/0x1240 mm/slub.c:4317
kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
netif_receive_generic_xdp net/core/dev.c:5112 [inline]
do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
__netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
__netif_receive_skb_one_core net/core/dev.c:5702 [inline]
__netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595

CPU: 1 UID: 0 PID: 5804 Comm: syz-executor907 Not tainted 6.13.0-rc7-syzkaller-00039-gc3812b15000c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
=====================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Prithvi Tambewagh

unread,
Nov 18, 2025, 2:51:38 PM11/18/25
to m...@pengutronix.de, sock...@hartkopp.net, Prithvi Tambewagh, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

The call trace suggests that the bug appears to be due to effect of change
in headroom by pskb_header_expand(). The new headroom remains uninitialized
and when can_receive tries accessing can_skb_prv(skb)->skbcnt, indirectly
skb->head is accessed which causes KMSAN uninitialized value read bug.

To fix this bug, I think we can call can_dropped_invalid_skb() in can_rcv()
just before calling can_receive(). Further, we can add a condition for these
sk_buff with uninitialized headroom to initialize the skb, the way it had
been done in the patch for an earlier packet injection case in a similar
KMSAN bug:
https://lore.kernel.org/linux-can/20191207183418.2...@hartkopp.net/

However, I am not getting on what basis can I filter the sk_buff so that
only those with an uninitialized headroom will be initialized via this path.
Is this the correct approach?

Thank you,
Prithvi

Prithvi Tambewagh

unread,
Nov 28, 2025, 12:48:54 PM11/28/25
to m...@pengutronix.de, sock...@hartkopp.net, Prithvi Tambewagh, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

I wanted to seek feedback about this proposed approach for solving the
KMSAN: uninit-value in can_receive bug, can you please guide and share
your feedback?

Thank you,
Prithvi Tambewagh

Oliver Hartkopp

unread,
Nov 29, 2025, 12:04:26 PM11/29/25
to Prithvi Tambewagh, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello Prithvi,

thanks for picking up this topic!

I had your mail in my open tabs and I was reading some code several
times without having a really good idea how to continue.

On 17.11.25 18:30, Prithvi Tambewagh wrote:

> The call trace suggests that the bug appears to be due to effect of change
> in headroom by pskb_header_expand(). The new headroom remains uninitialized
> and when can_receive tries accessing can_skb_prv(skb)->skbcnt, indirectly
> skb->head is accessed which causes KMSAN uninitialized value read bug.

Yes.

If you take a look at the KMSAN message:

https://lore.kernel.org/linux-can/68bae75b.050a022...@google.com/T/#m0372e223746b9da19cbf39348ab1cda52a5cfadc

I wonder why anybody is obviously fiddling with the with the skb->head here.

When initially creating skb for the CAN subsystem we use
can_skb_reserve() which does a

skb_reserve(skb, sizeof(struct can_skb_priv));

so that we get some headroom for struct can_skb_priv.

Then we access this struct by referencing skb->head:

static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
{
return (struct can_skb_priv *)(skb->head);
}

If anybody is now extending the headroom skb->head will likely not
pointing to struct can_skb_priv anymore, right?

> To fix this bug, I think we can call can_dropped_invalid_skb() in can_rcv()
> just before calling can_receive(). Further, we can add a condition for these
> sk_buff with uninitialized headroom to initialize the skb, the way it had
> been done in the patch for an earlier packet injection case in a similar
> KMSAN bug:
> https://lore.kernel.org/linux-can/20191207183418.2...@hartkopp.net/

No. This is definitely a wrong approach. You can not wildly poke values
behind skb->head, when the correctly initialized struct can_skb_priv
just sits somewhere else.

In opposite to the case in your referenced patch we do not get a skb
from PF_PACKET but we handle a skb that has been properly created in
isotp_sendmsg(). Including can_skb_reserve() and an initialized struct
can_skb_priv.

> However, I am not getting on what basis can I filter the sk_buff so that
> only those with an uninitialized headroom will be initialized via this path.
> Is this the correct approach?

No.

When we are creating CAN skbs with [can_]skb_reserve(), the struct
can_skb_priv is located directly "before" the struct can_frame which is
at skb->data.

I'm therefore currently thinking in the direction of using skb->data
instead of skb->head as reference to struct can_skb_priv:

diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
index 1abc25a8d144..8822d7d2e3df 100644
--- a/include/linux/can/skb.h
+++ b/include/linux/can/skb.h
@@ -60,11 +60,11 @@ struct can_skb_priv {
struct can_frame cf[];
};

static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
{
- return (struct can_skb_priv *)(skb->head);
+ return (struct can_skb_priv *)(skb->data - sizeof(struct
can_skb_priv));
}

static inline void can_skb_reserve(struct sk_buff *skb)
{
skb_reserve(skb, sizeof(struct can_skb_priv));

I have not checked what effect this might have to this patch

https://lore.kernel.org/linux-can/20191207183418.2...@hartkopp.net/

when we initialize struct can_skb_priv inside skbs we did not create in
the CAN subsystem. The difference would be that we access struct
can_skb_priv via skb->data and not via skb->head. The effect to the
system should be similar.

What do you think about such approach?

Best regards,
Oliver

Prithvi Tambewagh

unread,
Nov 30, 2025, 7:04:52 AM11/30/25
to sock...@hartkopp.net, Prithvi Tambewagh, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello Oliver,

Thanks for the feedback! I now understand how struct can_skb_priv is
reserved in the headroom, more clearly, given that I am relatively new
to kernel development. I agree on your patch.

I tested it locally using the reproducer program for this bug provided by
syzbot and it didn't crash the kernel. Also, I checked the patch here

https://lore.kernel.org/linux-can/20191207183418.2...@hartkopp.net/

looking at it, I think your patch will work fine with the above patch as
well, since data will be accessed at

skb->data - sizeof(struct can_skb_priv)

which is the intended place for it, according to te action of
can_skb_reserve() which increases headroom by length
sizeof(struct can_skb_priv), reserving the space just before skb->data.

I think it solves this specific KMSAN bug. Kindly correct me if I am wrong.

Would you like to fix this bug by sending your patch upstream? Or else
shall I send this patch upstream and mention your name in Suggested-by tag?

Thank you,
Prithvi Tambewagh

Oliver Hartkopp

unread,
Nov 30, 2025, 7:44:43 AM11/30/25
to Prithvi Tambewagh, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
>> https://lore.kernel.org/linux-can/20191207183418.28868-1-
>> sock...@hartkopp.net/
> https://lore.kernel.org/linux-can/20191207183418.28868-1-
> sock...@hartkopp.net/
>
> when we initialize struct can_skb_priv inside skbs we did not create in
> the CAN subsystem. The difference would be that we access struct
> can_skb_priv via skb->data and not via skb->head. The effect to the
> system should be similar.
>
> What do you think about such approach?
>
> Best regards,
> Oliver
>

Hello Prithvi,

I'm answering in this mail thread as you answered on the other thread
which does not preserve the discussion above.

On 30.11.25 13:04, Prithvi Tambewagh wrote:
> Hello Oliver,
>
> Thanks for the feedback! I now understand how struct can_skb_priv is
> reserved in the headroom, more clearly, given that I am relatively new
> to kernel development. I agree on your patch.
>
> I tested it locally using the reproducer program for this bug
provided by
> syzbot and it didn't crash the kernel. Also, I checked the patch here
>
>
https://lore.kernel.org/linux-can/20191207183418.2...@hartkopp.net/
>
> looking at it, I think your patch will work fine with the above patch as
> well, since data will be accessed at
>
> skb->data - sizeof(struct can_skb_priv)
>
> which is the intended place for it, according to te action of
> can_skb_reserve() which increases headroom by length
> sizeof(struct can_skb_priv), reserving the space just before skb->data.
>
> I think it solves this specific KMSAN bug. Kindly correct me if I am
wrong.

Yes. It solves that specific bug. But IMO we need to fix the root cause
of this issue.

The CAN skb is passed to NAPI and XDP code

kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
netif_receive_generic_xdp net/core/dev.c:5112 [inline]
do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
__netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524

which invoked pskb_expand_head() which manipulates skb->head and
therefore removes the reference to our struct can_skb_priv.
> Would you like to fix this bug by sending your patch upstream? Or else
> shall I send this patch upstream and mention your name in
Suggested-by tag?

No. Neither of that - as it will not fix the root cause.

IMO we need to check who is using the headroom in CAN skbs and for what
reason first. And when we are not able to safely control the headroom
for our struct can_skb_priv content we might need to find another way to
store that content.
E.g. by creating this space behind skb->data or add new attributes to
struct sk_buff.

Best regards,
Oliver

Prithvi Tambewagh

unread,
Nov 30, 2025, 12:29:39 PM11/30/25
to Oliver Hartkopp, m...@pungutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Oliver,

Apologies for this, I was using git send-email and probably messed up with
the Message ID. I have just set up mutt, this should be correct now.
I will work in this direction. Just to confirm, what you mean is
that first it should be checked where the headroom is used while also
checking whether the data from region covered by struct can_skb_priv is
intact, and if not then we need to ensure that it is intact by other
measures, right?

>
>Best regards,
>Oliver

Thank You,
Prithvi

Oliver Hartkopp

unread,
Nov 30, 2025, 2:09:53 PM11/30/25
to Prithvi Tambewagh, Marc Kleine-Budde, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hi Prithvi,
I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
an sent some CAN frames with cansend.

CAN CC:

[ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
mac=(16,0) mac_len=0 net=(16,0) trans=16
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0
valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 3351.708151] dev name=can0 feat=0x0000000000004008
[ 3351.708159] sk family=29 type=3 proto=0
[ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00
[ 3351.708173] skb linear: 00000000: 23 01 00 00 04 00 00 00 11 22 33
44 00 00 00 00

(..)

CAN FD:

[ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
mac=(16,0) mac_len=0 net=(16,0) trans=16
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0
valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 3557.069499] dev name=can0 feat=0x0000000000004008
[ 3557.069507] sk family=29 type=3 proto=0
[ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00
[ 3557.069520] skb linear: 00000000: 33 03 00 00 10 05 00 00 00 11 22
33 44 55 66 77
[ 3557.069526] skb linear: 00000010: 88 aa bb cc dd ee ff 00 00 00 00
00 00 00 00 00

(..)

CAN XL:

[ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
mac=(16,0) mac_len=0 net=(16,0) trans=16
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0
valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 5477.498236] dev name=can0 feat=0x0000000000004008
[ 5477.498244] sk family=29 type=3 proto=0
[ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00
[ 5477.498258] skb linear: 00000000: b0 05 92 00 81 cd 80 03 cd b4 92
58 4c a1 f6 0c
[ 5477.498264] skb linear: 00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d
0a 4c a1 f6 0c
[ 5477.498269] skb linear: 00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d
0a 4c a1 f6 0c
[ 5477.498275] skb linear: 00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d
0a 4c a1 f6 0c


I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive
path to see what's going on there.

My main problem with the KMSAN message
https://lore.kernel.org/linux-can/68bae75b.050a022...@google.com/
is that it uses

NAPI, XDP and therefore pskb_expand_head():

kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
netif_receive_generic_xdp net/core/dev.c:5112 [inline]
do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
__netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
__netif_receive_skb_one_core net/core/dev.c:5702 [inline]
__netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]

As you can see in
https://syzkaller.appspot.com/x/log.txt?x=144ece64580000

[pid 5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
[pid 5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0

they are using the vxcan driver which is mainly derived from vcan.c and
veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP
features today which vxcan.c still does NOT support.

Therefore I wonder how the NAPI and XDP code can be used together with
vxcan. And if this is still the case today, as the syzcaller kernel
6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.

Many questions ...

Best regards,
Oliver

Prithvi

unread,
Dec 7, 2025, 1:45:13 PM12/7/25
to Oliver Hartkopp, Marc Kleine-Budde, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Oliver,

Firstly I apologize for I have not been able to get back to the coversation.
I have my exams going on right now and unfortunately my PC got some hardware
issue, due to which I am using another old PC, which d0oesn't work much well.
Hence I am not able to work on this right now

However I look forward to continue testing this bug ASAP. There are sevral
things to analyse here.

Best regards,
Prithvi

Prithvi

unread,
Dec 20, 2025, 12:33:55 PM12/20/25
to Oliver Hartkopp, Marc Kleine-Budde, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Sun, Nov 30, 2025 at 08:09:48PM +0100, Oliver Hartkopp wrote:
Hello Oliver,

I tried investigating further why the XDP path was chosen inspite of using
vxcan. I tried looking for dummy_can.c in upstream tree but could not find
it; I might be missing something here - could you please tell where can I
find it? Meanwhile, I tried using GDB for the analysis.

I observed in the bug's strace log:

[pid 5804] bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=3, insns=0x200000c0, license="syzkaller", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_XDP, prog_btf_fd=-1, func_info_rec_size=8, func_info=NULL, func_info_cnt=0, line_info_rec_size=16, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL, ...}, 144) = 3
[pid 5804] socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 4
[pid 5804] sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\x34\x00\x00\x00\x10\x00\x01\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x40\x01\x00\x00\x00\x01\x00\x0c\x00\x2b\x80\x08\x00\x01\x00\x03\x00\x00\x00\x08\x00\x1b\x00\x00\x00\x00\x00", iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_DONTWAIT|MSG_FASTOPEN}, 0) = 52
[pid 5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
[pid 5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0

Notably, before binding vxcan0 to the CAN socket, a BPF program is loaded.
I then tried using GDB to check and got the following insights:

(gdb) b vxcan_xmit
Breakpoint 23 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
(gdb) delete 23
(gdb) b __sys_bpf
Breakpoint 24 at 0xffffffff81d2653e: file kernel/bpf/syscall.c, line 5752.
(gdb) b bpf_prog_load
Breakpoint 25 at 0xffffffff81d2cd80: file kernel/bpf/syscall.c, line 2736.
(gdb) b vxcan_xmit if (oskb->dev->name[0]=='v' && ((oskb->dev->name[1]=='x' && oskb->dev->name[2]=='c' && oskb->dev->name[3]=='a' && oskb->dev->name[4]=='n') || (oskb->dev->name[1]=='c' && oskb->dev->name[2]=='a' && oskb->dev->name[3]=='n')))
Breakpoint 26 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
(gdb) b __netif_receive_skb if (skb->dev->name[0]=='v' && ((skb->dev->name[1]=='x' && skb->dev->name[2]=='c' && skb->dev->name[3]=='a' && skb->dev->name[4]=='n') || (skb->dev->name[1]=='c' && skb->dev->name[2]=='a' && skb->dev->name[3]=='n')))
Breakpoint 27 at 0xffffffff8ce3c310: file net/core/dev.c, line 5798.
(gdb) b do_xdp_generic if (pskb->dev->name[0]=='v' && ((pskb->dev->name[1]=='x' && pskb->dev->name[2]=='c' && pskb->dev->name[3]=='a' && pskb->dev->name[4]=='n') || (pskb->dev->name[1]=='c' && pskb->dev->name[2]=='a' && pskb->dev->name[3]=='n')))
Breakpoint 28 at 0xffffffff8cdfccd7: file net/core/dev.c, line 5171.
(gdb) b dev_xdp_attach if (dev->name[0]=='v' && ((dev->name[1]=='x' && dev->name[2]=='c' && dev->name[3]=='a' && dev->name[4]=='n') || (dev->name[1]=='c' && dev->name[2]=='a' && dev->name[3]=='n')))
Breakpoint 29 at 0xffffffff8ce18b4e: file net/core/dev.c, line 9610.

Thread 2 hit Breakpoint 24, __sys_bpf (cmd=cmd@entry=BPF_PROG_LOAD, uattr=..., size=size@entry=144) at kernel/bpf/syscall.c:5752
5752 {
(gdb) c
Continuing.

Thread 2 hit Breakpoint 25, bpf_prog_load (attr=attr@entry=0xffff88811c987d60, uattr=..., uattr_size=144) at kernel/bpf/syscall.c:2736
2736 {
(gdb) c
Continuing.
[Switching to Thread 1.1]

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff888124e78000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610 {
(gdb) p dev->name
$104 = "vcan0\000\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$105 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e918000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610 {
(gdb) p dev->name
$106 = "vxcan0\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$107 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e910000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610 {
(gdb) p dev->name
$108 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$109 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.
[Switching to Thread 1.2]

Here, it is attempted to attach the eariler BPF program to each of the CAN
devices present (I checked only for CAN devices since we are dealing with
effect of XDP in CAN networing stack). Earlier they didn't seem to have any
BPF program attached due to which XDP wasn't attempted for these CAN devices
earlier.

Thread 2 hit Breakpoint 26, vxcan_xmit (oskb=0xffff888115d8a400, dev=0xffff88818e918000) at drivers/net/can/vxcan.c:38
38 {
(gdb) p oskb->dev->name
$110 = "vxcan0\000\000\000\000\000\000\000\000\000"
(gdb) p oskb->dev->xdp_prog
$111 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

Thread 2 hit Breakpoint 27, __netif_receive_skb (skb=skb@entry=0xffff888115d8ab00) at net/core/dev.c:5798
5798 {
(gdb) p skb->dev->name
$112 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p skb->dev->xdp_prog
$113 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

Thread 2 hit Breakpoint 28, do_xdp_generic (xdp_prog=0xffffc9000a516000, pskb=0xffff88843fc05af8) at net/core/dev.c:5171
5171 {
(gdb) p pskb->dev->name
$114 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p pskb->dev->xdp_prog
$115 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

After this, the KMSAN bug is triggered. Hence, we can conclude that due to the
BPF program loaded earlier, the CAN device undertakes generic XDP path during RX,
which is accessible even if vxcan doesn't support XDP by itself.

It seems that the way CAN devices use the headroom for storing private skb related
data might be incompatible for XPD path, due to which the generic networking stack
at RX requires to expand the head, and it is done in such a way that the yet
uninitialized expanded headroom is accesssed by can_skb_prv() using skb->head.

So, I think we can solve this bug in the following ways:

1. As you suggested earlier, access struct can_skb_priv using:
struct can_skb_priv *)(skb->data - sizeof(struct can_skb_priv)
This method ensures that the remaining CAN networking stack, which expects can_skb_priv
just before skb->data, as well as maintain compatibility with headroom expamnsion during
generic XDP.

2. Try to find some way so that XDP pathway is rejected by CAN devices at the beginning
itself, like for example in function dev_xdp_attach():

/* don't call drivers if the effective program didn't change */
if (new_prog != cur_prog) {
bpf_op = dev_xdp_bpf_op(dev, mode);
if (!bpf_op) {
NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode");
return -EOPNOTSUPP;
}

err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog);
if (err)
return err;
}

or in some other appropriate way.

What do you think what should be done ahead?

Best Regards,
Prithvi

Oliver Hartkopp

unread,
Dec 21, 2025, 1:29:48 PM12/21/25
to Andrii Nakryiko, Prithvi, Marc Kleine-Budde, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Andrii,

we have a "KMSAN: uninit value" problem which is created by
netif_skb_check_for_xdp() and later pskb_expand_head().

The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the
CAN bus related skbs allocate 16 bytes of pricate headroom.

Although CAN netdevs don't support XDP the KMSAN issue shows that the
headroom is expanded for CAN skbs and a following access to the CAN skb
private data via skb->head now reads from the beginning of the XDP
expanded head which is (of course) uninitialized.

Prithvi thankfully did some investigation (see below!) which proved my
estimation about "someone is expanding our CAN skb headroom".

Prithvi also proposed two ways to solve the issue (at the end of his
mail below), where I think the first one is a bad hack (although it was
my idea).

The second idea is a change for dev_xdp_attach() where your expertise
would be necessary.

My sugestion would rather go into the direction to extend dev_xdp_mode()

https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170

in a way that it allows to completely disable XDP for CAN skbs, e.g.
with a new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.

Do you have any (better) idea how to preserve the private data in the
skb->head of CAN related skbs?

Many thanks and best regards,
Oliver

ps. original mail thread at
https://lore.kernel.org/linux-can/68bae75b.050a022...@google.com/

Marc Kleine-Budde

unread,
Dec 21, 2025, 2:06:50 PM12/21/25
to Oliver Hartkopp, Andrii Nakryiko, Prithvi, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On 21.12.2025 19:29:37, Oliver Hartkopp wrote:
> we have a "KMSAN: uninit value" problem which is created by
> netif_skb_check_for_xdp() and later pskb_expand_head().
>
> The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the CAN
> bus related skbs allocate 16 bytes of pricate headroom.
>
> Although CAN netdevs don't support XDP the KMSAN issue shows that the
> headroom is expanded for CAN skbs and a following access to the CAN skb
> private data via skb->head now reads from the beginning of the XDP expanded
> head which is (of course) uninitialized.
>
> Prithvi thankfully did some investigation (see below!) which proved my
> estimation about "someone is expanding our CAN skb headroom".
>
> Prithvi also proposed two ways to solve the issue (at the end of his mail
> below), where I think the first one is a bad hack (although it was my idea).
>
> The second idea is a change for dev_xdp_attach() where your expertise would
> be necessary.
>
> My sugestion would rather go into the direction to extend dev_xdp_mode()
>
> https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170
>
> in a way that it allows to completely disable XDP for CAN skbs, e.g. with a
> new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.

That sounds not like a good idea to me.

> Do you have any (better) idea how to preserve the private data in the
> skb->head of CAN related skbs?

We probably have to place the data somewhere else.

regards,
Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
signature.asc

Oliver Hartkopp

unread,
Dec 21, 2025, 2:42:20 PM12/21/25
to Marc Kleine-Budde, Andrii Nakryiko, Prithvi, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org


On 21.12.25 20:06, Marc Kleine-Budde wrote:
> On 21.12.2025 19:29:37, Oliver Hartkopp wrote:
>> we have a "KMSAN: uninit value" problem which is created by
>> netif_skb_check_for_xdp() and later pskb_expand_head().
>>
>> The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the CAN
>> bus related skbs allocate 16 bytes of pricate headroom.
>>
>> Although CAN netdevs don't support XDP the KMSAN issue shows that the
>> headroom is expanded for CAN skbs and a following access to the CAN skb
>> private data via skb->head now reads from the beginning of the XDP expanded
>> head which is (of course) uninitialized.
>>
>> Prithvi thankfully did some investigation (see below!) which proved my
>> estimation about "someone is expanding our CAN skb headroom".
>>
>> Prithvi also proposed two ways to solve the issue (at the end of his mail
>> below), where I think the first one is a bad hack (although it was my idea).
>>
>> The second idea is a change for dev_xdp_attach() where your expertise would
>> be necessary.
>>
>> My sugestion would rather go into the direction to extend dev_xdp_mode()
>>
>> https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170
>>
>> in a way that it allows to completely disable XDP for CAN skbs, e.g. with a
>> new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.
>
> That sounds not like a good idea to me.
>
>> Do you have any (better) idea how to preserve the private data in the
>> skb->head of CAN related skbs?
>
> We probably have to place the data somewhere else.

Maybe in the tail room or inside struct sk_buff with some #ifdef
CONFIG_CAN handling?

But let's wait for Andrii's feedback first, whether he is generally
aware of this XDP behavior effect on CAN skbs.

Best regards,
Oliver

Prithvi

unread,
Jan 2, 2026, 10:36:21 AM (9 days ago) Jan 2
to and...@kernel.org, sock...@hartkopp.net, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Andrii,

Just a gentle ping on this thread

Thanks,
Prithvi

Jakub Kicinski

unread,
Jan 2, 2026, 3:04:09 PM (9 days ago) Jan 2
to Prithvi, and...@kernel.org, sock...@hartkopp.net, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Fri, 2 Jan 2026 21:06:11 +0530 Prithvi wrote:
> Just a gentle ping on this thread

You're asking the wrong person, IIUC Andrii is tangentially involved
in XDP (via bpf links?):

XDP (eXpress Data Path)
M: Alexei Starovoitov <a...@kernel.org>
M: Daniel Borkmann <dan...@iogearbox.net>
M: David S. Miller <da...@davemloft.net>
M: Jakub Kicinski <ku...@kernel.org>
M: Jesper Dangaard Brouer <ha...@kernel.org>
M: John Fastabend <john.fa...@gmail.com>
R: Stanislav Fomichev <s...@fomichev.me>
L: net...@vger.kernel.org
L: b...@vger.kernel.org

Without looking too deeply - XDP has historically left the new space
uninitialized after push, expecting programs to immediately write the
headers in that space. syzbot had run into this in the past but I can't
find any references to past threads quickly :(

Oliver Hartkopp

unread,
Jan 3, 2026, 7:20:43 AM (8 days ago) Jan 3
to Jakub Kicinski, Prithvi, and...@kernel.org, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Jakub,

thanks for stepping in!

On 02.01.26 21:04, Jakub Kicinski wrote:

> You're asking the wrong person, IIUC Andrii is tangentially involved
> in XDP (via bpf links?):
>
(..)
>
> Without looking too deeply - XDP has historically left the new space
> uninitialized after push, expecting programs to immediately write the
> headers in that space. syzbot had run into this in the past but I can't
> find any references to past threads quickly :(

To identify Andrii I mainly looked into the code with 'git blame' that
led to this problematic call chain:

pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
netif_receive_generic_xdp net/core/dev.c:5112 [inline]
do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180

Having in mind that the syzkaller refers to
6.13.0-rc7-syzkaller-00039-gc3812b15000c I wonder if we can leave this
report as-is, as the problem might be solved in the meantime??

In any case I wonder, if we should add some code to re-check if the
headroom of the CAN-related skbs is still consistent and not changed in
size by other players. And maybe add some WARN_ON_ONCE() before dropping
the skb then.

When the skb headroom is not safe to be used we need to be able to
identify and solve it.

Best regards,
Oliver

Jakub Kicinski

unread,
Jan 4, 2026, 10:42:26 AM (7 days ago) Jan 4
to Oliver Hartkopp, Prithvi, and...@kernel.org, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Ugh, I should have looked at the report. The struct can_skb_priv
business is highly unconventional for the networking stack.
Would it be possible to kmalloc() this info and pass it to the socket
via shinfo->destructor_arg?

Oliver Hartkopp

unread,
Jan 5, 2026, 8:47:18 AM (6 days ago) Jan 5
to Jakub Kicinski, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Jakub, all,
I did some more code investigation about struct skb_shared_info which
aims to be "invariant across clones".

Our struct can_skb_priv does the same and looks like this:

/**
* struct can_skb_priv - private additional data inside CAN sk_buffs
* @ifindex: ifindex of the first interface the CAN frame appeared on
* @skbcnt: atomic counter to have an unique id together with skb pointer
* @frame_len: length of CAN frame in data link layer
* @cf: align to the following CAN frame at skb->data
*/
struct can_skb_priv {
int ifindex;
int skbcnt;
unsigned int frame_len;
struct can_frame cf[];
};

Where ifindex and skbcnt needs to be invariant across clones.

The frame_len is some intelligent length value caching which might be
solved differently.

As the skbcnt is used as some incremented identifier to identify the CAN
skb in the receive path with RPS, we might use the existing skb->hash
space for it.

For the ifindex I would propose to store it in struct skb_shared_info:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 86737076101d..f7233b8f461c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -604,10 +604,15 @@ struct skb_shared_info {
struct xsk_tx_metadata_compl xsk_meta;
};
unsigned int gso_type;
u32 tskey;

+#if IS_ENABLED(CONFIG_CAN)
+ /* initial CAN iif to avoid routing back to it (can-gw) */
+ int can_iif;
+#endif
+
/*
* Warning : all fields before dataref are cleared in __alloc_skb()
*/
atomic_t dataref;

Would this be a suitable approach to get rid of struct can_skb_priv in
your opinion?

If so I would send three RFC patches:

- remove the need for can_skb_priv::frame_len
- make use of skb->hash instead of can_skb_priv::skbcnt
- move can_skb_priv:ifindex to skb_shared_info::can_iif

Which finally removes struct can_skb_priv and the highly unconventional
skb->head construction.

Best regards,
Oliver

Andrii Nakryiko

unread,
Jan 5, 2026, 4:30:26 PM (6 days ago) Jan 5
to Oliver Hartkopp, Jakub Kicinski, Prithvi, and...@kernel.org, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Sat, Jan 3, 2026 at 4:21 AM Oliver Hartkopp <sock...@hartkopp.net> wrote:
>
> Hello Jakub,
>
> thanks for stepping in!
>
> On 02.01.26 21:04, Jakub Kicinski wrote:
>
> > You're asking the wrong person, IIUC Andrii is tangentially involved
> > in XDP (via bpf links?):
> >
> (..)
> >
> > Without looking too deeply - XDP has historically left the new space
> > uninitialized after push, expecting programs to immediately write the
> > headers in that space. syzbot had run into this in the past but I can't
> > find any references to past threads quickly :(
>
> To identify Andrii I mainly looked into the code with 'git blame' that

Hey, sorry for a late response, I've been out on vacation for the past
~2 weeks. But as Jakub correctly pointed out, I'm probably not the
right person to help with this, I touched XDP bits only superficially
to wire up some generic BPF infrastructure, while the issue at hand
goes deeper than that. I'll let you guys figure this out.

Daniel Borkmann

unread,
Jan 5, 2026, 5:11:33 PM (6 days ago) Jan 5
to Jakub Kicinski, Prithvi, and...@kernel.org, sock...@hartkopp.net, m...@pengutronix.de, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On 1/2/26 9:04 PM, Jakub Kicinski wrote:
[...]
>
> Without looking too deeply - XDP has historically left the new space
> uninitialized after push, expecting programs to immediately write the
> headers in that space.
Correct, and the same also holds true for tc BPF programs. This is an
optimization given zero'ing and then again immediately write of the
headers from the BPF program cannot be optimized by the compiler into
just the latter write. Additionally, load/attachments of these programs
require root capabilities (e.g. cap_bpf, cap_perfmon, cap_net_admin).

Jakub Kicinski

unread,
Jan 5, 2026, 6:26:42 PM (6 days ago) Jan 5
to Oliver Hartkopp, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Mon, 5 Jan 2026 14:47:08 +0100 Oliver Hartkopp wrote:
> For the ifindex I would propose to store it in struct skb_shared_info:
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 86737076101d..f7233b8f461c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -604,10 +604,15 @@ struct skb_shared_info {
> struct xsk_tx_metadata_compl xsk_meta;
> };
> unsigned int gso_type;
> u32 tskey;
>
> +#if IS_ENABLED(CONFIG_CAN)
> + /* initial CAN iif to avoid routing back to it (can-gw) */
> + int can_iif;
> +#endif
> +
> /*
> * Warning : all fields before dataref are cleared in __alloc_skb()
> */
> atomic_t dataref;
>
> Would this be a suitable approach to get rid of struct can_skb_priv in
> your opinion?

Possibly a naive question but why is skb_iif not working here?

Oliver Hartkopp

unread,
Jan 6, 2026, 7:04:51 AM (5 days ago) Jan 6
to Jakub Kicinski, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
With the CAN gateway (net/can/gw.c) incoming CAN frames can be modified
and forwarded to other CAN interfaces via dev_queue_xmit(skb).

When such skb is echo'ed back after successful transmission via
netif_rx() this leads to skb->skb_iif = skb->dev->ifindex;

To prevent a loopback the CAN frame must not be sent back to the
originating interface - even when it has been routed to different CAN
interfaces in the meantime (which always overwrites skb_iif).

Therefore we need to maintain the "real original" incoming interface.

can_iif could also be named skb_initial_iif when someone else would need
such an information too.

Best regards,
Oliver


Jakub Kicinski

unread,
Jan 6, 2026, 7:23:09 PM (5 days ago) Jan 6
to Oliver Hartkopp, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Tue, 6 Jan 2026 13:04:41 +0100 Oliver Hartkopp wrote:
> When such skb is echo'ed back after successful transmission via
> netif_rx() this leads to skb->skb_iif = skb->dev->ifindex;
>
> To prevent a loopback the CAN frame must not be sent back to the
> originating interface - even when it has been routed to different CAN
> interfaces in the meantime (which always overwrites skb_iif).
>
> Therefore we need to maintain the "real original" incoming interface.

Alternatively perhaps for this particular use case you could use
something like metadata_dst to mark the frame as forwarded / annotate
with the originating ifindex?

Oliver Hartkopp

unread,
Jan 7, 2026, 10:34:22 AM (4 days ago) Jan 7
to Jakub Kicinski, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Hello Jakub,
I looked into it and the way how skb_dst is shared in the union behind
cb[] does not look very promising for skbs that wander up and down in
the network layer. And it is pretty complex to just store a single
interface index integer value.

While looking into _sk_redir to see how the _skb_refdst union is used,
I've seen that the _sk_redir function was removed from struct tcp_skb_cb
(commit e3526bb92a208).

Today we use skb->cb only for passing (address) information from the
network layer to the socket layer and user space. But the space in cb[]
could also hold the content we currently store in the problematic skb
headroom.

Would using skb->cb be a good approach for CAN skbs (that do not have
any of the Ethernet/TCP/IP requirements/features) or will there still be
networking code (besides CAN drivers and CAN network layer) that writes
into cb[] when passing the CAN skb up and down in the stack?

/**
* struct can_skb_cb - private data inside CAN skb->cb
* cb[] is 64 bit aligned which is also recommended for struct sockaddr_can
* @magic: to check if someone wrote to our CAN skb->cb space
* @flags: extra flags for CAN_RAW and CAN_BCM sockets
* @can_addr: socket address information to userspace
* @can_iif: ifindex of the first interface the CAN frame appeared on
* @skbcnt: atomic counter to have an unique id together with skb pointer
* @frame_len: bql length cache of CAN frame in data link layer
*/
struct can_skb_cb {
u32 magic;
u32 flags;
struct sockaddr_can can_addr;
int can_iif;
int skbcnt;
unsigned int frame_len;
};

If not: We also don't have vlans nor inner[protocol|headers] in CAN
where we might store the 4 byte can_iif integer ...

Oliver Hartkopp

unread,
Jan 7, 2026, 2:11:01 PM (4 days ago) Jan 7
to Jakub Kicinski, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
Sorry for answering myself:

The below idea using skb->cb definitely does not work :-/

But as we never use encapsulation in CAN skbs we can use the
inner_protocol and inner_xxx_header space when skb->encapsulation is false:

union {
/* encapsulation == true */
struct {
union {
__be16 inner_protocol;
__u8 inner_ipproto;
};

__u16 inner_transport_header;
__u16 inner_network_header;
__u16 inner_mac_header;
};
/* encapsulation == false */
struct {
int can_iif;
__u16 can_frame_len;
};
};


Best regards,
Oliver

Jakub Kicinski

unread,
Jan 8, 2026, 10:17:08 AM (3 days ago) Jan 8
to Oliver Hartkopp, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
On Wed, 7 Jan 2026 16:34:13 +0100 Oliver Hartkopp wrote:
> > Alternatively perhaps for this particular use case you could use
> > something like metadata_dst to mark the frame as forwarded / annotate
> > with the originating ifindex?
>
> I looked into it and the way how skb_dst is shared in the union behind
> cb[] does not look very promising for skbs that wander up and down in
> the network layer.

Maybe I'm misunderstanding, but skb_dst is only unioned with some
socket layer (TCP and sockmsg) fields, not with cb[]. It'd be
problematic if CAN gw frames had to traverse routing but I don't
think they do?

Oliver Hartkopp

unread,
Jan 8, 2026, 11:27:36 AM (3 days ago) Jan 8
to Jakub Kicinski, m...@pengutronix.de, Prithvi, and...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, net...@vger.kernel.org
We are using skb's that are e.g. created on socket level and only
contain fixed struct can[|fd|xl]_frames that are written into CAN
controllers registers on netdev level. The skb is just a dumb container,
which passes qdiscs and are stored in the CAN device cache until the CAN
frame is sent successfully on the CAN bus. And when it was sent, the skb
is echo'ed back via netif_rx() so that all local applications can see
the real traffic on the CAN bus. So our skb's can go down and up.

I did some more investigation and created 5 RFC patches that solve the
issue with the problematic private headroom (struct can_skb_priv) in CAN
skbs.

I'll continue testing - but it looks pretty good so far.

Best regards,
Oliver
Reply all
Reply to author
Forward
0 new messages