[syzbot] [can?] KCSAN: data-race in bcm_can_tx / bcm_tx_setup (3)

7 views
Skip to first unread message

syzbot

unread,
Apr 25, 2023, 4:05:00 AM4/25/23
to da...@davemloft.net, edum...@google.com, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, sock...@hartkopp.net, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 1a0beef98b58 Merge tag 'tpmdd-v6.4-rc1' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1485f1dbc80000
kernel config: https://syzkaller.appspot.com/x/.config?x=501f7c86f7a05a13
dashboard link: https://syzkaller.appspot.com/bug?extid=e1786f049e71693263bf
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/f06c11683242/disk-1a0beef9.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5c0a1cd5a059/vmlinux-1a0beef9.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e4c318183ce3/bzImage-1a0beef9.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e1786f...@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in bcm_can_tx / bcm_tx_setup

write to 0xffff888137fcff10 of 4 bytes by task 10792 on cpu 0:
bcm_tx_setup+0x698/0xd30 net/can/bcm.c:995
bcm_sendmsg+0x38b/0x470 net/can/bcm.c:1355
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
___sys_sendmsg net/socket.c:2555 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2584
__do_sys_sendmsg net/socket.c:2593 [inline]
__se_sys_sendmsg net/socket.c:2591 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2591
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff888137fcff10 of 4 bytes by interrupt on cpu 1:
bcm_can_tx+0x38a/0x410
bcm_tx_timeout_handler+0xdb/0x260
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749
hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766
__do_softirq+0xc1/0x265 kernel/softirq.c:571
invoke_softirq kernel/softirq.c:445 [inline]
__irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650
sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1107
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
kcsan_setup_watchpoint+0x3fe/0x410 kernel/kcsan/core.c:696
string_nocheck lib/vsprintf.c:648 [inline]
string+0x16c/0x200 lib/vsprintf.c:726
vsnprintf+0xa09/0xe20 lib/vsprintf.c:2796
add_uevent_var+0xf0/0x1c0 lib/kobject_uevent.c:665
kobject_uevent_env+0x225/0x5b0 lib/kobject_uevent.c:539
kobject_uevent+0x1c/0x20 lib/kobject_uevent.c:642
__loop_clr_fd+0x1e0/0x3b0 drivers/block/loop.c:1167
lo_release+0xe4/0xf0 drivers/block/loop.c:1745
blkdev_put+0x3fb/0x470
kill_block_super+0x83/0xa0 fs/super.c:1410
deactivate_locked_super+0x6b/0xd0 fs/super.c:331
deactivate_super+0x9b/0xb0 fs/super.c:362
cleanup_mnt+0x272/0x2e0 fs/namespace.c:1177
__cleanup_mnt+0x19/0x20 fs/namespace.c:1184
task_work_run+0x123/0x160 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0xd1/0xe0 kernel/entry/common.c:171
exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000059 -> 0x00000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 3096 Comm: syz-executor.5 Not tainted 6.3.0-syzkaller-00113-g1a0beef98b58 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Dmitry Vyukov

unread,
Apr 25, 2023, 4:37:07 AM4/25/23
to syzbot, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, sock...@hartkopp.net, syzkall...@googlegroups.com
On Tue, 25 Apr 2023 at 10:05, syzbot
<syzbot+e1786f...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 1a0beef98b58 Merge tag 'tpmdd-v6.4-rc1' of git://git.kerne..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1485f1dbc80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=501f7c86f7a05a13
> dashboard link: https://syzkaller.appspot.com/bug?extid=e1786f049e71693263bf
> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/f06c11683242/disk-1a0beef9.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/5c0a1cd5a059/vmlinux-1a0beef9.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e4c318183ce3/bzImage-1a0beef9.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e1786f...@syzkaller.appspotmail.com

op->currframe and probably other op fields are concurrently
read/modified by both bcm_tx_setup() and bcm_can_tx().
If I am reading the code correctly, it can lead to a wide range of
misbehavior, e.g. sending wrong/uninit data, reading/writing data
out-of-bounds, etc.
I think these functions need to be somehow serialized (stopping timers
before doing any modifications to op?).
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000059e1b705fa2494e4%40google.com.

Oliver Hartkopp

unread,
Apr 25, 2023, 5:37:01 PM4/25/23
to Dmitry Vyukov, syzbot, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello Dmitry,
KCSAN has detected a very special case here:

The content of the CAN frames (in a running tx-job) has been altered and
the number of CAN frames has been reduced. (Increasing if the number of
CAN frames is not possible with an active tx-job/running hrtimer).

Or (alternatively) the TX_RESET_MULTI_IDX flag has been set.

In both cases op->currframe is set to zero to start the sequence of the
CAN frames in op->frames in the next(!) hrtimer execution.

So setting values in op->currframe to zero (as pointed out by KCSAN) is
always a good move.

When there would be a race between the op->currframe++ in bcm_can_tx()
and the test for
if (op->nframes != msg_head->nframes) in bcm_tx_setup() this would be
fixed with
if (op->currframe >= op->nframes) in bcm_can_tx().

But looking at the code again I'm not sure if we might /potentially/
lose the TX_RESET_MULTI_IDX feature when the unlocked op->currframe++ is
performed concurrently in bcm_can_tx().

So a short local locking around the op->currframe r/w operations in
bcm_can_tx() and bcm_tx_setup() would make sense IMO.

The code is intended to update CAN frame content (with a fixed
non-increasing length) lock-less on the fly and there should be no other
"wide range of misbehavior" cases here.

I will take a look and send a patch for the op->currframe locking.

Many thanks for looking into this and best regards,
Oliver

Dmitry Vyukov

unread,
Apr 26, 2023, 3:05:11 AM4/26/23
to Oliver Hartkopp, syzbot, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
bcm_tx_timeout_handler() must also be racing with bcm_tx_setup() and
it reads more fields (kt_ival1, kt_ival2, flags, count) while they are
being changed.
Can bcm_tx_timeout_handler() read unint/partially
init/inconsistent/stale values for these fields?
Also can't bcm_can_tx() read partially overwritten/messed cf data when
sending, since it's already being overwritten by bcm_tx_setup()?

Oliver Hartkopp

unread,
May 11, 2023, 1:56:33 PM5/11/23
to Dmitry Vyukov, syzbot, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello Dmitry,
I needed to stare on the code some more time to boil it down to the
relevant and critical variables and functions that are concurrently
modified and executed from user context and the soft hrtimer.

And finally I had to figure out which kind of locking has to be used here:

https://docs.kernel.org/kernel-hacking/locking.html
"Locking Between User Context and Timers"

The RFC patch can be found here:
https://lore.kernel.org/linux-can/20230511174644....@hartkopp.net/T/#u

I'll do some more tests to check if the locking creates some 'real world
problems'.

Many thanks for your support!

Oliver

syzbot

unread,
Aug 6, 2023, 6:49:37 AM8/6/23
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages