[syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

30 views
Skip to first unread message

syzbot

unread,
Mar 10, 2021, 11:46:16 AM3/10/21
to bri...@redhat.com, bse...@google.com, dietmar....@arm.com, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, syzkall...@googlegroups.com, vincent...@linaro.org
Hello,

syzbot found the following issue on:

HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e74b94...@syzkaller.appspotmail.com

Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
ra : task_pid_vnr include/linux/sched.h:1421 [inline]
ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Dmitry Vyukov

unread,
Mar 10, 2021, 12:16:29 PM3/10/21
to syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
On Wed, Mar 10, 2021 at 5:46 PM syzbot
<syzbot+e74b94...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
> dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
> userspace arch: riscv64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e74b94...@syzkaller.appspotmail.com

+riscv maintainers

This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000b74f1b05bd316729%40google.com.

Ben Dooks

unread,
Mar 10, 2021, 5:24:43 PM3/10/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
On 10/03/2021 17:16, Dmitry Vyukov wrote:
> On Wed, Mar 10, 2021 at 5:46 PM syzbot
> <syzbot+e74b94...@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
>> userspace arch: riscv64
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e74b94...@syzkaller.appspotmail.com
>
> +riscv maintainers
>
> This is riscv64-specific.
> I've seen similar crashes in put_user in other places. It looks like
> put_user crashes in the user address is not mapped/protected (?).

The unmapped case should have been handled.

I think this issue is that the check for user-mode access added. From
what I read the code may be wrong in

+ if (!user_mode(regs) && addr < TASK_SIZE &&
+ unlikely(!(regs->status & SR_SUM)))
+ die_kernel_fault("access to user memory without uaccess routines",
+ addr, regs);

I think the SR_SUM check might be wrong, as I read the standard the
SR_SUM should be set to disable user-space access. So the check
should be unlikely(regs->status & SR_SUM) to say access without
having disabled the protection.

Without this, you can end up with an infinite loop in the fault handler.
> _______________________________________________
> linux-riscv mailing list
> linux...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>


--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

Alex Ghiti

unread,
Mar 11, 2021, 1:40:06 AM3/11/21
to Ben Dooks, Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
Hi Ben,

Le 3/10/21 à 5:24 PM, Ben Dooks a écrit :
The check that is done seems correct to me: "The SUM (permit Supervisor
User Memory access) bit modifies the privilege with which S-mode loads
and stores access virtual memory. *When SUM=0, S-mode memory accesses
to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*.
When SUM=1, these accesses are permitted.SUM has no effect when
page-based virtual memory is not in effect".

I will try to reproduce the problem locally.

Thanks,

Alex

Dmitry Vyukov

unread,
Mar 11, 2021, 1:50:29 AM3/11/21
to Alex Ghiti, Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
Weird. It crashes with this all the time:
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69

Even on trivial programs that almost don't do anything.
Maybe it's qemu bug? Do registers look sane in the dump? That SR_SUM, etc.


00:13:27 executing program 1:
openat$drirender128(0xffffffffffffff9c,
&(0x7f0000000040)='/dev/dri/renderD128\x00', 0x0, 0x0)

[ 812.318182][ T4833] Unable to handle kernel access to user memory
without uaccess routines at virtual address 00000000250b60d0
[ 812.322304][ T4833] Oops [#1]
[ 812.323196][ T4833] Modules linked in:
[ 812.324110][ T4833] CPU: 1 PID: 4833 Comm: syz-executor.1 Not
tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
[ 812.325862][ T4833] Hardware name: riscv-virtio,qemu (DT)
[ 812.327561][ T4833] epc : schedule_tail+0x72/0xb2
[ 812.328640][ T4833] ra : schedule_tail+0x70/0xb2
[ 812.330088][ T4833] epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp
: ffffffe0238bbec0
[ 812.331312][ T4833] gp : ffffffe005d25378 tp : ffffffe00a275b00 t0
: 0000000000000000
[ 812.333014][ T4833] t1 : 0000000000000001 t2 : 00000000000f4240 s0
: ffffffe0238bbee0
[ 812.334137][ T4833] s1 : 00000000250b60d0 a0 : 0000000000000036 a1
: 0000000000000003
[ 812.336063][ T4833] a2 : 1ffffffc0cfa8b00 a3 : ffffffe0000c80cc a4
: 7f467e72c6adf800
[ 812.337398][ T4833] a5 : 0000000000000000 a6 : 0000000000f00000 a7
: ffffffe0000f8c84
[ 812.339287][ T4833] s2 : 0000000000040000 s3 : ffffffe0077a96c0 s4
: ffffffe020e67fe0
[ 812.340658][ T4833] s5 : 0000000000004020 s6 : ffffffe0077a9b58 s7
: ffffffe067d74850
[ 812.342492][ T4833] s8 : ffffffe067d73e18 s9 : 0000000000000000
s10: ffffffe00bd72280
[ 812.343668][ T4833] s11: 000000bd067bf638 t3 : 7f467e72c6adf800 t4
: ffffffc403ee7fb2
[ 812.345510][ T4833] t5 : ffffffc403ee7fba t6 : 0000000000040000
[ 812.347004][ T4833] status: 0000000000000120 badaddr:
00000000250b60d0 cause: 000000000000000f
[ 812.348091][ T4833] Call Trace:
[ 812.349291][ T4833] [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
[ 812.350796][ T4833] [<ffffffe000005570>] ret_from_exception+0x0/0x14
[ 812.352799][ T4833] Dumping ftrace buffer:
[ 812.354328][ T4833] (ftrace buffer empty)
[ 812.428145][ T4833] ---[ end trace 94b077e4d677ee73 ]---


00:10:42 executing program 1:
bpf$ENABLE_STATS(0x20, 0x0, 0x0)
bpf$ENABLE_STATS(0x20, 0x0, 0x0)

[ 646.536862][ T5163] loop0: detected capacity change from 0 to 1
[ 646.566730][ T5165] Unable to handle kernel access to user memory
without uaccess routines at virtual address 00000000032f80d0
[ 646.586024][ T5165] Oops [#1]
[ 646.586640][ T5165] Modules linked in:
[ 646.587350][ T5165] CPU: 1 PID: 5165 Comm: syz-executor.1 Not
tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
[ 646.588209][ T5165] Hardware name: riscv-virtio,qemu (DT)
[ 646.589019][ T5165] epc : schedule_tail+0x72/0xb2
[ 646.589811][ T5165] ra : schedule_tail+0x70/0xb2
[ 646.590435][ T5165] epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp
: ffffffe008013ec0
[ 646.591142][ T5165] gp : ffffffe005d25378 tp : ffffffe007634440 t0
: 0000000000000000
[ 646.591836][ T5165] t1 : 0000000000000001 t2 : 0000000000000008 s0
: ffffffe008013ee0
[ 646.592509][ T5165] s1 : 00000000032f80d0 a0 : 0000000000000004 a1
: 0000000000000003
[ 646.593188][ T5165] a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4
: 8d229faaffda9500
[ 646.593878][ T5165] a5 : 0000000000000000 a6 : 0000000000f00000 a7
: ffffffe000082eba
[ 646.594552][ T5165] s2 : 0000000000040000 s3 : ffffffe00c82c440 s4
: ffffffe00e61ffe0
[ 646.595253][ T5165] s5 : 0000000000004000 s6 : ffffffe067d57e00 s7
: ffffffe067d57850
[ 646.595938][ T5165] s8 : ffffffe067d56e18 s9 : ffffffe067d57e00
s10: ffffffe00c82c878
[ 646.596627][ T5165] s11: 000000967ba7a1cc t3 : 8d229faaffda9500 t4
: ffffffc4011bc79b
[ 646.597319][ T5165] t5 : ffffffc4011bc79d t6 : ffffffe008de3ce8
[ 646.597909][ T5165] status: 0000000000000120 badaddr:
00000000032f80d0 cause: 000000000000000f
[ 646.598682][ T5165] Call Trace:
[ 646.599294][ T5165] [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
[ 646.600115][ T5165] [<ffffffe000005570>] ret_from_exception+0x0/0x14
[ 646.601333][ T5165] Dumping ftrace buffer:
[ 646.602322][ T5165] (ftrace buffer empty)
[ 646.663691][ T5165] ---[ end trace e7b7847ce74cdfca ]---

Dmitry Vyukov

unread,
Mar 11, 2021, 1:52:23 AM3/11/21
to Alex Ghiti, Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
Is it reasonable that schedule_tail is called from ret_from_exception?
Maybe the issue is in ret_from_exception? I see it does something with
registers.

Ben Dooks

unread,
Mar 11, 2021, 5:41:51 AM3/11/21
to Dmitry Vyukov, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I'd not noticed this with an earlier kernel (5.10 and the user-fault
check patches) but this may be an qemu issue?

Ben Dooks

unread,
Mar 12, 2021, 8:50:00 AM3/12/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
On 10/03/2021 17:16, Dmitry Vyukov wrote:
> On Wed, Mar 10, 2021 at 5:46 PM syzbot
> <syzbot+e74b94...@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
>> userspace arch: riscv64
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e74b94...@syzkaller.appspotmail.com
>
> +riscv maintainers
>
> This is riscv64-specific.
> I've seen similar crashes in put_user in other places. It looks like
> put_user crashes in the user address is not mapped/protected (?).

I've been having a look, and this seems to be down to access of the
tsk->set_child_tid variable. I assume the fuzzing here is to pass a
bad address to clone?

From looking at the code, the put_user() code should have set the
relevant SR_SUM bit (the value for this, which is 1<<18 is in the
s2 register in the crash report) and from looking at the compiler
output from my gcc-10, the code looks to be dong the relevant csrs
and then csrc around the put_user

So currently I do not understand how the above could have happened
over than something re-tried the code seqeunce and ended up retrying
the faulting instruction without the SR_SUM bit set.

Dmitry Vyukov

unread,
Mar 12, 2021, 10:12:30 AM3/12/21
to Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I would maybe blame qemu for randomly resetting SR_SUM, but it's
strange that 99% of these crashes are in schedule_tail. If it would be
qemu, then they would be more evenly distributed...

Another observation: looking at a dozen of crash logs, in none of
these cases fuzzer was actually trying to fuzz clone with some insane
arguments. So it looks like completely normal clone's (e..g coming
from pthread_create) result in this crash.

I also wonder why there is ret_from_exception, is it normal? I see
handle_exception disables SR_SUM:
https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73

Alex Ghiti

unread,
Mar 12, 2021, 11:26:07 AM3/12/21
to Dmitry Vyukov, Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot


Le 3/12/21 à 10:12 AM, Dmitry Vyukov a écrit :
csrrc does the right thing: it cleans SR_SUM bit in status but saves the
previous value that will get correctly restored.

("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
value of the CSR, zero-extends the value to XLEN bits, and writes it to
integer registerrd. The initial value in integerregisterrs1is treated
as a bit mask that specifies bit positions to be cleared in the CSR. Any
bitthat is high inrs1will cause the corresponding bit to be cleared in
the CSR, if that CSR bit iswritable. Other bits in the CSR are
unaffected.")

> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73

Still no luck for the moment, can't reproduce it locally, my test is
maybe not that good (I created threads all day long in order to trigger
the put_user of schedule_tail).

Given that the path you mention works most of the time, and that the
status register in the stack trace shows the SUM bit is not set whereas
it is set in put_user, I'm leaning toward some race condition (maybe an
interrupt that arrives at the "wrong" time) or a qemu issue as you
mentioned.

To eliminate qemu issues, do you have access to some HW ? Or to
different qemu versions ?

Ben Dooks

unread,
Mar 12, 2021, 11:30:07 AM3/12/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
So I think if SR_SUM is set, then it faults the access to user memory
which the _user() routines clear to allow them access.

I'm thinking there is at least one issue here:

- the test in fault is the wrong way around for die kernel
- the handler only catches this if the page has yet to be mapped.

So I think the test should be:

if (!user_mode(regs) && addr < TASK_SIZE &&
unlikely(regs->status & SR_SUM)

This then should continue on and allow the rest of the handler to
complete mapping the page if it is not there.

I have been trying to create a very simple clone test, but so far it
has yet to actually trigger anything.

Ben Dooks

unread,
Mar 12, 2021, 11:34:26 AM3/12/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I should have added there doesn't seem to be a good way to use mmap()
to allocate memory but not insert a vm-mapping post the mmap().

Ben Dooks

unread,
Mar 12, 2021, 11:36:53 AM3/12/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
How difficult is it to try building a branch with the above test
modified?

Dmitry Vyukov

unread,
Mar 12, 2021, 12:35:00 PM3/12/21
to Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I don't have access to hardware, I don't have other qemu versions ready to use.
But I can teach you how to run syzkaller locally :)
I am not sure anybody run it on real riscv hardware at all. When
Tobias ported syzkaller, Tobias also used qemu I think.

I am now building with an inverted check to test locally.

I don't fully understand but this code, but does handle_exception
reset SR_SUM around do_page_fault? If so, then looking at SR_SUM in
do_page_fault won't work with positive nor negative check.

Dmitry Vyukov

unread,
Mar 12, 2021, 12:39:09 PM3/12/21
to Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
The inverted check crashes during boot:

--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -249,7 +249,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
flags |= FAULT_FLAG_USER;

if (!user_mode(regs) && addr < TASK_SIZE &&
- unlikely(!(regs->status & SR_SUM)))
+ unlikely(regs->status & SR_SUM))
die_kernel_fault("access to user memory without
uaccess routines",
addr, regs);


[ 77.349329][ T1] Run /sbin/init as init process
[ 77.868371][ T1] Unable to handle kernel access to user memory
without uaccess routines at virtual address 00000000000e8e39
[ 77.870355][ T1] Oops [#1]
[ 77.870766][ T1] Modules linked in:
[ 77.871326][ T1] CPU: 0 PID: 1 Comm: init Not tainted
5.12.0-rc2-00010-g0d7588ab9ef9-dirty #42
[ 77.872057][ T1] Hardware name: riscv-virtio,qemu (DT)
[ 77.872620][ T1] epc : __clear_user+0x36/0x4e
[ 77.873285][ T1] ra : padzero+0x9c/0xb0
[ 77.873849][ T1] epc : ffffffe000bb7136 ra : ffffffe0004f42a0 sp
: ffffffe006f8fbc0
[ 77.874438][ T1] gp : ffffffe005d25718 tp : ffffffe006f98000 t0
: 00000000000e8e40
[ 77.875031][ T1] t1 : 00000000000e9000 t2 : 000000000001c49c s0
: ffffffe006f8fbf0
[ 77.875618][ T1] s1 : 00000000000001c7 a0 : 00000000000e8e39 a1
: 00000000000001c7
[ 77.876204][ T1] a2 : 0000000000000002 a3 : 00000000000e9000 a4
: ffffffe006f99000
[ 77.876787][ T1] a5 : 0000000000000000 a6 : 0000000000f00000 a7
: ffffffe00031c088
[ 77.877367][ T1] s2 : 00000000000e8e39 s3 : 0000000000001000 s4
: 0000003ffffffe39
[ 77.877952][ T1] s5 : 00000000000e8e39 s6 : 00000000000e9570 s7
: 00000000000e8e39
[ 77.878535][ T1] s8 : 0000000000000001 s9 : 00000000000e8e39
s10: ffffffe00c65f608
[ 77.879126][ T1] s11: ffffffe00816e8d8 t3 : ea3af0fa372b8300 t4
: 0000000000000003
[ 77.879711][ T1] t5 : ffffffc401dc45d8 t6 : 0000000000040000
[ 77.880209][ T1] status: 0000000000040120 badaddr:
00000000000e8e39 cause: 000000000000000f
[ 77.880846][ T1] Call Trace:
[ 77.881213][ T1] [<ffffffe000bb7136>] __clear_user+0x36/0x4e
[ 77.881912][ T1] [<ffffffe0004f523e>] load_elf_binary+0xf8a/0x2400
[ 77.882562][ T1] [<ffffffe0003e1802>] bprm_execve+0x5b0/0x1080
[ 77.883145][ T1] [<ffffffe0003e38bc>] kernel_execve+0x204/0x288
[ 77.883727][ T1] [<ffffffe003b70e94>] run_init_process+0x1fe/0x212
[ 77.884337][ T1] [<ffffffe003b70ec6>] try_to_run_init_process+0x1e/0x66
[ 77.884956][ T1] [<ffffffe003bc0864>] kernel_init+0x14a/0x200
[ 77.885541][ T1] [<ffffffe000005570>] ret_from_exception+0x0/0x14
[ 77.886955][ T1] ---[ end trace 1e934d07b8a4bed8 ]---
[ 77.887705][ T1] Kernel panic - not syncing: Fatal exception
[ 77.888333][ T1] SMP: stopping secondary CPUs
[ 77.889357][ T1] Rebooting in 86400 seconds..

Ben Dooks

unread,
Mar 12, 2021, 3:12:55 PM3/12/21
to Alex Ghiti, Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I think there may also be an understanding issue on what the SR_SUM
bit does. I thought if it is set, M->U accesses would fault, which is
why it gets set early on. But from reading the uaccess code it looks
like the uaccess code sets it on entry and then clears on exit.

I am very confused. Is there a master reference for rv64?

https://people.eecs.berkeley.edu/~krste/papers/riscv-privileged-v1.9.pdf
seems to state PUM is the SR_SUM bit, and that (if set) disabled

Quote:
The PUM (Protect User Memory) bit modifies the privilege with which
S-mode loads, stores, and instruction fetches access virtual memory.
When PUM=0, translation and protection behave as normal. When PUM=1,
S-mode memory accesses to pages that are accessible by U-mode (U=1 in
Figure 4.19) will fault. PUM has no effect when executing in U-mode


>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73
>>
>
> Still no luck for the moment, can't reproduce it locally, my test is
> maybe not that good (I created threads all day long in order to trigger
> the put_user of schedule_tail).

It may of course depend on memory and other stuff. I did try to see if
it was possible to clone() with the child_tid address being a valid but
not mapped page...

> Given that the path you mention works most of the time, and that the
> status register in the stack trace shows the SUM bit is not set whereas
> it is set in put_user, I'm leaning toward some race condition (maybe an
> interrupt that arrives at the "wrong" time) or a qemu issue as you
> mentioned.

I suppose this is possible. From what I read it should get to the
point of being there with the SUM flag cleared, so either something
went wrong in trying to fix the instruction up or there's some other
error we're missing.

> To eliminate qemu issues, do you have access to some HW ? Or to
> different qemu versions ?

I do have access to a Microchip Polarfire board. I just need the
instructions on how to setup the test-code to make it work on the
hardware.

The config supplied takes /ages/ to boot on qemu even on my ryzen9.

Dmitry Vyukov

unread,
Mar 13, 2021, 2:21:09 AM3/13/21
to Ben Dooks, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
For full syzkaller support, it would need to know how to reboot these
boards and get access to the console.
syzkaller has a stop-gap VM backend which just uses ssh to a physical
machine and expects the kernel to reboot on its own after any crashes.

But I actually managed to reproduce it in an even simpler setup.
Assuming you have Go 1.15 and riscv64 cross-compiler gcc installed

$ go get -u -d github.com/google/syzkaller/...
$ cd $GOPATH/src/github.com/google/syzkaller
$ make stress executor TARGETARCH=riscv64
$ scp bin/linux_riscv64/syz-execprog bin/linux_riscv64/syz-executor
your_machine:/

Then run ./syz-stress on the machine.
On the first run it crashed it with some other bug, on the second run
I got the crash in schedule_tail.
With qemu tcg I also added -slowdown=10 flag to syz-stress to scale
all timeouts, if native execution is faster, then you don't need it.

Ben Dooks

unread,
Mar 15, 2021, 12:55:32 PM3/15/21
to Dmitry Vyukov, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
I have built the tools and got it to start.

It would be helpful for the dashboard to give the qemu version and
how it was launched (memory, cpus etc)

Ben Dooks

unread,
Mar 15, 2021, 5:38:49 PM3/15/21
to Dmitry Vyukov, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
On 13/03/2021 07:20, Dmitry Vyukov wrote:
Ok, not sure what's going on. I get a lot of errors similar to:
>
> 2021/03/15 21:35:20 transitively unsupported: ioctl$SNAPSHOT_CREATE_IMAGE: no syscalls can create resource fd_snapshot, enable some syscalls that can create it [openat$snapshot]

Followed by:

> 2021/03/15 21:35:48 executed 0 programs
> 2021/03/15 21:35:48 failed to create execution environment: failed to mmap shm file: invalid argument

The qemu is 5.2.0 and root is Debian/unstable riscv64 (same as chroot
used to build the syz tools)

Dmitry Vyukov

unread,
Mar 16, 2021, 4:53:09 AM3/16/21
to Ben Dooks, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
This is not an error, just a notification that some syscalls are not
enabled in the kernel and won't be fuzzed.

> Followed by:
>
> > 2021/03/15 21:35:48 executed 0 programs
> > 2021/03/15 21:35:48 failed to create execution environment: failed to mmap shm file: invalid argument
>
> The qemu is 5.2.0 and root is Debian/unstable riscv64 (same as chroot
> used to build the syz tools)

This is an error. But I see it the first time ever.
It comes from here:
https://github.com/google/syzkaller/blob/fdb2bb2c23ee709880407f56307e2800ad27e9ae/pkg/osutil/osutil_unix.go#L119-L121
There should be pretty simple logic inside of syscall.Mmap. Perhaps
you are using some older Go toolchain with incomplete riscv support?
I think I've used 1.14 and 1.15. But there is already 1.16. You can
always download a toolchain here:
https://golang.org/dl/

Ben Dooks

unread,
Mar 16, 2021, 7:35:55 AM3/16/21
to Dmitry Vyukov, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
Hmm it would have been useful to print out what file it failed to map.

I've got go 1.15 from the debian/unstable riscv64 chroot.
I'll have a look at this in a bit to see if it throws the same issue on
a real system.

Dmitry Vyukov

unread,
Mar 16, 2021, 7:44:31 AM3/16/21
to Ben Dooks, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
What do you want to do with the file name? It's not one of
pre-existing files, so the name won't tell the user much. It's just a
temp file, it won't exist afterwards and it's easy to create an
equivalent file.
It was created in that function with:

f, err = ioutil.TempFile("./", "syzkaller-shm")
if err != nil {
err = fmt.Errorf("failed to create temp file: %v", err)
return
}
if err = f.Truncate(int64(size)); err != nil {
err = fmt.Errorf("failed to truncate shm file: %v", err)
f.Close()
os.Remove(f.Name())
return
}
f.Close()
fname := f.Name()
f, err = os.OpenFile(f.Name(), os.O_RDWR, DefaultFilePerm)
if err != nil {
err = fmt.Errorf("failed to open shm file: %v", err)
os.Remove(fname)
return
}

> I've got go 1.15 from the debian/unstable riscv64 chroot.
> I'll have a look at this in a bit to see if it throws the same issue on
> a real system.
>
>
> --
> Ben Dooks http://www.codethink.co.uk/
> Senior Engineer Codethink - Providing Genius
>
> https://www.codethink.co.uk/privacy.html
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/8ebea51d-b03c-e6de-fa1c-d47091c54e45%40codethink.co.uk.

Ben Dooks

unread,
Mar 18, 2021, 5:41:08 AM3/18/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot, Terry Hu, Javier Jardón
I have reproduced this on qemu, not managed to get the real hardwre
working with this branch yet.

I have a working hypothesis now, having added debug to check the
sstatus.SR_SUM flag and reviewed the assembly, I think this is
what is happening:

C code of "put_user(func(), address)" is generating code to do:

1: __enable_user_access();
2: cpu_reg = func();
3: assembly for *address = cpu_reg;
4: __disable_user_access();

I think the call to func() with all the sanitisers enabled allow
the func() to possibly schedule out. The __swtich_to() code does
not restore the original status registers which means that if
there is IO during the sleep SR_SUM may end up being cleared and
never re-set. We get back to 3 and fault as 2 cleared the result of 1.

It is very possible no-one has seen this before as generally the
functions involved in feeding put_user() are fairly small and thus
this system is both under load and has some reason to schedule then
this bug has probably been rare to unseen.

I think the correct solution is to store the SR_SUM bit status in
the thread_struct and make __switch_to() save/restore this when
changing between tasks/threads. Trying to re-order the code to
force swapping of 1 and 2 may reduce the bug's window.

Further thinking of the order of 1 and 2 is that we should probably
fix that order so that func() is not run with the user-space access
protection disabled.

I'll try and make some sort of of small test case to avoid having
to run syz-stress to provoke this.

Dmitry Vyukov

unread,
Mar 18, 2021, 6:05:45 AM3/18/21
to Ben Dooks, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot, Terry Hu, Javier Jardón
Ouch!
Can't preempt kernel schedule at almost any instruction where
preemption is not disabled explicitly? But if it's disabled, then the
instrumented code won't schedule as well, right? I suspect this may be
quite a bad issue for preempt kernels.

Shouldn't __put_user materialize the expression in a local var using
__typeof__ magic before __enable_user_access? I suspect it may
potentially lead to quite bad security implications.

It can also make sense to add checks to schedule to check that it's
not called from unexpected contexts.

Ben Dooks

unread,
Mar 18, 2021, 8:52:24 AM3/18/21
to Dmitry Vyukov, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot, Terry Hu, Javier Jardón
I wrote a kernel thread that does:

#define rd_sstatus() ({ unsigned long result; asm(" csrr %0, sstatus" :
"=r"(result) :: "memory"); result; })


static int test_thread1(void *data)
{
unsigned int cpu = (unsigned int)data;
unsigned long status;

pr_info("%s: thread starting on cpu %d\n", __func__, cpu);

while (!kthread_should_stop()) {
status = rd_sstatus();
if (status & SR_SUM)
printk_ratelimited("%s: found sstaus=0x%08lx\n",
__func__, status);
msleep(1);
}

pr_info("%s: thread exiting\n", __func__);
return 0;
}

And under the syz-stress I have the following

[ 1192.124018] test_thread1: found sstaus=0x00040022

this thread does not do any IO operations yet during a stress run
it got entered with SR_SUM set (the 0x00040000) in the sstatus
field.

I think this is proof that #1 this is /rare/ and #2 we need to
make __switch_user save at-least the SR_SUM field.

Dmitry Vyukov

unread,
Mar 18, 2021, 10:34:43 AM3/18/21
to Ben Dooks, Alex Ghiti, syzbot, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Daniel Bristot de Oliveira, Benjamin Segall, dietmar....@arm.com, Juri Lelli, LKML, Mel Gorman, Ingo Molnar, Peter Zijlstra, Steven Rostedt, syzkaller-bugs, Vincent Guittot
Hi Ben,

syzbot will show info about qemu version/args in "VM info" column then
this commit is deployed (should happen by tomorrow);
https://github.com/google/syzkaller/commit/4a3131941837f1fab321bcdfcac13ac4fb480684
Reply all
Reply to author
Forward
0 new messages