[syzbot] [mm?] BUG: unable to handle kernel paging request in copy_from_kernel_nofault

16 views
Skip to first unread message

syzbot

unread,
Nov 19, 2023, 12:53:26 PM11/19/23
to ak...@linux-foundation.org, b...@alien8.de, b...@suse.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, linu...@kvack.org, lu...@kernel.org, mi...@redhat.com, net...@vger.kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: 1fda5bb66ad8 bpf: Do not allocate percpu memory at init st..
git tree: bpf
console+strace: https://syzkaller.appspot.com/x/log.txt?x=12d99420e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=2ae0ccd6bfde5eb0
dashboard link: https://syzkaller.appspot.com/bug?extid=72aa0161922eba61b50e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dff22f680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1027dc70e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/3e24d257ce8d/disk-1fda5bb6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/eaa9caffb0e4/vmlinux-1fda5bb6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/16182bbed726/bzImage-1fda5bb6.xz

The issue was bisected to:

commit ca247283781d754216395a41c5e8be8ec79a5f1c
Author: Andy Lutomirski <lu...@kernel.org>
Date: Wed Feb 10 02:33:45 2021 +0000

x86/fault: Don't run fixups for SMAP violations

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=103d92db680000
final oops: https://syzkaller.appspot.com/x/report.txt?x=123d92db680000
console output: https://syzkaller.appspot.com/x/log.txt?x=143d92db680000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+72aa01...@syzkaller.appspotmail.com
Fixes: ca247283781d ("x86/fault: Don't run fixups for SMAP violations")

BUG: unable to handle page fault for address: ffffffffff600000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD cd7a067 P4D cd7a067 PUD cd7c067 PMD cd9f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5071 Comm: syz-executor322 Not tainted 6.6.0-syzkaller-15867-g1fda5bb66ad8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:copy_from_kernel_nofault mm/maccess.c:36 [inline]
RIP: 0010:copy_from_kernel_nofault+0x86/0x240 mm/maccess.c:24
Code: ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ab 01 00 00 41 83 85 6c 17 00 00 01 eb 1e e8 ba 23 cf ff <48> 8b 45 00 49 89 04 24 48 83 c5 08 49 83 c4 08 48 83 eb 08 e8 a1
RSP: 0018:ffffc900038d7ae8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffff81b8690c
RDX: ffff888016ab0000 RSI: ffffffff81b868e6 RDI: 0000000000000007
RBP: ffffffffff600000 R08: 0000000000000007 R09: 0000000000000007
R10: 0000000000000008 R11: 0000000000000001 R12: ffffc900038d7b30
R13: ffff888016ab0000 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600000 CR3: 000000000cd77000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bpf_probe_read_kernel_common include/linux/bpf.h:2747 [inline]
____bpf_probe_read_kernel kernel/trace/bpf_trace.c:236 [inline]
bpf_probe_read_kernel+0x26/0x70 kernel/trace/bpf_trace.c:233
bpf_prog_bd8b22826c103b08+0x42/0x44
bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline]
__bpf_prog_run include/linux/filter.h:651 [inline]
bpf_prog_run include/linux/filter.h:658 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2307 [inline]
bpf_trace_run2+0x14e/0x410 kernel/trace/bpf_trace.c:2346
trace_kfree include/trace/events/kmem.h:94 [inline]
kfree+0xec/0x150 mm/slab_common.c:1043
vma_numab_state_free include/linux/mm.h:638 [inline]
__vm_area_free+0x3e/0x140 kernel/fork.c:525
remove_vma+0x128/0x170 mm/mmap.c:146
exit_mmap+0x453/0xa70 mm/mmap.c:3332
__mmput+0x12a/0x4d0 kernel/fork.c:1349
mmput+0x62/0x70 kernel/fork.c:1371
exit_mm kernel/exit.c:567 [inline]
do_exit+0x9ad/0x2ae0 kernel/exit.c:858
do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1030
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fe1c24c2dc9
Code: Unable to access opcode bytes at 0x7fe1c24c2d9f.
RSP: 002b:00007ffd4d4b8dc8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe1c24c2dc9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007fe1c253e290 R08: ffffffffffffffb8 R09: 0000000000000006
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe1c253e290
R13: 0000000000000000 R14: 00007fe1c253ece0 R15: 00007fe1c2494030
</TASK>
Modules linked in:
CR2: ffffffffff600000
---[ end trace 0000000000000000 ]---
RIP: 0010:copy_from_kernel_nofault mm/maccess.c:36 [inline]
RIP: 0010:copy_from_kernel_nofault+0x86/0x240 mm/maccess.c:24
Code: ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ab 01 00 00 41 83 85 6c 17 00 00 01 eb 1e e8 ba 23 cf ff <48> 8b 45 00 49 89 04 24 48 83 c5 08 49 83 c4 08 48 83 eb 08 e8 a1
RSP: 0018:ffffc900038d7ae8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffff81b8690c
RDX: ffff888016ab0000 RSI: ffffffff81b868e6 RDI: 0000000000000007
RBP: ffffffffff600000 R08: 0000000000000007 R09: 0000000000000007
R10: 0000000000000008 R11: 0000000000000001 R12: ffffc900038d7b30
R13: ffff888016ab0000 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600000 CR3: 000000000cd77000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 03 0f add (%rdi),%ecx
2: b6 14 mov $0x14,%dh
4: 02 48 89 add -0x77(%rax),%cl
7: f8 clc
8: 83 e0 07 and $0x7,%eax
b: 83 c0 03 add $0x3,%eax
e: 38 d0 cmp %dl,%al
10: 7c 08 jl 0x1a
12: 84 d2 test %dl,%dl
14: 0f 85 ab 01 00 00 jne 0x1c5
1a: 41 83 85 6c 17 00 00 addl $0x1,0x176c(%r13)
21: 01
22: eb 1e jmp 0x42
24: e8 ba 23 cf ff call 0xffcf23e3
* 29: 48 8b 45 00 mov 0x0(%rbp),%rax <-- trapping instruction
2d: 49 89 04 24 mov %rax,(%r12)
31: 48 83 c5 08 add $0x8,%rbp
35: 49 83 c4 08 add $0x8,%r12
39: 48 83 eb 08 sub $0x8,%rbx
3d: e8 .byte 0xe8
3e: a1 .byte 0xa1


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Thomas Gleixner

unread,
Nov 21, 2023, 12:13:40 PM11/21/23
to syzbot, ak...@linux-foundation.org, b...@alien8.de, b...@suse.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, linu...@kvack.org, lu...@kernel.org, mi...@redhat.com, net...@vger.kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, x...@kernel.org
On Sun, Nov 19 2023 at 09:53, syzbot wrote:
> HEAD commit: 1fda5bb66ad8 bpf: Do not allocate percpu memory at init st..
> git tree: bpf
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=12d99420e80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2ae0ccd6bfde5eb0
> dashboard link: https://syzkaller.appspot.com/bug?extid=72aa0161922eba61b50e
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dff22f680000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1027dc70e80000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/3e24d257ce8d/disk-1fda5bb6.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/eaa9caffb0e4/vmlinux-1fda5bb6.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/16182bbed726/bzImage-1fda5bb6.xz
>
> The issue was bisected to:
>
> commit ca247283781d754216395a41c5e8be8ec79a5f1c
> Author: Andy Lutomirski <lu...@kernel.org>
> Date: Wed Feb 10 02:33:45 2021 +0000
>
> x86/fault: Don't run fixups for SMAP violations

Reverting that makes the Ooops go away, but wrongly so.

> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=103d92db680000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=123d92db680000
> console output: https://syzkaller.appspot.com/x/log.txt?x=143d92db680000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+72aa01...@syzkaller.appspotmail.com
> Fixes: ca247283781d ("x86/fault: Don't run fixups for SMAP violations")
>
> BUG: unable to handle page fault for address: ffffffffff600000

This is VSYSCALL_ADDR.

So the real question is why the BPF program tries to copy from the
VSYSCALL page, which is not mapped.

Thanks,

tglx

Jann Horn

unread,
Dec 8, 2023, 9:12:18 AM12/8/23
to Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann, John Fastabend, bpf, syzbot, ak...@linux-foundation.org, b...@alien8.de, b...@suse.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, linu...@kvack.org, lu...@kernel.org, mi...@redhat.com, net...@vger.kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, x...@kernel.org
The linked syz repro is:

r0 = bpf$PROG_LOAD(0x5, &(0x7f00000000c0)={0x11, 0xb,
&(0x7f0000000180)=@framed={{}, [@printk={@integer, {}, {}, {}, {},
{0x7, 0x0, 0xb, 0x3, 0x0, 0x0, 0xff600000}, {0x85, 0x0, 0x0, 0x71}}]},
&(0x7f0000000200)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
0x90)
bpf$BPF_RAW_TRACEPOINT_OPEN(0x11,
&(0x7f0000000540)={&(0x7f0000000000)='kfree\x00', r0}, 0x10)

So syzkaller generated a BPF tracing program. 0x85 is BPF_JMP |
BPF_CALL, which is used to invoke BPF helpers; 0x71 is 113, which is
the number of the probe_read_kernel helper, which basically takes
arbitrary values as input and casts them to kernel pointers, and then
probe-reads them. And before that is some kinda ALU op with 0xff600000
as immediate.

So it looks like the answer to that question is "the BPF program tries
to copy from the VSYSCALL page because syzkaller decided to write BPF
code that does specifically that, and the BPF helper let it do that".

copy_from_kernel_nofault() does check
copy_from_kernel_nofault_allowed() to make sure the pointer really is
a kernel pointer, and the X86 version of that rejects anything in the
userspace part of the address space. But it does not know about the
vsyscall area.

Thomas Gleixner

unread,
Dec 8, 2023, 4:01:19 PM12/8/23
to Jann Horn, Alexei Starovoitov, Daniel Borkmann, John Fastabend, bpf, syzbot, ak...@linux-foundation.org, b...@alien8.de, b...@suse.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, linu...@kvack.org, lu...@kernel.org, mi...@redhat.com, net...@vger.kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, x...@kernel.org
Indeed.

> copy_from_kernel_nofault() does check
> copy_from_kernel_nofault_allowed() to make sure the pointer really is
> a kernel pointer, and the X86 version of that rejects anything in the
> userspace part of the address space. But it does not know about the
> vsyscall area.

That's cureable. Untested fix below.

Thanks for the explanation!

tglx

---
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 6993f026adec..8e846833aa37 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -3,6 +3,8 @@
#include <linux/uaccess.h>
#include <linux/kernel.h>

+#include <uapi/asm/vsyscall.h>
+
#ifdef CONFIG_X86_64
bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
@@ -15,6 +17,9 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
if (vaddr < TASK_SIZE_MAX + PAGE_SIZE)
return false;

+ if ((vaddr & PAGE_MASK) == VSYSCALL_ADDR)
+ return false;
+
/*
* Allow everything during early boot before 'x86_virt_bits'
* is initialized. Needed for instruction decoding in early

Hou Tao

unread,
Dec 21, 2023, 12:20:42 PM12/21/23
to Thomas Gleixner, bpf, syzbot, ak...@linux-foundation.org, b...@alien8.de, b...@suse.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, linu...@kvack.org, lu...@kernel.org, mi...@redhat.com, net...@vger.kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, x...@kernel.org, Jann Horn, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Hi Thomas,

On 12/9/2023 5:01 AM, Thomas Gleixner wrote:
> diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
> index 6993f026adec..8e846833aa37 100644
> --- a/arch/x86/mm/maccess.c
> +++ b/arch/x86/mm/maccess.c
> @@ -3,6 +3,8 @@
> #include <linux/uaccess.h>
> #include <linux/kernel.h>
>
> +#include <uapi/asm/vsyscall.h>
> +
> #ifdef CONFIG_X86_64
> bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
> {
> @@ -15,6 +17,9 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
> if (vaddr < TASK_SIZE_MAX + PAGE_SIZE)
> return false;
>
> + if ((vaddr & PAGE_MASK) == VSYSCALL_ADDR)
> + return false;
> +
> /*
> * Allow everything during early boot before 'x86_virt_bits'
> * is initialized. Needed for instruction decoding in early

Tested-by: Hou Tao <hou...@huawei.com>

Could you please post a formal patch for the fix ? The patch fixes the
oops when using bpf_probe_read_kernel() or similar bpf helpers [1] to
read from vsyscall address and you can take my tested-by tag if it is
necessary.

[1]:
https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33L...@mail.gmail.com/

Reply all
Reply to author
Forward
0 new messages