BUG: unable to handle kernel paging request in __switch

syzbot

unread,

Dec 3, 2017, 12:49:03 PM12/3/17

to b...@suse.de, dsaf...@virtuozzo.com, h...@zytor.com, linux-...@vger.kernel.org, lu...@kernel.org, m...@kylehuey.com, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org

Hello,

syzkaller hit the following crash on
d127129e85a020879f334154300ddd3f7ec21c1e
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.

*** Guest State ***
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
IP: __switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
PGD 5e28067 P4D 5e28067 PUD 5e2a067 PMD 0
Oops: 0002 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4355 Comm: syz-executor1 Not tainted 4.15.0-rc1-next-20171129+
#55
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
task: ffff8801cf1e80c0 task.stack: ffff8801d03a8000
RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535
[inline]
RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
RSP: 0018:ffff8801cb867468 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8801cc0b8500 RCX: ffff8801cc0b9a00
RDX: 1ffff10039e3d2d0 RSI: 0000000000000000 RDI: ffff8801cf1e96c0
RBP: ffff8801cb867628 R08: ffff8801db427918 R09: 1ffff1003a075dfe
R10: ffff8801cf1e80c0 R11: 0000000000000003 R12: ffff8801cf1e80c0
R13: ffff8801cf1e96c0 R14: ffff8801cf1e9680 R15: ffff8801cf1e95c0
FS: 00007f16e6ea0700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff8 CR3: 00000001cc778000 CR4: 00000000001426f0
Call Trace:
Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03
0f 8e d5 06 00 00 8b 85 70 fe ff ff 41 89 84 24 c0 15 00 00 <cc> 1f 44 00
00 65 8b 05 99 01 dc 7e 89 c0 48 0f a3 05 df 97 39
RIP: switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
RSP: ffff8801cb867468
RIP: __switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407 RSP:
ffff8801cb867468
CR2: fffffffffffffff8
---[ end trace 6254ce9c3b92dfb6 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.
Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>

syzbot will keep track of this bug report.
Once a fix for this bug is committed, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

config.txt

raw.log

Thomas Gleixner

unread,

Dec 14, 2017, 12:12:45 PM12/14/17

to syzbot, b...@suse.de, dsaf...@virtuozzo.com, h...@zytor.com, linux-...@vger.kernel.org, lu...@kernel.org, m...@kylehuey.com, mi...@redhat.com, syzkall...@googlegroups.com, x...@kernel.org

<cc> is an int3 !?!?!

> 8b 05 99 01 dc 7e 89 c0 48 0f a3 05 df 97 39

That's the second report I'm staring at today which has CR2
fffffffffffffffx and points to a faulting instruction which does not make
any sense at all.

Thanks,

tglx

Linus Torvalds

unread,

Dec 14, 2017, 1:42:10 PM12/14/17

to Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Andrew Lutomirski, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 9:12 AM, Thomas Gleixner <tg...@linutronix.de> wrote:
> On Sun, 3 Dec 2017, syzbot wrote:
>> BUG: unable to handle kernel paging request at fffffffffffffff8

>> Oops: 0002 [#1] SMP KASAN

System write of a non-existent page.

>> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
>> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407

This says it's

old_fpu->last_cpu = cpu;

and the code disassembly ends up looking something like this:

0: 48 c1 ea 03 shr $0x3,%rdx
4: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
8: 84 c0 test %al,%al
a: 74 08 je 0x14
c: 3c 03 cmp $0x3,%al
e: 0f 8e d5 06 00 00 jle 0x6e9
14: 8b 85 70 fe ff ff mov -0x190(%rbp),%eax
1a: 41 89 84 24 c0 15 00 mov %eax,0x15c0(%r12)
21: 00
22:* cc int3 <-- trapping instruction

where that preceding two "mov" instructions look like it might indeed be that

old_fpu->last_cpu = cpu;

thing, and the register state doesn't look insane for this.

So I think the RIP->line encoding is slightly off, and that "int3" is
almost certainly due to the very next thing after the write:

trace_x86_fpu_regs_deactivated(old_fpu);

and that actually makes sense if the test robot is doing some tracing,
particularly if it's just about to _start_ tracing, and it has
replaced the first byte of the instruction with 'int3' and is in the
process of doing the rewrite.

The fact that it then takes a system write fault is because some GDT
or IDT setup is screwed up. Or possibly the stack is screwed up and
started out as 0, and then the push to the stack would decrement the
stack pointer and try to push the error state or something.

> That's the second report I'm staring at today which has CR2
> fffffffffffffffx and points to a faulting instruction which does not make
> any sense at all.

That actually does make sense - see above. It just requires that race
with the instruction rewriting.

*Normally* we never actually take the "int3" exception, because
normally we'll have completed the rewrite before another CPU actually
executes the instruction that is being rewritten.

So I'm assuming this is with the page table isolation, and some
unusual case in exception handling got screwed up.

Linus

Andy Lutomirski

unread,

Dec 14, 2017, 1:55:09 PM12/14/17

to Linus Torvalds, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Andrew Lutomirski, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

SDM time. Assuming the CPU actually decoded int3 and tried to execute
it, I can see a couple possible outcomes:

1. Something's wrong with the IDT and it can't read the vector. I
think this would end up triple-faulting, though.

2. It actually tries to handle the breakpoint. A breakpoint is a
benign exception, so any exception encountered while delivering it
would result in serial delivery. I've never thought that serial
delivery made any sense -- presumably it just cancels the breakpoint
and delivers the other exception. So this *could* be a page fault hit
during delivery of the int3 exception. I don't believe it's a GDT
problem, though, because that would also likely lead to a triple
fault. What I *would* believe is that the IST table got messed up and
we're seeing the result of trying to push to the stack with the
initial RSP=0 so the fault hits at address -8.

I have no idea how that would happen, though. Especially since int3
from userspace would have exactly the same problem, and we exercise
that code in the selftests.

Linus Torvalds

unread,

Dec 14, 2017, 2:28:20 PM12/14/17

to Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <lu...@kernel.org> wrote:
>
> 2. It actually tries to handle the breakpoint. A breakpoint is a
> benign exception, so any exception encountered while delivering it
> would result in serial delivery.

I don't think that's the case. "int3" is entirely synchronous, and
doesn't have the same odd issues as a breakpoint trap (which honors RF
etc). It's literally just a one-byte shorthand for "int $3".

There should be no serial delivery, although obviously if it's a trap
gate (as opposed to an interrupt gate), you can get a normal external
interrupt on the first instruction of the exception handler.

But that's not what the oops says: it says it happens on the "int3" instruction.

Now, it is possible that the "int3" was written _after_ the CPU took a
real page fault on the original instruction, and that the original
instruction actually caused a perfectly normal page fault, and then we
just report the "int3" because another CPU overwrote the instruction
after the original instruction had already trapped.

But that makes very little sense either. I really do think it's the
"int3" itself that causes the page fault due to some IDT/GDT change.
Because that would actually make sense considering what has changed in
the tree that Thomas is running.

Plus I think the instruction that gets overwritten is just a 5-byte
nop isn't it? So it really shouldn't take a fault without the "int3"
overwriting.

[ Goes back to the original report ]

Yeah, so looking back at the "Code:" line, the faulting instruction
looked like this:

<cc> 1f 44 00 00

and a P6_NOP5 is

#define P6_NOP5 0x0f,0x1f,0x44,0x00,0

so it's definitely "first byte of a 5-byte nop has been overwritten
with a 'int3' instruction". The nop does not fault on its own.

Linus

Andy Lutomirski

unread,

Dec 14, 2017, 4:27:59 PM12/14/17

to Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <lu...@kernel.org> wrote:
>>
>> 2. It actually tries to handle the breakpoint. A breakpoint is a
>> benign exception, so any exception encountered while delivering it
>> would result in serial delivery.
>
> I don't think that's the case. "int3" is entirely synchronous, and
> doesn't have the same odd issues as a breakpoint trap (which honors RF
> etc). It's literally just a one-byte shorthand for "int $3".
>

The SDM says precisely the same thing about INT N, so, whichever way
you dice it, int3 is a benign exception.

> There should be no serial delivery, although obviously if it's a trap
> gate (as opposed to an interrupt gate), you can get a normal external
> interrupt on the first instruction of the exception handler.
>
> But that's not what the oops says: it says it happens on the "int3" instruction.
>
> Now, it is possible that the "int3" was written _after_ the CPU took a
> real page fault on the original instruction, and that the original
> instruction actually caused a perfectly normal page fault, and then we
> just report the "int3" because another CPU overwrote the instruction
> after the original instruction had already trapped.
>
> But that makes very little sense either. I really do think it's the
> "int3" itself that causes the page fault due to some IDT/GDT change.
> Because that would actually make sense considering what has changed in
> the tree that Thomas is running.

I still have trouble figuring what IDT or GDT error would cause a page
fault and not a double-fault or triple-fault. So I like my
bogus-IST-in-the-TSS theory more, even if I have no idea how it would
happen. Entry stack underflow? Overflow of whatever is mapped just
above the TSS in that kernel? Some kind of fuckup where ioperm()
overwrote the IST? (I tested that, but who knows? This is a fuzz
test, after all.)

0xfffffffffffffff8 is *exactly* where the fault would be if the
microcoded push of SS faulted if the IST contained zeros.

Hmm. There is another way that could happen. If the IDT ended up
with the wrong IST entry, we could get the same failure. But I don't
see how that would happen either.

Maybe it's the bloody debug_idt thing blowing up?

>
> Plus I think the instruction that gets overwritten is just a 5-byte
> nop isn't it? So it really shouldn't take a fault without the "int3"
> overwriting.

Unless it was being overwritten the other way and the oops hit while
tracing was being turned *off*.

--Andy

Linus Torvalds

unread,

Dec 14, 2017, 4:39:06 PM12/14/17

to Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <lu...@kernel.org> wrote:
> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
>> I don't think that's the case. "int3" is entirely synchronous, and
>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>> etc). It's literally just a one-byte shorthand for "int $3".
>
> The SDM says precisely the same thing about INT N, so, whichever way
> you dice it, int3 is a benign exception.

That just means that it doesn't double-fault when it takes the page fault.

Which we already know, because we see a page fault, not a double fault.

> 0xfffffffffffffff8 is *exactly* where the fault would be if the
> microcoded push of SS faulted if the IST contained zeros.

Yes, I suspect it's the stack that is buggered for some reason.

>> Plus I think the instruction that gets overwritten is just a 5-byte
>> nop isn't it? So it really shouldn't take a fault without the "int3"
>> overwriting.
>
> Unless it was being overwritten the other way and the oops hit while
> tracing was being turned *off*.

Doesn't really matter. The two forms of that instruction are "5-byte
nop" and "unconditional branch".

Neither of them will write to anything - the only page fault they
could take is for instruction fetch.

So it really must be the "int3" that fails. Unless we're looking at
some odd CPU errata, which sounds very very unlikely.

Linus

Dmitry Vyukov

unread,

Dec 15, 2017, 4:08:15 AM12/15/17

to Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

FTR the commit is:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <s...@canb.auug.org.au>
Date: Wed Nov 29 14:09:56 2017 +1100
Add linux-next specific files for 20171129

You can get it from
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
Config was attached.

I've built this exact kernel and here is __switch_to disasm:
https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt

__switch_to+0x95b seems to point to (?):

ffffffff81252f6b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

which is branch target alignment nop.

We have a bunch of semi-similar non-sense crashes on syzbot:

https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ

Lots of them are on 0xfffffffffffffff8 address.

I have some suspicion towards KVM. Potentially a nested KVM messed
host processor state (CRn or page tables) so that then we get these
weird crashes.

One question: how would triple-fault look like? I am asking because we
have hundreds of cases where kernel just starts silently rebooting
while running some unprivileged syscalls:
https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
Can these be triple faults? Reproducer for that one also seems to be
related to KVM.

Dmitry Vyukov

unread,

Dec 15, 2017, 4:14:01 AM12/15/17

to Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

Well, actually replying log for this crash and for
https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
with:

./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt
(you can find exact instructions on how to do this here
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md)

I've got:

[ 121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22
[ 121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called
without request
[ 121.559744] binder: 3857 RLIMIT_NICE not set
[ 121.586339] binder: 3857 RLIMIT_NICE not set
[ 121.591764] binder: 3856:3857 unknown command 1400526783
[ 121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22
[ 121.598292] binder: 3857 RLIMIT_NICE not set
[ 121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14
[ 121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match
[ 121.622181] binder: 3856:3857 got reply transaction with no transaction stack
[ 121.626345] binder: 3856:3857 transaction failed 29201/-71, size
72-56 line 2747
[ 121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14
[ 121.635620] binder: unexpected work type, 4, not freed
[ 121.639753] binder: undelivered TRANSACTION_COMPLETE
[ 121.645213] binder: undelivered TRANSACTION_ERROR: 29201
[ 121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match
[ 121.667216] *** Guest State ***
[ 121.667728] CR0: actual=0x0000000000000030,
shadow=0x0000000060000010, gh_mask=fffffffffffffff7
early console in extract_kernel
input_data: 0x0000000005f13276
input_len: 0x0000000001e7fa4c
output: 0x0000000001000000
output_len: 0x0000000005c85958
kernel_total_size: 0x0000000006db2000

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Linux version 4.15.0-rc1-next-20171129
(dvy...@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620
(GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017
[ 0.000000] Command line: kvm-intel.nested=1
kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
kvm-intel.flexpriority=1 kvm-intel.vpid=1
kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
panic_on_warn=1 panic=86400
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is
832 bytes, using 'standard' format.
[ 0.000000] e820: BIOS-provided physical RAM map:
...

Dmitry Vyukov

unread,

Dec 15, 2017, 4:38:44 AM12/15/17

to Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, tiany...@intel.com, James Mattson, Wanpeng Li, David Hildenbrand

Well, the crash was minimized down to:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#define _GNU_SOURCE
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main()
{
int fd = open("/dev/kvm", 0x80102ul);
int vm = ioctl(fd, KVM_CREATE_VM, 0);
int cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
ioctl(cpu, KVM_RUN, 0);
return 0;
}

And, yes, this in fact triggers instant reboot of kernel (running in qemu).
Am I missing something here?

+kvm maintainers, you can see full thread here:
https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw

Wanpeng Li

unread,

Dec 15, 2017, 4:40:37 AM12/15/17

to Dmitry Vyukov, Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson, David Hildenbrand

I will have a try.

Regards,
Wanpeng Li

Thomas Gleixner

unread,

Dec 15, 2017, 4:49:56 AM12/15/17

to Dmitry Vyukov, Linus Torvalds, Andy Lutomirski, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers

On Fri, 15 Dec 2017, Dmitry Vyukov wrote:
> I've built this exact kernel and here is __switch_to disasm:
> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
>
> __switch_to+0x95b seems to point to (?):
>
> ffffffff81252f6b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>
> which is branch target alignment nop.

Which is a place holder for a trace point as Linus pointed out and the
'faulting' instruction which is int3 shows that there is a tracepoint
install/remove in progress. Are your test cases fiddling with tracepoints?

Thanks,

tglx

David Hildenbrand

unread,

Dec 15, 2017, 4:52:03 AM12/15/17

to Dmitry Vyukov, Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, tiany...@intel.com, James Mattson, Wanpeng Li

> int main()
> {
> int fd = open("/dev/kvm", 0x80102ul);
> int vm = ioctl(fd, KVM_CREATE_VM, 0);
> int cpu = ioctl(vm, KVM_CREATE_VCPU, 4);

Not even a memory region :) So maybe the first memory access directly
triggers a fault?

> ioctl(cpu, KVM_RUN, 0);
> return 0;
> }
>
> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> Am I missing something here?
>
> +kvm maintainers, you can see full thread here:
> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>

--

Thanks,

David / dhildenb

Wanpeng Li

unread,

Dec 15, 2017, 4:58:43 AM12/15/17

to David Hildenbrand, Dmitry Vyukov, Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson

I didn't see any issue after running the test.

Regards,
Wanpeng Li

Dmitry Vyukov

unread,

Dec 15, 2017, 5:03:00 AM12/15/17

to Wanpeng Li, David Hildenbrand, Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson

Yes, it's strange. But I can reproduce it. There must be something
different in our setups.
Here is how to build exact same kernel:
https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ

Here is how I start qemu:

qemu-system-x86_64 -hda wheezy.img -net
user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
arch/x86/boot/bzImage -append "kvm-intel.nested=1

kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
kvm-intel.flexpriority=1 kvm-intel.vpid=1
kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic

panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
-cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all

The image is here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce

Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3

Andy Lutomirski

unread,

Dec 15, 2017, 11:16:41 AM12/15/17

to Dmitry Vyukov, Wanpeng Li, David Hildenbrand, Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson

Looking more closely, you seem to be testing this:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <s...@canb.auug.org.au>
Date: Wed Nov 29 14:09:56 2017 +1100
Add linux-next specific files for 20171129

which is almost certainly missing this fix:

https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d...@kernel.org

on account of the fix being sent the day after the tag.

The symptoms you're seeing are definitely consistent with a screwed up
TSS after VM exit.

Ingo Molnar

unread,

Dec 15, 2017, 11:45:02 AM12/15/17

to Andy Lutomirski, Dmitry Vyukov, Wanpeng Li, David Hildenbrand, Linus Torvalds, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson

Note that this should all be fixed in WIP.x86/pti.

If you have:

5ed1fcd523b9: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss

then you should be fine.

Thanks,

Ingo

Dmitry Vyukov

unread,

Dec 19, 2017, 6:49:03 AM12/19/17

to Ingo Molnar, Andy Lutomirski, Wanpeng Li, David Hildenbrand, Linus Torvalds, Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar, syzkall...@googlegroups.com, the arch/x86 maintainers, Paolo Bonzini, Radim Krčmář, KVM list, Lan, Tianyu, James Mattson

Let's tell syzbot about the fix:

#syz fix:

Reply all

Reply to author

Forward

BUG: unable to handle kernel paging request in __switch_to

syzbot

Thomas Gleixner

Linus Torvalds

Andy Lutomirski

Linus Torvalds

Andy Lutomirski

Linus Torvalds

Dmitry Vyukov

Dmitry Vyukov

Dmitry Vyukov

Wanpeng Li

Thomas Gleixner

David Hildenbrand

Wanpeng Li

Dmitry Vyukov

Andy Lutomirski

Ingo Molnar

Dmitry Vyukov