x86: bad pte in pageattr_test

71 views
Skip to first unread message

Dmitry Vyukov

unread,
Apr 11, 2016, 4:28:59 AM4/11/16
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
Hello,

I've got the following WARNING while running syzkaller fuzzer:

CPA ffff880054118000: bad pte after revert 8000000054118363
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1503 at arch/x86/mm/pageattr-test.c:226
pageattr_test+0xa6c/0xd10
NOT PASSED. Please report.
Modules linked in:
CPU: 2 PID: 1503 Comm: pageattr-test Not tainted 4.6.0-rc2+ #346
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff87eb25c0 ffff88003b627a70 ffffffff82c8b17f ffffffff81490b58
fffffbfff0fd64b8 ffff88003b627ae8 0000000000000000 ffffffff86a77e00
ffffffff8129487c 0000000000000009 ffff88003b627ab8 ffffffff8136639f
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82c8b17f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
[<ffffffff8136639f>] __warn+0x19f/0x1e0 kernel/panic.c:512
[<ffffffff8136648c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:527
[<ffffffff8129487c>] pageattr_test+0xa6c/0xd10 arch/x86/mm/pageattr-test.c:226
[<ffffffff81294b3b>] do_pageattr_test+0x1b/0x60 arch/x86/mm/pageattr-test.c:240
[<ffffffff813cde7f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
[<ffffffff867b6252>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
---[ end trace e669a8d69b836be8 ]---

Unfortunately it is not reproducible, tried bumping checking frequency to 1.

On commit 541d8f4d59d79f5d37c8c726f723d42ff307db57 (Apr 5).

For the repcord, full syzkaller log:
https://gist.githubusercontent.com/dvyukov/323ff7275c5ac38156cb40caeacac057/raw/0836f8dd81024e441f81caebdcb73ca1221aef97/gistfile1.txt

Andrey Ryabinin

unread,
Apr 11, 2016, 4:52:24 AM4/11/16
to Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin


On 04/11/2016 11:28 AM, Dmitry Vyukov wrote:
> Hello,
>
> I've got the following WARNING while running syzkaller fuzzer:
>
> CPA ffff880054118000: bad pte after revert 8000000054118363
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 1503 at arch/x86/mm/pageattr-test.c:226
> pageattr_test+0xa6c/0xd10
> NOT PASSED. Please report.
> Modules linked in:
> CPU: 2 PID: 1503 Comm: pageattr-test Not tainted 4.6.0-rc2+ #346
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> ffffffff87eb25c0 ffff88003b627a70 ffffffff82c8b17f ffffffff81490b58
> fffffbfff0fd64b8 ffff88003b627ae8 0000000000000000 ffffffff86a77e00
> ffffffff8129487c 0000000000000009 ffff88003b627ab8 ffffffff8136639f
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff82c8b17f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
> [<ffffffff8136639f>] __warn+0x19f/0x1e0 kernel/panic.c:512
> [<ffffffff8136648c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:527
> [<ffffffff8129487c>] pageattr_test+0xa6c/0xd10 arch/x86/mm/pageattr-test.c:226
> [<ffffffff81294b3b>] do_pageattr_test+0x1b/0x60 arch/x86/mm/pageattr-test.c:240
> [<ffffffff813cde7f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303

It seems, that your script is buggy. It should be kthread() from kernel/kthread.c here.

Dmitry Vyukov

unread,
Apr 11, 2016, 5:03:21 AM4/11/16
to Andrey Ryabinin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
I probably used a non-matching vmlinux for symbolization. Please
check the raw report below if the symbolized one does not make sense.

\/\/\/\/\/\/\/\/

Andrey Ryabinin

unread,
Apr 11, 2016, 5:32:10 AM4/11/16
to Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
No, it's a bug in your script. To find out source location, it uses 'function_name + offset' instead of absolute address.
We have 2 kthread() functions in kernel and this confuses you script.


E.g. my vmlinux :
$ addr2line -i -e vmlinux ffffffff811b5290
/home/andrew/linux/kernel/kthread.c:178
$ addr2line -i -e kasan_conf/vmlinux ffffffff825c7240
/home/andrew/linux/drivers/block/aoe/aoecmd.c:1289


$ echo '[<ffffffff811b5290>] kthread+0x00/0x00' | python kasan_symbolize.py vmlinux
[<ffffffff811b5290>] kthread+0x00/0x00 drivers/block/aoe/aoecmd.c:462
$ echo '[<ffffffff825c7240>] kthread+0x00/0x00' | python kasan_symbolize.py vmlinux
[<ffffffff825c7240>] kthread+0x00/0x00 drivers/block/aoe/aoecmd.c:462

Dmitry Vyukov

unread,
Apr 11, 2016, 5:47:39 AM4/11/16
to Andrey Ryabinin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
On Mon, Apr 11, 2016 at 11:32 AM, Andrey Ryabinin
Agree. Filed https://github.com/google/sanitizers/issues/668
Thanks!

Dmitry Vyukov

unread,
Jun 7, 2016, 5:34:21 AM6/7/16
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
Ping.

Hit it again on af8c34ce6ae32addda3788d54a7e340cad22516b (4.7-rc2)

CPA ffff880059990000: bad pte 8000000059990060
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1602 at arch/x86/mm/pageattr-test.c:226[<
none >] pageattr_test+0xa6f/0xd10 arch/x86/mm/pageattr-test.c:226
NOT PASSED. Please report.
Modules linked in:
CPU: 1 PID: 1602 Comm: pageattr-test Not tainted 4.7.0-rc2+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff880b59e0 ffff88003a41fa70 ffffffff82cc5e2f ffffffff81495ed8
fffffbfff1016b3c ffff88003a41fae8 0000000000000000 ffffffff86c7a060
ffffffff8129ac8f 0000000000000009 ffff88003a41fab8 ffffffff8136d23f
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82cc5e2f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
[<ffffffff8136d23f>] __warn+0x19f/0x1e0 kernel/panic.c:516
[<ffffffff8136d32c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:531
[<ffffffff8129ac8f>] pageattr_test+0xa6f/0xd10 arch/x86/mm/pageattr-test.c:226
[<ffffffff8129af4b>] do_pageattr_test+0x1b/0x60 arch/x86/mm/pageattr-test.c:240
[<ffffffff813d58df>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
[<ffffffff86a8da0f>] ret_from_fork+0x1f/0x40 arch/x86/entry/entry_64.S:389
---[ end trace 1e5de26a2555fc00 ]---

Dmitry Vyukov

unread,
Jun 7, 2016, 5:34:44 AM6/7/16
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin
Should we delete this test if it is not important?

Dmitry Vyukov

unread,
Jun 10, 2016, 6:18:41 AM6/10/16
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linu...@kvack.org, Peter Zijlstra
On Thu, Jun 9, 2016 at 11:34 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> On Tue, 7 Jun 2016, Dmitry Vyukov wrote:
>> >> I've got the following WARNING while running syzkaller fuzzer:
>> >>
>> >> CPA ffff880054118000: bad pte after revert 8000000054118363
>>
>> > CPA ffff880059990000: bad pte 8000000059990060
>
> In both cases the PTE bit which the test modifies is in the wrong state.
>
>> Should we delete this test if it is not important?
>
> No. There is something badly wrong.
>
> PAGE_BIT_CPA_TEST is the same as PAGE_BIT_SPECIAL. And the latter is used by
> the mm code to mark user space mappings. The test code only modifies the
> direct mapping, i.e. the kernel side one.
>
> So something sets PAGE_BIT_SPECIAL on a kernel PTE. And that's definitely a
> bug.
>
> These are the last entries from your syzkaller log file of the first incident:
>
> r0 = perf_event_open(&(0x7f000000f000-0x78)={0x2, 0x78, 0x11, 0x7, 0xd537, 0x6, 0x0, 0xc1, 0xffff, 0x5, 0x0, 0x40, 0x4, 0x9, 0x5369, 0x8, 0x7, 0x8508, 0x3, 0x80, 0x0}, 0x0, 0xffffffff, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000cbb000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
> r1 = syz_open_dev$mouse(&(0x7f0000cbb000)="2f6465762f696e7075742f6d6f7573652300", 0x100, 0xa00)
> mmap(&(0x7f0000cbc000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
> setsockopt$BT_SNDMTU(r1, 0x112, 0xc, &(0x7f0000cbc000)=0x5, 0x2)
> mmap(&(0x7f0000cbb000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
> ioctl$EVIOCGEFFECTS(r1, 0x80044584, &(0x7f0000cbc000-0x942)=nil)
> r2 = fcntl$dupfd(r0, 0x406, r0)
> mmap(&(0x7f0000cbc000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f00002bf000)=nil, (0x1000), 0x3, 0x8010, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000000000)=nil, (0x0), 0x3, 0x32, 0xffffffffffffffff, 0x0)
> pwritev(r2, &(0x7f00007e9000)=[{&(0x7f0000cbc000)=....
>
> Do you have log of the second one available as well?
>
> CC'ing mm and perf folks.


Here is the second log:
https://gist.githubusercontent.com/dvyukov/dd7970a5daaa7a30f6d37fa5592b56de/raw/f29182024538e604c95d989f7b398816c3c595dc/gistfile1.txt

I've hit only twice. The first time I tried hard to reproduce it, with
no success. So unfortunately that's all we have.

Re logs: my setup executes up to 16 programs in parallel. So for
normal BUGs any of the preceding 16 programs can be guilty. But since
this check is asynchronous, it can be just any preceding program in
the log.

I would expect that it is triggered by some rarely-executing poorly
tested code. Maybe mmap of some device?

Dmitry Vyukov

unread,
Jun 10, 2016, 9:06:29 AM6/10/16
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x...@kernel.org, LKML, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linu...@kvack.org, Peter Zijlstra
On Fri, Jun 10, 2016 at 2:54 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> On Fri, 10 Jun 2016, Dmitry Vyukov wrote:
>> Here is the second log:
>> https://gist.githubusercontent.com/dvyukov/dd7970a5daaa7a30f6d37fa5592b56de/raw/f29182024538e604c95d989f7b398816c3c595dc/gistfile1.txt
>>
>> I've hit only twice. The first time I tried hard to reproduce it, with
>> no success. So unfortunately that's all we have.
>>
>> Re logs: my setup executes up to 16 programs in parallel. So for
>> normal BUGs any of the preceding 16 programs can be guilty. But since
>> this check is asynchronous, it can be just any preceding program in
>> the log.
>
> Ok.
>
>> I would expect that it is triggered by some rarely-executing poorly
>> tested code. Maybe mmap of some device?
>
> That's the mmap(dev) list which is common between the two log files:
>
> vcsn
> ircomm
> rfkill
> userio
> dspn
> mice
> midi
> sndpcmc
> hidraw0
> vga_arbiter
> lightnvm
> sr
>
> Dunno, if that's the right direction, but exposing these a bit more might be
> worth to try.


I am now running both of these logs for several hours (2.5M
executions). No failures so far...
Reply all
Reply to author
Forward
0 new messages