[PATCH] arm: port KCOV to arm

Dmitry Vyukov

unread,

Apr 26, 2018, 9:08:51 AM4/26/18

to li...@armlinux.org.uk, mark.r...@arm.com, liuwe...@huawei.com, catalin...@arm.com, takuo.ko...@hitachi.com, at...@google.com, Dmitry Vyukov, linux-ar...@lists.infradead.org, syzk...@googlegroups.com

KCOV is code coverage collection facility used, in particular, by syzkaller
system call fuzzer. There is some interest in using syzkaller on arm devices.
So port KCOV to arm.

On implementation level this merely declares that KCOV is supported and
disables instrumentation of 3 special cases. Reasons for disabling are
commented in code.

Tested with qemu-system-arm/vexpress-a15.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: Mark Rutland <mark.r...@arm.com>
Cc: Abbott Liu <liuwe...@huawei.com>
Cc: Catalin Marinas <catalin...@arm.com>
Cc: Koguchi Takuo <takuo.ko...@hitachi.com>
Cc: Atul Prakash <at...@google.com>
Cc: li...@armlinux.org.uk
Cc: linux-ar...@lists.infradead.org
Cc: syzk...@googlegroups.com
---
arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/Makefile | 3 +++
arch/arm/mm/Makefile | 4 ++++
arch/arm/vdso/Makefile | 3 +++
4 files changed, 11 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7f8e7f4b88f..60558a6bb744 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -105,6 +105,7 @@ config ARM
select REFCOUNT_FULL
select RTC_LIB
select SYS_SUPPORTS_APM_EMULATION
+ select ARCH_HAS_KCOV
# Above selects are sorted alphabetically; please add new ones
# according to that. Thanks.
help
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 45a6b9b7af2a..5219700e9161 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -25,6 +25,9 @@ endif

GCOV_PROFILE := n

+# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
+KCOV_INSTRUMENT := n
+
#
# Architecture dependencies
#
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 9dbb84923e12..e8be5d904ac7 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -8,6 +8,10 @@ obj-y += dma-mapping$(MMUEXT).o
obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \
mmap.o pgd.o mmu.o pageattr.o

+# Instrumenting fault.c causes infinite recursion between:
+# __dabt_svc -> do_DataAbort -> __sanitizer_cov_trace_pc -> __dabt_svc
+KCOV_INSTRUMENT_fault.o := n
+
ifneq ($(CONFIG_MMU),y)
obj-y += nommu.o
obj-$(CONFIG_ARM_MPU) += pmsa-v7.o
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index bb4118213fee..f4efff9d3afb 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -30,6 +30,9 @@ CFLAGS_vgettimeofday.o = -O2
# Disable gcov profiling for VDSO code
GCOV_PROFILE := n

+# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
+KCOV_INSTRUMENT := n
+
# Force dependency
$(obj)/vdso.o : $(obj)/vdso.so

--
2.17.0.484.g0c8726318c-goog

Mark Rutland

unread,

Apr 26, 2018, 9:41:04 AM4/26/18

to Dmitry Vyukov, li...@armlinux.org.uk, liuwe...@huawei.com, catalin...@arm.com, takuo.ko...@hitachi.com, at...@google.com, linux-ar...@lists.infradead.org, syzk...@googlegroups.com, marc.z...@arm.com, cd...@kernel.org

Hi Dmitry,

On Thu, Apr 26, 2018 at 03:08:46PM +0200, Dmitry Vyukov wrote:
> KCOV is code coverage collection facility used, in particular, by syzkaller
> system call fuzzer. There is some interest in using syzkaller on arm devices.
> So port KCOV to arm.
>
> On implementation level this merely declares that KCOV is supported and
> disables instrumentation of 3 special cases. Reasons for disabling are
> commented in code.
>
> Tested with qemu-system-arm/vexpress-a15.
>
> Signed-off-by: Dmitry Vyukov <dvy...@google.com>
> Cc: Russell King <li...@armlinux.org.uk>
> Cc: Mark Rutland <mark.r...@arm.com>
> Cc: Abbott Liu <liuwe...@huawei.com>
> Cc: Catalin Marinas <catalin...@arm.com>
> Cc: Koguchi Takuo <takuo.ko...@hitachi.com>
> Cc: Atul Prakash <at...@google.com>
> Cc: li...@armlinux.org.uk
> Cc: linux-ar...@lists.infradead.org
> Cc: syzk...@googlegroups.com
> ---
> arch/arm/Kconfig | 1 +
> arch/arm/boot/compressed/Makefile | 3 +++
> arch/arm/mm/Makefile | 4 ++++
> arch/arm/vdso/Makefile | 3 +++
> 4 files changed, 11 insertions(+)

The hyp code will also need to opt-out of KCOV instrumentation.

i.e. arch/arm/kvm/hyp/Makefile will need:

KCOV_INSTRUMENT := n

... and we should probably pick up the other bits from the arm64 hyp
Makefile, i.e. all of:

# KVM code is run at a different exception code with a different map, so
# compiler instrumentation that inserts callbacks or checks into the code may
# cause crashes. Just disable it.
GCOV_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n

Why does __sanitizer_cov_trace_pc() cause a data abort?

We don't seem to have this issue on arm64, where our fault handling is
instrumented, so this seems suspect.

Thanks,
Mark.

Dmitry Vyukov

unread,

Apr 26, 2018, 9:48:11 AM4/26/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Atul Prakash, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

I can blindly add them if you wish, but I don't have a way to test it.
I also need an explanatory comment as to why we disable this.
Otherwise I have to say "Mark said so" :)

p.s. KASAN does not exist on arm (yet).

I don't have an explanation. That's just what me and Takuo observed.
We've seen that it happens when __sanitizer_cov_trace_pc tries to
dereference current to check kcov mode.

Dmitry Vyukov

unread,

Apr 26, 2018, 9:49:02 AM4/26/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

-stale email

Mark Rutland

unread,

Apr 26, 2018, 10:29:54 AM4/26/18

to Dmitry Vyukov, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

The rationale is that this code runs at hyp, with minimal code/data
mapped in its page tables (which are not the usual kernel page tables).
Instrumented code may call functions or access data structures which
aren't mapped, which will bring down the system.

> p.s. KASAN does not exist on arm (yet).

Sure. We can drop that line for now, or keep it -- it does no harm.

[...]

> >> +# Instrumenting fault.c causes infinite recursion between:
> >> +# __dabt_svc -> do_DataAbort -> __sanitizer_cov_trace_pc -> __dabt_svc
> >> +KCOV_INSTRUMENT_fault.o := n
> >
> > Why does __sanitizer_cov_trace_pc() cause a data abort?
> >
> > We don't seem to have this issue on arm64, where our fault handling is
> > instrumented, so this seems suspect.
>
> I don't have an explanation. That's just what me and Takuo observed.
> We've seen that it happens when __sanitizer_cov_trace_pc tries to
> dereference current to check kcov mode.

Huh. The only reason I can imagine that might happen is if the
compiler's generating a misaligned access requiring fixup. If your
compiler's doing that, it could presumably do that in the fault handling
code too, which would be a big problem.

If you happen to have a binary around, can you dump the disassembly for
your __sanitizer_cov_trace_pc?

Using the Linaro 17.05 arm-linux-gnueabhif-gcc 6.3 toolchain I get the
following:

00000000 <__sanitizer_cov_trace_pc>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e1a0300d mov r3, sp
8: e3c33d7f bic r3, r3, #8128 ; 0x1fc0
c: e3a02c01 mov r2, #256 ; 0x100
10: e3c3303f bic r3, r3, #63 ; 0x3f
14: e340201f movt r2, #31
18: e5931004 ldr r1, [r3, #4]
1c: e1110002 tst r1, r2
20: 149df004 popne {pc} ; (ldrne pc, [sp], #4)
24: e593300c ldr r3, [r3, #12]
28: e5932508 ldr r2, [r3, #1288] ; 0x508
2c: e3520002 cmp r2, #2
30: 149df004 popne {pc} ; (ldrne pc, [sp], #4)
34: e5932510 ldr r2, [r3, #1296] ; 0x510
38: e593150c ldr r1, [r3, #1292] ; 0x50c
3c: e5923000 ldr r3, [r2]
40: e2833001 add r3, r3, #1
44: e1530001 cmp r3, r1
48: 3782e103 strcc lr, [r2, r3, lsl #2]
4c: 35823000 strcc r3, [r2]
50: e49df004 pop {pc} ; (ldr pc, [sp], #4)

... which looks sane/safe to me.

Thanks,
Mark.

Dmitry Vyukov

unread,

Apr 26, 2018, 10:58:24 AM4/26/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

Here is my disasm:

801dc1b0 <__sanitizer_cov_trace_pc>:
801dc1b0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
801dc1b4: e1a0300d mov r3, sp
801dc1b8: e3c33d7f bic r3, r3, #8128 ; 0x1fc0
801dc1bc: e3a02c01 mov r2, #256 ; 0x100
801dc1c0: e3c3303f bic r3, r3, #63 ; 0x3f
801dc1c4: e340201f movt r2, #31
801dc1c8: e5931004 ldr r1, [r3, #4]
801dc1cc: e1110002 tst r1, r2
801dc1d0: 149df004 popne {pc} ; (ldrne pc, [sp], #4)
801dc1d4: e593300c ldr r3, [r3, #12]
801dc1d8: e5932be0 ldr r2, [r3, #3040] ; 0xbe0
801dc1dc: e3520002 cmp r2, #2
801dc1e0: 149df004 popne {pc} ; (ldrne pc, [sp], #4)
801dc1e4: e5932be8 ldr r2, [r3, #3048] ; 0xbe8
801dc1e8: e5931be4 ldr r1, [r3, #3044] ; 0xbe4
801dc1ec: e5923000 ldr r3, [r2]
801dc1f0: e2833001 add r3, r3, #1
801dc1f4: e1510003 cmp r1, r3
801dc1f8: 8782e103 strhi lr, [r2, r3, lsl #2]
801dc1fc: 85823000 strhi r3, [r2]
801dc200: e49df004 pop {pc} ; (ldr pc, [sp], #4)

Compiler is gcc version 7.2.0 (Debian 7.2.0-7).

I've now rebuilt without that change and will hopefully soon get
crashes to reconfirm.

Dmitry Vyukov

unread,

Apr 26, 2018, 11:04:32 AM4/26/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

Yes, a swarm of assorted crashes now. Here are 4:

buildroot login: Unable to handle kernel paging request at virtual
address c9db963e
pgd = c188b8a2
[c9db963e] *pgd=00000000
Internal error: Oops: 80000005 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 933 Comm: syz-executor3 Not tainted 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
PC is at 0xc9db963e
LR is at do_work_pending+0xcc/0xf0
pc : [<c9db963e>] lr : [<8010e290>] psr: 80000093
sp : 9785dfb0 ip : 00000000 fp : 00000000
r10: 00000054 r9 : 9785c000 r8 : 00000000
r7 : 10c5387d r6 : ffffffff r5 : 20000030 r4 : 00031408
r3 : 9f749980 r2 : 00000000 r1 : 00000000 r0 : 00000000
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9786006a DAC: 00000051
Process syz-executor3 (pid: 933, stack limit = 0xa0d2fc58)
Stack: (0x9785dfb0 to 0x9785e000)
dfa0: 0009c308 801dc1ec 60000193 ffffffff
dfc0: 9785e004 9fbd6990 801dc1ec 80101950 abf38000 00040000 abf38000 9f748cc0
dfe0: 80c08408 00000005 abf38000 9785e0d8 9fbd6990 9785e000 9ed5c480 80118a3c
[<8010e290>] (do_work_pending) from [<9fbd6990>] (0x9fbd6990)
Code: bad PC value
---[ end trace 4c3305535d90997d ]---
Kernel panic - not syncing: Fatal exception
CPU1: stopping
CPU: 1 PID: 928 Comm: syz-executor0 Tainted: G D 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
[<80112f64>] (unwind_backtrace) from [<8010ede4>] (show_stack+0x18/0x1c)
[<8010ede4>] (show_stack) from [<807e55d0>] (dump_stack+0xcc/0x110)
[<807e55d0>] (dump_stack) from [<80111758>] (handle_IPI+0x1b0/0x1c0)
[<80111758>] (handle_IPI) from [<804985e8>] (gic_handle_irq+0xbc/0xc0)
[<804985e8>] (gic_handle_irq) from [<801019f0>] (__irq_svc+0x70/0x98)
Exception stack(0x9a6dfd90 to 0x9a6dfdd8)
fd80: 9eed1600 00000002 00000000 9ec68000
fda0: 0009b66e 100400fb 9eed1600 9b66e71d 768e5000 00000008 00000000 768e5000
fdc0: 9a6de000 9a6dfde0 8023ad74 801dc1b0 00000013 ffffffff
[<801019f0>] (__irq_svc) from [<801dc1b0>] (__sanitizer_cov_trace_pc+0x0/0x54)
[<801dc1b0>] (__sanitizer_cov_trace_pc) from [<9b66e71d>] (0x9b66e71d)
Rebooting in 86400 seconds..

=============================================================

buildroot login: Unable to handle kernel paging request at virtual
address c641ca60
Unhandled fault: page domain fault (0x81b) at 0x00000055
pgd = 071861d0
[c641ca60] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 954 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
PC is at trace_hardirqs_off_caller+0x2c/0x164
LR is at __dabt_svc+0x54/0xa0
pc : [<801755b0>] lr : [<80101934>] psr: 20000193
sp : 974e8040 ip : 00000051 fp : 97511da4
r10: 9eeacb40 r9 : 974e8000 r8 : 9fbe6990
r7 : 974e807c r6 : ffffffff r5 : 20000193 r4 : 801755b0
r3 : ffffe000 r2 : c641c56c r1 : 00000001 r0 : 80101934
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9eec006a DAC: 00000051
Process syz-executor0 (pid: 954, stack limit = 0xadce5611)
Stack: (0x974e8040 to 0x974e8000)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8048 to 0x974e8090)
8040: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
8060: ffffffff 974e80d4 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8098
8080: 80101934 801755b0 20000193 ffffffff
pgd = 78062a34
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e80a0 to 0x974e80e8)
80a0: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e812c
80c0: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e80f0 80101934 801755b0
80e0: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e80f8 to 0x974e8140)
80e0: 80101934 00000001
8100: c641c56c ffffe000 801755b0 20000193 ffffffff 974e8184 9fbe6990 974e8000
8120: 9eeacb40 97511da4 00000051 974e8148 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8150 to 0x974e8198)
8140: 80101934 00000001 c641c56c ffffe000
8160: 801755b0 20000193 ffffffff 974e81dc 9fbe6990 974e8000 9eeacb40 97511da4
8180: 00000051 974e81a0 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e81a8 to 0x974e81f0)
81a0: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
81c0: ffffffff 974e8234 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e81f8
81e0: 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8200 to 0x974e8248)
8200: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e828c
8220: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8250 80101934 801755b0
8240: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8258 to 0x974e82a0)
8240: 80101934 00000001
8260: c641c56c ffffe000 801755b0 20000193 ffffffff 974e82e4 9fbe6990 974e8000
8280: 9eeacb40 97511da4 00000051 974e82a8 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
[00000055] *pgd=97443835, *pte=00000000, *ppte=00000000
Internal error: : 81b [#2] SMP ARM
Modules linked in:
CPU: 0 PID: 942 Comm: syz-executor2 Not tainted 4.17.0-rc2+ #4
Exception stack(0x974e82b0 to 0x974e82f8)
82a0: 80101934 00000001 c641c56c ffffe000
82c0: 801755b0 20000193 ffffffff 974e833c 9fbe6990 974e8000 9eeacb40 97511da4
82e0: 00000051 974e8300 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
Hardware name: ARM-Versatile Express
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8308 to 0x974e8350)
8300: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
8320: ffffffff 974e8394 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8358
8340: 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8360 to 0x974e83a8)
8360: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e83ec
8380: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e83b0 80101934 801755b0
83a0: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e83b8 to 0x974e8400)
83a0: 80101934 00000001
83c0: c641c56c ffffe000 801755b0 20000193 ffffffff 974e8444 9fbe6990 974e8000
83e0: 9eeacb40 97511da4 00000051 974e8408 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8410 to 0x974e8458)
8400: 80101934 00000001 c641c56c ffffe000
8420: 801755b0 20000193 ffffffff 974e849c 9fbe6990 974e8000 9eeacb40 97511da4
8440: 00000051 974e8460 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8468 to 0x974e84b0)
8460: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
8480: ffffffff 974e84f4 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e84b8
84a0: 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e84c0 to 0x974e8508)
84c0: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e854c
84e0: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8510 80101934 801755b0
PC is at list_netdevice+0xc4/0x17c
LR is at list_netdevice+0xc4/0x17c
pc : [<806bac14>] lr : [<806bac14>] psr: 80000013
sp : 97451e28 ip : 00000000 fp : 00000000
r10: 97451e6c r9 : 00000000 r8 : 97490ab0
r7 : 9ee27810 r6 : 00000051 r5 : 974909c0 r4 : 9ee27800
r3 : 9f614c80 r2 : 00000000 r1 : 00000201 r0 : 000000d0
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9745406a DAC: 00000051
Process syz-executor2 (pid: 942, stack limit = 0xdd0292b9)
Stack: (0x97451e28 to 0x97452000)
8500: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8518 to 0x974e8560)
8500: 80101934 00000001
8520: c641c56c ffffe000 801755b0 20000193 ffffffff 974e85a4 9fbe6990 974e8000
8540: 9eeacb40 97511da4 00000051 974e8568 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8570 to 0x974e85b8)
8560: 80101934 00000001 c641c56c ffffe000
8580: 801755b0 20000193 ffffffff 974e85fc 9fbe6990 974e8000 9eeacb40 97511da4
85a0: 00000051 974e85c0 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e85c8 to 0x974e8610)
85c0: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
85e0: ffffffff 974e8654 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8618
8600: 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8620 to 0x974e8668)
8620: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e86ac
8640: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8670 80101934 801755b0
8660: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8678 to 0x974e86c0)
8660: 80101934 00000001
8680: c641c56c ffffe000 801755b0 20000193 ffffffff 974e8704 9fbe6990 974e8000
1e20: 40000013 9ee27800 9ee27800 80c08408 00000000 00000001
86a0: 9eeacb40 97511da4 00000051 974e86c8 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e86d0 to 0x974e8718)
86c0: 80101934 00000001 c641c56c ffffe000
86e0: 801755b0 20000193 ffffffff 974e875c 9fbe6990 974e8000 9eeacb40 97511da4
8700: 00000051 974e8720 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8728 to 0x974e8770)
8720: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
8740: ffffffff 974e87b4 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e8778
8760: 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8780 to 0x974e87c8)
8780: 80101934 00000001 c641c56c ffffe000 801755b0 20000193 ffffffff 974e880c
87a0: 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e87d0 80101934 801755b0
87c0: 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e87d8 to 0x974e8820)
87c0: 80101934 00000001
87e0: c641c56c ffffe000 801755b0 20000193 ffffffff 974e8864 9fbe6990 974e8000
8800: 9eeacb40 97511da4 00000051 974e8828 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8830 to 0x974e8878)
8820: 80101934 00000001 c641c56c ffffe000
8840: 801755b0 20000193 ffffffff 974e88bc 9fbe6990 974e8000 9eeacb40 97511da4
8860: 00000051 974e8880 80101934 801755b0 20000193 ffffffff
[<80101934>] (__dabt_svc) from [<801755b0>]
(trace_hardirqs_off_caller+0x2c/0x164)
[<801755b0>] (trace_hardirqs_off_caller) from [<80101934>]
(__dabt_svc+0x54/0xa0)
Exception stack(0x974e8888 to 0x974e88d0)
8880: 80101934 00000001 c641c56c ffffe000 801755b0 20000193
1e40: 00000001 806cbb58 806c2104 00000001 00000000 00000000 00000000 00000000
1e60: 00000000 00000000 00000001 9ee27800 00000000 c9db963e 00000000 9ee27800
1e80: 974909c0 00000000 00000000 97451ee4 80c08408 80c32a80 00000000 806cbc9c
1ea0: 9ee27800 805a5f20 00000001 00000001 805a5ed0 974909c0 00000000 806b35b8
1ec0: 80c27298 974909c0 80c32ae8 00000000 97451ee4 80c08408 80c32a80 806b3cf0
1ee0: 806b524c 97451ee4 97451ee4 c9db963e 00000001 974909c0 80c0fb3c 80c0fa7c
1f00: 9f40f500 00000000 9f5f8e80 00000000 00000000 806b530c 9f422960 80c41fac
1f20: 40000000 80c0fa7c 9f614c80 801519bc 00000015 00000001 80c0fa7c 40000000
1f40: 9f5f8e80 80c0fa7c 97451f80 9f614c80 97450000 80151f2c 00000000 c9db963e
88a0: ffffffff 974e8914 9fbe6990 974e8000 9eeacb40 97511da4 00000051 974e88d8
1f60: 40000000 80c08408 00000000 00000000 00000000 80124ffc 00000000 00000000
1f80: 00000000 c9db963e 00000002 7ef93d1c 00000000 000b0000 00000151 801011c4
1fa0: 97450000 80101000 7ef93d1c 00000000 40000000 7ef93cf8 000001b4 00100000
1fc0: 7ef93d1c 00000000 000b0000 00000151 00000004 00000000 00000000 00000000
1fe0: 00000000 7ef93d0c 00010547 00036578 00000030 40000000 00000000 00000000
[<806bac14>] (list_netdevice) from [<806cbb58>] (register_netdevice+0x5d8/0x6f8)
[<806cbb58>] (register_netdevice) from [<806cbc9c>] (register_netdev+0x24/0x40)
[<806cbc9c>] (register_netdev) from [<805a5f20>] (loopback_net_init+0x50/0xc4)
[<805a5f20>] (loopback_net_init) from [<806b35b8>] (ops_init+0xdc/0x190)
[<806b35b8>] (ops_init) from [<806b3cf0>] (setup_net+0xd8/0x230)
[<806b3cf0>] (setup_net) from [<806b530c>] (copy_net_ns+0x190/0x1e0)
[<806b530c>] (copy_net_ns) from [<801519bc>] (create_new_namespaces+0x118/0x280)
[<801519bc>] (create_new_namespaces) from [<80151f2c>]
(unshare_nsproxy_namespaces+0x8c/0xf8)
[<80151f2c>] (unshare_nsproxy_namespaces) from [<80124ffc>]
(ksys_unshare+0x24c/0x48c)
[<80124ffc>] (ksys_unshare) from [<80101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0x97451fa8 to 0x97451ff0)
1fa0: 7ef93d1c 00000000 40000000 7ef93cf8 000001b4 00100000
1fc0: 7ef93d1c 00000000 000b0000 00000151 00000004 00000000 00000000 00000000
1fe0: 00000000 7ef93d0c 00010547 00036578
Code: e3560000 e7827100 0a000001 ebec8566 (e5867004)
---[ end trace 6ace6175b5180e2d ]---
Kernel panic - not syncing: Fatal exception in interrupt
Unhandled fault: page domain fault (0x01b) at 0x00000be0
Unable to handle kernel paging request at virtual address 7087f618
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
SMP: failed to stop secondary CPUs
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unhandled fault: page domain fault (0x01b) at 0x00000be0
Unhandled fault: page domain fault (0x01b) at 0x00000244
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
pgd = ff081c69
[3028ec1f] *pgd=00000000
Rebooting in 86400 seconds..

=============================================

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: do_futex+0xf04/0xf88

CPU: 1 PID: 969 Comm: syz-executor2 Not tainted 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
[<80112f64>] (unwind_backtrace) from [<8010ede4>] (show_stack+0x18/0x1c)
[<8010ede4>] (show_stack) from [<807e55d0>] (dump_stack+0xcc/0x110)
[<807e55d0>] (dump_stack) from [<80125dac>] (panic+0x11c/0x2f0)
[<80125dac>] (panic) from [<80125828>] (print_tainted+0x0/0xcc)
[<80125828>] (print_tainted) from [<978fbf68>] (0x978fbf68)
Unhandled fault: page domain fault (0x01b) at 0x000004f5
Unable to handle kernel paging request at virtual address 7414bd10
pgd = 16be7fe8
[7414bd10] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: -1700673763 PID: 0 Comm: Not tainted 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
PC is at console_unlock+0x80/0x6c0
LR is at console_unlock+0x50/0x6c0
pc : [<801867bc>] lr : [<8018678c>] psr: a0000193
sp : 978cfde8 ip : 9aa1bc15 fp : 8134b578
r10: 20000193 r9 : 00000000 r8 : 00000000
r7 : 8134b578 r6 : 00000006 r5 : ffffe000 r4 : 00000000
r3 : fcd50e39 r2 : 0000001d r1 : 80c0842c r0 : 00000001
Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9a1cc06a DAC: 00000051
Process (pid: 0, stack limit = 0xf7eeb3d2)
Stack: (0x978cfde8 to 0x00002000)
[<801867bc>] (console_unlock) from [<801870cc>] (vprintk_emit+0x2d0/0x510)
[<801870cc>] (vprintk_emit) from [<80187520>] (vprintk_default+0x2c/0x34)
[<80187520>] (vprintk_default) from [<80188b6c>] (vprintk_func+0xc4/0x124)
[<80188b6c>] (vprintk_func) from [<80188364>] (printk+0x34/0x58)
[<80188364>] (printk) from [<80118b34>] (do_DataAbort+0x9c/0xf4)
[<80118b34>] (do_DataAbort) from [<8010193c>] (__dabt_svc+0x5c/0xa0)
Exception stack(0x978cffb0 to 0x978cfff8)
ffa0: 80101934 00000001 00000001 ffffe000
ffc0: 801755b0 20000193 ffffffff 978d003c 80c0d1f8 978d0000 9ed36480 978dbd54
ffe0: 80c0d1a8 978d0000 80101934 801755b0 20000193 ffffffff
Code: e203201f b1a03001 e59d102c e1a032c3 (e7913103)
---[ end trace d985f5a16c59cb8d ]---
SMP: failed to stop secondary CPUs
Rebooting in 86400 seconds..

===============================================

buildroot login: ------------[ cut here ]------------
Unable to handle kernel paging request at virtual address 73b23c48
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79caeeb
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address e79ccbe3
pgd = 27ff7dff
Unable to handle kernel paging request at virtual address d20d547a
Unable to handle kernel paging request at virtual address 7087f618
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
Unable to handle kernel paging request at virtual address 3028ec1f
pgd = 1f9b9281
[3028ec1f] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: Not tainted 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
PC is at show_pte+0x28/0xd4
LR is at show_pte+0x28/0xd4
pc : [<801182d8>] lr : [<801182d8>] psr: 20000193
sp : 978da0e8 ip : 00000000 fp : 9fbd1510
r10: 8031140e r9 : 978da000 r8 : 3028ebff
r7 : 00000005 r6 : 00000181 r5 : 3028ec1f r4 : 3028ebff
r3 : 978da000 r2 : 001f0100 r1 : 9fbd1a2e r0 : 3028ebff
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 978d006a DAC: 00000051
Process (pid: 0, stack limit = 0xc88e92ff)
Stack: (0x978da0e8 to 0x00002000)
[<801182d8>] (show_pte) from [<80118cc0>] (__do_kernel_fault.part.0+0x5c/0x80)
[<80118cc0>] (__do_kernel_fault.part.0) from [<801188ec>] (do_bad_area+0x0/0xa0)
[<801188ec>] (do_bad_area) from [<8491141e>] (0x8491141e)
Code: e34830c1 e1a06aa5 01a04003 eb030fb5 (e5941020)
---[ end trace 5c73d7479f0df7a7 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU1: stopping
CPU: 1 PID: 926 Comm: syz-executor1 Tainted: G D 4.17.0-rc2+ #4
Hardware name: ARM-Versatile Express
[<80112f64>] (unwind_backtrace) from [<8010ede4>] (show_stack+0x18/0x1c)
[<8010ede4>] (show_stack) from [<807e55d0>] (dump_stack+0xcc/0x110)
[<807e55d0>] (dump_stack) from [<80111758>] (handle_IPI+0x1b0/0x1c0)
[<80111758>] (handle_IPI) from [<804985e8>] (gic_handle_irq+0xbc/0xc0)
[<804985e8>] (gic_handle_irq) from [<801019f0>] (__irq_svc+0x70/0x98)
Exception stack(0x98b75f08 to 0x98b75f50)
5f00: 9edfcbc8 00000000 00000000 00000055 80c08408 00000051
5f20: 7ee32e60 7ee32e60 801011c4 98b74000 00000000 00000000 98b74000 98b75f58
5f40: 8019ec9c 8019ecac 80000013 ffffffff
[<801019f0>] (__irq_svc) from [<8019ecac>] (put_timespec64+0x78/0xcc)
[<8019ecac>] (put_timespec64) from [<801ad904>] (sys_clock_gettime+0x84/0xcc)
[<801ad904>] (sys_clock_gettime) from [<80101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0x98b75fa8 to 0x98b75ff0)
5fa0: 00000002 7ee32eb0 00000001 7ee32e60 00000000 00000000
5fc0: 00000002 7ee32eb0 7ee32edc 00000107 0006c8f4 00000000 7ee336c4 00000000
5fe0: 00000107 7ee32e54 0003692f 0001bad6

Mark Rutland

unread,

Apr 27, 2018, 9:06:49 AM4/27/18

to Dmitry Vyukov, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

These offsets for task_struct::{kcov_area,kcov_size} are *much* larger
than mine. Can you share your kernel config?

> > 801dc1ec: e5923000 ldr r3, [r2]
> > 801dc1f0: e2833001 add r3, r3, #1
> > 801dc1f4: e1510003 cmp r1, r3
> > 801dc1f8: 8782e103 strhi lr, [r2, r3, lsl #2]
> > 801dc1fc: 85823000 strhi r3, [r2]
> > 801dc200: e49df004 pop {pc} ; (ldr pc, [sp], #4)
> >
> > Compiler is gcc version 7.2.0 (Debian 7.2.0-7).

I also tried with the Linaro 17.11 GCC 7.2.1, and see codegen
to yours above, modulo the task_struct offsets.

> > I've now rebuilt without that change and will hopefully soon get
> > crashes to reconfirm.

Just to check, do you see this when starting userspace? i.e. without
opening any kcov files?

I can't reproduce the issue on real hardware atop of v4.17-rc2, when
booting and running a standard ARMv7 buildroot userspace. So the kcov
mode check seems fine to me.

> Yes, a swarm of assorted crashes now. Here are 4:
>
> buildroot login: Unable to handle kernel paging request at virtual
> address c9db963e
> pgd = c188b8a2
> [c9db963e] *pgd=00000000
> Internal error: Oops: 80000005 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 933 Comm: syz-executor3 Not tainted 4.17.0-rc2+ #4
> Hardware name: ARM-Versatile Express
> PC is at 0xc9db963e

That PC is the faulting address, which doesn't look like a valid kernel
image address given it's ~1G above the valid LR value down at
0x8010e290.

> LR is at do_work_pending+0xcc/0xf0

Assuming your GCC's codegen is the same as mine, that's the LR set up by
the call to task_work_run(), immediately before we branch back to the
start of the loop. So either we blew up in task_work_run(), or we've
returned to the top of the loop.

At the top of the loop my GCC has a bl to __sanitizer_cov_trace_pc(),
which should setup the LR.

My task_work_run() doesn't tail-call to anything, so I don't currently
see how we could end up in this state. That could be down to text
corruption, or corruption of the state of an interrupted context.

If you don't already have STRICT_KERNEL_RWX enabled, could you try
turning it on?

Thanks,
Mark.

Dmitry Vyukov

unread,

Apr 27, 2018, 9:51:44 AM4/27/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

Attached. It's pretty much vexpress_defconfig with few minor
additions. Here is full description of what I am doing:
https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md

FWIW when I do "KCOV_INSTRUMENT_fault.o := n" everything works and I
see reasonable coverage.

>> > 801dc1ec: e5923000 ldr r3, [r2]
>> > 801dc1f0: e2833001 add r3, r3, #1
>> > 801dc1f4: e1510003 cmp r1, r3
>> > 801dc1f8: 8782e103 strhi lr, [r2, r3, lsl #2]
>> > 801dc1fc: 85823000 strhi r3, [r2]
>> > 801dc200: e49df004 pop {pc} ; (ldr pc, [sp], #4)
>> >
>> > Compiler is gcc version 7.2.0 (Debian 7.2.0-7).
>
> I also tried with the Linaro 17.11 GCC 7.2.1, and see codegen
> to yours above, modulo the task_struct offsets.
>
>> > I've now rebuilt without that change and will hopefully soon get
>> > crashes to reconfirm.
>
> Just to check, do you see this when starting userspace? i.e. without
> opening any kcov files?
>
> I can't reproduce the issue on real hardware atop of v4.17-rc2, when
> booting and running a standard ARMv7 buildroot userspace. So the kcov
> mode check seems fine to me.

It happens after brief fuzzing with syzkaller. So it's both kcov
opened and some weird syscall workload. Again, here is everything what
I am doing:
https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md

Trying.

.config

Dmitry Vyukov

unread,

Apr 27, 2018, 9:52:55 AM4/27/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

It is enabled in my config.

Mark Rutland

unread,

Apr 27, 2018, 12:18:17 PM4/27/18

to Dmitry Vyukov, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

On Fri, Apr 27, 2018 at 03:51:22PM +0200, Dmitry Vyukov wrote:
> On Fri, Apr 27, 2018 at 3:06 PM, Mark Rutland <mark.r...@arm.com> wrote:
> > Can you share your kernel config?
>
> Attached. It's pretty much vexpress_defconfig with few minor
> additions. Here is full description of what I am doing:
> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md

Cheers!

> FWIW when I do "KCOV_INSTRUMENT_fault.o := n" everything works and I
> see reasonable coverage.

While this may be the case, I think it's papering over a bug rather than
solving it.

[...]

> > I can't reproduce the issue on real hardware atop of v4.17-rc2, when
> > booting and running a standard ARMv7 buildroot userspace. So the kcov
> > mode check seems fine to me.
>
> It happens after brief fuzzing with syzkaller. So it's both kcov
> opened and some weird syscall workload. Again, here is everything what
> I am doing:
> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md

I've set this up, and while I see RCU stalls and "no output from test
machine" warnings, I'm not seeing any reports with KCOV splats.

Are you somehow connecting to a VM which failed with no output?

Thanks,
Mark.

Dmitry Vyukov

unread,

Apr 27, 2018, 12:22:14 PM4/27/18

to Mark Rutland, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

I've started seeing assorted crashes like these:

kernel panic: Fatal exception
unable to handle kernel paging request in migrate_task_rq_fair
BUG: spinlock bad magic in corrupted
unable to handle kernel paging request in trace_hardirqs_off_caller
unable to handle kernel paging request in kick_process
kernel panic: stack-protector: Kernel stack is corrupted in: do_futex
unable to handle kernel paging request in __sanitizer_cov_trace_pc
Unable to handle kernel paging request at virtual address ADDR

Do you see code coverage increasing?

Besides compiler I am not sure what else can be different between our
setups (mine is Debian's 7.2).

Mark Rutland

unread,

Apr 27, 2018, 12:33:48 PM4/27/18

to Dmitry Vyukov, Russell King - ARM Linux, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Linux ARM, syzkaller, Marc Zyngier, cd...@kernel.org

Just to check, is that with or without instrumentation in fault.c?

It might be worth enabling HARDENED_USERCOPY -- that should scream if we
corrupt task_struct via a uaccess.

> Do you see code coverage increasing?

Not so far. QEMU TCG on this machine is rather slow, so it might just be
that VMs are timing out at boot time.

> Besides compiler I am not sure what else can be different between our
> setups (mine is Debian's 7.2).

Could you give mine [1] a go? It's the Linaro 17.11
arm-linux-gnueabihf-gcc 7.2.1 toolchain.

I don't ahve a Debian 7 install up at the moment.

[1] https://releases.linaro.org/components/toolchain/binaries/latest/arm-linux-gnueabihf/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf.tar.xz

Thanks,
Mark.

takuo.ko...@hitachi.com

unread,

Apr 27, 2018, 7:39:28 PM4/27/18

to syzkaller

I reproduced "__dabt_svc -> do_DataAbort -> __sanitizer_cov_trace_pc -> __dabt_svc" issue I had reported to Dmitry at the last SIL2LinuxMP workshop.

I am using,
Linaro GCC 7.2-2017.1
Linux version 4.17.0-rc2 + Dmitry's patch and removing "KCOV_INSTRUMENT_fault.o := n" from arch/arm/Makefile
My __sanitizer_cov_trace_pc is identical to the one Mark posted,
801cfd4c <__sanitizer_cov_trace_pc>:
801cfd4c:    e52de004     push    {lr}        ; (str lr, [sp, #-4]!)
801cfd50:    e1a0300d     mov    r3, sp
801cfd54:    e3c33d7f     bic    r3, r3, #8128    ; 0x1fc0
801cfd58:    e3a02c01     mov    r2, #256    ; 0x100
801cfd5c:    e3c3303f     bic    r3, r3, #63    ; 0x3f
801cfd60:    e340201f     movt    r2, #31
801cfd64:    e5931004     ldr    r1, [r3, #4]
801cfd68:    e1110002     tst    r1, r2
801cfd6c:    149df004     popne    {pc}        ; (ldrne pc, [sp], #4)
801cfd70:    e593300c     ldr    r3, [r3, #12]
801cfd74:    e5932508     ldr    r2, [r3, #1288]    ; 0x508
801cfd78:    e3520002     cmp    r2, #2
801cfd7c:    149df004     popne    {pc}        ; (ldrne pc, [sp], #4)
801cfd80:    e5932510     ldr    r2, [r3, #1296]    ; 0x510
801cfd84:    e593150c     ldr    r1, [r3, #1292]    ; 0x50c
801cfd88:    e5923000     ldr    r3, [r2]
801cfd8c:    e2833001     add    r3, r3, #1
801cfd90:    e1510003     cmp    r1, r3
801cfd94:    8782e103     strhi    lr, [r2, r3, lsl #2]
801cfd98:    85823000     strhi    r3, [r2]
801cfd9c:    e49df004     pop    {pc}        ; (ldr pc, [sp], #4)

As for syzkallers, I have modified cover_t to uint32 as Dmitry suggested.

2018/04/27 22:56:28 executing program 3:
r0 = socket$netlink(0x10, 0x3, 0x0)
getsockopt$sock_int(r0, 0x1, 0xf, &(0x7f0000000040), &(0x7f0000000180)=0x4)

syzkaller login: Unable to handle kernel paging request at virtual address e08032c4
pgd = 124f7ad8
[e08032c4] *pgd=00000000

Internal error: Oops: 5 [#1] SMP ARM

Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 1074 Comm: mdev Not tainted 4.17.0-rc2+ #30
Hardware name: ARM-Versatile Express
PC is at __sanitizer_cov_trace_pc+0x28/0x54 kernel/kcov.c:100
LR is at fsr_fs arch/arm/mm/fault.h:26 [inline]
LR is at do_DataAbort+0x28/0xf8 arch/arm/mm/fault.c:550
pc : [<801cfd74>]    lr : [<801190d0>]    psr: 60080193
sp : 9716c05c ip : 00000051 fp : 9716fda4
r10: 80902564 r9 : 9716c000 r8 : 97b89880
r7 : 9716c110 r6 : e08032c4 r5 : 00000005 r4 : 80c04c08
r3 : e0802dbc r2 : 001f0100 r1 : 00000000 r0 : 7f000000
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9ef3806a DAC: 00000051
Process mdev (pid: 1074, stack limit = 0x63b628ec)
Stack: (0x9716c05c to 0x9716c000)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c110 to 0x9716c158)
c100:                                     e08032c4 00000000 001f0100 e0802dbc
c120: 80c04c08 00000005 e08032c4 9716c218 97b89880 9716c000 80902564 9716fda4
c140: 00000051 9716c164 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c218 to 0x9716c260)
c200:                                                       e08032c4 00000000
c220: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716c320 97b89880 9716c000
c240: 80902564 9716fda4 00000051 9716c26c 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c320 to 0x9716c368)
c320: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716c428
c340: 97b89880 9716c000 80902564 9716fda4 00000051 9716c374 801190d0 801cfd74
c360: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c428 to 0x9716c470)
c420:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
c440: e08032c4 9716c530 97b89880 9716c000 80902564 9716fda4 00000051 9716c47c
c460: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c530 to 0x9716c578)
c520:                                     e08032c4 00000000 001f0100 e0802dbc
c540: 80c04c08 00000005 e08032c4 9716c638 97b89880 9716c000 80902564 9716fda4
c560: 00000051 9716c584 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c638 to 0x9716c680)
c620:                                                       e08032c4 00000000
c640: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716c740 97b89880 9716c000
c660: 80902564 9716fda4 00000051 9716c68c 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c740 to 0x9716c788)
c740: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716c848
c760: 97b89880 9716c000 80902564 9716fda4 00000051 9716c794 801190d0 801cfd74
c780: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c848 to 0x9716c890)
c840:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
c860: e08032c4 9716c950 97b89880 9716c000 80902564 9716fda4 00000051 9716c89c
c880: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716c950 to 0x9716c998)
c940:                                     e08032c4 00000000 001f0100 e0802dbc
c960: 80c04c08 00000005 e08032c4 9716ca58 97b89880 9716c000 80902564 9716fda4
c980: 00000051 9716c9a4 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716ca58 to 0x9716caa0)
ca40:                                                       e08032c4 00000000
ca60: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716cb60 97b89880 9716c000
ca80: 80902564 9716fda4 00000051 9716caac 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716cb60 to 0x9716cba8)
cb60: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716cc68
cb80: 97b89880 9716c000 80902564 9716fda4 00000051 9716cbb4 801190d0 801cfd74
cba0: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716cc68 to 0x9716ccb0)
cc60:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
cc80: e08032c4 9716cd70 97b89880 9716c000 80902564 9716fda4 00000051 9716ccbc
cca0: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716cd70 to 0x9716cdb8)
cd60:                                     e08032c4 00000000 001f0100 e0802dbc
cd80: 80c04c08 00000005 e08032c4 9716ce78 97b89880 9716c000 80902564 9716fda4
cda0: 00000051 9716cdc4 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716ce78 to 0x9716cec0)
ce60:                                                       e08032c4 00000000
ce80: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716cf80 97b89880 9716c000
cea0: 80902564 9716fda4 00000051 9716cecc 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716cf80 to 0x9716cfc8)
cf80: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716d088
cfa0: 97b89880 9716c000 80902564 9716fda4 00000051 9716cfd4 801190d0 801cfd74
cfc0: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d088 to 0x9716d0d0)
d080:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
d0a0: e08032c4 9716d190 97b89880 9716c000 80902564 9716fda4 00000051 9716d0dc
d0c0: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d190 to 0x9716d1d8)
d180:                                     e08032c4 00000000 001f0100 e0802dbc
d1a0: 80c04c08 00000005 e08032c4 9716d298 97b89880 9716c000 80902564 9716fda4
d1c0: 00000051 9716d1e4 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d298 to 0x9716d2e0)
d280:                                                       e08032c4 00000000
d2a0: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716d3a0 97b89880 9716c000
d2c0: 80902564 9716fda4 00000051 9716d2ec 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d3a0 to 0x9716d3e8)
d3a0: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716d4a8
d3c0: 97b89880 9716c000 80902564 9716fda4 00000051 9716d3f4 801190d0 801cfd74
d3e0: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d4a8 to 0x9716d4f0)
d4a0:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
d4c0: e08032c4 9716d5b0 97b89880 9716c000 80902564 9716fda4 00000051 9716d4fc
d4e0: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d5b0 to 0x9716d5f8)
d5a0:                                     e08032c4 00000000 001f0100 e0802dbc
d5c0: 80c04c08 00000005 e08032c4 9716d6b8 97b89880 9716c000 80902564 9716fda4
d5e0: 00000051 9716d604 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d6b8 to 0x9716d700)
d6a0:                                                       e08032c4 00000000
d6c0: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716d7c0 97b89880 9716c000
d6e0: 80902564 9716fda4 00000051 9716d70c 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d7c0 to 0x9716d808)
d7c0: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716d8c8
d7e0: 97b89880 9716c000 80902564 9716fda4 00000051 9716d814 801190d0 801cfd74
d800: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d8c8 to 0x9716d910)
d8c0:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
d8e0: e08032c4 9716d9d0 97b89880 9716c000 80902564 9716fda4 00000051 9716d91c
d900: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716d9d0 to 0x9716da18)
d9c0:                                     e08032c4 00000000 001f0100 e0802dbc
d9e0: 80c04c08 00000005 e08032c4 9716dad8 97b89880 9716c000 80902564 9716fda4
da00: 00000051 9716da24 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716dad8 to 0x9716db20)
dac0:                                                       e08032c4 00000000
dae0: 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716dbe0 97b89880 9716c000
db00: 80902564 9716fda4 00000051 9716db2c 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716dbe0 to 0x9716dc28)
dbe0: e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005 e08032c4 9716dce8
dc00: 97b89880 9716c000 80902564 9716fda4 00000051 9716dc34 801190d0 801cfd74
dc20: 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716dce8 to 0x9716dd30)
dce0:                   e08032c4 00000000 001f0100 e0802dbc 80c04c08 00000005
dd00: e08032c4 9716ddf0 97b89880 9716c000 80902564 9716fda4 00000051 9716dd3c
dd20: 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716ddf0 to 0x9716de38)
dde0:                                     e08032c4 00000000 001f0100 e0802dbc
de00: 80c04c08 00000005 e08032c4 9716def8 97b89880 9716c000 80902564 9716fda4
de20: 00000051 9716de44 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
[<801190d0>] (do_DataAbort) from [<80101978>] (__dabt_svc+0x58/0x80)
Exception stack(0x9716def8 to 0x9716df40)
dee0:                                                       ade30000 00000000
df00: 001f0100 e0802dbc 80c04c08 00000005 ade30000 9716e000 97b89880 9716e000
df20: 80902564 9716fda4 00000051 9716df4c 801190d0 801cfd74 60080193 ffffffff
[<80101978>] (__dabt_svc) from [<801cfd74>] (__sanitizer_cov_trace_pc+0x28/0x54)
[<801cfd74>] (__sanitizer_cov_trace_pc) from [<801190d0>] (do_DataAbort+0x28/0xf8)
Code: e5931004 e1110002 149df004 e593300c (e5932508)
---[ end trace 8ff748f0857fa897 ]---

Thanks,
Takuo Koguchi

Takuo Koguchi

unread,

Apr 27, 2018, 9:02:42 PM4/27/18

to syzkaller

I would like to fix some typos in my previous post.

I used
gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)
Linux version 4.17.0-rc2 + Dmitry's patch and removing "KCOV_INSTRUMENT_fault.o := n" from arch/arm/mm/Makefile

Takuo

Mark Rutland

unread,

Apr 30, 2018, 1:32:07 AM4/30/18

to takuo.ko...@hitachi.com, syzkaller

Hi,

On Fri, Apr 27, 2018 at 04:39:28PM -0700, takuo.ko...@hitachi.com wrote:
> I reproduced "__dabt_svc -> do_DataAbort -> __sanitizer_cov_trace_pc ->
> __dabt_svc" issue I had reported to Dmitry at the last SIL2LinuxMP workshop.
>
> I am using,
> Linaro GCC 7.2-2017.1
> Linux version 4.17.0-rc2 + Dmitry's patch and removing
> "KCOV_INSTRUMENT_fault.o := n" from arch/arm/Makefile

Thanks for all this information!

I *think* that the underlying issue may be a corrupt thread_info::task field.

Analysis below, along with a hack to try to detect that case earlier.

> My __sanitizer_cov_trace_pc is identical to the one Mark posted,
> 801cfd4c <__sanitizer_cov_trace_pc>:
> 801cfd4c: e52de004 push {lr} ; (str lr, [sp, #-4]!)
> 801cfd50: e1a0300d mov r3, sp
> 801cfd54: e3c33d7f bic r3, r3, #8128 ; 0x1fc0
> 801cfd58: e3a02c01 mov r2, #256 ; 0x100
> 801cfd5c: e3c3303f bic r3, r3, #63 ; 0x3f
> 801cfd60: e340201f movt r2, #31
> 801cfd64: e5931004 ldr r1, [r3, #4]

At this point we have:

r3 is current_thread_info(), derived by masking the sp.
r2 is (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET), for in_task().
r1 is current_thread_info()->preempt-count, for in_task().

> 801cfd68: e1110002 tst r1, r2
> 801cfd6c: 149df004 popne {pc} ; (ldrne pc, [sp], #4)

Here we bail out if !in_task().

> 801cfd70: e593300c ldr r3, [r3, #12]

Here we load current_thread_info()->task (aka current) into r3...

> 801cfd74: e5932508 ldr r2, [r3, #1288] ; 0x508

... and here we load current->kcov_mode.

For the assembly above, that PC is:

> 801cfd74: e5932508 ldr r2, [r3, #1288] ; 0x508

... which *should* be fine, unless current_thread_info()->task has been corrupted.

> LR is at fsr_fs arch/arm/mm/fault.h:26 [inline]
> LR is at do_DataAbort+0x28/0xf8 arch/arm/mm/fault.c:550
> pc : [<801cfd74>] lr : [<801190d0>] psr: 60080193
> sp : 9716c05c ip : 00000051 fp : 9716fda4
> r10: 80902564 r9 : 9716c000 r8 : 97b89880
> r7 : 9716c110 r6 : e08032c4 r5 : 00000005 r4 : 80c04c08
> r3 : e0802dbc r2 : 001f0100 r1 : 00000000 r0 : 7f000000

I'd expect task_struct to have stricter alignment than 4 bytes, since it has
some u64 members, so r3 doesn't look sufficiently aligned.

> Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> Control: 10c5387d Table: 9ef3806a DAC: 00000051
> Process mdev (pid: 1074, stack limit = 0x63b628ec)

Here we manage to access task information, and dump the stack, which means that
somehow the recursion terminated naturally, as we don't have stack overflow
detection on arm at the moment.

I bet that we've overflowed the original task's stack, and fallen into
*another* task's stack which happened to be adjacent, and the task information
above is not for the original task.

I think that the underlying issue is that we've somehow corrupted the original
task's thread_info::task field, or maybe the stack pointer, and hence the
thread_info.

We could check that with something like the below (untested).

Thanks,
Mark.

---->8----
From 48c81ac48958b4e8496ffac9886600037cb381e3 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.r...@arm.com>
Date: Mon, 30 Apr 2018 06:28:04 +0100
Subject: [PATCH] HACK: sanity check thread_info / task relationship

Hopefully this doesn't generate horrendous code...

Signed-off-by: Mark Rutland <mark.r...@arm.com>
---
arch/arm/include/asm/thread_info.h | 8 +++++++-
arch/arm/kernel/process.c | 5 +++++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index e71cc35de163..17075392782d 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -80,6 +80,8 @@ struct thread_info {
*/
register unsigned long current_stack_pointer asm ("sp");

+void __sanity_check_thread_info(struct thread_info *info);
+
/*
* how to get the thread information struct from C
*/
@@ -87,8 +89,12 @@ static inline struct thread_info *current_thread_info(void) __attribute_const__;

static inline struct thread_info *current_thread_info(void)
{
- return (struct thread_info *)
+ struct thread_info *info = (struct thread_info *)
(current_stack_pointer & ~(THREAD_SIZE - 1));
+
+ __sanity_check_thread_info(info);
+
+ return info;
}

#define thread_saved_pc(tsk) \
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1523cb18b109..8fe5249a8aa8 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -39,6 +39,11 @@
#include <asm/tls.h>
#include <asm/vdso.h>

+void __sanity_check_thread_info(struct thread_info *info)
+{
+ BUG_ON(info != task_thread_info(info->task));
+}
+
#ifdef CONFIG_CC_STACKPROTECTOR
#include <linux/stackprotector.h>
unsigned long __stack_chk_guard __read_mostly;
--
2.11.0

Dmitry Vyukov

unread,

Apr 30, 2018, 7:35:01 AM4/30/18

to Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

Kernel does not boot with this patch. No output from qemu at all.

> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mark Rutland

unread,

Apr 30, 2018, 8:53:23 AM4/30/18

to Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Mon, Apr 30, 2018 at 01:34:38PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 30, 2018 at 7:31 AM, Mark Rutland <mark.r...@arm.com> wrote:

> > I bet that we've overflowed the original task's stack, and fallen into
> > *another* task's stack which happened to be adjacent, and the task information
> > above is not for the original task.
> >
> > I think that the underlying issue is that we've somehow corrupted the original
> > task's thread_info::task field, or maybe the stack pointer, and hence the
> > thread_info.
> >
> > We could check that with something like the below (untested).
>
> Kernel does not boot with this patch. No output from qemu at all.

Sorry about that. As process.c is instrumented, it turns any kcov check
recursive -- we can move the sanity check into kcov.c to avoid that.

I don't have a filesystem to hand right now, but this at least produced
some dmesg output all the way to not mounting a rootfs.

Thanks,
Mark.

---->8----
From 005c8f3334498e96c0509e1ff2d0b94b9a76a27c Mon Sep 17 00:00:00 2001

From: Mark Rutland <mark.r...@arm.com>
Date: Mon, 30 Apr 2018 06:28:04 +0100
Subject: [PATCH] HACK: sanity check thread_info / task relationship

Hopefully this doesn't generate horrendous code...

Signed-off-by: Mark Rutland <mark.r...@arm.com>
---
arch/arm/include/asm/thread_info.h | 8 +++++++-

kernel/kcov.c | 5 +++++

2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index e71cc35de163..17075392782d 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -80,6 +80,8 @@ struct thread_info {
*/
register unsigned long current_stack_pointer asm ("sp");

+void __sanity_check_thread_info(struct thread_info *info);
+
/*
* how to get the thread information struct from C
*/
@@ -87,8 +89,12 @@ static inline struct thread_info *current_thread_info(void) __attribute_const__;

static inline struct thread_info *current_thread_info(void)
{
- return (struct thread_info *)
+ struct thread_info *info = (struct thread_info *)
(current_stack_pointer & ~(THREAD_SIZE - 1));
+
+ __sanity_check_thread_info(info);
+
+ return info;
}

#define thread_saved_pc(tsk) \

diff --git a/kernel/kcov.c b/kernel/kcov.c
index 2c16f1ab5e10..7807d3f39ebd 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -25,6 +25,11 @@
/* Number of 64-bit words written per one comparison: */
#define KCOV_WORDS_PER_CMP 4

+void __sanity_check_thread_info(struct thread_info *info)
+{
+ BUG_ON(info != task_thread_info(info->task));
+}
+

/*
* kcov descriptor (one per opened debugfs file).
* State transitions of the descriptor:
--
2.11.0

Dmitry Vyukov

unread,

Apr 30, 2018, 9:35:14 AM4/30/18

to Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Mon, Apr 30, 2018 at 2:53 PM, Mark Rutland <mark.r...@arm.com> wrote:
> On Mon, Apr 30, 2018 at 01:34:38PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 7:31 AM, Mark Rutland <mark.r...@arm.com> wrote:
>
>> > I bet that we've overflowed the original task's stack, and fallen into
>> > *another* task's stack which happened to be adjacent, and the task information
>> > above is not for the original task.
>> >
>> > I think that the underlying issue is that we've somehow corrupted the original
>> > task's thread_info::task field, or maybe the stack pointer, and hence the
>> > thread_info.
>> >
>> > We could check that with something like the below (untested).
>>
>> Kernel does not boot with this patch. No output from qemu at all.
>
> Sorry about that. As process.c is instrumented, it turns any kcov check
> recursive -- we can move the sanity check into kcov.c to avoid that.
>
> I don't have a filesystem to hand right now, but this at least produced
> some dmesg output all the way to not mounting a rootfs.

This patch causes EFAULT in init. Happens even if I make
__sanity_check_thread_info empty (comment out BUG_ON):

$ qemu-system-arm -m 512 -smp 2 -net nic -net
user,host=10.0.2.10,hostfwd=tcp::10022-:22 -display none -serial stdio
-machine vexpress-a15 -dtb arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dtb
-sd /buildroot/output/images/rootfs.ext2 -snapshot -kernel
arch/arm/boot/zImage -append "earlyprintk=serial console=ttyAMA0
root=/dev/mmcblk0"
pulseaudio: set_sink_input_volume() failed
pulseaudio: Reason: Invalid argument
pulseaudio: set_sink_input_mute() failed
pulseaudio: Reason: Invalid argument
** 416 printk messages dropped **
cpuidle: using governor ladder
hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 8 bytes.
Serial: AMBA PL011 UART driver
OF: amba_device_add() failed (-19) for /memory-controller@2b0a0000
OF: amba_device_add() failed (-19) for /memory-controller@7ffd0000
OF: amba_device_add() failed (-19) for /dma@7ffb0000
1c090000.uart: ttyAMA0 at MMIO 0x1c090000 (irq = 43, base_baud = 0) is
a PL011 rev1
console [ttyAMA0] enabled
1c0a0000.uart: ttyAMA1 at MMIO 0x1c0a0000 (irq = 44, base_baud = 0) is
a PL011 rev1
1c0b0000.uart: ttyAMA2 at MMIO 0x1c0b0000 (irq = 45, base_baud = 0) is
a PL011 rev1
1c0c0000.uart: ttyAMA3 at MMIO 0x1c0c0000 (irq = 46, base_baud = 0) is
a PL011 rev1
OF: amba_device_add() failed (-19) for
/smb@8000000/motherboard/iofpga@3,00000000/wdt@f0000
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Advanced Linux Sound Architecture Driver Initialized.
clocksource: Switched to clocksource arch_sys_counter
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 256 (order: 1, 10240 bytes)
TCP established hash table entries: 4096 (order: 2, 16384 bytes)
TCP bind hash table entries: 4096 (order: 5, 147456 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
UDP hash table entries: 256 (order: 2, 20480 bytes)
UDP-Lite hash table entries: 256 (order: 2, 20480 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
hw perfevents: no interrupt-affinity property for /pmu, guessing.
hw perfevents: enabled with armv7_cortex_a15 PMU driver, 1 counters available
workingset: timestamp_bits=30 max_order=17 bucket_order=0
squashfs: version 4.0 (2009/01/31) Phillip Lougher
jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
9p: Installing v9fs 9p2000 file system support
io scheduler noop registered (default)
io scheduler mq-deadline registered
io scheduler kyber registered
clcd-pl11x 1c1f0000.clcd: PL111 designer 41 rev2 at 0x1c1f0000
clcd-pl11x 1c1f0000.clcd: clcd@1f0000 hardware, 640x480@59 display
Console: switching to colour frame buffer device 80x30
8000000.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer
ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
8000000.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer
ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
Concatenating MTD devices:
(0): "8000000.flash"
(1): "8000000.flash"
into device "8000000.flash"
libphy: Fixed MDIO Bus: probed
libphy: smsc911x-mdio: probed
smsc911x 1a000000.ethernet eth0: MAC Address: 52:54:00:12:34:56
isp1760 1b000000.usb: bus width: 32, oc: digital
isp1760 1b000000.usb: NXP ISP1760 USB Host Controller
isp1760 1b000000.usb: new USB bus registered, assigned bus number 1
isp1760 1b000000.usb: Scratch test failed.
isp1760 1b000000.usb: can't setup: -19
isp1760 1b000000.usb: USB bus 1 deregistered
usbcore: registered new interface driver usb-storage
rtc-pl031 1c170000.rtc: rtc core: registered pl031 as rtc0
mmci-pl18x 1c050000.mmci: Got CD GPIO
mmci-pl18x 1c050000.mmci: Got WP GPIO
mmci-pl18x 1c050000.mmci: mmc0: PL181 manf 41 rev0 at 0x1c050000 irq 39,40 (pio)
input: AT Raw Set 2 keyboard as
/devices/platform/smb@8000000/smb@8000000:motherboard/smb@8000000:motherboard:iofpga@3,00000000/1c060000.kmi/serio0/input/input0
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
mmc0: new SD card at address 4567
mmcblk0: mmc0:4567 QEMU! 1.00 GiB
aaci-pl041 1c040000.aaci: ARM AC'97 Interface PL041 rev0 at 0x1c040000, irq 38
aaci-pl041 1c040000.aaci: FIFO 512 entries
oprofile: using timer interrupt.
NET: Registered protocol family 17
9pnet: Installing 9P2000 support
Registering SWP/SWPB emulation handler
rtc-pl031 1c170000.rtc: setting system clock to 2018-04-30 13:30:25
UTC (1525095025)
ALSA device list:
#0: ARM AC'97 Interface PL041 rev0 at 0x1c040000, irq 38
input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/smb@8000000/smb@8000000:motherboard/smb@8000000:motherboard:iofpga@3,00000000/1c070000.kmi/serio1/input/input2
random: fast init done
EXT4-fs (mmcblk0): mounted filesystem without journal. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 179:0.
devtmpfs: mounted
Freeing unused kernel memory: 1024K
Starting init: /sbin/init exists but couldn't execute it (error -14)
Starting init: /bin/sh exists but couldn't execute it (error -14)
Kernel panic - not syncing: No working init found. Try passing init=
option to kernel. See Linux Documentation/admin-guide/init.rst for
guidance.
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc2+ #9
Hardware name: ARM-Versatile Express
[<80113c00>] (unwind_backtrace) from [<8010f804>] (show_stack+0x18/0x1c)
[<8010f804>] (show_stack) from [<807fea70>] (dump_stack+0xd4/0x118)
[<807fea70>] (dump_stack) from [<8012725c>] (panic+0x124/0x2f8)
[<8012725c>] (panic) from [<808181ac>] (kernel_init+0x158/0x16c)
[<808181ac>] (kernel_init) from [<801010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0x9f451fb0 to 0x9f451ff8)
1fa0: 00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc2+ #9
Hardware name: ARM-Versatile Express
[<80113c00>] (unwind_backtrace) from [<8010f804>] (show_stack+0x18/0x1c)
[<8010f804>] (show_stack) from [<807fea70>] (dump_stack+0xd4/0x118)
[<807fea70>] (dump_stack) from [<801123cc>] (handle_IPI+0x1b8/0x1c8)
[<801123cc>] (handle_IPI) from [<804a9e68>] (gic_handle_irq+0xbc/0xc0)
[<804a9e68>] (gic_handle_irq) from [<801019f0>] (__irq_svc+0x70/0x98)
Exception stack(0x9f477f60 to 0x9f477fa8)
7f60: 9f476000 1f096000 00000000 9f478cc0 00000000 ffffe000 80c0842c 00000002
7f80: 80c0846c 80b4fc10 9f476000 00000000 00000000 9f477fb0 8017d7fc 8010aa18
7fa0: 60000013 ffffffff
[<801019f0>] (__irq_svc) from [<8010aa18>] (arch_cpu_idle+0x30/0x4c)
[<8010aa18>] (arch_cpu_idle) from [<8016363c>] (do_idle+0x164/0x268)
[<8016363c>] (do_idle) from [<80163b98>] (cpu_startup_entry+0x18/0x20)
[<80163b98>] (cpu_startup_entry) from [<801025ec>] (__enable_mmu+0x0/0x14)
---[ end Kernel panic - not syncing: No working init found. Try
passing init= option to kernel. See Linux
Documentation/admin-guide/init.rst for guidance. ]---

Mark Rutland

unread,

May 3, 2018, 8:22:27 AM5/3/18

to Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Mon, Apr 30, 2018 at 03:34:52PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 30, 2018 at 2:53 PM, Mark Rutland <mark.r...@arm.com> wrote:
> > On Mon, Apr 30, 2018 at 01:34:38PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 30, 2018 at 7:31 AM, Mark Rutland <mark.r...@arm.com> wrote:
> >
> >> > I bet that we've overflowed the original task's stack, and fallen into
> >> > *another* task's stack which happened to be adjacent, and the task information
> >> > above is not for the original task.
> >> >
> >> > I think that the underlying issue is that we've somehow corrupted the original
> >> > task's thread_info::task field, or maybe the stack pointer, and hence the
> >> > thread_info.
> >> >
> >> > We could check that with something like the below (untested).
> >>
> >> Kernel does not boot with this patch. No output from qemu at all.
> >
> > Sorry about that. As process.c is instrumented, it turns any kcov check
> > recursive -- we can move the sanity check into kcov.c to avoid that.
> >
> > I don't have a filesystem to hand right now, but this at least produced
> > some dmesg output all the way to not mounting a rootfs.
>
> This patch causes EFAULT in init. Happens even if I make
> __sanity_check_thread_info empty (comment out BUG_ON):

Huh. I cannot explain that, but I can reproduce it locally.

I've gone digging here, and the underlying issue is a subtle interaction
with the way the vmalloc area works on arm, along with some
(instrumented) code that happens to get called during context-switch.

The problem is that we vmalloc the kcov_area, and this is lazily faulted
in for each mm. Because the fault handling code is instrumented, this
means we end up recursing through __sanitizer_cov_trace_pc() and the
fault handling code until we overflow the stack.

I don't think that it's safe to only prevent instrumentation of
arch/arm/mm/fault.c, even if we seem to get away with that today.

AFAICT, the same is true for x86-64, and their fault handling code is
instrumented -- I'm not sure how they get away with this. If we can
figure that out, then we might be able to adopt the same approach for
arm.

Thanks,
Mark.

Dmitry Vyukov

unread,

May 3, 2018, 8:41:11 AM5/3/18

to Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller, Andrey Ryabinin

Humm... I don't know why it works on x86. Maybe +Andrey can shed some
light on it.

The problem is with arm port of KCOV. Mark says that KCOV lazily
pages in output area, so __sanitizer_cov_trace_pc can cause a fault,
fault handler is instrumented with KCOV, so it again gets into
__sanitizer_cov_trace_pc and causes fault again, and this repeats
infinitely.

I see that x86 do_page_fault in arch/x86/mm/fault.c _is_ instrumented.
Is it possible that in_task() returns true for x86 do_page_fault?
Or maybe the pages are somehow prefaulted on x86?

Andrey Ryabinin

unread,

May 3, 2018, 12:10:28 PM5/3/18

to Dmitry Vyukov, Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On x86-64 most of the time the page tables of kernel part in process are in sync with the swapper_pg_dir.
Kernel part of the top level page table is simply copied into new process (pgd_ctor()->clone_pgd_range()),
so lower levels page tables are shared among all processes. Thus, all process immediately see
the new vmalloc mapping, because most likely it will be added into already existing pgd.

Page table may become out of sync if new in-kernel pgd table added. Existing processes will not see the new pgd, since
clone_pgd_range() already happened, but vmalloc_fault() should fix them up.
Obviously, this is super rare since PGDDIR_SIZE is 512GB on 4-level and don't-know-how-many-TB on 5-level page tables.

I expect the patch bellow will make kcov crash on x86-64 (but I didn't bother to try it yet).
It forces kcov area to be in the middle of VMALLOC, which is probably empty. So pgd entry should be empty too.

---
kernel/kcov.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index 2c16f1ab5e10..7684c49b0a21 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -275,7 +275,9 @@ static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
unsigned long size, off;
struct page *page;

- area = vmalloc_user(vma->vm_end - vma->vm_start);
+ area = __vmalloc_node_range(vma->vm_end - vma->vm_start, 1, VMALLOC_START+
+ ((VMALLOC_END - VMALLOC_START)/2)+1, VMALLOC_END,
+ GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0));
if (!area)
return -ENOMEM;

--
2.16.1

Dmitry Vyukov

unread,

May 4, 2018, 1:33:51 AM5/4/18

to Andrey Ryabinin, Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

Thanks!

Should we mlock these pages then?

Andrey Ryabinin

unread,

May 4, 2018, 4:51:02 AM5/4/18

to Dmitry Vyukov, Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

Just do prefault right after vmalloc_user(). This will work as long as kcov area allocated in the same process
that accesses the area in __sanitizer_cov_trace_pc().

If someday we'll want to allocate kcov area from one process and access it from another, than we'll need
either recursion protection, or call vmalloc_sync_all() after the allocation.
AFAICS arm choose not to implement vmalloc_sync_all(), so it needs to be implemented.

Dmitry Vyukov

unread,

May 4, 2018, 5:09:26 AM5/4/18

to Andrey Ryabinin, Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

Ah, this memory can't be swapped out, so we don't need to lock it.
Do we just do memset, or there is a more fancy way to pre-fault a region?

Andrey Ryabinin

unread,

May 4, 2018, 5:31:12 AM5/4/18

to Dmitry Vyukov, Mark Rutland, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

READ_ONCE();

But memset() might be not that bad too. vmalloc_user() can be replace with vmalloc() without __GFP_ZERO and memset() afterwards.
That will do prefault and initialization in one go.

Mark Rutland

unread,

May 4, 2018, 5:50:32 AM5/4/18

to Andrey Ryabinin, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

I'd hacked this up locally, but it is insufficient on arm because
we don't switch task and mm atomically, and some code executed during
this window (while a different PGD is installed in the HW) is
instrumented.

> If someday we'll want to allocate kcov area from one process and
> access it from another, than we'll need either recursion protection,
> or call vmalloc_sync_all() after the allocation. AFAICS arm choose
> not to implement vmalloc_sync_all(), so it needs to be implemented.

This could solve the problem.

Thanks,
Mark.

Dmitry Vyukov

unread,

May 4, 2018, 6:00:57 AM5/4/18

to Mark Rutland, Andrey Ryabinin, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Fri, May 4, 2018 at 11:50 AM, Mark Rutland <mark.r...@arm.com> wrote:
> On Fri, May 04, 2018 at 11:52:05AM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 05/04/2018 08:33 AM, Dmitry Vyukov wrote:
>> > On Thu, May 3, 2018 at 6:11 PM, Andrey Ryabinin <arya...@virtuozzo.com> wrote:
>> >> On x86-64 most of the time the page tables of kernel part in process are in sync with the swapper_pg_dir.
>> >> Kernel part of the top level page table is simply copied into new process (pgd_ctor()->clone_pgd_range()),
>> >> so lower levels page tables are shared among all processes. Thus, all process immediately see
>> >> the new vmalloc mapping, because most likely it will be added into already existing pgd.
>> >>
>> >> Page table may become out of sync if new in-kernel pgd table added. Existing processes will not see the new pgd, since
>> >> clone_pgd_range() already happened, but vmalloc_fault() should fix them up.
>> >> Obviously, this is super rare since PGDDIR_SIZE is 512GB on 4-level and don't-know-how-many-TB on 5-level page tables.
>> >
>> >
>> > Thanks!
>> >
>> > Should we mlock these pages then?
>>
>> Just do prefault right after vmalloc_user(). This will work as long as
>> kcov area allocated in the same process that accesses the area in
>> __sanitizer_cov_trace_pc().
>
> I'd hacked this up locally, but it is insufficient on arm because
> we don't switch task and mm atomically, and some code executed during
> this window (while a different PGD is installed in the HW) is
> instrumented.

If memset does not fix arm (which is our problem at hand), Mark, what
do you suggest?

Andrey Ryabinin

unread,

May 4, 2018, 6:11:13 AM5/4/18

to Mark Rutland, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On 05/04/2018 12:50 PM, Mark Rutland wrote:
> On Fri, May 04, 2018 at 11:52:05AM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 05/04/2018 08:33 AM, Dmitry Vyukov wrote:
>>> On Thu, May 3, 2018 at 6:11 PM, Andrey Ryabinin <arya...@virtuozzo.com> wrote:
>>>> On x86-64 most of the time the page tables of kernel part in process are in sync with the swapper_pg_dir.
>>>> Kernel part of the top level page table is simply copied into new process (pgd_ctor()->clone_pgd_range()),
>>>> so lower levels page tables are shared among all processes. Thus, all process immediately see
>>>> the new vmalloc mapping, because most likely it will be added into already existing pgd.
>>>>
>>>> Page table may become out of sync if new in-kernel pgd table added. Existing processes will not see the new pgd, since
>>>> clone_pgd_range() already happened, but vmalloc_fault() should fix them up.
>>>> Obviously, this is super rare since PGDDIR_SIZE is 512GB on 4-level and don't-know-how-many-TB on 5-level page tables.
>>>
>>>
>>> Thanks!
>>>
>>> Should we mlock these pages then?
>>
>> Just do prefault right after vmalloc_user(). This will work as long as
>> kcov area allocated in the same process that accesses the area in
>> __sanitizer_cov_trace_pc().
>
> I'd hacked this up locally, but it is insufficient on arm because
> we don't switch task and mm atomically, and some code executed during
> this window (while a different PGD is installed in the HW) is
> instrumented.
>

I don't understand this.

Are you saying that after this:
a = vmalloc(PAGE_SIZE);
memset(a, 0, PAGE_SIZE);

something is still not fully setup in arm case. So access to 'a' still might cause
faults. This would mean that every single memory access in memset() also cause faults.

Mark Rutland

unread,

May 4, 2018, 7:41:00 AM5/4/18

to Andrey Ryabinin, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

After the above, the mapping is fully setup in the current task's page
tables. Practically everything in task context can safely access the
kcov area after this.

However, there is a small window during context-switch when the
current/prev task runs with the next task's page tables. During this
window, accesses to the kcov area will fault.

Because the scheduler code isn't instrumented, on x86 nothing happens to
access the kcov area during this window. However, on arm we call
instrumented code during this window:

context_switch(prev, next)
-> switch_mm_irqs_off(oldmm, mm, next);
// prev now running with next's page tables, kcov area not mapped
-> switch_to(prev, next, prev);
---> __switch_to(prev,task_thread_info(prev), task_thread_info(next));
-----> atomic_notifier_call_chain(thread_notify_head, THREAD_NOTIFY_SWITCH);
-------> <various notifiers here>

... this leads to recursive faults, overflowing the stack, etc.

My best idea so far is to (somehow) have __sanitizer_cov_trace_pc detect
if we're in the middle of a context-switch, and return early if so.

I we can do that (along with pre-faulting), I think we'd be fine.

Thanks,
Mark.

Andrey Ryabinin

unread,

May 4, 2018, 8:41:24 AM5/4/18

to Mark Rutland, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

x86 probably access kcov area too. But on x86 kcov area must be in new pgd
to cause the fault, so we've just never seen this.

> context_switch(prev, next)
> -> switch_mm_irqs_off(oldmm, mm, next);
> // prev now running with next's page tables, kcov area not mapped
> -> switch_to(prev, next, prev);
> ---> __switch_to(prev,task_thread_info(prev), task_thread_info(next));
> -----> atomic_notifier_call_chain(thread_notify_head, THREAD_NOTIFY_SWITCH);
> -------> <various notifiers here>
>
> ... this leads to recursive faults, overflowing the stack, etc.

Makes sense. So we actually have two problems:

1) Page tables out of sync problem, which should be solved by prefaulting.
2) switch_mm() before switch_to().

> My best idea so far is to (somehow) have __sanitizer_cov_trace_pc detect
> if we're in the middle of a context-switch, and return early if so.
>

From the top of my head - use preempt notifiers to change kcov_mode.
sched_out() changes kcov_mode, e.g. sets top bit which will make check_kcov_mode() return false
and sched_in() clears that bit back.

Mark Rutland

unread,

May 4, 2018, 9:24:30 AM5/4/18

to Andrey Ryabinin, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Fri, May 04, 2018 at 03:42:27PM +0300, Andrey Ryabinin wrote:
> On 05/04/2018 02:40 PM, Mark Rutland wrote:
> > However, there is a small window during context-switch when the
> > current/prev task runs with the next task's page tables. During this
> > window, accesses to the kcov area will fault.
> >
> > Because the scheduler code isn't instrumented, on x86 nothing happens to
> > access the kcov area during this window. However, on arm we call
> > instrumented code during this window:
>
> x86 probably access kcov area too. But on x86 kcov area must be in new pgd
> to cause the fault, so we've just never seen this.

Sounds plausible.

> > context_switch(prev, next)
> > -> switch_mm_irqs_off(oldmm, mm, next);
> > // prev now running with next's page tables, kcov area not mapped
> > -> switch_to(prev, next, prev);
> > ---> __switch_to(prev,task_thread_info(prev), task_thread_info(next));
> > -----> atomic_notifier_call_chain(thread_notify_head, THREAD_NOTIFY_SWITCH);
> > -------> <various notifiers here>
> >
> > ... this leads to recursive faults, overflowing the stack, etc.
>
>
> Makes sense. So we actually have two problems:
>
> 1) Page tables out of sync problem, which should be solved by prefaulting.
> 2) switch_mm() before switch_to().

Yup.

While digging into this, I also spotted a potential race against against
KCOV_DISABLE on CONFIG_PREEMPT kernels, but so far that doesn't seem to
be adversely affecting us on arm64.

> > My best idea so far is to (somehow) have __sanitizer_cov_trace_pc detect
> > if we're in the middle of a context-switch, and return early if so.
> >
>
> From the top of my head - use preempt notifiers to change kcov_mode.
> sched_out() changes kcov_mode, e.g. sets top bit which will make check_kcov_mode() return false
> and sched_in() clears that bit back.

Yup.

I've implemented kcov_{prepare,finish}_switch() hooks called directly in
the scheduler (where they can be inlined). From a quick IRC chat the
scheduler folk (at least PeterZ) didn't seem violently opposed to the
idea.

That all appears to be working locally -- I will send patches shortly.

Thanks,
Mark.

Mark Rutland

unread,

May 4, 2018, 9:58:25 AM5/4/18

to Andrey Ryabinin, Dmitry Vyukov, 小口琢夫 / KOGUCHI，TAKUO, syzkaller

On Fri, May 04, 2018 at 02:24:25PM +0100, Mark Rutland wrote:
> On Fri, May 04, 2018 at 03:42:27PM +0300, Andrey Ryabinin wrote:

> > Makes sense. So we actually have two problems:
> >
> > 1) Page tables out of sync problem, which should be solved by prefaulting.
> > 2) switch_mm() before switch_to().

> While digging into this, I also spotted a potential race against against
> KCOV_DISABLE on CONFIG_PREEMPT kernels, but so far that doesn't seem to
> be adversely affecting us on arm64.
>

> I've implemented kcov_{prepare,finish}_switch() hooks called directly in
> the scheduler (where they can be inlined). From a quick IRC chat the
> scheduler folk (at least PeterZ) didn't seem violently opposed to the
> idea.
>
> That all appears to be working locally -- I will send patches shortly.

I've posted that to LKML:

https://lkml.kernel.org/r/20180504135535.53...@arm.com

... which works for me in lcoal testing (coverage increasing, other
bugs being triggered by syzkaller).

Thanks,
Mark.

Russell King - ARM Linux

unread,

May 8, 2018, 6:30:33 AM5/8/18

to Dmitry Vyukov, mark.r...@arm.com, liuwe...@huawei.com, catalin...@arm.com, takuo.ko...@hitachi.com, at...@google.com, linux-ar...@lists.infradead.org, syzk...@googlegroups.com

On Thu, Apr 26, 2018 at 03:08:46PM +0200, Dmitry Vyukov wrote:
> KCOV is code coverage collection facility used, in particular, by syzkaller
> system call fuzzer. There is some interest in using syzkaller on arm devices.
> So port KCOV to arm.
>
> On implementation level this merely declares that KCOV is supported and
> disables instrumentation of 3 special cases. Reasons for disabling are
> commented in code.
>
> Tested with qemu-system-arm/vexpress-a15.
>
> Signed-off-by: Dmitry Vyukov <dvy...@google.com>
> Cc: Russell King <li...@armlinux.org.uk>
> Cc: Mark Rutland <mark.r...@arm.com>
> Cc: Abbott Liu <liuwe...@huawei.com>
> Cc: Catalin Marinas <catalin...@arm.com>
> Cc: Koguchi Takuo <takuo.ko...@hitachi.com>
> Cc: Atul Prakash <at...@google.com>
> Cc: li...@armlinux.org.uk
> Cc: linux-ar...@lists.infradead.org
> Cc: syzk...@googlegroups.com
> ---
> arch/arm/Kconfig | 1 +
> arch/arm/boot/compressed/Makefile | 3 +++
> arch/arm/mm/Makefile | 4 ++++
> arch/arm/vdso/Makefile | 3 +++
> 4 files changed, 11 insertions(+)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index a7f8e7f4b88f..60558a6bb744 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -105,6 +105,7 @@ config ARM
> select REFCOUNT_FULL
> select RTC_LIB
> select SYS_SUPPORTS_APM_EMULATION
> + select ARCH_HAS_KCOV
> # Above selects are sorted alphabetically; please add new ones
> # according to that. Thanks.

Please read this comment and rework your patch, thanks.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Dmitry Vyukov

unread,

May 11, 2018, 10:32:54 AM5/11/18

to li...@armlinux.org.uk, mark.r...@arm.com, liuwe...@huawei.com, catalin...@arm.com, inux-ar...@lists.infradead.org, linu...@kvack.org, Dmitry Vyukov, Koguchi Takuo, linux-ar...@lists.infradead.org, syzk...@googlegroups.com

KCOV is code coverage collection facility used, in particular, by syzkaller
system call fuzzer. There is some interest in using syzkaller on arm devices.
So port KCOV to arm.

On implementation level this merely declares that KCOV is supported and
disables instrumentation of 3 special cases. Reasons for disabling are
commented in code.

Tested with qemu-system-arm/vexpress-a15.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: Mark Rutland <mark.r...@arm.com>
Cc: Abbott Liu <liuwe...@huawei.com>
Cc: Catalin Marinas <catalin...@arm.com>
Cc: Koguchi Takuo <takuo.ko...@hitachi.com>

Cc: linux-ar...@lists.infradead.org
Cc: linu...@kvack.org
Cc: syzk...@googlegroups.com

---

Changes since v1:
- remove disable of instrumentation for arch/arm/mm/fault.c
- disable instrumentation of arch/arm/kvm/hyp/*
- resort ARCH_HAS_KCOV alphabetically
---
arch/arm/Kconfig | 3 ++-
arch/arm/boot/compressed/Makefile | 3 +++
arch/arm/kvm/hyp/Makefile | 8 ++++++++
arch/arm/vdso/Makefile | 3 +++
4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 3493f840e89c..34591796c36f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -8,9 +8,10 @@ config ARM
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
+ select ARCH_HAS_KCOV
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
- select ARCH_HAS_SET_MEMORY
select ARCH_HAS_PHYS_TO_DMA
+ select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 6a4e7341ecd3..5f5f081e4879 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -25,6 +25,9 @@ endif

GCOV_PROFILE := n

+# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
+KCOV_INSTRUMENT := n
+
#
# Architecture dependencies
#
diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 7fc0638f263a..d2b5ec9c4b92 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -23,3 +23,11 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
obj-$(CONFIG_KVM_ARM_HOST) += switch.o
CFLAGS_switch.o += $(CFLAGS_ARMV7VE)
obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o
+
+# KVM code is run at a different exception code with a different map, so
+# compiler instrumentation that inserts callbacks or checks into the code may
+# cause crashes. Just disable it.
+GCOV_PROFILE := n
+KASAN_SANITIZE := n
+UBSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index bb4118213fee..f4efff9d3afb 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -30,6 +30,9 @@ CFLAGS_vgettimeofday.o = -O2
# Disable gcov profiling for VDSO code
GCOV_PROFILE := n

+# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
+KCOV_INSTRUMENT := n
+
# Force dependency
$(obj)/vdso.o : $(obj)/vdso.so

--
2.17.0.441.gb46fe60e1d-goog

Dmitry Vyukov

unread,

May 11, 2018, 10:36:35 AM5/11/18

to Russell King - ARM Linux, Mark Rutland, Abbott Liu, Catalin Marinas, Linux-MM, Andrew Morton, Dmitry Vyukov, Koguchi Takuo, Linux ARM, syzkaller

On Fri, May 11, 2018 at 4:32 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> KCOV is code coverage collection facility used, in particular, by syzkaller
> system call fuzzer. There is some interest in using syzkaller on arm devices.
> So port KCOV to arm.
>
> On implementation level this merely declares that KCOV is supported and
> disables instrumentation of 3 special cases. Reasons for disabling are
> commented in code.
>
> Tested with qemu-system-arm/vexpress-a15.
>
> Signed-off-by: Dmitry Vyukov <dvy...@google.com>
> Cc: Russell King <li...@armlinux.org.uk>
> Cc: Mark Rutland <mark.r...@arm.com>
> Cc: Abbott Liu <liuwe...@huawei.com>
> Cc: Catalin Marinas <catalin...@arm.com>
> Cc: Koguchi Takuo <takuo.ko...@hitachi.com>
> Cc: linux-ar...@lists.infradead.org
> Cc: linu...@kvack.org
> Cc: syzk...@googlegroups.com
>
> ---
>
> Changes since v1:
> - remove disable of instrumentation for arch/arm/mm/fault.c
> - disable instrumentation of arch/arm/kvm/hyp/*
> - resort ARCH_HAS_KCOV alphabetically

Andrew, this is for MM tree because this depends on the following
patches in MM tree:

kcov: prefault the kcov_area
kcov: ensure irq code sees a valid area
sched/core / kcov: avoid kcov_area during task switch

Mark Rutland

unread,

May 11, 2018, 10:37:36 AM5/11/18

to Dmitry Vyukov, li...@armlinux.org.uk, liuwe...@huawei.com, catalin...@arm.com, inux-ar...@lists.infradead.org, linu...@kvack.org, Koguchi Takuo, linux-ar...@lists.infradead.org, syzk...@googlegroups.com

It might be worth mentioning in the commit message that this also cleans
up an existing unordered entry in the arm Kconfig.

Otherwise, this looks good to me, assumign it goes in after my kcov core
fixups. FWIW:

Acked-by: Mark Rutland <mark.r...@arm.com>

Thanks,
Mark.

Dmitry Vyukov

unread,

May 11, 2018, 10:37:48 AM5/11/18

to Russell King - ARM Linux, Mark Rutland, Abbott Liu, Catalin Marinas, 小口琢夫 / KOGUCHI，TAKUO, Atul Prakash, Linux ARM, syzkaller

Now that Mark's fixes are in mm tree, I mailed v2 with the following changes:

Reply all

Reply to author

Forward