Clang miscompiling arm64 kernel with BTI and PAC?

95 views
Skip to first unread message

Will Deacon

unread,
Jun 15, 2020, 6:55:30 AM6/15/20
to ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, bro...@kernel.org
Hi Nick, [+android-kvm as FYI]

I just ran into a host panic when trying to spawn a KVM virtual machine
with 5.8-rc1 on arm64 (defconfig):

(I had to hack in code to dump the regs; I'll send a patch for that shortly)

[ 56.229757] Bad mode in Synchronous Abort handler detected on CPU0, code 0x34000003 -- BTI
[ 56.230439] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1-dirty #2
[ 56.230864] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 56.234182] pstate: 80000c05 (Nzcv daif -PAN -UAO BTYPE=j-)
[ 56.234646] pc : kvm_vm_ioctl_check_extension_generic+0x74/0x98
[ 56.235068] lr : kvm_dev_ioctl+0x94/0xbc
[ 56.237544] sp : ffff800010f4bdf0
[ 56.237797] x29: ffff800010f4bdf0 x28: ffff0000f9629c00
[ 56.238277] x27: 0000000000000000 x26: 0000000000000000
[ 56.238665] x25: 0000000000000000 x24: 0000000000000003
[ 56.241275] x23: 000000000000ae03 x22: 0000000000000046
[ 56.241708] x21: 00000000ffffffe7 x20: ffff0000f9621200
[ 56.242155] x19: ffff0000f9621200 x18: 0000000000000000
[ 56.242564] x17: 0000000000000000 x16: 0000000000000000
[ 56.242987] x15: 0000000000000000 x14: 0000000000000000
[ 56.245570] x13: 0000000000000000 x12: 0000000000000010
[ 56.245953] x11: ffffd68929392e14 x10: ffffd6892a17b879
[ 56.246420] x9 : 0000000000000043 x8 : 0000000000000000
[ 56.246787] x7 : 0000000000000000 x6 : 0000000000000000
[ 56.249737] x5 : 0000000000000000 x4 : 0000000000000000
[ 56.250236] x3 : 0000000000000046 x2 : 0000000000000046
[ 56.250644] x1 : 0000000000000046 x0 : 0000000000000001
[ 56.253312] Kernel panic - not syncing: bad mode
[ 56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1-dirty #2
[ 56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 56.254712] Call trace:
[ 56.254952] dump_backtrace+0x0/0x1d4
[ 56.255305] show_stack+0x1c/0x28
[ 56.255647] dump_stack+0xc4/0x128
[ 56.255905] panic+0x16c/0x35c
[ 56.256146] bad_el0_sync+0x0/0x58
[ 56.256403] el1_sync_handler+0xb4/0xe0
[ 56.256674] el1_sync+0x7c/0x100
[ 56.256928] kvm_vm_ioctl_check_extension_generic+0x74/0x98
[ 56.257286] __arm64_sys_ioctl+0x94/0xcc
[ 56.257569] el0_svc_common+0x9c/0x150
[ 56.257836] do_el0_svc+0x84/0x90
[ 56.258083] el0_sync_handler+0xf8/0x298
[ 56.258361] el0_sync+0x158/0x180
[ 56.258900] SMP: stopping secondary CPUs
[ 56.259594] Kernel Offset: 0x568919360000 from 0xffff800010000000
[ 56.259969] PHYS_OFFSET: 0xffffb50180000000
[ 56.260304] CPU features: 0x7e0152,20802028
[ 56.260599] Memory Limit: none
[ 56.261242] ---[ end Kernel panic - not syncing: bad mode ]---

Looking at the disassembly for kvm_vm_ioctl_check_extension_generic, it
looks like this is a compiler bug:

ffff800010032da0 <kvm_vm_ioctl_check_extension_generic>:
ffff800010032da0: aa0003e8 mov x8, x0
ffff800010032da4: f102843f cmp x1, #0xa1
ffff800010032da8: 52800020 mov w0, #0x1 // #1
ffff800010032dac: 5400018c b.gt ffff800010032ddc <kvm_vm_ioctl_check_extension_generic+0x3c>
ffff800010032db0: d1000c29 sub x9, x1, #0x3
ffff800010032db4: f101dd3f cmp x9, #0x77
ffff800010032db8: 540002e8 b.hi ffff800010032e14 <kvm_vm_ioctl_check_extension_generic+0x74> // b.pmore
ffff800010032dbc: b0006f4a adrp x10, ffff800010e1b000 <vdso32_end>
ffff800010032dc0: 9121e54a add x10, x10, #0x879
ffff800010032dc4: 1000008b adr x11, ffff800010032dd4 <kvm_vm_ioctl_check_extension_generic+0x34>
ffff800010032dc8: 3869694c ldrb w12, [x10, x9]
ffff800010032dcc: 8b0c096b add x11, x11, x12, lsl #2
ffff800010032dd0: d61f0160 br x11

Here, the switch statement has been replaced by a jump table which we *tail
call* into. The register dump shows we're going to 0xffffd68929392e14:

ffff800010032e14: d503233f paciasp
ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]!
ffff800010032e1c: 910003fd mov x29, sp
ffff800010032e20: aa0803e0 mov x0, x8
ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension>
ffff800010032e28: 93407c00 sxtw x0, w0
ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16
ffff800010032e30: d50323bf autiasp
ffff800010032e34: d65f03c0 ret

The problem is that the paciasp instruction is not BTYPE-compatible with BR;
it expects to be called with a branch-and-link, and so we panic. I think you
need to emit a 'bti j' here prior to the paciasp.

$ clang --version

Android (6443078 based on r383902) clang version 11.0.1 (https://android.googlesource.com/toolchain/llvm-project b397f81060ce6d701042b782172ed13bee898b79)

We currently support for this Clang 8+, but maybe we need to reconsider that
:(

Will

Mark Brown

unread,
Jun 15, 2020, 7:37:29 AM6/15/20
to Will Deacon, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com
On Mon, Jun 15, 2020 at 11:55:24AM +0100, Will Deacon wrote:

> We currently support for this Clang 8+, but maybe we need to reconsider that
> :(

Yes, looking a bit like that - this one is relatively rare but could
come up elsewhere so we can't just disable the functionality.
signature.asc

Mark Brown

unread,
Jun 15, 2020, 7:53:40 AM6/15/20
to Will Deacon, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com
On Mon, Jun 15, 2020 at 11:55:24AM +0100, Will Deacon wrote:

> Here, the switch statement has been replaced by a jump table which we *tail
> call* into. The register dump shows we're going to 0xffffd68929392e14:

> ffff800010032e14: d503233f paciasp
> ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]!
> ffff800010032e1c: 910003fd mov x29, sp
> ffff800010032e20: aa0803e0 mov x0, x8
> ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension>
> ffff800010032e28: 93407c00 sxtw x0, w0
> ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16
> ffff800010032e30: d50323bf autiasp
> ffff800010032e34: d65f03c0 ret

> The problem is that the paciasp instruction is not BTYPE-compatible with BR;
> it expects to be called with a branch-and-link, and so we panic. I think you
> need to emit a 'bti j' here prior to the paciasp.

I checked with our internal teams and they actually ran into this
recently with some other code, the patch:

https://reviews.llvm.org/D81746

([AArch64] Fix BTI instruction emission) should fix this, it's been
reviewed so should be merged shortly.
signature.asc

Will Deacon

unread,
Jun 15, 2020, 8:02:29 AM6/15/20
to Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com
Cheers, that's good to hear. Shall we have a guess at the clang release
that will get the fix, or just disable in-kernel BTI with clang for now?

Will

Nathan Chancellor

unread,
Jun 15, 2020, 10:31:09 AM6/15/20
to Will Deacon, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com, tste...@redhat.com
[+ Tom, the clang 10 release manager]
This will be in clang 11 for sure. Tom, would it be too late to get this
in to clang 10.0.1? If it is not, I can open a PR.

Cheers,
Nathan

Daniel Kiss

unread,
Jun 15, 2020, 10:59:02 AM6/15/20
to Nathan Chancellor, Will Deacon, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, Mark Rutland, Catalin Marinas, andro...@google.com, tste...@redhat.com
Sorry, I just saw this mail, I have opened a ticket for it already.
I hope it will make it into the 10.0.1. 

Thanks,
Daniel

Daniel Kiss

unread,
Jun 15, 2020, 11:00:43 AM6/15/20
to Nathan Chancellor, Will Deacon, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, Mark Rutland, Catalin Marinas, andro...@google.com, tste...@redhat.com
Sorry, I just saw this mail, I have opened a ticket for it already.
https://bugs.llvm.org/show_bug.cgi?id=46327
I hope it will make it into the 10.0.1.

Thanks,
Daniel

> On 15 Jun 2020, at 16:31, Nathan Chancellor <natecha...@gmail.com> wrote:
>

Will Deacon

unread,
Jun 16, 2020, 1:37:34 PM6/16/20
to Nathan Chancellor, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com, tste...@redhat.com
Any update on this, please? I'd like to get the kernel fixed this week.

Cheers,

Will

Nathan Chancellor

unread,
Jun 16, 2020, 1:49:07 PM6/16/20
to Will Deacon, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com, tste...@redhat.com
The AArch64 backend owner said it should be okay to add to 10.0.1:
https://llvm.org/pr46327

Tom just needs to pick it, I see no reason to believe that won't happen
this week.

Cheers,
Nathan

Will Deacon

unread,
Jun 16, 2020, 1:55:25 PM6/16/20
to Nathan Chancellor, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com, tste...@redhat.com
On Tue, Jun 16, 2020 at 10:49:04AM -0700, Nathan Chancellor wrote:
> On Tue, Jun 16, 2020 at 06:37:28PM +0100, Will Deacon wrote:
> > On Mon, Jun 15, 2020 at 07:31:05AM -0700, Nathan Chancellor wrote:
> > > On Mon, Jun 15, 2020 at 01:02:23PM +0100, Will Deacon wrote:
> > > > On Mon, Jun 15, 2020 at 12:53:37PM +0100, Mark Brown wrote:
> > > > > ([AArch64] Fix BTI instruction emission) should fix this, it's been
> > > > > reviewed so should be merged shortly.
> > > >
> > > > Cheers, that's good to hear. Shall we have a guess at the clang release
> > > > that will get the fix, or just disable in-kernel BTI with clang for now?
> > > >
> > >
> > > This will be in clang 11 for sure. Tom, would it be too late to get this
> > > in to clang 10.0.1? If it is not, I can open a PR.
> >
> > Any update on this, please? I'd like to get the kernel fixed this week.
> >
> The AArch64 backend owner said it should be okay to add to 10.0.1:
> https://llvm.org/pr46327
>
> Tom just needs to pick it, I see no reason to believe that won't happen
> this week.

Brill, then I'll tentatively queue the diff below...

Thanks,

Will

--->8

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 31380da53689..4ae2419c14a8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1630,6 +1630,8 @@ config ARM64_BTI_KERNEL
depends on CC_HAS_BRANCH_PROT_PAC_RET_BTI
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697
depends on !CC_IS_GCC || GCC_VERSION >= 100100
+ # https://reviews.llvm.org/rGb8ae3fdfa579dbf366b1bb1cbfdbf8c51db7fa55
+ depends on !CC_IS_CLANG || CLANG_VERSION >= 100001
depends on !(CC_IS_CLANG && GCOV_KERNEL)
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
help

Tom Stellard

unread,
Jun 16, 2020, 2:10:16 PM6/16/20
to Nathan Chancellor, Will Deacon, Mark Brown, ndesau...@google.com, clang-bu...@googlegroups.com, linux-ar...@lists.infradead.org, mark.r...@arm.com, catalin...@arm.com, andro...@google.com, danie...@arm.com
I have this in the list of fixes I'm working through. I'm trying to
get everything done by Thursday.

-Tom

> Cheers,
> Nathan
>

Nick Desaulniers

unread,
Jun 16, 2020, 2:35:31 PM6/16/20
to Will Deacon, Nathan Chancellor, Mark Brown, clang-built-linux, Linux ARM, Mark Rutland, Catalin Marinas, andro...@google.com, danie...@arm.com, Tom Stellard
That should be fine.
Acked-by: Nick Desaulniers <ndesau...@google.com>

--
Thanks,
~Nick Desaulniers

Fangrui Song

unread,
Jun 17, 2020, 2:36:26 AM6/17/20
to Nick Desaulniers, Will Deacon, Nathan Chancellor, Mark Brown, clang-built-linux, Linux ARM, Mark Rutland, Catalin Marinas, andro...@google.com, danie...@arm.com, Tom Stellard
100001 is fine.

Tom has merged it into release/10.x
https://github.com/llvm/llvm-project/commit/bf89c5aeb8915d488fa1c790e1b237b62a49c01f

Daniel Kiss

unread,
Jun 18, 2020, 7:23:00 AM6/18/20
to Will Deacon, Nick Desaulniers, Nathan Chancellor, Mark Brown, clang-built-linux, Linux ARM, Mark Rutland, Catalin Marinas, andro...@google.com, Tom Stellard, Fangrui Song
Hi Will,

v5.8rc1 compiled with the patched llvm 10.0.01(dc94773a91c85a05f4f249153cb1e9522b3beb5e).
The function you reported now looks good to me.

Thanks,
Daniel

0000000000006ae8 kvm_vm_ioctl_check_extension_generic:
6ae8: e8 03 00 aa mov x8, x0
6aec: 3f 84 02 f1 cmp x1, #161
6af0: 20 00 80 52 mov w0, #1
6af4: 8c 01 00 54 b.gt #48 <kvm_vm_ioctl_check_extension_generic+0x3c>
6af8: 29 0c 00 d1 sub x9, x1, #3
6afc: 3f dd 01 f1 cmp x9, #119
6b00: e8 02 00 54 b.hi #92 <kvm_vm_ioctl_check_extension_generic+0x74>
6b04: 0a 00 00 90 adrp x10, #0
6b08: 4a 01 00 91 add x10, x10, #0
6b0c: 8b 00 00 10 adr x11, #16
6b10: 4c 69 69 38 ldrb w12, [x10, x9]
6b14: 6b 09 0c 8b add x11, x11, x12, lsl #2
6b18: 60 01 1f d6 br x11
6b1c: 9f 24 03 d5 bti j
6b20: c0 03 5f d6 ret
6b24: 3f 88 02 f1 cmp x1, #162
6b28: a0 ff ff 54 b.eq #-12 <kvm_vm_ioctl_check_extension_generic+0x34>
6b2c: 3f d8 02 f1 cmp x1, #182
6b30: 60 ff ff 54 b.eq #-20 <kvm_vm_ioctl_check_extension_generic+0x34>
6b34: 3f a0 02 f1 cmp x1, #168
6b38: 21 01 00 54 b.ne #36 <kvm_vm_ioctl_check_extension_generic+0x74>
6b3c: 60 00 80 52 mov w0, #3
6b40: c0 03 5f d6 ret
6b44: 9f 24 03 d5 bti j
6b48: 00 40 80 52 mov w0, #512
6b4c: c0 03 5f d6 ret
6b50: 9f 24 03 d5 bti j
6b54: 00 00 82 52 mov w0, #4096
6b58: c0 03 5f d6 ret
6b5c: 9f 24 03 d5 bti j
6b60: 3f 23 03 d5 paciasp
6b64: fd 7b bf a9 stp x29, x30, [sp, #-16]!
6b68: fd 03 00 91 mov x29, sp
6b6c: e0 03 08 aa mov x0, x8
6b70: 00 00 00 94 bl #0 <kvm_vm_ioctl_check_extension_generic+0x88>
6b74: 00 7c 40 93 sxtw x0, w0
6b78: fd 7b c1 a8 ldp x29, x30, [sp], #16
6b7c: bf 23 03 d5 autiasp
6b80: c0 03 5f d6 ret
Reply all
Reply to author
Forward
0 new messages