[PATCH 00/18] add support for Clang's Shadow Call Stack

23 views
Skip to first unread message

Sami Tolvanen

unread,
Oct 18, 2019, 12:10:45 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This patch series adds support for Clang's Shadow Call Stack (SCS)
mitigation, which uses a separately allocated shadow stack to protect
against return address overwrites. More information can be found here:

https://clang.llvm.org/docs/ShadowCallStack.html

SCS is currently supported only on arm64, where the compiler requires
the x18 register to be reserved for holding the current task's shadow
stack pointer. Because of this, the series includes four patches from
Ard to remove x18 usage from assembly code and to reserve the register
from general allocation.

With -fsanitize=shadow-call-stack, the compiler injects instructions
to all non-leaf C functions to store the return address to the shadow
stack and unconditionally load it again before returning. As a result,
SCS is incompatible with features that rely on modifying function
return addresses to alter control flow, such as function graph tracing
and kretprobes. A copy of the return address is still kept in the
kernel stack for compatibility with stack unwinding, for example.

SCS has a minimal performance overhead, but allocating shadow stacks
increases kernel memory usage. The feature is therefore mostly useful
on hardware that lacks support for PAC instructions. This series adds
a ROP protection choice to the kernel configuration, where other
return address protection options can be selected as they are added to
the kernel.


Ard Biesheuvel (4):
arm64/lib: copy_page: avoid x18 register in assembler code
arm64: kvm: stop treating register x18 as caller save
arm64: kernel: avoid x18 as an arbitrary temp register
arm64: kbuild: reserve reg x18 from general allocation by the compiler

Sami Tolvanen (14):
arm64: mm: don't use x18 in idmap_kpti_install_ng_mappings
add support for Clang's Shadow Call Stack (SCS)
scs: add accounting
scs: add support for stack usage debugging
trace: disable function graph tracing with SCS
kprobes: fix compilation without CONFIG_KRETPROBES
kprobes: disable kretprobes with SCS
arm64: reserve x18 only with Shadow Call Stack
arm64: preserve x18 when CPU is suspended
arm64: efi: restore x18 if it was corrupted
arm64: vdso: disable Shadow Call Stack
arm64: kprobes: fix kprobes without CONFIG_KRETPROBES
arm64: disable SCS for hypervisor code
arm64: implement Shadow Call Stack

Makefile | 6 +
arch/Kconfig | 41 ++++-
arch/arm64/Kconfig | 1 +
arch/arm64/Makefile | 4 +
arch/arm64/include/asm/scs.h | 60 ++++++++
arch/arm64/include/asm/stacktrace.h | 4 +
arch/arm64/include/asm/thread_info.h | 3 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kernel/cpu-reset.S | 4 +-
arch/arm64/kernel/efi-rt-wrapper.S | 7 +-
arch/arm64/kernel/entry.S | 23 +++
arch/arm64/kernel/head.S | 9 ++
arch/arm64/kernel/irq.c | 2 +
arch/arm64/kernel/probes/kprobes.c | 2 +
arch/arm64/kernel/process.c | 3 +
arch/arm64/kernel/scs.c | 39 +++++
arch/arm64/kernel/smp.c | 4 +
arch/arm64/kernel/vdso/Makefile | 2 +-
arch/arm64/kvm/hyp/Makefile | 3 +-
arch/arm64/kvm/hyp/entry.S | 12 +-
arch/arm64/lib/copy_page.S | 38 ++---
arch/arm64/mm/proc.S | 69 +++++----
drivers/base/node.c | 6 +
fs/proc/meminfo.c | 4 +
include/linux/compiler-clang.h | 2 +
include/linux/compiler_types.h | 4 +
include/linux/mmzone.h | 3 +
include/linux/scs.h | 88 +++++++++++
init/init_task.c | 6 +
init/main.c | 3 +
kernel/Makefile | 1 +
kernel/fork.c | 9 ++
kernel/kprobes.c | 38 ++---
kernel/sched/core.c | 2 +
kernel/sched/sched.h | 1 +
kernel/scs.c | 221 +++++++++++++++++++++++++++
kernel/trace/Kconfig | 1 +
mm/page_alloc.c | 6 +
mm/vmstat.c | 3 +
40 files changed, 656 insertions(+), 82 deletions(-)
create mode 100644 arch/arm64/include/asm/scs.h
create mode 100644 arch/arm64/kernel/scs.c
create mode 100644 include/linux/scs.h
create mode 100644 kernel/scs.c

--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:10:48 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
idmap_kpti_install_ng_mappings uses x18 as a temporary register, which
will result in a conflict when x18 is reserved. Use x16 and x17 instead
where needed.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/mm/proc.S | 63 ++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index a1e0592d1fbc..fdabf40a83c8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -250,15 +250,15 @@ ENTRY(idmap_kpti_install_ng_mappings)
/* We're the boot CPU. Wait for the others to catch up */
sevl
1: wfe
- ldaxr w18, [flag_ptr]
- eor w18, w18, num_cpus
- cbnz w18, 1b
+ ldaxr w17, [flag_ptr]
+ eor w17, w17, num_cpus
+ cbnz w17, 1b

/* We need to walk swapper, so turn off the MMU. */
pre_disable_mmu_workaround
- mrs x18, sctlr_el1
- bic x18, x18, #SCTLR_ELx_M
- msr sctlr_el1, x18
+ mrs x17, sctlr_el1
+ bic x17, x17, #SCTLR_ELx_M
+ msr sctlr_el1, x17
isb

/* Everybody is enjoying the idmap, so we can rewrite swapper. */
@@ -281,9 +281,9 @@ skip_pgd:
isb

/* We're done: fire up the MMU again */
- mrs x18, sctlr_el1
- orr x18, x18, #SCTLR_ELx_M
- msr sctlr_el1, x18
+ mrs x17, sctlr_el1
+ orr x17, x17, #SCTLR_ELx_M
+ msr sctlr_el1, x17
isb

/*
@@ -353,46 +353,47 @@ skip_pte:
b.ne do_pte
b next_pmd

+ .unreq cpu
+ .unreq num_cpus
+ .unreq swapper_pa
+ .unreq cur_pgdp
+ .unreq end_pgdp
+ .unreq pgd
+ .unreq cur_pudp
+ .unreq end_pudp
+ .unreq pud
+ .unreq cur_pmdp
+ .unreq end_pmdp
+ .unreq pmd
+ .unreq cur_ptep
+ .unreq end_ptep
+ .unreq pte
+
/* Secondary CPUs end up here */
__idmap_kpti_secondary:
/* Uninstall swapper before surgery begins */
- __idmap_cpu_set_reserved_ttbr1 x18, x17
+ __idmap_cpu_set_reserved_ttbr1 x16, x17

/* Increment the flag to let the boot CPU we're ready */
-1: ldxr w18, [flag_ptr]
- add w18, w18, #1
- stxr w17, w18, [flag_ptr]
+1: ldxr w16, [flag_ptr]
+ add w16, w16, #1
+ stxr w17, w16, [flag_ptr]
cbnz w17, 1b

/* Wait for the boot CPU to finish messing around with swapper */
sevl
1: wfe
- ldxr w18, [flag_ptr]
- cbnz w18, 1b
+ ldxr w16, [flag_ptr]
+ cbnz w16, 1b

/* All done, act like nothing happened */
- offset_ttbr1 swapper_ttb, x18
+ offset_ttbr1 swapper_ttb, x16
msr ttbr1_el1, swapper_ttb
isb
ret

- .unreq cpu
- .unreq num_cpus
- .unreq swapper_pa
.unreq swapper_ttb
.unreq flag_ptr
- .unreq cur_pgdp
- .unreq end_pgdp
- .unreq pgd
- .unreq cur_pudp
- .unreq end_pudp
- .unreq pud
- .unreq cur_pmdp
- .unreq end_pmdp
- .unreq pmd
- .unreq cur_ptep
- .unreq end_ptep
- .unreq pte
ENDPROC(idmap_kpti_install_ng_mappings)
.popsection
#endif
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:10:52 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

Register x18 will no longer be used as a caller save register in the
future, so stop using it in the copy_page() code.

Link: https://patchwork.kernel.org/patch/9836869/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/lib/copy_page.S | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index bbb8562396af..8b562264c165 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -34,45 +34,45 @@ alternative_else_nop_endif
ldp x14, x15, [x1, #96]
ldp x16, x17, [x1, #112]

- mov x18, #(PAGE_SIZE - 128)
+ add x0, x0, #256
add x1, x1, #128
1:
- subs x18, x18, #128
+ tst x0, #(PAGE_SIZE - 1)

alternative_if ARM64_HAS_NO_HW_PREFETCH
prfm pldl1strm, [x1, #384]
alternative_else_nop_endif

- stnp x2, x3, [x0]
+ stnp x2, x3, [x0, #-256]
ldp x2, x3, [x1]
- stnp x4, x5, [x0, #16]
+ stnp x4, x5, [x0, #-240]
ldp x4, x5, [x1, #16]
- stnp x6, x7, [x0, #32]
+ stnp x6, x7, [x0, #-224]
ldp x6, x7, [x1, #32]
- stnp x8, x9, [x0, #48]
+ stnp x8, x9, [x0, #-208]
ldp x8, x9, [x1, #48]
- stnp x10, x11, [x0, #64]
+ stnp x10, x11, [x0, #-192]
ldp x10, x11, [x1, #64]
- stnp x12, x13, [x0, #80]
+ stnp x12, x13, [x0, #-176]
ldp x12, x13, [x1, #80]
- stnp x14, x15, [x0, #96]
+ stnp x14, x15, [x0, #-160]
ldp x14, x15, [x1, #96]
- stnp x16, x17, [x0, #112]
+ stnp x16, x17, [x0, #-144]
ldp x16, x17, [x1, #112]

add x0, x0, #128
add x1, x1, #128

- b.gt 1b
+ b.ne 1b

- stnp x2, x3, [x0]
- stnp x4, x5, [x0, #16]
- stnp x6, x7, [x0, #32]
- stnp x8, x9, [x0, #48]
- stnp x10, x11, [x0, #64]
- stnp x12, x13, [x0, #80]
- stnp x14, x15, [x0, #96]
- stnp x16, x17, [x0, #112]
+ stnp x2, x3, [x0, #-256]
+ stnp x4, x5, [x0, #-240]
+ stnp x6, x7, [x0, #-224]
+ stnp x8, x9, [x0, #-208]
+ stnp x10, x11, [x0, #-192]
+ stnp x12, x13, [x0, #-176]
+ stnp x14, x15, [x0, #-160]
+ stnp x16, x17, [x0, #-144]

ret
ENDPROC(copy_page)
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:10:55 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

In preparation of using x18 as a task struct pointer register when
running in the kernel, stop treating it as caller save in the KVM
guest entry/exit code. Currently, the code assumes there is no need
to preserve it for the host, given that it would have been assumed
clobbered anyway by the function call to __guest_enter(). Instead,
preserve its value and restore it upon return.

Link: https://patchwork.kernel.org/patch/9836891/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kvm/hyp/entry.S | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e5cc8d66bf53..20bd9a20ea27 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -23,6 +23,7 @@
.pushsection .hyp.text, "ax"

.macro save_callee_saved_regs ctxt
+ str x18, [\ctxt, #CPU_XREG_OFFSET(18)]
stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
@@ -38,6 +39,7 @@
ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)]
+ ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)]
.endm

/*
@@ -87,12 +89,9 @@ alternative_else_nop_endif
ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]

- // Restore guest regs x19-x29, lr
+ // Restore guest regs x18-x29, lr
restore_callee_saved_regs x18

- // Restore guest reg x18
- ldr x18, [x18, #CPU_XREG_OFFSET(18)]
-
// Do not touch any register after this!
eret
sb
@@ -114,7 +113,7 @@ ENTRY(__guest_exit)
// Retrieve the guest regs x0-x1 from the stack
ldp x2, x3, [sp], #16 // x0, x1

- // Store the guest regs x0-x1 and x4-x18
+ // Store the guest regs x0-x1 and x4-x17
stp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
stp x4, x5, [x1, #CPU_XREG_OFFSET(4)]
stp x6, x7, [x1, #CPU_XREG_OFFSET(6)]
@@ -123,9 +122,8 @@ ENTRY(__guest_exit)
stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
- str x18, [x1, #CPU_XREG_OFFSET(18)]

- // Store the guest regs x19-x29, lr
+ // Store the guest regs x18-x29, lr
save_callee_saved_regs x1

get_host_ctxt x2, x3
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:00 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

The code in __cpu_soft_restart() uses x18 as an arbitrary temp register,
which will shortly be disallowed. So use x8 instead.

Link: https://patchwork.kernel.org/patch/9836877/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/cpu-reset.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 6ea337d464c4..32c7bf858dd9 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -42,11 +42,11 @@ ENTRY(__cpu_soft_restart)
mov x0, #HVC_SOFT_RESTART
hvc #0 // no return

-1: mov x18, x1 // entry
+1: mov x8, x1 // entry
mov x0, x2 // arg0
mov x1, x3 // arg1
mov x2, x4 // arg2
- br x18
+ br x8
ENDPROC(__cpu_soft_restart)

.popsection
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:03 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

Before we can start using register x18 for a special purpose (as permitted
by the AAPCS64 ABI), we need to tell the compiler that it is off limits
for general allocation. So tag it as 'fixed', and remove the mention from
the LL/SC compiler flag override.

Link: https://patchwork.kernel.org/patch/9836881/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2c0238ce0551..1c7b276bc7c5 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -55,7 +55,7 @@ endif

KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) \
$(compat_vdso) $(cc_has_k_constraint)
-KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -ffixed-x18
KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) $(compat_vdso)

--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:06 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change adds generic support for Clang's Shadow Call Stack, which
uses a shadow stack to protect return addresses from being overwritten
by an attacker. Details are available here:

https://clang.llvm.org/docs/ShadowCallStack.html

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
Makefile | 6 ++
arch/Kconfig | 39 ++++++++
include/linux/compiler-clang.h | 2 +
include/linux/compiler_types.h | 4 +
include/linux/scs.h | 88 ++++++++++++++++++
init/init_task.c | 6 ++
init/main.c | 3 +
kernel/Makefile | 1 +
kernel/fork.c | 9 ++
kernel/sched/core.c | 2 +
kernel/sched/sched.h | 1 +
kernel/scs.c | 162 +++++++++++++++++++++++++++++++++
12 files changed, 323 insertions(+)
create mode 100644 include/linux/scs.h
create mode 100644 kernel/scs.c

diff --git a/Makefile b/Makefile
index ffd7a912fc46..e401fa500f62 100644
--- a/Makefile
+++ b/Makefile
@@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH
KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
endif

+ifdef CONFIG_SHADOW_CALL_STACK
+KBUILD_CFLAGS += -fsanitize=shadow-call-stack
+DISABLE_SCS := -fno-sanitize=shadow-call-stack
+export DISABLE_SCS
+endif
+
# arch Makefile may override CC so keep this after arch Makefile is included
NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)

diff --git a/arch/Kconfig b/arch/Kconfig
index 5f8a5d84dbbe..a222adda8130 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -521,6 +521,45 @@ config STACKPROTECTOR_STRONG
about 20% of all kernel functions, which increases the kernel code
size by about 2%.

+config ARCH_SUPPORTS_SHADOW_CALL_STACK
+ bool
+ help
+ An architecture should select this if it supports Clang's Shadow
+ Call Stack, has asm/scs.h, and implements runtime support for shadow
+ stack switching.
+
+config SHADOW_CALL_STACK_VMAP
+ def_bool n
+ depends on SHADOW_CALL_STACK
+ help
+ Use virtually mapped shadow call stacks. Selecting this option
+ provides better stack exhaustion protection, but increases per-thread
+ memory consumption as a full page is allocated for each shadow stack.
+
+choice
+ prompt "Return-oriented programming (ROP) protection"
+ default ROP_PROTECTION_NONE
+ help
+ This option controls kernel protections against return-oriented
+ programming (ROP) attacks.
+
+config ROP_PROTECTION_NONE
+ bool "None"
+
+config SHADOW_CALL_STACK
+ bool "Clang Shadow Call Stack"
+ depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
+ depends on CC_IS_CLANG && CLANG_VERSION >= 70000
+ help
+ This option enables Clang's Shadow Call Stack, which uses a shadow
+ stack to protect function return addresses from being overwritten by
+ an attacker. More information can be found from Clang's
+ documentation:
+
+ https://clang.llvm.org/docs/ShadowCallStack.html
+
+endchoice
+
config HAVE_ARCH_WITHIN_STACK_FRAMES
bool
help
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 333a6695a918..9af08391f205 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -42,3 +42,5 @@
* compilers, like ICC.
*/
#define barrier() __asm__ __volatile__("" : : : "memory")
+
+#define __noscs __attribute__((no_sanitize("shadow-call-stack")))
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 72393a8c1a6c..be5d5be4b1ae 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -202,6 +202,10 @@ struct ftrace_likely_data {
# define randomized_struct_fields_end
#endif

+#ifndef __noscs
+# define __noscs
+#endif
+
#ifndef asm_volatile_goto
#define asm_volatile_goto(x...) asm goto(x)
#endif
diff --git a/include/linux/scs.h b/include/linux/scs.h
new file mode 100644
index 000000000000..dfbd80faa528
--- /dev/null
+++ b/include/linux/scs.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2018 Google LLC
+ */
+
+#ifndef _LINUX_SCS_H
+#define _LINUX_SCS_H
+
+#include <linux/gfp.h>
+#include <linux/sched.h>
+#include <asm/page.h>
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+
+#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
+# define SCS_SIZE PAGE_SIZE
+#else
+# define SCS_SIZE 1024
+#endif
+
+#define SCS_GFP (GFP_KERNEL | __GFP_ZERO)
+
+extern unsigned long init_shadow_call_stack[];
+
+static inline void *task_scs(struct task_struct *tsk)
+{
+ return task_thread_info(tsk)->shadow_call_stack;
+}
+
+static inline void task_set_scs(struct task_struct *tsk, void *s)
+{
+ task_thread_info(tsk)->shadow_call_stack = s;
+}
+
+extern void scs_init(void);
+extern void scs_set_init_magic(struct task_struct *tsk);
+extern void scs_task_init(struct task_struct *tsk);
+extern void scs_task_reset(struct task_struct *tsk);
+extern int scs_prepare(struct task_struct *tsk, int node);
+extern bool scs_corrupted(struct task_struct *tsk);
+extern void scs_release(struct task_struct *tsk);
+
+#else /* CONFIG_SHADOW_CALL_STACK */
+
+static inline void *task_scs(struct task_struct *tsk)
+{
+ return 0;
+}
+
+static inline void task_set_scs(struct task_struct *tsk, void *s)
+{
+}
+
+static inline void scs_init(void)
+{
+}
+
+static inline void scs_set_init_magic(struct task_struct *tsk)
+{
+}
+
+static inline void scs_task_init(struct task_struct *tsk)
+{
+}
+
+static inline void scs_task_reset(struct task_struct *tsk)
+{
+}
+
+static inline int scs_prepare(struct task_struct *tsk, int node)
+{
+ return 0;
+}
+
+static inline bool scs_corrupted(struct task_struct *tsk)
+{
+ return false;
+}
+
+static inline void scs_release(struct task_struct *tsk)
+{
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK */
+
+#endif /* _LINUX_SCS_H */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..5e55ff45bbbf 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -11,6 +11,7 @@
#include <linux/mm.h>
#include <linux/audit.h>
#include <linux/numa.h>
+#include <linux/scs.h>

#include <asm/pgtable.h>
#include <linux/uaccess.h>
@@ -184,6 +185,11 @@ struct task_struct init_task
};
EXPORT_SYMBOL(init_task);

+#ifdef CONFIG_SHADOW_CALL_STACK
+unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)]
+ __init_task_data __aligned(SCS_SIZE);
+#endif
+
/*
* Initial thread structure. Alignment of this is handled by a special
* linker map entry.
diff --git a/init/main.c b/init/main.c
index 91f6ebb30ef0..fb8bcdd729b9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -93,6 +93,7 @@
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
#include <linux/mem_encrypt.h>
+#include <linux/scs.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -578,6 +579,8 @@ asmlinkage __visible void __init start_kernel(void)
char *after_dashes;

set_task_stack_end_magic(&init_task);
+ scs_set_init_magic(&init_task);
+
smp_setup_processor_id();
debug_objects_early_init();

diff --git a/kernel/Makefile b/kernel/Makefile
index daad787fb795..313dbd44d576 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..ae7ebe9f0586 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -94,6 +94,7 @@
#include <linux/livepatch.h>
#include <linux/thread_info.h>
#include <linux/stackleak.h>
+#include <linux/scs.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk)

void free_task(struct task_struct *tsk)
{
+ scs_release(tsk);
+
#ifndef CONFIG_THREAD_INFO_IN_TASK
/*
* The task is finally done with both the stack and thread_info,
@@ -834,6 +837,8 @@ void __init fork_init(void)
NULL, free_vm_stack_cache);
#endif

+ scs_init();
+
lockdep_init_task(&init_task);
uprobes_init();
}
@@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
set_task_stack_end_magic(tsk);
+ scs_task_init(tsk);

#ifdef CONFIG_STACKPROTECTOR
tsk->stack_canary = get_random_canary();
@@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process(
args->tls);
if (retval)
goto bad_fork_cleanup_io;
+ retval = scs_prepare(p, node);
+ if (retval)
+ goto bad_fork_cleanup_thread;

stackleak_task_init(p);

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd05a378631a..e7faeb383008 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu)
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);

+ scs_task_reset(idle);
+
__sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0db2c1b3361e..c153003a011c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -58,6 +58,7 @@
#include <linux/profile.h>
#include <linux/psi.h>
#include <linux/rcupdate_wait.h>
+#include <linux/scs.h>
#include <linux/security.h>
#include <linux/stop_machine.h>
#include <linux/suspend.h>
diff --git a/kernel/scs.c b/kernel/scs.c
new file mode 100644
index 000000000000..47324e8d313b
--- /dev/null
+++ b/kernel/scs.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2019 Google LLC
+ */
+
+#include <linux/cpuhotplug.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/scs.h>
+#include <linux/vmalloc.h>
+#include <asm/scs.h>
+
+#define SCS_END_MAGIC 0xaf0194819b1635f6UL
+
+static inline void *__scs_base(struct task_struct *tsk)
+{
+ return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1));
+}
+
+#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
+
+/* Keep a cache of shadow stacks */
+#define SCS_CACHE_SIZE 2
+static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]);
+
+static void *scs_alloc(int node)
+{
+ int i;
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ void *s;
+
+ s = this_cpu_xchg(scs_cache[i], NULL);
+ if (s) {
+ memset(s, 0, SCS_SIZE);
+ return s;
+ }
+ }
+
+ return __vmalloc_node_range(SCS_SIZE, SCS_SIZE,
+ VMALLOC_START, VMALLOC_END,
+ SCS_GFP, PAGE_KERNEL, 0,
+ node, __builtin_return_address(0));
+}
+
+static void scs_free(void *s)
+{
+ int i;
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0)
+ continue;
+
+ return;
+ }
+
+ vfree_atomic(s);
+}
+
+static int scs_cleanup(unsigned int cpu)
+{
+ int i;
+ void **cache = per_cpu_ptr(scs_cache, cpu);
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ vfree(cache[i]);
+ cache[i] = NULL;
+ }
+
+ return 0;
+}
+
+void __init scs_init(void)
+{
+ cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL,
+ scs_cleanup);
+}
+
+#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */
+
+static struct kmem_cache *scs_cache;
+
+static inline void *scs_alloc(int node)
+{
+ return kmem_cache_alloc_node(scs_cache, SCS_GFP, node);
+}
+
+static inline void scs_free(void *s)
+{
+ kmem_cache_free(scs_cache, s);
+}
+
+void __init scs_init(void)
+{
+ scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE,
+ 0, NULL);
+ WARN_ON(!scs_cache);
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */
+
+static inline unsigned long *scs_magic(struct task_struct *tsk)
+{
+ return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long));
+}
+
+static inline void scs_set_magic(struct task_struct *tsk)
+{
+ *scs_magic(tsk) = SCS_END_MAGIC;
+}
+
+void scs_task_init(struct task_struct *tsk)
+{
+ task_set_scs(tsk, NULL);
+}
+
+void scs_task_reset(struct task_struct *tsk)
+{
+ task_set_scs(tsk, __scs_base(tsk));
+}
+
+void scs_set_init_magic(struct task_struct *tsk)
+{
+ scs_save(tsk);
+ scs_set_magic(tsk);
+ scs_load(tsk);
+}
+
+int scs_prepare(struct task_struct *tsk, int node)
+{
+ void *s;
+
+ s = scs_alloc(node);
+ if (!s)
+ return -ENOMEM;
+
+ task_set_scs(tsk, s);
+ scs_set_magic(tsk);
+
+ return 0;
+}
+
+bool scs_corrupted(struct task_struct *tsk)
+{
+ return *scs_magic(tsk) != SCS_END_MAGIC;
+}
+
+void scs_release(struct task_struct *tsk)
+{
+ void *s;
+
+ s = __scs_base(tsk);
+ if (!s)
+ return;
+
+ WARN_ON(scs_corrupted(tsk));
+
+ scs_task_init(tsk);
+ scs_free(s);
+}
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:09 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change adds accounting for the memory allocated for shadow stacks.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
drivers/base/node.c | 6 ++++++
fs/proc/meminfo.c | 4 ++++
include/linux/mmzone.h | 3 +++
kernel/scs.c | 20 ++++++++++++++++++++
mm/page_alloc.c | 6 ++++++
mm/vmstat.c | 3 +++
6 files changed, 42 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 296546ffed6c..111e58ec231e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -415,6 +415,9 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d AnonPages: %8lu kB\n"
"Node %d Shmem: %8lu kB\n"
"Node %d KernelStack: %8lu kB\n"
+#ifdef CONFIG_SHADOW_CALL_STACK
+ "Node %d ShadowCallStack:%8lu kB\n"
+#endif
"Node %d PageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
@@ -438,6 +441,9 @@ static ssize_t node_read_meminfo(struct device *dev,
nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
nid, K(i.sharedram),
nid, sum_zone_node_page_state(nid, NR_KERNEL_STACK_KB),
+#ifdef CONFIG_SHADOW_CALL_STACK
+ nid, sum_zone_node_page_state(nid, NR_KERNEL_SCS_BYTES) / 1024,
+#endif
nid, K(sum_zone_node_page_state(nid, NR_PAGETABLE)),
nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index ac9247371871..df352e4bab90 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -103,6 +103,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
show_val_kb(m, "SUnreclaim: ", sunreclaim);
seq_printf(m, "KernelStack: %8lu kB\n",
global_zone_page_state(NR_KERNEL_STACK_KB));
+#ifdef CONFIG_SHADOW_CALL_STACK
+ seq_printf(m, "ShadowCallStack:%8lu kB\n",
+ global_zone_page_state(NR_KERNEL_SCS_BYTES) / 1024);
+#endif
show_val_kb(m, "PageTables: ",
global_zone_page_state(NR_PAGETABLE));

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bda20282746b..fcb8c1708f9e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -200,6 +200,9 @@ enum zone_stat_item {
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_PAGETABLE, /* used for pagetables */
NR_KERNEL_STACK_KB, /* measured in KiB */
+#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
+ NR_KERNEL_SCS_BYTES, /* measured in bytes */
+#endif
/* Second 128 byte cacheline */
NR_BOUNCE,
#if IS_ENABLED(CONFIG_ZSMALLOC)
diff --git a/kernel/scs.c b/kernel/scs.c
index 47324e8d313b..0e3cba49ea1a 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -7,9 +7,11 @@

#include <linux/cpuhotplug.h>
#include <linux/mm.h>
+#include <linux/mmzone.h>
#include <linux/slab.h>
#include <linux/scs.h>
#include <linux/vmalloc.h>
+#include <linux/vmstat.h>
#include <asm/scs.h>

#define SCS_END_MAGIC 0xaf0194819b1635f6UL
@@ -59,6 +61,11 @@ static void scs_free(void *s)
vfree_atomic(s);
}

+static struct page *__scs_page(struct task_struct *tsk)
+{
+ return vmalloc_to_page(__scs_base(tsk));
+}
+
static int scs_cleanup(unsigned int cpu)
{
int i;
@@ -92,6 +99,11 @@ static inline void scs_free(void *s)
kmem_cache_free(scs_cache, s);
}

+static struct page *__scs_page(struct task_struct *tsk)
+{
+ return virt_to_page(__scs_base(tsk));
+}
+
void __init scs_init(void)
{
scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE,
@@ -128,6 +140,12 @@ void scs_set_init_magic(struct task_struct *tsk)
scs_load(tsk);
}

+static void scs_account(struct task_struct *tsk, int account)
+{
+ mod_zone_page_state(page_zone(__scs_page(tsk)), NR_KERNEL_SCS_BYTES,
+ account * SCS_SIZE);
+}
+
int scs_prepare(struct task_struct *tsk, int node)
{
void *s;
@@ -138,6 +156,7 @@ int scs_prepare(struct task_struct *tsk, int node)

task_set_scs(tsk, s);
scs_set_magic(tsk);
+ scs_account(tsk, 1);

return 0;
}
@@ -157,6 +176,7 @@ void scs_release(struct task_struct *tsk)

WARN_ON(scs_corrupted(tsk));

+ scs_account(tsk, -1);
scs_task_init(tsk);
scs_free(s);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ecc3dbad606b..fe17d69d98a7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5361,6 +5361,9 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
" managed:%lukB"
" mlocked:%lukB"
" kernel_stack:%lukB"
+#ifdef CONFIG_SHADOW_CALL_STACK
+ " shadow_call_stack:%lukB"
+#endif
" pagetables:%lukB"
" bounce:%lukB"
" free_pcp:%lukB"
@@ -5382,6 +5385,9 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
K(zone_managed_pages(zone)),
K(zone_page_state(zone, NR_MLOCK)),
zone_page_state(zone, NR_KERNEL_STACK_KB),
+#ifdef CONFIG_SHADOW_CALL_STACK
+ zone_page_state(zone, NR_KERNEL_SCS_BYTES) / 1024,
+#endif
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_BOUNCE)),
K(free_pcp),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6afc892a148a..9fe4afe670fe 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1118,6 +1118,9 @@ const char * const vmstat_text[] = {
"nr_mlock",
"nr_page_table_pages",
"nr_kernel_stack",
+#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
+ "nr_shadow_call_stack_bytes",
+#endif
"nr_bounce",
#if IS_ENABLED(CONFIG_ZSMALLOC)
"nr_zspages",
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:12 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Implements CONFIG_DEBUG_STACK_USAGE for shadow stacks.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
kernel/scs.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/kernel/scs.c b/kernel/scs.c
index 0e3cba49ea1a..1ec5c5a8dfae 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -161,6 +161,44 @@ int scs_prepare(struct task_struct *tsk, int node)
return 0;
}

+#ifdef CONFIG_DEBUG_STACK_USAGE
+static inline unsigned long scs_used(struct task_struct *tsk)
+{
+ unsigned long *p = __scs_base(tsk);
+ unsigned long *end = scs_magic(tsk);
+ uintptr_t s = (uintptr_t)p;
+
+ while (p < end && *p)
+ p++;
+
+ return (uintptr_t)p - s;
+}
+
+static void scs_check_usage(struct task_struct *tsk)
+{
+ static DEFINE_SPINLOCK(lock);
+ static unsigned long highest;
+ unsigned long used = scs_used(tsk);
+
+ if (used <= highest)
+ return;
+
+ spin_lock(&lock);
+
+ if (used > highest) {
+ pr_info("%s: highest shadow stack usage %lu bytes\n",
+ __func__, used);
+ highest = used;
+ }
+
+ spin_unlock(&lock);
+}
+#else
+static inline void scs_check_usage(struct task_struct *tsk)
+{
+}
+#endif
+
bool scs_corrupted(struct task_struct *tsk)
{
return *scs_magic(tsk) != SCS_END_MAGIC;
@@ -175,6 +213,7 @@ void scs_release(struct task_struct *tsk)
return;

WARN_ON(scs_corrupted(tsk));
+ scs_check_usage(tsk);

scs_account(tsk, -1);
scs_task_init(tsk);
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:14 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
With CONFIG_FUNCTION_GRAPH_TRACER, function return addresses are
modified in ftrace_graph_caller and prepare_ftrace_return to redirect
control flow to ftrace_return_to_handler. This is incompatible with
return address protection.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
kernel/trace/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..b7e5e3bfa0f4 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -161,6 +161,7 @@ config FUNCTION_GRAPH_TRACER
depends on HAVE_FUNCTION_GRAPH_TRACER
depends on FUNCTION_TRACER
depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
+ depends on ROP_PROTECTION_NONE
default y
help
Enable the kernel to trace a function at both its return
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:16 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
kprobe_on_func_entry and arch_kprobe_on_func_entry need to be available
even if CONFIG_KRETPROBES is not selected.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
kernel/kprobes.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 53534aa258a6..b5e20a4669b8 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1829,6 +1829,25 @@ unsigned long __weak arch_deref_entry_point(void *entry)
return (unsigned long)entry;
}

+bool __weak arch_kprobe_on_func_entry(unsigned long offset)
+{
+ return !offset;
+}
+
+bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+{
+ kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+
+ if (IS_ERR(kp_addr))
+ return false;
+
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
+ !arch_kprobe_on_func_entry(offset))
+ return false;
+
+ return true;
+}
+
#ifdef CONFIG_KRETPROBES
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
@@ -1885,25 +1904,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);

-bool __weak arch_kprobe_on_func_entry(unsigned long offset)
-{
- return !offset;
-}
-
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
-{
- kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
-
- if (IS_ERR(kp_addr))
- return false;
-
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
-
- return true;
-}
-
int register_kretprobe(struct kretprobe *rp)
{
int ret = 0;
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:19 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
With CONFIG_KRETPROBES, function return addresses are modified to
redirect control flow to kretprobe_trampoline. This is incompatible with
return address protection.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a222adda8130..4646e3b34925 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -171,7 +171,7 @@ config ARCH_USE_BUILTIN_BSWAP

config KRETPROBES
def_bool y
- depends on KPROBES && HAVE_KRETPROBES
+ depends on KPROBES && HAVE_KRETPROBES && ROP_PROTECTION_NONE

config USER_RETURN_NOTIFIER
bool
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:22 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Only reserve x18 with CONFIG_SHADOW_CALL_STACK. Note that all external
kernel modules must also have x18 reserved if the kernel uses SCS.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Makefile | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 1c7b276bc7c5..ef76101201b2 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -55,7 +55,7 @@ endif

KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) \
$(compat_vdso) $(cc_has_k_constraint)
-KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -ffixed-x18
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) $(compat_vdso)

@@ -72,6 +72,10 @@ stack_protector_prepare: prepare0
include/generated/asm-offsets.h))
endif

+ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
+KBUILD_CFLAGS += -ffixed-x18
+endif
+
ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
KBUILD_CPPFLAGS += -mbig-endian
CHECKFLAGS += -D__AARCH64EB__
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:24 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Don't lose the current task's shadow stack when the CPU is suspended.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/mm/proc.S | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index fdabf40a83c8..9a8bd4bc8549 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,6 +73,9 @@ alternative_endif
stp x8, x9, [x0, #48]
stp x10, x11, [x0, #64]
stp x12, x13, [x0, #80]
+#ifdef CONFIG_SHADOW_CALL_STACK
+ stp x18, xzr, [x0, #96]
+#endif
ret
ENDPROC(cpu_do_suspend)

@@ -89,6 +92,9 @@ ENTRY(cpu_do_resume)
ldp x9, x10, [x0, #48]
ldp x11, x12, [x0, #64]
ldp x13, x14, [x0, #80]
+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldp x18, x19, [x0, #96]
+#endif
msr tpidr_el0, x2
msr tpidrro_el0, x3
msr contextidr_el1, x4
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:26 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
If we detect a corrupted x18 and SCS is enabled, restore the register
before jumping back to instrumented code.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/efi-rt-wrapper.S | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/efi-rt-wrapper.S b/arch/arm64/kernel/efi-rt-wrapper.S
index 3fc71106cb2b..945744f16086 100644
--- a/arch/arm64/kernel/efi-rt-wrapper.S
+++ b/arch/arm64/kernel/efi-rt-wrapper.S
@@ -34,5 +34,10 @@ ENTRY(__efi_rt_asm_wrapper)
ldp x29, x30, [sp], #32
b.ne 0f
ret
-0: b efi_handle_corrupted_x18 // tail call
+0:
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* Restore x18 before returning to instrumented code. */
+ mov x18, x2
+#endif
+ b efi_handle_corrupted_x18 // tail call
ENDPROC(__efi_rt_asm_wrapper)
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:28 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/vdso/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index dd2514bb1511..b23c963dd8df 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -19,7 +19,7 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))

ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \
--build-id -n -T
-
+ccflags-y += $(DISABLE_SCS)
ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
ccflags-y += -DDISABLE_BRANCH_PROFILING

--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:31 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This allows CONFIG_KRETPROBES to be disabled without disabling
kprobes entirely.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/probes/kprobes.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index c4452827419b..98230ae979ca 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -551,6 +551,7 @@ void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
return (void *)orig_ret_address;
}

+#ifdef CONFIG_KRETPROBES
void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
@@ -564,6 +565,7 @@ int __kprobes arch_trampoline_kprobe(struct kprobe *p)
{
return 0;
}
+#endif

int __init arch_init_kprobes(void)
{
--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:34 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kvm/hyp/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index ea710f674cb6..96073d81cb3b 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -4,7 +4,8 @@
#

ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING \
- $(DISABLE_STACKLEAK_PLUGIN)
+ $(DISABLE_STACKLEAK_PLUGIN) \
+ $(DISABLE_SCS)

KVM=../../../../virt/kvm

--
2.23.0.866.gb869b98d4c-goog

Sami Tolvanen

unread,
Oct 18, 2019, 12:11:36 PM10/18/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change implements shadow stack switching, initial SCS set-up,
and interrupt shadow stacks for arm64.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/scs.h | 60 ++++++++++++++++++++++++++++
arch/arm64/include/asm/stacktrace.h | 4 ++
arch/arm64/include/asm/thread_info.h | 3 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/asm-offsets.c | 3 ++
arch/arm64/kernel/entry.S | 23 +++++++++++
arch/arm64/kernel/head.S | 9 +++++
arch/arm64/kernel/irq.c | 2 +
arch/arm64/kernel/process.c | 3 ++
arch/arm64/kernel/scs.c | 39 ++++++++++++++++++
arch/arm64/kernel/smp.c | 4 ++
12 files changed, 152 insertions(+)
create mode 100644 arch/arm64/include/asm/scs.h
create mode 100644 arch/arm64/kernel/scs.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3f047afb982c..9bf179db5da9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -66,6 +66,7 @@ config ARM64
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_SUPPORTS_MEMORY_FAILURE
+ select ARCH_SUPPORTS_SHADOW_CALL_STACK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG
select ARCH_SUPPORTS_NUMA_BALANCING
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
new file mode 100644
index 000000000000..14ba192dc6f0
--- /dev/null
+++ b/arch/arm64/include/asm/scs.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_SCS_H
+#define _ASM_SCS_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/scs.h>
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+
+extern void scs_init_irq(void);
+
+static inline void scs_save(struct task_struct *tsk)
+{
+ void *s;
+
+ asm volatile("mov %0, x18" : "=r" (s));
+ task_set_scs(tsk, s);
+}
+
+static inline void scs_load(struct task_struct *tsk)
+{
+ asm volatile("mov x18, %0" : : "r" (task_scs(tsk)));
+ task_set_scs(tsk, NULL);
+}
+
+static inline void scs_thread_switch(struct task_struct *prev,
+ struct task_struct *next)
+{
+ scs_save(prev);
+ scs_load(next);
+
+ if (unlikely(scs_corrupted(prev)))
+ panic("corrupted shadow stack detected inside scheduler\n");
+}
+
+#else /* CONFIG_SHADOW_CALL_STACK */
+
+static inline void scs_init_irq(void)
+{
+}
+
+static inline void scs_save(struct task_struct *tsk)
+{
+}
+
+static inline void scs_load(struct task_struct *tsk)
+{
+}
+
+static inline void scs_thread_switch(struct task_struct *prev,
+ struct task_struct *next)
+{
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK */
+
+#endif /* __ASSEMBLY __ */
+
+#endif /* _ASM_SCS_H */
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 4d9b1f48dc39..b6cf32fb4efe 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -68,6 +68,10 @@ extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);

DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);

+#ifdef CONFIG_SHADOW_CALL_STACK
+DECLARE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr);
+#endif
+
static inline bool on_irq_stack(unsigned long sp,
struct stack_info *info)
{
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f0cec4160136..8c73764b9ed2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
#endif
} preempt;
};
+#ifdef CONFIG_SHADOW_CALL_STACK
+ void *shadow_call_stack;
+#endif
};

#define thread_saved_pc(tsk) \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 478491f07b4f..b3995329d9e5 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
obj-$(CONFIG_ARM64_SSBD) += ssbd.o
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
+obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o

obj-y += vdso/ probes/
obj-$(CONFIG_COMPAT_VDSO) += vdso32/
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 214685760e1c..f6762b9ae1e1 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -33,6 +33,9 @@ int main(void)
DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit));
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
+#endif
+#ifdef CONFIG_SHADOW_CALL_STACK
+ DEFINE(TSK_TI_SCS, offsetof(struct task_struct, thread_info.shadow_call_stack));
#endif
DEFINE(TSK_STACK, offsetof(struct task_struct, stack));
#ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index cf3bd2976e57..ca49938b99d0 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -172,6 +172,10 @@ alternative_cb_end

apply_ssbd 1, x22, x23

+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldr x18, [tsk, #TSK_TI_SCS] // Restore shadow call stack
+ str xzr, [tsk, #TSK_TI_SCS]
+#endif
.else
add x21, sp, #S_FRAME_SIZE
get_current_task tsk
@@ -278,6 +282,12 @@ alternative_else_nop_endif
ct_user_enter
.endif

+#ifdef CONFIG_SHADOW_CALL_STACK
+ .if \el == 0
+ str x18, [tsk, #TSK_TI_SCS] // Save shadow call stack
+ .endif
+#endif
+
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
/*
* Restore access to TTBR0_EL1. If returning to EL0, no need for SPSR
@@ -383,6 +393,9 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0

.macro irq_stack_entry
mov x19, sp // preserve the original sp
+#ifdef CONFIG_SHADOW_CALL_STACK
+ mov x20, x18 // preserve the original shadow stack
+#endif

/*
* Compare sp with the base of the task stack.
@@ -400,6 +413,12 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0

/* switch to the irq stack */
mov sp, x26
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* also switch to the irq shadow stack */
+ ldr_this_cpu x18, irq_shadow_call_stack_ptr, x26
+#endif
+
9998:
.endm

@@ -409,6 +428,10 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0
*/
.macro irq_stack_exit
mov sp, x19
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* x20 is also preserved */
+ mov x18, x20
+#endif
.endm

/* GPRs used by entry code */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 989b1944cb71..2be977c6496f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -27,6 +27,7 @@
#include <asm/pgtable-hwdef.h>
#include <asm/pgtable.h>
#include <asm/page.h>
+#include <asm/scs.h>
#include <asm/smp.h>
#include <asm/sysreg.h>
#include <asm/thread_info.h>
@@ -424,6 +425,10 @@ __primary_switched:
stp xzr, x30, [sp, #-16]!
mov x29, sp

+#ifdef CONFIG_SHADOW_CALL_STACK
+ adr_l x18, init_shadow_call_stack // Set shadow call stack
+#endif
+
str_l x21, __fdt_pointer, x5 // Save FDT pointer

ldr_l x4, kimage_vaddr // Save the offset between
@@ -731,6 +736,10 @@ __secondary_switched:
ldr x2, [x0, #CPU_BOOT_TASK]
cbz x2, __secondary_too_slow
msr sp_el0, x2
+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldr x18, [x2, #TSK_TI_SCS] // Set shadow call stack
+ str xzr, [x2, #TSK_TI_SCS]
+#endif
mov x29, #0
mov x30, #0
b secondary_start_kernel
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 04a327ccf84d..fe0ca522ff60 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -21,6 +21,7 @@
#include <linux/vmalloc.h>
#include <asm/daifflags.h>
#include <asm/vmap_stack.h>
+#include <asm/scs.h>

unsigned long irq_err_count;

@@ -63,6 +64,7 @@ static void init_irq_stacks(void)
void __init init_IRQ(void)
{
init_irq_stacks();
+ scs_init_irq();
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 71f788cd2b18..4490632047d6 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -52,6 +52,7 @@
#include <asm/mmu_context.h>
#include <asm/processor.h>
#include <asm/pointer_auth.h>
+#include <asm/scs.h>
#include <asm/stacktrace.h>

#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
@@ -508,6 +509,8 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
ptrauth_thread_switch(next);
ssbs_thread_switch(next);

+ scs_thread_switch(prev, next);
+
/*
* Complete any pending TLB or cache maintenance on this CPU in case
* the thread migrates to a different CPU.
diff --git a/arch/arm64/kernel/scs.c b/arch/arm64/kernel/scs.c
new file mode 100644
index 000000000000..6f255072c9a9
--- /dev/null
+++ b/arch/arm64/kernel/scs.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2019 Google LLC
+ */
+
+#include <linux/percpu.h>
+#include <linux/vmalloc.h>
+#include <asm/scs.h>
+
+DEFINE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr);
+
+#ifndef CONFIG_SHADOW_CALL_STACK_VMAP
+DEFINE_PER_CPU(unsigned long [SCS_SIZE/sizeof(long)], irq_shadow_call_stack)
+ __aligned(SCS_SIZE);
+#endif
+
+void scs_init_irq(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
+ unsigned long *p;
+
+ p = __vmalloc_node_range(SCS_SIZE, SCS_SIZE,
+ VMALLOC_START, VMALLOC_END,
+ SCS_GFP, PAGE_KERNEL,
+ 0, cpu_to_node(cpu),
+ __builtin_return_address(0));
+
+ per_cpu(irq_shadow_call_stack_ptr, cpu) = p;
+#else
+ per_cpu(irq_shadow_call_stack_ptr, cpu) =
+ per_cpu(irq_shadow_call_stack, cpu);
+#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */
+ }
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc9fe879c279..cc1938a585d2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -44,6 +44,7 @@
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/processor.h>
+#include <asm/scs.h>
#include <asm/smp_plat.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -357,6 +358,9 @@ void cpu_die(void)
{
unsigned int cpu = smp_processor_id();

+ /* Save the shadow stack pointer before exiting the idle task */
+ scs_save(current);
+
idle_task_exit();

local_daif_mask();
--
2.23.0.866.gb869b98d4c-goog

Nick Desaulniers

unread,
Oct 18, 2019, 12:43:32 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, kernel-h...@lists.openwall.com, Linux ARM, LKML
On Fri, Oct 18, 2019 at 9:10 AM Sami Tolvanen <samito...@google.com> wrote:
>
> idmap_kpti_install_ng_mappings uses x18 as a temporary register, which
> will result in a conflict when x18 is reserved. Use x16 and x17 instead
> where needed.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

TIL about .req/.unreq. Seems like a nice way of marking "variable"
lifetime. Technically, only `pte` needed to be moved to reuse
{w|x}16, but moving most the unreqs together is nicer than splitting
them apart. The usage all looks correct to me.
Reviewed-by: Nick Desaulniers <ndesau...@google.com>
--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Oct 18, 2019, 12:49:34 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, kernel-h...@lists.openwall.com, Linux ARM, LKML
On Fri, Oct 18, 2019 at 9:11 AM Sami Tolvanen <samito...@google.com> wrote:
>
> Don't lose the current task's shadow stack when the CPU is suspended.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/arm64/mm/proc.S | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index fdabf40a83c8..9a8bd4bc8549 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -73,6 +73,9 @@ alternative_endif
> stp x8, x9, [x0, #48]
> stp x10, x11, [x0, #64]
> stp x12, x13, [x0, #80]
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + stp x18, xzr, [x0, #96]

Could this be a str/ldr of just x18 rather than stp/ldp of x18 +
garbage? Maybe there's no real cost difference, or some kind of
alignment invariant?

> +#endif
> ret
> ENDPROC(cpu_do_suspend)
>
> @@ -89,6 +92,9 @@ ENTRY(cpu_do_resume)
> ldp x9, x10, [x0, #48]
> ldp x11, x12, [x0, #64]
> ldp x13, x14, [x0, #80]
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + ldp x18, x19, [x0, #96]
> +#endif
> msr tpidr_el0, x2
> msr tpidrro_el0, x3
> msr contextidr_el1, x4
> --
> 2.23.0.866.gb869b98d4c-goog
>


--
Thanks,
~Nick Desaulniers

Joe Perches

unread,
Oct 18, 2019, 12:58:18 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Fri, 2019-10-18 at 09:10 -0700, Sami Tolvanen wrote:
> This change adds generic support for Clang's Shadow Call Stack, which
> uses a shadow stack to protect return addresses from being overwritten
> by an attacker
[]
> .diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
[]
> @@ -42,3 +42,5 @@
> * compilers, like ICC.
> */
> #define barrier() __asm__ __volatile__("" : : : "memory")
> +
> +#define __noscs __attribute__((no_sanitize("shadow-call-stack")))

trivia:

This should likely use the __ prefix and suffix form:

#define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))

as should the __no_sanitize_address above this

> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h

Steven Rostedt

unread,
Oct 18, 2019, 1:01:31 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Fri, 18 Oct 2019 09:10:24 -0700
Sami Tolvanen <samito...@google.com> wrote:

> With CONFIG_FUNCTION_GRAPH_TRACER, function return addresses are
> modified in ftrace_graph_caller and prepare_ftrace_return to redirect
> control flow to ftrace_return_to_handler. This is incompatible with
> return address protection.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> kernel/trace/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index e08527f50d2a..b7e5e3bfa0f4 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -161,6 +161,7 @@ config FUNCTION_GRAPH_TRACER
> depends on HAVE_FUNCTION_GRAPH_TRACER
> depends on FUNCTION_TRACER
> depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
> + depends on ROP_PROTECTION_NONE

NAK, Put this in the arch code.

> default y
> help
> Enable the kernel to trace a function at both its return

-- Steve


diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 41a9b4257b72..d68339987604 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -149,7 +149,7 @@ config ARM64
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
- select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_TRACER if ROP_PROTECTION_NONE
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING

Steven Rostedt

unread,
Oct 18, 2019, 1:03:00 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Masami Hiramatsu

Added Masami who's the maintainer of kprobes.

-- Steve

Steven Rostedt

unread,
Oct 18, 2019, 1:04:32 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Masami Hiramatsu

[ Added Masami ]

On Fri, 18 Oct 2019 09:10:26 -0700
Sami Tolvanen <samito...@google.com> wrote:

> With CONFIG_KRETPROBES, function return addresses are modified to
> redirect control flow to kretprobe_trampoline. This is incompatible with
> return address protection.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index a222adda8130..4646e3b34925 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -171,7 +171,7 @@ config ARCH_USE_BUILTIN_BSWAP
>
> config KRETPROBES
> def_bool y
> - depends on KPROBES && HAVE_KRETPROBES
> + depends on KPROBES && HAVE_KRETPROBES && ROP_PROTECTION_NONE

Again, this belongs in the arch code.

-- Steve

>
> config USER_RETURN_NOTIFIER
> bool


diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 41a9b4257b72..65557d7e6b5e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,7 +166,7 @@ config ARM64
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
- select HAVE_KRETPROBES
+ select HAVE_KRETPROBES if ROP_PROTECTION_NONE
select HAVE_GENERIC_VDSO
select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN

Sami Tolvanen

unread,
Oct 18, 2019, 1:05:32 PM10/18/19
to Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Fri, Oct 18, 2019 at 9:49 AM Nick Desaulniers
<ndesau...@google.com> wrote:
> > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > index fdabf40a83c8..9a8bd4bc8549 100644
> > --- a/arch/arm64/mm/proc.S
> > +++ b/arch/arm64/mm/proc.S
> > @@ -73,6 +73,9 @@ alternative_endif
> > stp x8, x9, [x0, #48]
> > stp x10, x11, [x0, #64]
> > stp x12, x13, [x0, #80]
> > +#ifdef CONFIG_SHADOW_CALL_STACK
> > + stp x18, xzr, [x0, #96]
>
> Could this be a str/ldr of just x18 rather than stp/ldp of x18 +
> garbage? Maybe there's no real cost difference, or some kind of
> alignment invariant?

Sure, this can be changed to str/ldr. I don't think there's a
noticeable difference in cost.

Sami

Sami Tolvanen

unread,
Oct 18, 2019, 1:08:29 PM10/18/19
to Steven Rostedt, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Fri, Oct 18, 2019 at 10:01 AM Steven Rostedt <ros...@goodmis.org> wrote:
> NAK, Put this in the arch code.

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 41a9b4257b72..d68339987604 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -149,7 +149,7 @@ config ARM64
> select HAVE_FTRACE_MCOUNT_RECORD
> select HAVE_FUNCTION_TRACER
> select HAVE_FUNCTION_ERROR_INJECTION
> - select HAVE_FUNCTION_GRAPH_TRACER
> + select HAVE_FUNCTION_GRAPH_TRACER if ROP_PROTECTION_NONE
> select HAVE_GCC_PLUGINS
> select HAVE_HW_BREAKPOINT if PERF_EVENTS
> select HAVE_IRQ_TIME_ACCOUNTING

Thanks, Steven. I'll fix this and kretprobes in v2.

Sami

Nick Desaulniers

unread,
Oct 18, 2019, 1:08:50 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, kernel-h...@lists.openwall.com, Linux ARM, LKML, Miguel Ojeda
On Fri, Oct 18, 2019 at 9:11 AM Sami Tolvanen <samito...@google.com> wrote:
>
Version check LGTM.

> + help
> + This option enables Clang's Shadow Call Stack, which uses a shadow
> + stack to protect function return addresses from being overwritten by
> + an attacker. More information can be found from Clang's
> + documentation:
> +
> + https://clang.llvm.org/docs/ShadowCallStack.html
> +
> +endchoice
> +
> config HAVE_ARCH_WITHIN_STACK_FRAMES
> bool
> help
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..9af08391f205 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -42,3 +42,5 @@
> * compilers, like ICC.
> */
> #define barrier() __asm__ __volatile__("" : : : "memory")
> +
> +#define __noscs __attribute__((no_sanitize("shadow-call-stack")))

It looks like this attribute, (and thus a requirement to use this
feature), didn't exist until Clang 7.0: https://godbolt.org/z/p9u1we
(as noted above)

I think it's better to put __noscs behind a __has_attribute guard in
include/linux/compiler_attributes.h. Otherwise, what will happen when
Clang 6.0 sees __noscs, for example? (-Wunknown-sanitizers will
happen).

> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index 72393a8c1a6c..be5d5be4b1ae 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -202,6 +202,10 @@ struct ftrace_likely_data {
> # define randomized_struct_fields_end
> #endif
>
> +#ifndef __noscs
> +# define __noscs
> +#endif
> +

and then this can be removed.
--
Thanks,
~Nick Desaulniers

Sami Tolvanen

unread,
Oct 18, 2019, 1:11:28 PM10/18/19
to Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML, Miguel Ojeda
On Fri, Oct 18, 2019 at 10:08 AM 'Nick Desaulniers' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 333a6695a918..9af08391f205 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -42,3 +42,5 @@
> > * compilers, like ICC.
> > */
> > #define barrier() __asm__ __volatile__("" : : : "memory")
> > +
> > +#define __noscs __attribute__((no_sanitize("shadow-call-stack")))
>
> It looks like this attribute, (and thus a requirement to use this
> feature), didn't exist until Clang 7.0: https://godbolt.org/z/p9u1we
> (as noted above)
>
> I think it's better to put __noscs behind a __has_attribute guard in
> include/linux/compiler_attributes.h. Otherwise, what will happen when
> Clang 6.0 sees __noscs, for example? (-Wunknown-sanitizers will
> happen).

Good point, I'll fix this in v2. Thanks.

Sami

Jann Horn

unread,
Oct 18, 2019, 1:13:19 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-ar...@lists.infradead.org, kernel list
On Fri, Oct 18, 2019 at 6:16 PM Sami Tolvanen <samito...@google.com> wrote:
> This change implements shadow stack switching, initial SCS set-up,
> and interrupt shadow stacks for arm64.
[...]
> +static inline void scs_save(struct task_struct *tsk)
> +{
> + void *s;
> +
> + asm volatile("mov %0, x18" : "=r" (s));
> + task_set_scs(tsk, s);
> +}
> +
> +static inline void scs_load(struct task_struct *tsk)
> +{
> + asm volatile("mov x18, %0" : : "r" (task_scs(tsk)));
> + task_set_scs(tsk, NULL);
> +}

These things should probably be __always_inline or something like
that? If the compiler decides not to inline them (e.g. when called
from scs_thread_switch()), stuff will blow up, right?

> +static inline void scs_thread_switch(struct task_struct *prev,
> + struct task_struct *next)
> +{
> + scs_save(prev);
> + scs_load(next);
> +
> + if (unlikely(scs_corrupted(prev)))
> + panic("corrupted shadow stack detected inside scheduler\n");
> +}
[...]
> +#ifdef CONFIG_SHADOW_CALL_STACK
> +DECLARE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr);
> +#endif

If an attacker has a leak of some random function pointer to find the
ASLR base address, they'll know where irq_shadow_call_stack_ptr is.
With an arbitrary read (to read
irq_shadow_call_stack_ptr[sched_getcpu()]) followed by an arbitrary
write, they'd be able to overwrite the shadow stack. Or with just an
arbitrary write without a read, they could change
irq_shadow_call_stack_ptr[sched_getcpu()] to point elsewhere. This is
different from the intended protection level according to
<https://clang.llvm.org/docs/ShadowCallStack.html#security>, which
talks about "a runtime that avoids exposing the address of the shadow
call stack to attackers that can read arbitrary memory". Of course,
that's extremely hard to implement in the context of the kernel, where
you can see all the memory management data structures and all physical
memory.

You might want to write something in the cover letter about what the
benefits of this mechanism compared to STACKPROTECTOR are in the
context of the kernel, including a specific description of which types
of attacker capabilities this is supposed to defend against.

Sami Tolvanen

unread,
Oct 18, 2019, 1:18:52 PM10/18/19
to Jann Horn, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, kernel list
On Fri, Oct 18, 2019 at 10:13 AM Jann Horn <ja...@google.com> wrote:
> These things should probably be __always_inline or something like
> that? If the compiler decides not to inline them (e.g. when called
> from scs_thread_switch()), stuff will blow up, right?

Correct. I'll change these to __always_inline in v2. I think there
might be other places in the kernel where not inlining a static inline
function would break things, but there's no need to add more.

> This is different from the intended protection level according to
> <https://clang.llvm.org/docs/ShadowCallStack.html#security>, which
> talks about "a runtime that avoids exposing the address of the shadow
> call stack to attackers that can read arbitrary memory". Of course,
> that's extremely hard to implement in the context of the kernel, where
> you can see all the memory management data structures and all physical
> memory.

Yes, the security guarantees in the kernel are different as hiding
shadow stack pointers is more challenging.

> You might want to write something in the cover letter about what the
> benefits of this mechanism compared to STACKPROTECTOR are in the
> context of the kernel, including a specific description of which types
> of attacker capabilities this is supposed to defend against.

Sure, I'll add something about that in v2. Thanks.

Sami

Mark Rutland

unread,
Oct 18, 2019, 1:23:18 PM10/18/19
to Jann Horn, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-ar...@lists.infradead.org, kernel list
On Fri, Oct 18, 2019 at 07:12:52PM +0200, Jann Horn wrote:
> On Fri, Oct 18, 2019 at 6:16 PM Sami Tolvanen <samito...@google.com> wrote:
> > This change implements shadow stack switching, initial SCS set-up,
> > and interrupt shadow stacks for arm64.
> [...]
> > +static inline void scs_save(struct task_struct *tsk)
> > +{
> > + void *s;
> > +
> > + asm volatile("mov %0, x18" : "=r" (s));
> > + task_set_scs(tsk, s);
> > +}
> > +
> > +static inline void scs_load(struct task_struct *tsk)
> > +{
> > + asm volatile("mov x18, %0" : : "r" (task_scs(tsk)));
> > + task_set_scs(tsk, NULL);
> > +}
>
> These things should probably be __always_inline or something like
> that? If the compiler decides not to inline them (e.g. when called
> from scs_thread_switch()), stuff will blow up, right?

I think scs_save() would better live in assembly in cpu_switch_to(),
where we switch the stack and current. It shouldn't matter whether
scs_load() is inlined or not, since the x18 value _should_ be invariant
from the PoV of the task.

We just need to add a TSK_TI_SCS to asm-offsets.c, and then insert a
single LDR at the end:

mov sp, x9
msr sp_el0, x1
#ifdef CONFIG_SHADOW_CALL_STACK
ldr x18, [x1, TSK_TI_SCS]
#endif
ret

Thanks,
Mark.

Nick Desaulniers

unread,
Oct 18, 2019, 1:32:33 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, kernel-h...@lists.openwall.com, Linux ARM, LKML
On Fri, Oct 18, 2019 at 9:11 AM Sami Tolvanen <samito...@google.com> wrote:
>
> From: Ard Biesheuvel <ard.bie...@linaro.org>
>
> Before we can start using register x18 for a special purpose (as permitted
> by the AAPCS64 ABI), we need to tell the compiler that it is off limits
> for general allocation. So tag it as 'fixed',

yep, but...

> and remove the mention from
> the LL/SC compiler flag override.

was that cut/dropped from this patch?

>
> Link: https://patchwork.kernel.org/patch/9836881/

^ Looks like it. Maybe it doesn't matter, but if sending a V2, maybe
the commit message to be updated?

> Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

If sending a V2 with the above cleaned up, you may also include:
Reviewed-by: Nick Desaulniers <ndesau...@google.com>

I like how this does not conditionally reserve it based on the CONFIG
for SCS. Hopefully later patches don't wrap it, but I haven't looked
through all of them yet.

> ---
> arch/arm64/Makefile | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 2c0238ce0551..1c7b276bc7c5 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -55,7 +55,7 @@ endif
>
> KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) \
> $(compat_vdso) $(cc_has_k_constraint)
> -KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -ffixed-x18
> KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
> KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) $(compat_vdso)
>

Sami Tolvanen

unread,
Oct 18, 2019, 1:36:02 PM10/18/19
to Mark Rutland, Jann Horn, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, kernel list
On Fri, Oct 18, 2019 at 10:23 AM Mark Rutland <mark.r...@arm.com> wrote:
> I think scs_save() would better live in assembly in cpu_switch_to(),
> where we switch the stack and current. It shouldn't matter whether
> scs_load() is inlined or not, since the x18 value _should_ be invariant
> from the PoV of the task.

Note that there's also a call to scs_save in cpu_die, because the
current task's shadow stack pointer is only stored in x18 and we don't
want to lose it.

> We just need to add a TSK_TI_SCS to asm-offsets.c, and then insert a
> single LDR at the end:
>
> mov sp, x9
> msr sp_el0, x1
> #ifdef CONFIG_SHADOW_CALL_STACK
> ldr x18, [x1, TSK_TI_SCS]
> #endif
> ret

TSK_TI_SCS is already defined, so yes, we could move this to
cpu_switch_to. I would still prefer to have the overflow check that's
in scs_thread_switch though.

Sami

Jann Horn

unread,
Oct 18, 2019, 1:42:34 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-ar...@lists.infradead.org, kernel list
On Fri, Oct 18, 2019 at 6:14 PM Sami Tolvanen <samito...@google.com> wrote:
> This change adds generic support for Clang's Shadow Call Stack, which
> uses a shadow stack to protect return addresses from being overwritten
> by an attacker. Details are available here:
>
> https://clang.llvm.org/docs/ShadowCallStack.html

(As I mentioned in the other thread, the security documentation there
doesn't fit the kernel usecase.)

[...]
> +config SHADOW_CALL_STACK_VMAP
> + def_bool n
> + depends on SHADOW_CALL_STACK
> + help
> + Use virtually mapped shadow call stacks. Selecting this option
> + provides better stack exhaustion protection, but increases per-thread
> + memory consumption as a full page is allocated for each shadow stack.

Without CONFIG_SHADOW_CALL_STACK_VMAP, after 128 small stack frames,
you overflow into random physmap memory even if the main stack is
vmapped... I guess you can't get around that without making the SCS
instrumentation more verbose. :/

Could you maybe change things so that independent of whether you have
vmapped SCS or slab-allocated SCS, the scs_corrupted() check looks at
offset 1024-8 (where it currently is for the slab-allocated case)?
That way, code won't suddenly stop working when you disable
CONFIG_SHADOW_CALL_STACK_VMAP; and especially if you use
CONFIG_SHADOW_CALL_STACK_VMAP for development and testing but disable
it in production, that would be annoying.

Sami Tolvanen

unread,
Oct 18, 2019, 1:56:17 PM10/18/19
to Jann Horn, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, kernel list
On Fri, Oct 18, 2019 at 10:42 AM Jann Horn <ja...@google.com> wrote:
> (As I mentioned in the other thread, the security documentation there
> doesn't fit the kernel usecase.)

True. I'll add a note about it here too.

> Without CONFIG_SHADOW_CALL_STACK_VMAP, after 128 small stack frames,
> you overflow into random physmap memory even if the main stack is
> vmapped... I guess you can't get around that without making the SCS
> instrumentation more verbose. :/

That's correct. In our testing, 128 stack frames is nearly twice the
maximum amount that's been used (on an arm64 device), and for many use
cases, allocating a full page is simply too costly despite the
advantages.

> Could you maybe change things so that independent of whether you have
> vmapped SCS or slab-allocated SCS, the scs_corrupted() check looks at
> offset 1024-8 (where it currently is for the slab-allocated case)?
> That way, code won't suddenly stop working when you disable
> CONFIG_SHADOW_CALL_STACK_VMAP; and especially if you use
> CONFIG_SHADOW_CALL_STACK_VMAP for development and testing but disable
> it in production, that would be annoying.

Yes, that's a great idea. I'll change this in v2.

Sami

Miguel Ojeda

unread,
Oct 18, 2019, 2:33:05 PM10/18/19
to Sami Tolvanen, Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
+1, please CC whenever you send it!

Cheers,
Miguel

Sami Tolvanen

unread,
Oct 18, 2019, 3:00:43 PM10/18/19
to Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Fri, Oct 18, 2019 at 10:32 AM 'Nick Desaulniers' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
> > and remove the mention from
> > the LL/SC compiler flag override.
>
> was that cut/dropped from this patch?
>
> >
> > Link: https://patchwork.kernel.org/patch/9836881/
>
> ^ Looks like it. Maybe it doesn't matter, but if sending a V2, maybe
> the commit message to be updated?

True. The original patch is from 2017 and the relevant part of
arm64/lib/Makefile no longer exists. I'll update this accordingly.

> I like how this does not conditionally reserve it based on the CONFIG
> for SCS. Hopefully later patches don't wrap it, but I haven't looked
> through all of them yet.

In a later patch x18 is only reserved with SCS. I'm fine with dropping
that patch and reserving it always, but wouldn't mind hearing thoughts
from the maintainers about this first.

Sami

Nick Desaulniers

unread,
Oct 18, 2019, 4:33:40 PM10/18/19
to Miguel Ojeda, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
Sami pointed out to me off thread that __has_attribute would only
check `no_sanitize`, not `shadow-call-stack`. So maybe best to keep
the definition here (include/linux/compiler-clang.h), but wrapped in a
`__has_feature` check so that Clang 6.0 doesn't start complaining.
--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Oct 18, 2019, 5:23:24 PM10/18/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Fri, Oct 18, 2019 at 9:11 AM 'Sami Tolvanen' via Clang Built Linux
<clang-bu...@googlegroups.com> wrote:
>
> Only reserve x18 with CONFIG_SHADOW_CALL_STACK. Note that all external
> kernel modules must also have x18 reserved if the kernel uses SCS.

Ah, ok. The tradeoff for maintainers to consider, either:
1. one less GPR for ALL kernel code or
2. remember not to use x18 in inline as lest you potentially break SCS

This patch is 2 (the earlier patch was 1). Maybe we don't write
enough inline asm that this will be hard to remember, and we do have
CI in Android to watch for this (on mainline, not sure about -next).

Either way,
Acked-by: Nick Desaulniers <ndesau...@google.com>

>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/arm64/Makefile | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 1c7b276bc7c5..ef76101201b2 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -55,7 +55,7 @@ endif
>
> KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) \
> $(compat_vdso) $(cc_has_k_constraint)
> -KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -ffixed-x18
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
> KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) $(compat_vdso)
>
> @@ -72,6 +72,10 @@ stack_protector_prepare: prepare0
> include/generated/asm-offsets.h))
> endif
>
> +ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> +KBUILD_CFLAGS += -ffixed-x18
> +endif
> +
> ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
> KBUILD_CPPFLAGS += -mbig-endian
> CHECKFLAGS += -D__AARCH64EB__
> --
> 2.23.0.866.gb869b98d4c-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-li...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20191018161033.261971-13-samitolvanen%40google.com.



--
Thanks,
~Nick Desaulniers

Miguel Ojeda

unread,
Oct 18, 2019, 8:21:56 PM10/18/19
to Nick Desaulniers, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Fri, Oct 18, 2019 at 10:33 PM Nick Desaulniers
<ndesau...@google.com> wrote:
>
> Sami pointed out to me off thread that __has_attribute would only
> check `no_sanitize`, not `shadow-call-stack`. So maybe best to keep
> the definition here (include/linux/compiler-clang.h), but wrapped in a
> `__has_feature` check so that Clang 6.0 doesn't start complaining.

Ah, good point -- agreed!

Cheers,
Miguel

Ard Biesheuvel

unread,
Oct 21, 2019, 2:12:38 AM10/21/19
to Sami Tolvanen, Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
Why would you reserve x18 if SCS is disabled? Given that this is a
choice that is made at code generation time, there is no justification
for always reserving it, since it will never be used for anything. (Of
course, this applies to generated code only - .S files should simply
be updated to avoid x18 altogether)

Also, please combine this patch with the one that reserves it
conditionally, no point in having both in the same series.

Ard Biesheuvel

unread,
Oct 21, 2019, 2:15:30 AM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Fri, 18 Oct 2019 at 18:11, Sami Tolvanen <samito...@google.com> wrote:
>
> With CONFIG_FUNCTION_GRAPH_TRACER, function return addresses are
> modified in ftrace_graph_caller and prepare_ftrace_return to redirect
> control flow to ftrace_return_to_handler. This is incompatible with
> return address protection.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

How difficult would it be to update the return address on the shadow
call stack along with the normal one? Not having to disable
infrastructure that is widely used by the distros would make this a
lot more palatable in the general case (even if it is Clang only at
the moment)


> ---
> kernel/trace/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index e08527f50d2a..b7e5e3bfa0f4 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -161,6 +161,7 @@ config FUNCTION_GRAPH_TRACER
> depends on HAVE_FUNCTION_GRAPH_TRACER
> depends on FUNCTION_TRACER
> depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
> + depends on ROP_PROTECTION_NONE
> default y
> help
> Enable the kernel to trace a function at both its return
> --
> 2.23.0.866.gb869b98d4c-goog
>

Ard Biesheuvel

unread,
Oct 21, 2019, 2:19:34 AM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Fri, 18 Oct 2019 at 18:10, Sami Tolvanen <samito...@google.com> wrote:
>
> From: Ard Biesheuvel <ard.bie...@linaro.org>
>
> In preparation of using x18 as a task struct pointer register when
> running in the kernel, stop treating it as caller save in the KVM
> guest entry/exit code. Currently, the code assumes there is no need
> to preserve it for the host, given that it would have been assumed
> clobbered anyway by the function call to __guest_enter(). Instead,
> preserve its value and restore it upon return.
>
> Link: https://patchwork.kernel.org/patch/9836891/
> Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

You might want to update the commit log to drop the reference to the
task struct pointer.

> ---
> arch/arm64/kvm/hyp/entry.S | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index e5cc8d66bf53..20bd9a20ea27 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -23,6 +23,7 @@
> .pushsection .hyp.text, "ax"
>
> .macro save_callee_saved_regs ctxt
> + str x18, [\ctxt, #CPU_XREG_OFFSET(18)]
> stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
> stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
> stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
> @@ -38,6 +39,7 @@
> ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
> ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
> ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)]
> + ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)]
> .endm
>
> /*
> @@ -87,12 +89,9 @@ alternative_else_nop_endif
> ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
>
> - // Restore guest regs x19-x29, lr
> + // Restore guest regs x18-x29, lr
> restore_callee_saved_regs x18
>
> - // Restore guest reg x18
> - ldr x18, [x18, #CPU_XREG_OFFSET(18)]
> -
> // Do not touch any register after this!
> eret
> sb
> @@ -114,7 +113,7 @@ ENTRY(__guest_exit)
> // Retrieve the guest regs x0-x1 from the stack
> ldp x2, x3, [sp], #16 // x0, x1
>
> - // Store the guest regs x0-x1 and x4-x18
> + // Store the guest regs x0-x1 and x4-x17
> stp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
> stp x4, x5, [x1, #CPU_XREG_OFFSET(4)]
> stp x6, x7, [x1, #CPU_XREG_OFFSET(6)]
> @@ -123,9 +122,8 @@ ENTRY(__guest_exit)
> stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
> stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
> stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
> - str x18, [x1, #CPU_XREG_OFFSET(18)]
>
> - // Store the guest regs x19-x29, lr
> + // Store the guest regs x18-x29, lr
> save_callee_saved_regs x1
>
> get_host_ctxt x2, x3
> --
> 2.23.0.866.gb869b98d4c-goog
>

Ard Biesheuvel

unread,
Oct 21, 2019, 2:20:55 AM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Fri, 18 Oct 2019 at 18:11, Sami Tolvanen <samito...@google.com> wrote:
>
> If we detect a corrupted x18 and SCS is enabled, restore the register
> before jumping back to instrumented code.
>

You'll have to elaborate a bit here and explain that this is
sufficient, given that we run EFI runtime services with interrupts
enabled.

> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/arm64/kernel/efi-rt-wrapper.S | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/efi-rt-wrapper.S b/arch/arm64/kernel/efi-rt-wrapper.S
> index 3fc71106cb2b..945744f16086 100644
> --- a/arch/arm64/kernel/efi-rt-wrapper.S
> +++ b/arch/arm64/kernel/efi-rt-wrapper.S
> @@ -34,5 +34,10 @@ ENTRY(__efi_rt_asm_wrapper)
> ldp x29, x30, [sp], #32
> b.ne 0f
> ret
> -0: b efi_handle_corrupted_x18 // tail call
> +0:
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + /* Restore x18 before returning to instrumented code. */
> + mov x18, x2
> +#endif
> + b efi_handle_corrupted_x18 // tail call
> ENDPROC(__efi_rt_asm_wrapper)
> --
> 2.23.0.866.gb869b98d4c-goog
>

Ard Biesheuvel

unread,
Oct 21, 2019, 2:22:00 AM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Fri, 18 Oct 2019 at 18:11, Sami Tolvanen <samito...@google.com> wrote:
>
> This allows CONFIG_KRETPROBES to be disabled without disabling
> kprobes entirely.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

Can we make kretprobes work with the shadow call stack instead?

> ---
> arch/arm64/kernel/probes/kprobes.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index c4452827419b..98230ae979ca 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -551,6 +551,7 @@ void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
> return (void *)orig_ret_address;
> }
>
> +#ifdef CONFIG_KRETPROBES
> void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
> struct pt_regs *regs)
> {
> @@ -564,6 +565,7 @@ int __kprobes arch_trampoline_kprobe(struct kprobe *p)
> {
> return 0;
> }
> +#endif
>
> int __init arch_init_kprobes(void)
> {
> --
> 2.23.0.866.gb869b98d4c-goog
>

Masami Hiramatsu

unread,
Oct 21, 2019, 5:13:44 AM10/21/19
to Steven Rostedt, Sami Tolvanen, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Masami Hiramatsu
On Fri, 18 Oct 2019 13:02:57 -0400
Steven Rostedt <ros...@goodmis.org> wrote:

>
> Added Masami who's the maintainer of kprobes.
>
> -- Steve
>
>
> On Fri, 18 Oct 2019 09:10:25 -0700
> Sami Tolvanen <samito...@google.com> wrote:
>
> > kprobe_on_func_entry and arch_kprobe_on_func_entry need to be available
> > even if CONFIG_KRETPROBES is not selected.

Good catch! Since nowadays all arch supports kretprobes, I've missed to
test it.

Acked-by: Masami Hiramatsu <mhir...@kernel.org>

Thank you,
--
Masami Hiramatsu <mhir...@kernel.org>

Masami Hiramatsu

unread,
Oct 21, 2019, 5:15:41 AM10/21/19
to Steven Rostedt, Sami Tolvanen, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Masami Hiramatsu
On Fri, 18 Oct 2019 13:04:29 -0400
Steven Rostedt <ros...@goodmis.org> wrote:

>
> [ Added Masami ]
>
> On Fri, 18 Oct 2019 09:10:26 -0700
> Sami Tolvanen <samito...@google.com> wrote:
>
> > With CONFIG_KRETPROBES, function return addresses are modified to
> > redirect control flow to kretprobe_trampoline. This is incompatible with
> > return address protection.
> >
> > Signed-off-by: Sami Tolvanen <samito...@google.com>
> > ---
> > arch/Kconfig | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index a222adda8130..4646e3b34925 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -171,7 +171,7 @@ config ARCH_USE_BUILTIN_BSWAP
> >
> > config KRETPROBES
> > def_bool y
> > - depends on KPROBES && HAVE_KRETPROBES
> > + depends on KPROBES && HAVE_KRETPROBES && ROP_PROTECTION_NONE
>
> Again, this belongs in the arch code.

+1, below patch (from Steve) looks good to me.

Thank you,

>
> -- Steve
>
> >
> > config USER_RETURN_NOTIFIER
> > bool
>
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 41a9b4257b72..65557d7e6b5e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -166,7 +166,7 @@ config ARM64
> select HAVE_STACKPROTECTOR
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_KPROBES
> - select HAVE_KRETPROBES
> + select HAVE_KRETPROBES if ROP_PROTECTION_NONE
> select HAVE_GENERIC_VDSO
> select IOMMU_DMA if IOMMU_SUPPORT
> select IRQ_DOMAIN


--
Masami Hiramatsu <mhir...@kernel.org>

Masami Hiramatsu

unread,
Oct 21, 2019, 5:28:55 AM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
Hi,

On Fri, 18 Oct 2019 09:10:15 -0700
Sami Tolvanen <samito...@google.com> wrote:

> This patch series adds support for Clang's Shadow Call Stack (SCS)
> mitigation, which uses a separately allocated shadow stack to protect
> against return address overwrites. More information can be found here:
>
> https://clang.llvm.org/docs/ShadowCallStack.html

Looks interesting, and like what function-graph tracing does...

>
> SCS is currently supported only on arm64, where the compiler requires
> the x18 register to be reserved for holding the current task's shadow
> stack pointer. Because of this, the series includes four patches from
> Ard to remove x18 usage from assembly code and to reserve the register
> from general allocation.
>
> With -fsanitize=shadow-call-stack, the compiler injects instructions
> to all non-leaf C functions to store the return address to the shadow
> stack and unconditionally load it again before returning. As a result,
> SCS is incompatible with features that rely on modifying function
> return addresses to alter control flow, such as function graph tracing
> and kretprobes. A copy of the return address is still kept in the
> kernel stack for compatibility with stack unwinding, for example.

Is it possible that kretprobes and function graph tracing modify the
SCS directly instead of changing real stack in that case?

Thank you,

--
Masami Hiramatsu <mhir...@kernel.org>

Kees Cook

unread,
Oct 21, 2019, 12:07:00 PM10/21/19
to Ard Biesheuvel, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Mon, Oct 21, 2019 at 08:21:48AM +0200, Ard Biesheuvel wrote:
> On Fri, 18 Oct 2019 at 18:11, Sami Tolvanen <samito...@google.com> wrote:
> >
> > This allows CONFIG_KRETPROBES to be disabled without disabling
> > kprobes entirely.
> >
> > Signed-off-by: Sami Tolvanen <samito...@google.com>
>
> Can we make kretprobes work with the shadow call stack instead?

I've been viewing that as "next steps". This series is the first step:
actually gaining the feature and clearly indicating where future
improvements can live.

--
Kees Cook

Mark Rutland

unread,
Oct 21, 2019, 12:49:12 PM10/21/19
to Sami Tolvanen, Jann Horn, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, kernel list
The only bit that I think needs to be in cpu_switch_to() is the install
of the next task's shadow addr into x18.

Having a separate scs_check_overflow() sounds fine to me, as that only
has to read from the shadow stack. IIUC that's also for the prev task,
not next, in the current patches.

Thanks,
Mark.

Mark Rutland

unread,
Oct 21, 2019, 12:56:58 PM10/21/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Fri, Oct 18, 2019 at 09:10:28AM -0700, Sami Tolvanen wrote:
> Don't lose the current task's shadow stack when the CPU is suspended.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/arm64/mm/proc.S | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index fdabf40a83c8..9a8bd4bc8549 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -73,6 +73,9 @@ alternative_endif
> stp x8, x9, [x0, #48]
> stp x10, x11, [x0, #64]
> stp x12, x13, [x0, #80]
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + stp x18, xzr, [x0, #96]
> +#endif

This should have a corresponding change to cpu_suspend_ctx in
<asm/suspend.h>. Otherwise we're corrupting a portion of the stack.

Mark.

> ret
> ENDPROC(cpu_do_suspend)
>
> @@ -89,6 +92,9 @@ ENTRY(cpu_do_resume)
> ldp x9, x10, [x0, #48]
> ldp x11, x12, [x0, #64]
> ldp x13, x14, [x0, #80]
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + ldp x18, x19, [x0, #96]
> +#endif
> msr tpidr_el0, x2
> msr tpidrro_el0, x3
> msr contextidr_el1, x4
> --
> 2.23.0.866.gb869b98d4c-goog
>

Sami Tolvanen

unread,
Oct 21, 2019, 4:43:24 PM10/21/19
to Ard Biesheuvel, Nick Desaulniers, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Sun, Oct 20, 2019 at 11:12 PM Ard Biesheuvel
<ard.bie...@linaro.org> wrote:
> Also, please combine this patch with the one that reserves it
> conditionally, no point in having both in the same series.

Sure, I'll just drop this patch from v2 then and only reserve it with SCS.

Sami

Sami Tolvanen

unread,
Oct 21, 2019, 6:40:03 PM10/21/19
to Ard Biesheuvel, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Sun, Oct 20, 2019 at 11:20 PM Ard Biesheuvel
<ard.bie...@linaro.org> wrote:
> You'll have to elaborate a bit here and explain that this is
> sufficient, given that we run EFI runtime services with interrupts
> enabled.

I can add a note about this in v2. This is called with preemption
disabled and we have a separate interrupt shadow stack, so as far as I
can tell, this should be sufficient. Did you have concerns about this?

Sami

Sami Tolvanen

unread,
Oct 21, 2019, 6:43:27 PM10/21/19
to Mark Rutland, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Mon, Oct 21, 2019 at 9:56 AM Mark Rutland <mark.r...@arm.com> wrote:
> This should have a corresponding change to cpu_suspend_ctx in
> <asm/suspend.h>. Otherwise we're corrupting a portion of the stack.

Ugh, correct. I'll fix this in the next version. Thanks.

Sami

Ard Biesheuvel

unread,
Oct 22, 2019, 1:54:13 AM10/22/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
No concerns, but we should put the above clarification in the commit log.

Mark Rutland

unread,
Oct 22, 2019, 11:47:22 AM10/22/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
It's probably worth extending the comment above cpu_do_suspend to say:

| This must be kept in sync with struct cpu_suspend_ctx in
| <asm/suspend.h>

... to match what we have above struct cpu_suspend_ctx, and make this
more obvious in future.

Thanks,
Mark.

Mark Rutland

unread,
Oct 22, 2019, 12:00:20 PM10/22/19
to Nick Desaulniers, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Fri, Oct 18, 2019 at 02:23:10PM -0700, Nick Desaulniers wrote:
> On Fri, Oct 18, 2019 at 9:11 AM 'Sami Tolvanen' via Clang Built Linux
> <clang-bu...@googlegroups.com> wrote:
> >
> > Only reserve x18 with CONFIG_SHADOW_CALL_STACK. Note that all external
> > kernel modules must also have x18 reserved if the kernel uses SCS.
>
> Ah, ok. The tradeoff for maintainers to consider, either:
> 1. one less GPR for ALL kernel code or
> 2. remember not to use x18 in inline as lest you potentially break SCS

This option only affects compiler-generated code, so I don't think that
matters.

I think it's fine to say that we should always avoid the use of x18 in
hand-written assembly (with manual register allocation), while also
allowing the compiler to use x18 if we're not using SCS.

This can be folded into the earlier patch which always reserved x18.

> This patch is 2 (the earlier patch was 1). Maybe we don't write
> enough inline asm that this will be hard to remember, and we do have
> CI in Android to watch for this (on mainline, not sure about -next).

I think that we can trust the set of people who regularly review arm64
assembly to remember this. We could also document this somewhere -- we
might need to document other constraints or conventions for assembly
in preparation for livepatching and so on.

If we wanted to, we could periodically grep for x18 to find any illicit
usage.

Thanks,
Mark.

Kees Cook

unread,
Oct 22, 2019, 12:27:44 PM10/22/19
to Mark Rutland, Nick Desaulniers, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Laura Abbott, clang-built-linux, Kernel Hardening, Linux ARM, LKML
On Tue, Oct 22, 2019 at 05:00:10PM +0100, Mark Rutland wrote:
> If we wanted to, we could periodically grep for x18 to find any illicit
> usage.

Now we need objtool for arm64! :) (It seems CONFIG_HAVE_STACK_VALIDATION
is rather a narrow description for what objtool does now...)

--
Kees Cook

Mark Rutland

unread,
Oct 22, 2019, 12:28:35 PM10/22/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Fri, Oct 18, 2019 at 09:10:21AM -0700, Sami Tolvanen wrote:
> This change adds generic support for Clang's Shadow Call Stack, which
> uses a shadow stack to protect return addresses from being overwritten
> by an attacker. Details are available here:
>
> https://clang.llvm.org/docs/ShadowCallStack.html
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> Makefile | 6 ++
> arch/Kconfig | 39 ++++++++
> include/linux/compiler-clang.h | 2 +
> include/linux/compiler_types.h | 4 +
> include/linux/scs.h | 88 ++++++++++++++++++
> init/init_task.c | 6 ++
> init/main.c | 3 +
> kernel/Makefile | 1 +
> kernel/fork.c | 9 ++
> kernel/sched/core.c | 2 +
> kernel/sched/sched.h | 1 +
> kernel/scs.c | 162 +++++++++++++++++++++++++++++++++
> 12 files changed, 323 insertions(+)
> create mode 100644 include/linux/scs.h
> create mode 100644 kernel/scs.c
>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..e401fa500f62 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH
> KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> endif
>
> +ifdef CONFIG_SHADOW_CALL_STACK
> +KBUILD_CFLAGS += -fsanitize=shadow-call-stack
> +DISABLE_SCS := -fno-sanitize=shadow-call-stack
> +export DISABLE_SCS
> +endif

I think it would be preferable to follow the example of CC_FLAGS_FTRACE
so that this can be filtered out, e.g.

ifdef CONFIG_SHADOW_CALL_STACK
CFLAGS_SCS := -fsanitize=shadow-call-stack
KBUILD_CFLAGS += $(CFLAGS_SCS)
export CC_FLAGS_SCS
endif

... with removal being:

CFLAGS_REMOVE := $(CC_FLAGS_SCS)

... or:

CFLAGS_REMOVE_obj.o := $(CC_FLAGS_SCS)

That way you only need to define the flags once, so the enable and
disable falgs remain in sync by construction.

[...]

> +config ARCH_SUPPORTS_SHADOW_CALL_STACK
> + bool
> + help
> + An architecture should select this if it supports Clang's Shadow
> + Call Stack, has asm/scs.h, and implements runtime support for shadow
> + stack switching.
> +
> +config SHADOW_CALL_STACK_VMAP
> + def_bool n

A bool is default n, so you can just say bool here.

> + depends on SHADOW_CALL_STACK
> + help
> + Use virtually mapped shadow call stacks. Selecting this option
> + provides better stack exhaustion protection, but increases per-thread
> + memory consumption as a full page is allocated for each shadow stack.
> +
> +choice
> + prompt "Return-oriented programming (ROP) protection"
> + default ROP_PROTECTION_NONE
> + help
> + This option controls kernel protections against return-oriented
> + programming (ROP) attacks.

Are we expecting more options here in future?

> +config ROP_PROTECTION_NONE
> + bool "None"

IIUC this is for the benefit of the kretprobes Kconfig.

I think it would be better to make that depend on !SHADOW_CALL_STACK, as
it's plausible that we can add a different ROP protection mechanism that
is compatible with kretprobes.

> +config SHADOW_CALL_STACK
> + bool "Clang Shadow Call Stack"
> + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
> + depends on CC_IS_CLANG && CLANG_VERSION >= 70000

Is there a reason for an explicit version check rather than a
CC_HAS_<feature> check? e.g. was this available but broken in prior
versions of clang?

[...]

> +#define SCS_GFP (GFP_KERNEL | __GFP_ZERO)

Normally GFP_ is a prefix. For consistency, GFP_SCS would be preferable.

> +extern unsigned long init_shadow_call_stack[];

Do we need this exposed here? IIUC this is only assigned by assembly in
arch code.

[...]

> +void scs_set_init_magic(struct task_struct *tsk)
> +{
> + scs_save(tsk);
> + scs_set_magic(tsk);
> + scs_load(tsk);
> +}

Can we initialize this at compile time instead?

Thanks,
Mark.

Kees Cook

unread,
Oct 22, 2019, 12:30:56 PM10/22/19
to Mark Rutland, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Laura Abbott, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
^^^ was this meant to be CC_FLAGS_SCS here

> KBUILD_CFLAGS += $(CFLAGS_SCS)
^^^ and here?

> export CC_FLAGS_SCS
> endif
>
> ... with removal being:
>
> CFLAGS_REMOVE := $(CC_FLAGS_SCS)
>
> ... or:
>
> CFLAGS_REMOVE_obj.o := $(CC_FLAGS_SCS)
>
> That way you only need to define the flags once, so the enable and
> disable falgs remain in sync by construction.
>
> [...]

--
Kees Cook

Mark Rutland

unread,
Oct 22, 2019, 12:49:49 PM10/22/19
to Kees Cook, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Laura Abbott, Nick Desaulniers, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Tue, Oct 22, 2019 at 09:30:53AM -0700, Kees Cook wrote:
> On Tue, Oct 22, 2019 at 05:28:27PM +0100, Mark Rutland wrote:
> > On Fri, Oct 18, 2019 at 09:10:21AM -0700, Sami Tolvanen wrote:
> > > +ifdef CONFIG_SHADOW_CALL_STACK
> > > +KBUILD_CFLAGS += -fsanitize=shadow-call-stack
> > > +DISABLE_SCS := -fno-sanitize=shadow-call-stack
> > > +export DISABLE_SCS
> > > +endif
> >
> > I think it would be preferable to follow the example of CC_FLAGS_FTRACE
> > so that this can be filtered out, e.g.
> >
> > ifdef CONFIG_SHADOW_CALL_STACK
> > CFLAGS_SCS := -fsanitize=shadow-call-stack
> ^^^ was this meant to be CC_FLAGS_SCS here
>
> > KBUILD_CFLAGS += $(CFLAGS_SCS)
> ^^^ and here?

Whoops; yes in both cases...

> > export CC_FLAGS_SCS
> > endif
> >
> > ... with removal being:
> >
> > CFLAGS_REMOVE := $(CC_FLAGS_SCS)
> >
> > ... or:
> >
> > CFLAGS_REMOVE_obj.o := $(CC_FLAGS_SCS)
> >
> > That way you only need to define the flags once, so the enable and
> > disable falgs remain in sync by construction.
^^^^^ "flags" here, too.

Mark.

Marc Zyngier

unread,
Oct 22, 2019, 1:22:33 PM10/22/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Mark Rutland, Kees Cook, kernel-h...@lists.openwall.com, Nick Desaulniers, linux-...@vger.kernel.org, clang-bu...@googlegroups.com, Laura Abbott, Dave Martin, linux-ar...@lists.infradead.org
On Fri, 18 Oct 2019 09:10:18 -0700
Sami Tolvanen <samito...@google.com> wrote:

> From: Ard Biesheuvel <ard.bie...@linaro.org>
>
> In preparation of using x18 as a task struct pointer register when
> running in the kernel, stop treating it as caller save in the KVM
> guest entry/exit code. Currently, the code assumes there is no need
> to preserve it for the host, given that it would have been assumed
> clobbered anyway by the function call to __guest_enter(). Instead,
> preserve its value and restore it upon return.
>
> Link: https://patchwork.kernel.org/patch/9836891/
> Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> ---
> arch/arm64/kvm/hyp/entry.S | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index e5cc8d66bf53..20bd9a20ea27 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -23,6 +23,7 @@
> .pushsection .hyp.text, "ax"
>
> .macro save_callee_saved_regs ctxt
> + str x18, [\ctxt, #CPU_XREG_OFFSET(18)]
> stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
> stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
> stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
> @@ -38,6 +39,7 @@
> ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
> ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
> ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)]
> + ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)]

There is now an assumption that ctxt is x18 (otherwise why would it be
out of order?). Please add a comment to that effect.

> .endm
>
> /*
> @@ -87,12 +89,9 @@ alternative_else_nop_endif
> ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
>
> - // Restore guest regs x19-x29, lr
> + // Restore guest regs x18-x29, lr
> restore_callee_saved_regs x18

Or you could elect another register such as x29 as the base, and keep
the above in a reasonable order.

>
> - // Restore guest reg x18
> - ldr x18, [x18, #CPU_XREG_OFFSET(18)]
> -
> // Do not touch any register after this!
> eret
> sb
> @@ -114,7 +113,7 @@ ENTRY(__guest_exit)
> // Retrieve the guest regs x0-x1 from the stack
> ldp x2, x3, [sp], #16 // x0, x1
>
> - // Store the guest regs x0-x1 and x4-x18
> + // Store the guest regs x0-x1 and x4-x17
> stp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
> stp x4, x5, [x1, #CPU_XREG_OFFSET(4)]
> stp x6, x7, [x1, #CPU_XREG_OFFSET(6)]
> @@ -123,9 +122,8 @@ ENTRY(__guest_exit)
> stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
> stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
> stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
> - str x18, [x1, #CPU_XREG_OFFSET(18)]
>
> - // Store the guest regs x19-x29, lr
> + // Store the guest regs x18-x29, lr
> save_callee_saved_regs x1
>
> get_host_ctxt x2, x3

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Sami Tolvanen

unread,
Oct 22, 2019, 3:26:16 PM10/22/19
to Mark Rutland, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Tue, Oct 22, 2019 at 9:28 AM Mark Rutland <mark.r...@arm.com> wrote:
> I think it would be preferable to follow the example of CC_FLAGS_FTRACE
> so that this can be filtered out, e.g.
>
> ifdef CONFIG_SHADOW_CALL_STACK
> CFLAGS_SCS := -fsanitize=shadow-call-stack
> KBUILD_CFLAGS += $(CFLAGS_SCS)
> export CC_FLAGS_SCS
> endif

Sure, SGTM.

> > +choice
> > + prompt "Return-oriented programming (ROP) protection"
> > + default ROP_PROTECTION_NONE
> > + help
> > + This option controls kernel protections against return-oriented
> > + programming (ROP) attacks.
>
> Are we expecting more options here in future?

Yes, I believe we'd be interested in seeing PAC support too once
hardware is more readily available.

> I think it would be better to ./make that depend on !SHADOW_CALL_STACK, as
> it's plausible that we can add a different ROP protection mechanism that
> is compatible with kretprobes.

OK, I can change that and remove the choice. We can always add it back
when other alternatives are added.

> > +config SHADOW_CALL_STACK
> > + bool "Clang Shadow Call Stack"
> > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 70000
>
> Is there a reason for an explicit version check rather than a
> CC_HAS_<feature> check? e.g. was this available but broken in prior
> versions of clang?

No, this feature was added in Clang 7. However,
-fsanitize=shadow-call-stack might require architecture-specific
flags, so a simple $(cc-option, -fsanitize=shadow-call-stack) in
arch/Kconfig is not going to work. I could add something like this to
arch/arm64/Kconfig though:

select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
...
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)

And then drop CC_IS_CLANG and version check entirely. Thoughts?

> > +#define SCS_GFP (GFP_KERNEL | __GFP_ZERO)
>
> Normally GFP_ is a prefix. For consistency, GFP_SCS would be preferable.

Ack.

> > +extern unsigned long init_shadow_call_stack[];
>
> Do we need this exposed here? IIUC this is only assigned by assembly in
> arch code.

True, it's not needed.

> [...]
>
> > +void scs_set_init_magic(struct task_struct *tsk)
> > +{
> > + scs_save(tsk);
> > + scs_set_magic(tsk);
> > + scs_load(tsk);
> > +}
>
> Can we initialize this at compile time instead?

We can. I'll change this and drop the function.


Sami

Sami Tolvanen

unread,
Oct 22, 2019, 5:45:58 PM10/22/19
to Marc Zyngier, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Mark Rutland, Kees Cook, Kernel Hardening, Nick Desaulniers, LKML, clang-built-linux, Laura Abbott, Dave Martin, linux-arm-kernel
On Tue, Oct 22, 2019 at 10:22 AM Marc Zyngier <m...@kernel.org> wrote:
> > .macro save_callee_saved_regs ctxt
> > + str x18, [\ctxt, #CPU_XREG_OFFSET(18)]
> > stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
> > stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
> > stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
> > @@ -38,6 +39,7 @@
> > ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
> > ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
> > ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)]
> > + ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)]
>
> There is now an assumption that ctxt is x18 (otherwise why would it be
> out of order?). Please add a comment to that effect.

> > - // Restore guest regs x19-x29, lr
> > + // Restore guest regs x18-x29, lr
> > restore_callee_saved_regs x18
>
> Or you could elect another register such as x29 as the base, and keep
> the above in a reasonable order.

I'm fine with either option. Ard, any thoughts?

Sami

Sami Tolvanen

unread,
Oct 23, 2019, 12:59:23 PM10/23/19
to Mark Rutland, Masahiro Yamada, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Tue, Oct 22, 2019 at 9:28 AM Mark Rutland <mark.r...@arm.com> wrote:
> I think it would be preferable to follow the example of CC_FLAGS_FTRACE
> so that this can be filtered out, e.g.
>
> ifdef CONFIG_SHADOW_CALL_STACK
> CFLAGS_SCS := -fsanitize=shadow-call-stack
> KBUILD_CFLAGS += $(CFLAGS_SCS)
> export CC_FLAGS_SCS
> endif
>
> ... with removal being:
>
> CFLAGS_REMOVE := $(CC_FLAGS_SCS)
>
> ... or:
>
> CFLAGS_REMOVE_obj.o := $(CC_FLAGS_SCS)
>
> That way you only need to define the flags once, so the enable and
> disable falgs remain in sync by construction.

CFLAGS_REMOVE appears to be only implemented for objects, which means
there's no convenient way to filter out flags for everything in
arch/arm64/kvm/hyp, for example. I could add a CFLAGS_REMOVE
separately for each object file, or we could add something like
ccflags-remove-y to complement ccflags-y, which should be relatively
simple. Masahiro, do you have any suggestions?

Sami

Masahiro Yamada

unread,
Oct 23, 2019, 9:48:20 PM10/23/19
to Sami Tolvanen, Mark Rutland, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
I am fine with 'ccflags-remove-y'.

Thanks.


--
Best Regards
Masahiro Yamada

Steven Rostedt

unread,
Oct 24, 2019, 8:04:22 AM10/24/19
to Sami Tolvanen, Mark Rutland, Masahiro Yamada, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
You can remove a CFLAGS for a whole directory. lib, kernel/trace and
others do this. Look at kernel/trace/Makefile, we have:

ORIG_CFLAGS := $(KBUILD_CFLAGS)
KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))

Where it removes CC_FLAGS_FTRACE from CFLAGS for all objects in the
directory.

-- Steve

Mark Rutland

unread,
Oct 24, 2019, 9:28:38 AM10/24/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Tue, Oct 22, 2019 at 12:26:02PM -0700, Sami Tolvanen wrote:
> On Tue, Oct 22, 2019 at 9:28 AM Mark Rutland <mark.r...@arm.com> wrote:

> > > +config SHADOW_CALL_STACK
> > > + bool "Clang Shadow Call Stack"
> > > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
> > > + depends on CC_IS_CLANG && CLANG_VERSION >= 70000
> >
> > Is there a reason for an explicit version check rather than a
> > CC_HAS_<feature> check? e.g. was this available but broken in prior
> > versions of clang?
>
> No, this feature was added in Clang 7. However,
> -fsanitize=shadow-call-stack might require architecture-specific
> flags, so a simple $(cc-option, -fsanitize=shadow-call-stack) in
> arch/Kconfig is not going to work. I could add something like this to
> arch/arm64/Kconfig though:
>
> select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
> ...
> config CC_HAVE_SHADOW_CALL_STACK
> def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
>
> And then drop CC_IS_CLANG and version check entirely. Thoughts?

That sounds good to me, yes!

Thanks,
Mark.

Masahiro Yamada

unread,
Oct 24, 2019, 10:39:18 AM10/24/19
to Mark Rutland, Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
If you use cc-option, please add a comment like

# supported by Clang 7 or later.


I do not know the minimal supported clang version.
When we bump the minimal version to clang 7,
we can drop the cc-option test entirely.

Sami Tolvanen

unread,
Oct 24, 2019, 6:17:21 PM10/24/19
to Steven Rostedt, Mark Rutland, Masahiro Yamada, Will Deacon, Catalin Marinas, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Thu, Oct 24, 2019 at 5:04 AM Steven Rostedt <ros...@goodmis.org> wrote:
> You can remove a CFLAGS for a whole directory. lib, kernel/trace and
> others do this. Look at kernel/trace/Makefile, we have:
>
> ORIG_CFLAGS := $(KBUILD_CFLAGS)
> KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))

That definitely looks less invasive in this case than adding
ccflags-remove-y, since we only really need this for one directory.
I'll use this in v2. Thanks, Steven.

Sami

samito...@google.com

unread,
Oct 24, 2019, 6:51:37 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This patch series adds support for Clang's Shadow Call Stack
(SCS) mitigation, which uses a separately allocated shadow stack
to protect against return address overwrites. More information
can be found here:

https://clang.llvm.org/docs/ShadowCallStack.html

SCS provides better protection against traditional buffer
overflows than CONFIG_STACKPROTECTOR_*, but it should be noted
that SCS security guarantees in the kernel differ from the ones
documented for user space. The kernel must store addresses of
shadow stacks used by other tasks and interrupt handlers in
memory, which means an attacker capable reading and writing
arbitrary memory may be able to locate them and hijack control
flow by modifying shadow stacks that are not currently in use.

SCS is currently supported only on arm64, where the compiler
requires the x18 register to be reserved for holding the current
task's shadow stack pointer. Because of this, the series includes
patches from Ard to remove x18 usage from assembly code.

With -fsanitize=shadow-call-stack, the compiler injects
instructions to all non-leaf C functions to store the return
address to the shadow stack, and unconditionally load it again
before returning. As a result, SCS is currently incompatible
with features that rely on modifying function return addresses
to alter control flow, such as function graph tracing and
kretprobes, although it may be possible to later change these
feature to modify the shadow stack instead. A copy of the return
address is still kept in the kernel stack for compatibility with
stack unwinding, for example.

SCS has a minimal performance overhead, but allocating
shadow stacks increases kernel memory usage. The feature is
therefore mostly useful on hardware that lacks support for PAC
instructions.

Changes in v2:
- Changed Ard's KVM patch to use x29 instead of x18 for the
guest context, which makes restore_callee_saved_regs cleaner
- Updated help text (and commit messages) to point out
differences in security properties compared to user space SCS
- Cleaned up config options: removed the ROP protection choice,
replaced the CC_IS_CLANG dependency with an arch-specific
cc-option test, and moved disabling of incompatible config
options to an arch-specific Kconfig
- Added CC_FLAGS_SCS, which are filtered out where needed
instead of using DISABLE_SCS
- Added a __has_feature guard around __noscs for older clang
versions
- Changed the shadow stack overflow check for vmapped SCS to
use SCS_SIZE to avoid surprises when changing configs
- Renamed SCS_GFP to GFP_SCS
- Dropped the patch to reserve x18 unconditionally, it's now
only reserved with SCS
- Added a clarification why restoring x18 in the EFI RT
wrapper is safe
- Added a missing change to arch/arm64/include/asm/suspend.h,
and a comment to arch/arm64/mm/proc.S to remind that struct
cpu_suspend_ctx must be kept in sync with the code
- Moved x18 loading/storing during a context switch to
cpu_switch_to(), renamed scs_thread_switch() to
scs_overflow_check(), and removed the now unused scs_load()
- Added compile-time initialization for init_shadow_call_stack
and removed scs_set_init_magic()


Ard Biesheuvel (2):
arm64/lib: copy_page: avoid x18 register in assembler code
arm64: kernel: avoid x18 as an arbitrary temp register

Sami Tolvanen (15):
arm64: mm: don't use x18 in idmap_kpti_install_ng_mappings
arm64: kvm: stop treating register x18 as caller save
add support for Clang's Shadow Call Stack (SCS)
scs: add accounting
scs: add support for stack usage debugging
kprobes: fix compilation without CONFIG_KRETPROBES
arm64: disable function graph tracing with SCS
arm64: disable kretprobes with SCS
arm64: reserve x18 from general allocation with SCS
arm64: preserve x18 when CPU is suspended
arm64: efi: restore x18 if it was corrupted
arm64: vdso: disable Shadow Call Stack
arm64: kprobes: fix kprobes without CONFIG_KRETPROBES
arm64: disable SCS for hypervisor code
arm64: implement Shadow Call Stack

Makefile | 6 +
arch/Kconfig | 33 +++++
arch/arm64/Kconfig | 9 +-
arch/arm64/Makefile | 4 +
arch/arm64/include/asm/scs.h | 45 ++++++
arch/arm64/include/asm/stacktrace.h | 4 +
arch/arm64/include/asm/suspend.h | 2 +-
arch/arm64/include/asm/thread_info.h | 3 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kernel/cpu-reset.S | 4 +-
arch/arm64/kernel/efi-rt-wrapper.S | 7 +-
arch/arm64/kernel/entry.S | 28 ++++
arch/arm64/kernel/head.S | 9 ++
arch/arm64/kernel/irq.c | 2 +
arch/arm64/kernel/probes/kprobes.c | 2 +
arch/arm64/kernel/process.c | 2 +
arch/arm64/kernel/scs.c | 39 +++++
arch/arm64/kernel/smp.c | 4 +
arch/arm64/kernel/vdso/Makefile | 2 +-
arch/arm64/kvm/hyp/Makefile | 3 +
arch/arm64/kvm/hyp/entry.S | 41 +++--
arch/arm64/lib/copy_page.S | 38 ++---
arch/arm64/mm/proc.S | 72 +++++----
drivers/base/node.c | 6 +
fs/proc/meminfo.c | 4 +
include/linux/compiler-clang.h | 6 +
include/linux/compiler_types.h | 4 +
include/linux/mmzone.h | 3 +
include/linux/scs.h | 78 ++++++++++
init/init_task.c | 8 +
kernel/Makefile | 1 +
kernel/fork.c | 9 ++
kernel/kprobes.c | 38 ++---
kernel/sched/core.c | 2 +
kernel/sched/sched.h | 1 +
kernel/scs.c | 214 +++++++++++++++++++++++++++
mm/page_alloc.c | 6 +
mm/vmstat.c | 3 +
39 files changed, 649 insertions(+), 97 deletions(-)
create mode 100644 arch/arm64/include/asm/scs.h
create mode 100644 arch/arm64/kernel/scs.c
create mode 100644 include/linux/scs.h
create mode 100644 kernel/scs.c

--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:40 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
idmap_kpti_install_ng_mappings uses x18 as a temporary register, which
will result in a conflict when x18 is reserved. Use x16 and x17 instead
where needed.

Signed-off-by: Sami Tolvanen <samito...@google.com>
Reviewed-by: Nick Desaulniers <ndesau...@google.com>
---
arch/arm64/mm/proc.S | 63 ++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index a1e0592d1fbc..fdabf40a83c8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -250,15 +250,15 @@ ENTRY(idmap_kpti_install_ng_mappings)
/* We're the boot CPU. Wait for the others to catch up */
sevl
1: wfe
- ldaxr w18, [flag_ptr]
- eor w18, w18, num_cpus
- cbnz w18, 1b
+ ldaxr w17, [flag_ptr]
+ eor w17, w17, num_cpus
+ cbnz w17, 1b

/* We need to walk swapper, so turn off the MMU. */
pre_disable_mmu_workaround
- mrs x18, sctlr_el1
- bic x18, x18, #SCTLR_ELx_M
- msr sctlr_el1, x18
+ mrs x17, sctlr_el1
+ bic x17, x17, #SCTLR_ELx_M
+ msr sctlr_el1, x17
isb

/* Everybody is enjoying the idmap, so we can rewrite swapper. */
@@ -281,9 +281,9 @@ skip_pgd:
isb

/* We're done: fire up the MMU again */
- mrs x18, sctlr_el1
- orr x18, x18, #SCTLR_ELx_M
- msr sctlr_el1, x18
+ mrs x17, sctlr_el1
+ orr x17, x17, #SCTLR_ELx_M
+ msr sctlr_el1, x17
isb

/*
@@ -353,46 +353,47 @@ skip_pte:
b.ne do_pte
b next_pmd

+ .unreq cpu
+ .unreq num_cpus
+ .unreq swapper_pa
+ .unreq cur_pgdp
+ .unreq end_pgdp
+ .unreq pgd
+ .unreq cur_pudp
+ .unreq end_pudp
+ .unreq pud
+ .unreq cur_pmdp
+ .unreq end_pmdp
+ .unreq pmd
+ .unreq cur_ptep
+ .unreq end_ptep
+ .unreq pte
+
/* Secondary CPUs end up here */
__idmap_kpti_secondary:
/* Uninstall swapper before surgery begins */
- __idmap_cpu_set_reserved_ttbr1 x18, x17
+ __idmap_cpu_set_reserved_ttbr1 x16, x17

/* Increment the flag to let the boot CPU we're ready */
-1: ldxr w18, [flag_ptr]
- add w18, w18, #1
- stxr w17, w18, [flag_ptr]
+1: ldxr w16, [flag_ptr]
+ add w16, w16, #1
+ stxr w17, w16, [flag_ptr]
cbnz w17, 1b

/* Wait for the boot CPU to finish messing around with swapper */
sevl
1: wfe
- ldxr w18, [flag_ptr]
- cbnz w18, 1b
+ ldxr w16, [flag_ptr]
+ cbnz w16, 1b

/* All done, act like nothing happened */
- offset_ttbr1 swapper_ttb, x18
+ offset_ttbr1 swapper_ttb, x16
msr ttbr1_el1, swapper_ttb
isb
ret

- .unreq cpu
- .unreq num_cpus
- .unreq swapper_pa
.unreq swapper_ttb
.unreq flag_ptr
- .unreq cur_pgdp
- .unreq end_pgdp
- .unreq pgd
- .unreq cur_pudp
- .unreq end_pudp
- .unreq pud
- .unreq cur_pmdp
- .unreq end_pmdp
- .unreq pmd
- .unreq cur_ptep
- .unreq end_ptep
- .unreq pte
ENDPROC(idmap_kpti_install_ng_mappings)
.popsection
#endif
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:43 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

Register x18 will no longer be used as a caller save register in the
future, so stop using it in the copy_page() code.

Link: https://patchwork.kernel.org/patch/9836869/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/lib/copy_page.S | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index bbb8562396af..8b562264c165 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -34,45 +34,45 @@ alternative_else_nop_endif
ldp x14, x15, [x1, #96]
ldp x16, x17, [x1, #112]

- mov x18, #(PAGE_SIZE - 128)
+ add x0, x0, #256
add x1, x1, #128
1:
- subs x18, x18, #128
+ tst x0, #(PAGE_SIZE - 1)

alternative_if ARM64_HAS_NO_HW_PREFETCH
prfm pldl1strm, [x1, #384]
alternative_else_nop_endif

- stnp x2, x3, [x0]
+ stnp x2, x3, [x0, #-256]
ldp x2, x3, [x1]
- stnp x4, x5, [x0, #16]
+ stnp x4, x5, [x0, #-240]
ldp x4, x5, [x1, #16]
- stnp x6, x7, [x0, #32]
+ stnp x6, x7, [x0, #-224]
ldp x6, x7, [x1, #32]
- stnp x8, x9, [x0, #48]
+ stnp x8, x9, [x0, #-208]
ldp x8, x9, [x1, #48]
- stnp x10, x11, [x0, #64]
+ stnp x10, x11, [x0, #-192]
ldp x10, x11, [x1, #64]
- stnp x12, x13, [x0, #80]
+ stnp x12, x13, [x0, #-176]
ldp x12, x13, [x1, #80]
- stnp x14, x15, [x0, #96]
+ stnp x14, x15, [x0, #-160]
ldp x14, x15, [x1, #96]
- stnp x16, x17, [x0, #112]
+ stnp x16, x17, [x0, #-144]
ldp x16, x17, [x1, #112]

add x0, x0, #128
add x1, x1, #128

- b.gt 1b
+ b.ne 1b

- stnp x2, x3, [x0]
- stnp x4, x5, [x0, #16]
- stnp x6, x7, [x0, #32]
- stnp x8, x9, [x0, #48]
- stnp x10, x11, [x0, #64]
- stnp x12, x13, [x0, #80]
- stnp x14, x15, [x0, #96]
- stnp x16, x17, [x0, #112]
+ stnp x2, x3, [x0, #-256]
+ stnp x4, x5, [x0, #-240]
+ stnp x6, x7, [x0, #-224]
+ stnp x8, x9, [x0, #-208]
+ stnp x10, x11, [x0, #-192]
+ stnp x12, x13, [x0, #-176]
+ stnp x14, x15, [x0, #-160]
+ stnp x16, x17, [x0, #-144]

ret
ENDPROC(copy_page)
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:46 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
In preparation of reserving x18, stop treating it as caller save in
the KVM guest entry/exit code. Currently, the code assumes there is
no need to preserve it for the host, given that it would have been
assumed clobbered anyway by the function call to __guest_enter().
Instead, preserve its value and restore it upon return.

Co-developed-by: Ard Biesheuvel <ard.bie...@linaro.org>
Link: https://patchwork.kernel.org/patch/9836891/
[ updated commit message, switched from x18 to x29 for the guest context ]
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kvm/hyp/entry.S | 41 +++++++++++++++++++-------------------
1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e5cc8d66bf53..c3c2d842c609 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -23,6 +23,7 @@
.pushsection .hyp.text, "ax"

.macro save_callee_saved_regs ctxt
+ str x18, [\ctxt, #CPU_XREG_OFFSET(18)]
stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
@@ -32,6 +33,8 @@
.endm

.macro restore_callee_saved_regs ctxt
+ // We assume \ctxt is not x18-x28
+ ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)]
ldp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
ldp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
ldp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
@@ -48,7 +51,7 @@ ENTRY(__guest_enter)
// x0: vcpu
// x1: host context
// x2-x17: clobbered by macros
- // x18: guest context
+ // x29: guest context

// Store the host regs
save_callee_saved_regs x1
@@ -67,31 +70,28 @@ alternative_else_nop_endif
ret

1:
- add x18, x0, #VCPU_CONTEXT
+ add x29, x0, #VCPU_CONTEXT

// Macro ptrauth_switch_to_guest format:
// ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
// The below macro to restore guest keys is not implemented in C code
// as it may cause Pointer Authentication key signing mismatch errors
// when this feature is enabled for kernel code.
- ptrauth_switch_to_guest x18, x0, x1, x2
+ ptrauth_switch_to_guest x29, x0, x1, x2

// Restore guest regs x0-x17
- ldp x0, x1, [x18, #CPU_XREG_OFFSET(0)]
- ldp x2, x3, [x18, #CPU_XREG_OFFSET(2)]
- ldp x4, x5, [x18, #CPU_XREG_OFFSET(4)]
- ldp x6, x7, [x18, #CPU_XREG_OFFSET(6)]
- ldp x8, x9, [x18, #CPU_XREG_OFFSET(8)]
- ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
- ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
- ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
- ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
-
- // Restore guest regs x19-x29, lr
- restore_callee_saved_regs x18
-
- // Restore guest reg x18
- ldr x18, [x18, #CPU_XREG_OFFSET(18)]
+ ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)]
+ ldp x2, x3, [x29, #CPU_XREG_OFFSET(2)]
+ ldp x4, x5, [x29, #CPU_XREG_OFFSET(4)]
+ ldp x6, x7, [x29, #CPU_XREG_OFFSET(6)]
+ ldp x8, x9, [x29, #CPU_XREG_OFFSET(8)]
+ ldp x10, x11, [x29, #CPU_XREG_OFFSET(10)]
+ ldp x12, x13, [x29, #CPU_XREG_OFFSET(12)]
+ ldp x14, x15, [x29, #CPU_XREG_OFFSET(14)]
+ ldp x16, x17, [x29, #CPU_XREG_OFFSET(16)]
+
+ // Restore guest regs x18-x29, lr
+ restore_callee_saved_regs x29

// Do not touch any register after this!
eret
@@ -114,7 +114,7 @@ ENTRY(__guest_exit)
// Retrieve the guest regs x0-x1 from the stack
ldp x2, x3, [sp], #16 // x0, x1

- // Store the guest regs x0-x1 and x4-x18
+ // Store the guest regs x0-x1 and x4-x17
stp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
stp x4, x5, [x1, #CPU_XREG_OFFSET(4)]
stp x6, x7, [x1, #CPU_XREG_OFFSET(6)]
@@ -123,9 +123,8 @@ ENTRY(__guest_exit)
stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
- str x18, [x1, #CPU_XREG_OFFSET(18)]

- // Store the guest regs x19-x29, lr
+ // Store the guest regs x18-x29, lr
save_callee_saved_regs x1

get_host_ctxt x2, x3
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:51 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
From: Ard Biesheuvel <ard.bie...@linaro.org>

The code in __cpu_soft_restart() uses x18 as an arbitrary temp register,
which will shortly be disallowed. So use x8 instead.

Link: https://patchwork.kernel.org/patch/9836877/
Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/cpu-reset.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 6ea337d464c4..32c7bf858dd9 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -42,11 +42,11 @@ ENTRY(__cpu_soft_restart)
mov x0, #HVC_SOFT_RESTART
hvc #0 // no return

-1: mov x18, x1 // entry
+1: mov x8, x1 // entry
mov x0, x2 // arg0
mov x1, x3 // arg1
mov x2, x4 // arg2
- br x18
+ br x8
ENDPROC(__cpu_soft_restart)

.popsection
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:56 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change adds generic support for Clang's Shadow Call Stack,
which uses a shadow stack to protect return addresses from being
overwritten by an attacker. Details are available here:

https://clang.llvm.org/docs/ShadowCallStack.html

Note that security guarantees in the kernel differ from the
ones documented for user space. The kernel must store addresses
of shadow stacks used by other tasks and interrupt handlers in
memory, which means an attacker capable reading and writing
arbitrary memory may be able to locate them and hijack control
flow by modifying shadow stacks that are not currently in use.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
Makefile | 6 ++
arch/Kconfig | 33 +++++++
include/linux/compiler-clang.h | 6 ++
include/linux/compiler_types.h | 4 +
include/linux/scs.h | 78 +++++++++++++++++
init/init_task.c | 8 ++
kernel/Makefile | 1 +
kernel/fork.c | 9 ++
kernel/sched/core.c | 2 +
kernel/sched/sched.h | 1 +
kernel/scs.c | 155 +++++++++++++++++++++++++++++++++
11 files changed, 303 insertions(+)
create mode 100644 include/linux/scs.h
create mode 100644 kernel/scs.c

diff --git a/Makefile b/Makefile
index 5475cdb6d57d..2b5c59fb18f2 100644
--- a/Makefile
+++ b/Makefile
@@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH
KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
endif

+ifdef CONFIG_SHADOW_CALL_STACK
+CC_FLAGS_SCS := -fsanitize=shadow-call-stack
+KBUILD_CFLAGS += $(CC_FLAGS_SCS)
+export CC_FLAGS_SCS
+endif
+
# arch Makefile may override CC so keep this after arch Makefile is included
NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)

diff --git a/arch/Kconfig b/arch/Kconfig
index 5f8a5d84dbbe..5e34cbcd8d6a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG
about 20% of all kernel functions, which increases the kernel code
size by about 2%.

+config ARCH_SUPPORTS_SHADOW_CALL_STACK
+ bool
+ help
+ An architecture should select this if it supports Clang's Shadow
+ Call Stack, has asm/scs.h, and implements runtime support for shadow
+ stack switching.
+
+config SHADOW_CALL_STACK_VMAP
+ bool
+ depends on SHADOW_CALL_STACK
+ help
+ Use virtually mapped shadow call stacks. Selecting this option
+ provides better stack exhaustion protection, but increases per-thread
+ memory consumption as a full page is allocated for each shadow stack.
+
+config SHADOW_CALL_STACK
+ bool "Clang Shadow Call Stack"
+ depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
+ help
+ This option enables Clang's Shadow Call Stack, which uses a
+ shadow stack to protect function return addresses from being
+ overwritten by an attacker. More information can be found from
+ Clang's documentation:
+
+ https://clang.llvm.org/docs/ShadowCallStack.html
+
+ Note that security guarantees in the kernel differ from the ones
+ documented for user space. The kernel must store addresses of shadow
+ stacks used by other tasks and interrupt handlers in memory, which
+ means an attacker capable reading and writing arbitrary memory may
+ be able to locate them and hijack control flow by modifying shadow
+ stacks that are not currently in use.
+
config HAVE_ARCH_WITHIN_STACK_FRAMES
bool
help
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 333a6695a918..afe5e24088b2 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -42,3 +42,9 @@
* compilers, like ICC.
*/
#define barrier() __asm__ __volatile__("" : : : "memory")
+
+#if __has_feature(shadow_call_stack)
+# define __noscs __attribute__((no_sanitize("shadow-call-stack")))
+#else
+# define __noscs
+#endif
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 72393a8c1a6c..be5d5be4b1ae 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -202,6 +202,10 @@ struct ftrace_likely_data {
# define randomized_struct_fields_end
#endif

+#ifndef __noscs
+# define __noscs
+#endif
+
#ifndef asm_volatile_goto
#define asm_volatile_goto(x...) asm goto(x)
#endif
diff --git a/include/linux/scs.h b/include/linux/scs.h
new file mode 100644
index 000000000000..c8b0ccfdd803
--- /dev/null
+++ b/include/linux/scs.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2018 Google LLC
+ */
+
+#ifndef _LINUX_SCS_H
+#define _LINUX_SCS_H
+
+#include <linux/gfp.h>
+#include <linux/sched.h>
+#include <asm/page.h>
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+
+#define SCS_SIZE 1024
+#define SCS_END_MAGIC 0xaf0194819b1635f6UL
+
+#define GFP_SCS (GFP_KERNEL | __GFP_ZERO)
+
+static inline void *task_scs(struct task_struct *tsk)
+{
+ return task_thread_info(tsk)->shadow_call_stack;
+}
+
+static inline void task_set_scs(struct task_struct *tsk, void *s)
+{
+ task_thread_info(tsk)->shadow_call_stack = s;
+}
+
+extern void scs_init(void);
+extern void scs_task_init(struct task_struct *tsk);
+extern void scs_task_reset(struct task_struct *tsk);
+extern int scs_prepare(struct task_struct *tsk, int node);
+extern bool scs_corrupted(struct task_struct *tsk);
+extern void scs_release(struct task_struct *tsk);
+
+#else /* CONFIG_SHADOW_CALL_STACK */
+
+static inline void *task_scs(struct task_struct *tsk)
+{
+ return 0;
+}
+
+static inline void task_set_scs(struct task_struct *tsk, void *s)
+{
+}
+
+static inline void scs_init(void)
+{
+}
+
+static inline void scs_task_init(struct task_struct *tsk)
+{
+}
+
+static inline void scs_task_reset(struct task_struct *tsk)
+{
+}
+
+static inline int scs_prepare(struct task_struct *tsk, int node)
+{
+ return 0;
+}
+
+static inline bool scs_corrupted(struct task_struct *tsk)
+{
+ return false;
+}
+
+static inline void scs_release(struct task_struct *tsk)
+{
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK */
+
+#endif /* _LINUX_SCS_H */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..cbd40460e903 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -11,6 +11,7 @@
#include <linux/mm.h>
#include <linux/audit.h>
#include <linux/numa.h>
+#include <linux/scs.h>

#include <asm/pgtable.h>
#include <linux/uaccess.h>
@@ -184,6 +185,13 @@ struct task_struct init_task
};
EXPORT_SYMBOL(init_task);

+#ifdef CONFIG_SHADOW_CALL_STACK
+unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data
+ __aligned(SCS_SIZE) = {
+ [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC
+};
+#endif
+
/*
* Initial thread structure. Alignment of this is handled by a special
* linker map entry.
diff --git a/kernel/Makefile b/kernel/Makefile
index daad787fb795..313dbd44d576 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..ae7ebe9f0586 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -94,6 +94,7 @@
#include <linux/livepatch.h>
#include <linux/thread_info.h>
#include <linux/stackleak.h>
+#include <linux/scs.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk)

void free_task(struct task_struct *tsk)
{
+ scs_release(tsk);
+
#ifndef CONFIG_THREAD_INFO_IN_TASK
/*
* The task is finally done with both the stack and thread_info,
@@ -834,6 +837,8 @@ void __init fork_init(void)
NULL, free_vm_stack_cache);
#endif

+ scs_init();
+
lockdep_init_task(&init_task);
uprobes_init();
}
@@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
set_task_stack_end_magic(tsk);
+ scs_task_init(tsk);

#ifdef CONFIG_STACKPROTECTOR
tsk->stack_canary = get_random_canary();
@@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process(
args->tls);
if (retval)
goto bad_fork_cleanup_io;
+ retval = scs_prepare(p, node);
+ if (retval)
+ goto bad_fork_cleanup_thread;

stackleak_task_init(p);

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd05a378631a..e7faeb383008 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu)
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);

+ scs_task_reset(idle);
+
__sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0db2c1b3361e..c153003a011c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -58,6 +58,7 @@
#include <linux/profile.h>
#include <linux/psi.h>
#include <linux/rcupdate_wait.h>
+#include <linux/scs.h>
#include <linux/security.h>
#include <linux/stop_machine.h>
#include <linux/suspend.h>
diff --git a/kernel/scs.c b/kernel/scs.c
new file mode 100644
index 000000000000..383d29e8c199
--- /dev/null
+++ b/kernel/scs.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2019 Google LLC
+ */
+
+#include <linux/cpuhotplug.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/scs.h>
+#include <linux/vmalloc.h>
+#include <asm/scs.h>
+
+static inline void *__scs_base(struct task_struct *tsk)
+{
+ return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1));
+}
+
+#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
+
+/* Keep a cache of shadow stacks */
+#define SCS_CACHE_SIZE 2
+static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]);
+
+static void *scs_alloc(int node)
+{
+ int i;
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ void *s;
+
+ s = this_cpu_xchg(scs_cache[i], NULL);
+ if (s) {
+ memset(s, 0, SCS_SIZE);
+ return s;
+ }
+ }
+
+ BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE);
+
+ return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE,
+ VMALLOC_START, VMALLOC_END,
+ GFP_SCS, PAGE_KERNEL, 0,
+ node, __builtin_return_address(0));
+}
+
+static void scs_free(void *s)
+{
+ int i;
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0)
+ continue;
+
+ return;
+ }
+
+ vfree_atomic(s);
+}
+
+static int scs_cleanup(unsigned int cpu)
+{
+ int i;
+ void **cache = per_cpu_ptr(scs_cache, cpu);
+
+ for (i = 0; i < SCS_CACHE_SIZE; i++) {
+ vfree(cache[i]);
+ cache[i] = NULL;
+ }
+
+ return 0;
+}
+
+void __init scs_init(void)
+{
+ cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL,
+ scs_cleanup);
+}
+
+#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */
+
+static struct kmem_cache *scs_cache;
+
+static inline void *scs_alloc(int node)
+{
+ return kmem_cache_alloc_node(scs_cache, GFP_SCS, node);
+}
+
+static inline void scs_free(void *s)
+{
+ kmem_cache_free(scs_cache, s);
+}
+
+void __init scs_init(void)
+{
+ scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE,
+ 0, NULL);
+ WARN_ON(!scs_cache);
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */
+
+static inline unsigned long *scs_magic(struct task_struct *tsk)
+{
+ return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long));
+}
+
+static inline void scs_set_magic(struct task_struct *tsk)
+{
+ *scs_magic(tsk) = SCS_END_MAGIC;
+}
+
+void scs_task_init(struct task_struct *tsk)
+{
+ task_set_scs(tsk, NULL);
+}
+
+void scs_task_reset(struct task_struct *tsk)
+{
+ task_set_scs(tsk, __scs_base(tsk));
+}
+
+int scs_prepare(struct task_struct *tsk, int node)
+{
+ void *s;
+
+ s = scs_alloc(node);
+ if (!s)
+ return -ENOMEM;
+
+ task_set_scs(tsk, s);
+ scs_set_magic(tsk);
+
+ return 0;
+}
+
+bool scs_corrupted(struct task_struct *tsk)
+{
+ return *scs_magic(tsk) != SCS_END_MAGIC;
+}
+
+void scs_release(struct task_struct *tsk)
+{
+ void *s;
+
+ s = __scs_base(tsk);
+ if (!s)
+ return;
+
+ WARN_ON(scs_corrupted(tsk));
+
+ scs_task_init(tsk);
+ scs_free(s);
+}
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:51:59 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change adds accounting for the memory allocated for shadow stacks.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
drivers/base/node.c | 6 ++++++
fs/proc/meminfo.c | 4 ++++
include/linux/mmzone.h | 3 +++
kernel/scs.c | 20 ++++++++++++++++++++
mm/page_alloc.c | 6 ++++++
mm/vmstat.c | 3 +++
6 files changed, 42 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 296546ffed6c..111e58ec231e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -415,6 +415,9 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d AnonPages: %8lu kB\n"
"Node %d Shmem: %8lu kB\n"
"Node %d KernelStack: %8lu kB\n"
+#ifdef CONFIG_SHADOW_CALL_STACK
+ "Node %d ShadowCallStack:%8lu kB\n"
+#endif
"Node %d PageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
@@ -438,6 +441,9 @@ static ssize_t node_read_meminfo(struct device *dev,
nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
nid, K(i.sharedram),
nid, sum_zone_node_page_state(nid, NR_KERNEL_STACK_KB),
+#ifdef CONFIG_SHADOW_CALL_STACK
+ nid, sum_zone_node_page_state(nid, NR_KERNEL_SCS_BYTES) / 1024,
+#endif
nid, K(sum_zone_node_page_state(nid, NR_PAGETABLE)),
nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8c1f1bb1a5ce..49768005a79e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -103,6 +103,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
show_val_kb(m, "SUnreclaim: ", sunreclaim);
seq_printf(m, "KernelStack: %8lu kB\n",
global_zone_page_state(NR_KERNEL_STACK_KB));
+#ifdef CONFIG_SHADOW_CALL_STACK
+ seq_printf(m, "ShadowCallStack:%8lu kB\n",
+ global_zone_page_state(NR_KERNEL_SCS_BYTES) / 1024);
+#endif
show_val_kb(m, "PageTables: ",
global_zone_page_state(NR_PAGETABLE));

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bda20282746b..fcb8c1708f9e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -200,6 +200,9 @@ enum zone_stat_item {
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_PAGETABLE, /* used for pagetables */
NR_KERNEL_STACK_KB, /* measured in KiB */
+#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
+ NR_KERNEL_SCS_BYTES, /* measured in bytes */
+#endif
/* Second 128 byte cacheline */
NR_BOUNCE,
#if IS_ENABLED(CONFIG_ZSMALLOC)
diff --git a/kernel/scs.c b/kernel/scs.c
index 383d29e8c199..b9e6e225254f 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -7,9 +7,11 @@

#include <linux/cpuhotplug.h>
#include <linux/mm.h>
+#include <linux/mmzone.h>
#include <linux/slab.h>
#include <linux/scs.h>
#include <linux/vmalloc.h>
+#include <linux/vmstat.h>
#include <asm/scs.h>

static inline void *__scs_base(struct task_struct *tsk)
@@ -59,6 +61,11 @@ static void scs_free(void *s)
vfree_atomic(s);
}

+static struct page *__scs_page(struct task_struct *tsk)
+{
+ return vmalloc_to_page(__scs_base(tsk));
+}
+
static int scs_cleanup(unsigned int cpu)
{
int i;
@@ -92,6 +99,11 @@ static inline void scs_free(void *s)
kmem_cache_free(scs_cache, s);
}

+static struct page *__scs_page(struct task_struct *tsk)
+{
+ return virt_to_page(__scs_base(tsk));
+}
+
void __init scs_init(void)
{
scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE,
@@ -121,6 +133,12 @@ void scs_task_reset(struct task_struct *tsk)
task_set_scs(tsk, __scs_base(tsk));
}

+static void scs_account(struct task_struct *tsk, int account)
+{
+ mod_zone_page_state(page_zone(__scs_page(tsk)), NR_KERNEL_SCS_BYTES,
+ account * SCS_SIZE);
+}
+
int scs_prepare(struct task_struct *tsk, int node)
{
void *s;
@@ -131,6 +149,7 @@ int scs_prepare(struct task_struct *tsk, int node)

task_set_scs(tsk, s);
scs_set_magic(tsk);
+ scs_account(tsk, 1);

return 0;
}
@@ -150,6 +169,7 @@ void scs_release(struct task_struct *tsk)

WARN_ON(scs_corrupted(tsk));

+ scs_account(tsk, -1);
scs_task_init(tsk);
scs_free(s);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ecc3dbad606b..fe17d69d98a7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5361,6 +5361,9 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
" managed:%lukB"
" mlocked:%lukB"
" kernel_stack:%lukB"
+#ifdef CONFIG_SHADOW_CALL_STACK
+ " shadow_call_stack:%lukB"
+#endif
" pagetables:%lukB"
" bounce:%lukB"
" free_pcp:%lukB"
@@ -5382,6 +5385,9 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
K(zone_managed_pages(zone)),
K(zone_page_state(zone, NR_MLOCK)),
zone_page_state(zone, NR_KERNEL_STACK_KB),
+#ifdef CONFIG_SHADOW_CALL_STACK
+ zone_page_state(zone, NR_KERNEL_SCS_BYTES) / 1024,
+#endif
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_BOUNCE)),
K(free_pcp),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6afc892a148a..9fe4afe670fe 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1118,6 +1118,9 @@ const char * const vmstat_text[] = {
"nr_mlock",
"nr_page_table_pages",
"nr_kernel_stack",
+#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
+ "nr_shadow_call_stack_bytes",
+#endif
"nr_bounce",
#if IS_ENABLED(CONFIG_ZSMALLOC)
"nr_zspages",
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:03 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Implements CONFIG_DEBUG_STACK_USAGE for shadow stacks.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
kernel/scs.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/kernel/scs.c b/kernel/scs.c
index b9e6e225254f..a5bf7d12dc13 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -154,6 +154,44 @@ int scs_prepare(struct task_struct *tsk, int node)
return 0;
}

+#ifdef CONFIG_DEBUG_STACK_USAGE
+static inline unsigned long scs_used(struct task_struct *tsk)
+{
+ unsigned long *p = __scs_base(tsk);
+ unsigned long *end = scs_magic(tsk);
+ uintptr_t s = (uintptr_t)p;
+
+ while (p < end && *p)
+ p++;
+
+ return (uintptr_t)p - s;
+}
+
+static void scs_check_usage(struct task_struct *tsk)
+{
+ static DEFINE_SPINLOCK(lock);
+ static unsigned long highest;
+ unsigned long used = scs_used(tsk);
+
+ if (used <= highest)
+ return;
+
+ spin_lock(&lock);
+
+ if (used > highest) {
+ pr_info("%s: highest shadow stack usage %lu bytes\n",
+ __func__, used);
+ highest = used;
+ }
+
+ spin_unlock(&lock);
+}
+#else
+static inline void scs_check_usage(struct task_struct *tsk)
+{
+}
+#endif
+
bool scs_corrupted(struct task_struct *tsk)
{
return *scs_magic(tsk) != SCS_END_MAGIC;
@@ -168,6 +206,7 @@ void scs_release(struct task_struct *tsk)
return;

WARN_ON(scs_corrupted(tsk));
+ scs_check_usage(tsk);

scs_account(tsk, -1);
scs_task_init(tsk);
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:07 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
kprobe_on_func_entry and arch_kprobe_on_func_entry need to be available
even if CONFIG_KRETPROBES is not selected.

Signed-off-by: Sami Tolvanen <samito...@google.com>
Acked-by: Masami Hiramatsu <mhir...@kernel.org>
---
kernel/kprobes.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 53534aa258a6..b5e20a4669b8 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1829,6 +1829,25 @@ unsigned long __weak arch_deref_entry_point(void *entry)
return (unsigned long)entry;
}

+bool __weak arch_kprobe_on_func_entry(unsigned long offset)
+{
+ return !offset;
+}
+
+bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+{
+ kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+
+ if (IS_ERR(kp_addr))
+ return false;
+
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
+ !arch_kprobe_on_func_entry(offset))
+ return false;
+
+ return true;
+}
+
#ifdef CONFIG_KRETPROBES
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
@@ -1885,25 +1904,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);

-bool __weak arch_kprobe_on_func_entry(unsigned long offset)
-{
- return !offset;
-}
-
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
-{
- kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
-
- if (IS_ERR(kp_addr))
- return false;
-
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
-
- return true;
-}
-
int register_kretprobe(struct kretprobe *rp)
{
int ret = 0;
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:10 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
With CONFIG_FUNCTION_GRAPH_TRACER, function return addresses are
modified in ftrace_graph_caller and prepare_ftrace_return to redirect
control flow to ftrace_return_to_handler. This is incompatible with
SCS.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3f047afb982c..8cda176dad9a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,7 +148,7 @@ config ARM64
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
- select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_TRACER if !SHADOW_CALL_STACK
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:16 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
With CONFIG_KRETPROBES, function return addresses are modified to
redirect control flow to kretprobe_trampoline. This is incompatible
with SCS.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cda176dad9a..42867174920f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -165,7 +165,7 @@ config ARM64
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
- select HAVE_KRETPROBES
+ select HAVE_KRETPROBES if !SHADOW_CALL_STACK
select HAVE_GENERIC_VDSO
select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:20 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Reserve the x18 register from general allocation when SCS is enabled,
because the compiler uses the register to store the current task's
shadow stack pointer. Note that all external kernel modules must also be
compiled with -ffixed-x18 if the kernel has SCS enabled.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Makefile | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2c0238ce0551..ef76101201b2 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -72,6 +72,10 @@ stack_protector_prepare: prepare0
include/generated/asm-offsets.h))
endif

+ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
+KBUILD_CFLAGS += -ffixed-x18
+endif
+
ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
KBUILD_CPPFLAGS += -mbig-endian
CHECKFLAGS += -D__AARCH64EB__
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:25 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Don't lose the current task's shadow stack when the CPU is suspended.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/include/asm/suspend.h | 2 +-
arch/arm64/mm/proc.S | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index 8939c87c4dce..0cde2f473971 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -2,7 +2,7 @@
#ifndef __ASM_SUSPEND_H
#define __ASM_SUSPEND_H

-#define NR_CTX_REGS 12
+#define NR_CTX_REGS 13
#define NR_CALLEE_SAVED_REGS 12

/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index fdabf40a83c8..0e7c353c9dfd 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -49,6 +49,8 @@
* cpu_do_suspend - save CPU registers context
*
* x0: virtual address of context pointer
+ *
+ * This must be kept in sync with struct cpu_suspend_ctx in <asm/suspend.h>.
*/
ENTRY(cpu_do_suspend)
mrs x2, tpidr_el0
@@ -73,6 +75,9 @@ alternative_endif
stp x8, x9, [x0, #48]
stp x10, x11, [x0, #64]
stp x12, x13, [x0, #80]
+#ifdef CONFIG_SHADOW_CALL_STACK
+ str x18, [x0, #96]
+#endif
ret
ENDPROC(cpu_do_suspend)

@@ -89,6 +94,10 @@ ENTRY(cpu_do_resume)
ldp x9, x10, [x0, #48]
ldp x11, x12, [x0, #64]
ldp x13, x14, [x0, #80]
+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldr x18, [x0, #96]
+ str xzr, [x0, #96]
+#endif
msr tpidr_el0, x2
msr tpidrro_el0, x3
msr contextidr_el1, x4
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:27 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
If we detect a corrupted x18 and SCS is enabled, restore the register
before jumping back to instrumented code. This is safe, because the
wrapper is called with preemption disabled and a separate shadow stack
is used for interrupt handling.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/efi-rt-wrapper.S | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/efi-rt-wrapper.S b/arch/arm64/kernel/efi-rt-wrapper.S
index 3fc71106cb2b..945744f16086 100644
--- a/arch/arm64/kernel/efi-rt-wrapper.S
+++ b/arch/arm64/kernel/efi-rt-wrapper.S
@@ -34,5 +34,10 @@ ENTRY(__efi_rt_asm_wrapper)
ldp x29, x30, [sp], #32
b.ne 0f
ret
-0: b efi_handle_corrupted_x18 // tail call
+0:
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* Restore x18 before returning to instrumented code. */
+ mov x18, x2
+#endif
+ b efi_handle_corrupted_x18 // tail call
ENDPROC(__efi_rt_asm_wrapper)
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:30 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/vdso/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index dd2514bb1511..a87a4f11724e 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -25,7 +25,7 @@ ccflags-y += -DDISABLE_BRANCH_PROFILING

VDSO_LDFLAGS := -Bsymbolic

-CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
+CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS)
KBUILD_CFLAGS += $(DISABLE_LTO)
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:33 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This allows CONFIG_KRETPROBES to be disabled without disabling
kprobes entirely.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kernel/probes/kprobes.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index c4452827419b..98230ae979ca 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -551,6 +551,7 @@ void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
return (void *)orig_ret_address;
}

+#ifdef CONFIG_KRETPROBES
void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
@@ -564,6 +565,7 @@ int __kprobes arch_trampoline_kprobe(struct kprobe *p)
{
return 0;
}
+#endif

int __init arch_init_kprobes(void)
{
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:36 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/kvm/hyp/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index ea710f674cb6..8289ea086e5e 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -28,3 +28,6 @@ GCOV_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
+
+ORIG_CFLAGS := $(KBUILD_CFLAGS)
+KBUILD_CFLAGS = $(subst $(CC_FLAGS_SCS),,$(ORIG_CFLAGS))
--
2.24.0.rc0.303.g954a862665-goog

samito...@google.com

unread,
Oct 24, 2019, 6:52:38 PM10/24/19
to Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Sami Tolvanen
This change implements shadow stack switching, initial SCS set-up,
and interrupt shadow stacks for arm64.

Signed-off-by: Sami Tolvanen <samito...@google.com>
---
arch/arm64/Kconfig | 5 ++++
arch/arm64/include/asm/scs.h | 45 ++++++++++++++++++++++++++++
arch/arm64/include/asm/stacktrace.h | 4 +++
arch/arm64/include/asm/thread_info.h | 3 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/asm-offsets.c | 3 ++
arch/arm64/kernel/entry.S | 28 +++++++++++++++++
arch/arm64/kernel/head.S | 9 ++++++
arch/arm64/kernel/irq.c | 2 ++
arch/arm64/kernel/process.c | 2 ++
arch/arm64/kernel/scs.c | 39 ++++++++++++++++++++++++
arch/arm64/kernel/smp.c | 4 +++
12 files changed, 145 insertions(+)
create mode 100644 arch/arm64/include/asm/scs.h
create mode 100644 arch/arm64/kernel/scs.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42867174920f..f4c94c5e8012 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -66,6 +66,7 @@ config ARM64
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_SUPPORTS_MEMORY_FAILURE
+ select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG
select ARCH_SUPPORTS_NUMA_BALANCING
@@ -948,6 +949,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y if PGTABLE_LEVELS > 2

+# Supported by clang >= 7.0
+config CC_HAVE_SHADOW_CALL_STACK
+ def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
+
config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
---help---
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
new file mode 100644
index 000000000000..76dda1228935
--- /dev/null
+++ b/arch/arm64/include/asm/scs.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_SCS_H
+#define _ASM_SCS_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/scs.h>
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+
+extern void scs_init_irq(void);
+
+static __always_inline void scs_save(struct task_struct *tsk)
+{
+ void *s;
+
+ asm volatile("mov %0, x18" : "=r" (s));
+ task_set_scs(tsk, s);
+}
+
+static inline void scs_overflow_check(struct task_struct *tsk)
+{
+ if (unlikely(scs_corrupted(tsk)))
+ panic("corrupted shadow stack detected inside scheduler\n");
+}
+
+#else /* CONFIG_SHADOW_CALL_STACK */
+
+static inline void scs_init_irq(void)
+{
+}
+
+static inline void scs_save(struct task_struct *tsk)
+{
+}
+
+static inline void scs_overflow_check(struct task_struct *tsk)
+{
+}
+
+#endif /* CONFIG_SHADOW_CALL_STACK */
+
+#endif /* __ASSEMBLY __ */
+
+#endif /* _ASM_SCS_H */
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 4d9b1f48dc39..b6cf32fb4efe 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -68,6 +68,10 @@ extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);

DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);

+#ifdef CONFIG_SHADOW_CALL_STACK
+DECLARE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr);
+#endif
+
static inline bool on_irq_stack(unsigned long sp,
struct stack_info *info)
{
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f0cec4160136..8c73764b9ed2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
#endif
} preempt;
};
+#ifdef CONFIG_SHADOW_CALL_STACK
+ void *shadow_call_stack;
+#endif
};

#define thread_saved_pc(tsk) \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 478491f07b4f..b3995329d9e5 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
obj-$(CONFIG_ARM64_SSBD) += ssbd.o
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
+obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o

obj-y += vdso/ probes/
obj-$(CONFIG_COMPAT_VDSO) += vdso32/
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 214685760e1c..f6762b9ae1e1 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -33,6 +33,9 @@ int main(void)
DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit));
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
+#endif
+#ifdef CONFIG_SHADOW_CALL_STACK
+ DEFINE(TSK_TI_SCS, offsetof(struct task_struct, thread_info.shadow_call_stack));
#endif
DEFINE(TSK_STACK, offsetof(struct task_struct, stack));
#ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index cf3bd2976e57..12a5bc209280 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -172,6 +172,10 @@ alternative_cb_end

apply_ssbd 1, x22, x23

+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldr x18, [tsk, #TSK_TI_SCS] // Restore shadow call stack
+ str xzr, [tsk, #TSK_TI_SCS]
+#endif
.else
add x21, sp, #S_FRAME_SIZE
get_current_task tsk
@@ -278,6 +282,12 @@ alternative_else_nop_endif
ct_user_enter
.endif

+#ifdef CONFIG_SHADOW_CALL_STACK
+ .if \el == 0
+ str x18, [tsk, #TSK_TI_SCS] // Save shadow call stack
+ .endif
+#endif
+
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
/*
* Restore access to TTBR0_EL1. If returning to EL0, no need for SPSR
@@ -383,6 +393,9 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0

.macro irq_stack_entry
mov x19, sp // preserve the original sp
+#ifdef CONFIG_SHADOW_CALL_STACK
+ mov x20, x18 // preserve the original shadow stack
+#endif

/*
* Compare sp with the base of the task stack.
@@ -400,6 +413,12 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0

/* switch to the irq stack */
mov sp, x26
+
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* also switch to the irq shadow stack */
+ ldr_this_cpu x18, irq_shadow_call_stack_ptr, x26
+#endif
+
9998:
.endm

@@ -409,6 +428,10 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0
*/
.macro irq_stack_exit
mov sp, x19
+#ifdef CONFIG_SHADOW_CALL_STACK
+ /* x20 is also preserved */
+ mov x18, x20
+#endif
.endm

/* GPRs used by entry code */
@@ -1155,6 +1178,11 @@ ENTRY(cpu_switch_to)
ldr lr, [x8]
mov sp, x9
msr sp_el0, x1
+#ifdef CONFIG_SHADOW_CALL_STACK
+ str x18, [x0, #TSK_TI_SCS]
+ ldr x18, [x1, #TSK_TI_SCS]
+ str xzr, [x1, #TSK_TI_SCS]
+#endif
ret
ENDPROC(cpu_switch_to)
NOKPROBE(cpu_switch_to)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 989b1944cb71..2be977c6496f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -27,6 +27,7 @@
#include <asm/pgtable-hwdef.h>
#include <asm/pgtable.h>
#include <asm/page.h>
+#include <asm/scs.h>
#include <asm/smp.h>
#include <asm/sysreg.h>
#include <asm/thread_info.h>
@@ -424,6 +425,10 @@ __primary_switched:
stp xzr, x30, [sp, #-16]!
mov x29, sp

+#ifdef CONFIG_SHADOW_CALL_STACK
+ adr_l x18, init_shadow_call_stack // Set shadow call stack
+#endif
+
str_l x21, __fdt_pointer, x5 // Save FDT pointer

ldr_l x4, kimage_vaddr // Save the offset between
@@ -731,6 +736,10 @@ __secondary_switched:
ldr x2, [x0, #CPU_BOOT_TASK]
cbz x2, __secondary_too_slow
msr sp_el0, x2
+#ifdef CONFIG_SHADOW_CALL_STACK
+ ldr x18, [x2, #TSK_TI_SCS] // Set shadow call stack
+ str xzr, [x2, #TSK_TI_SCS]
+#endif
mov x29, #0
mov x30, #0
b secondary_start_kernel
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 04a327ccf84d..fe0ca522ff60 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -21,6 +21,7 @@
#include <linux/vmalloc.h>
#include <asm/daifflags.h>
#include <asm/vmap_stack.h>
+#include <asm/scs.h>

unsigned long irq_err_count;

@@ -63,6 +64,7 @@ static void init_irq_stacks(void)
void __init init_IRQ(void)
{
init_irq_stacks();
+ scs_init_irq();
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 71f788cd2b18..5f0aec285848 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -52,6 +52,7 @@
#include <asm/mmu_context.h>
#include <asm/processor.h>
#include <asm/pointer_auth.h>
+#include <asm/scs.h>
#include <asm/stacktrace.h>

#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
@@ -507,6 +508,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
uao_thread_switch(next);
ptrauth_thread_switch(next);
ssbs_thread_switch(next);
+ scs_overflow_check(next);

/*
* Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/kernel/scs.c b/arch/arm64/kernel/scs.c
new file mode 100644
index 000000000000..6f255072c9a9
--- /dev/null
+++ b/arch/arm64/kernel/scs.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shadow Call Stack support.
+ *
+ * Copyright (C) 2019 Google LLC
+ */
+
+#include <linux/percpu.h>
+#include <linux/vmalloc.h>
+#include <asm/scs.h>
+
+DEFINE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr);
+
+#ifndef CONFIG_SHADOW_CALL_STACK_VMAP
+DEFINE_PER_CPU(unsigned long [SCS_SIZE/sizeof(long)], irq_shadow_call_stack)
+ __aligned(SCS_SIZE);
+#endif
+
+void scs_init_irq(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
+ unsigned long *p;
+
+ p = __vmalloc_node_range(SCS_SIZE, SCS_SIZE,
+ VMALLOC_START, VMALLOC_END,
+ SCS_GFP, PAGE_KERNEL,
+ 0, cpu_to_node(cpu),
+ __builtin_return_address(0));
+
+ per_cpu(irq_shadow_call_stack_ptr, cpu) = p;
+#else
+ per_cpu(irq_shadow_call_stack_ptr, cpu) =
+ per_cpu(irq_shadow_call_stack, cpu);
+#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */
+ }
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc9fe879c279..cc1938a585d2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -44,6 +44,7 @@
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/processor.h>
+#include <asm/scs.h>
#include <asm/smp_plat.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -357,6 +358,9 @@ void cpu_die(void)
{
unsigned int cpu = smp_processor_id();

+ /* Save the shadow stack pointer before exiting the idle task */
+ scs_save(current);
+
idle_task_exit();

local_daif_mask();
--
2.24.0.rc0.303.g954a862665-goog

Steven Rostedt

unread,
Oct 24, 2019, 9:20:19 PM10/24/19
to samito...@google.com, Will Deacon, Catalin Marinas, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Thu, 24 Oct 2019 15:51:31 -0700
samito...@google.com wrote:

Suggested-by: Steven Rostedt (VMware) <ros...@goodmis.org>

;-)
May want a comment above that that states:

# remove the SCS flags from all objects in this directory

-- Steve

Masahiro Yamada

unread,
Oct 24, 2019, 9:31:01 PM10/24/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
$(subst ... ) is not the correct use here.

It works like sed, s/$(CC_CFLAGS_SCS)//
instead of matching by word.




KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

is more correct, and simpler.

Steven Rostedt

unread,
Oct 24, 2019, 9:43:04 PM10/24/19
to Masahiro Yamada, Sami Tolvanen, Will Deacon, Catalin Marinas, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
I guess that would work too. Not sure why I never used it. I see mips
used it for their -pg flags.

-- Steve

Mark Rutland

unread,
Oct 25, 2019, 5:24:30 AM10/25/19
to samito...@google.com, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Thu, Oct 24, 2019 at 03:51:16PM -0700, samito...@google.com wrote:
> idmap_kpti_install_ng_mappings uses x18 as a temporary register, which
> will result in a conflict when x18 is reserved. Use x16 and x17 instead
> where needed.
>
> Signed-off-by: Sami Tolvanen <samito...@google.com>
> Reviewed-by: Nick Desaulniers <ndesau...@google.com>

AFAICT the new register assignment is sound, so FWIW:

Reviewed-by: Mark Rutland <mark.r...@arm.com>

I was going to suggest adding menmonics for the remamining raw register
names, but after having a go locally I think it's cleaner as-is given
the registers are used in different widths for multiple purposes.

Thanks,
Mark.

Mark Rutland

unread,
Oct 25, 2019, 5:41:43 AM10/25/19
to samito...@google.com, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
For legibility, could we make the offset and bias explicit in the STNPs
so that these line up? e.g.

stnp x4, x5, [x0, #16 - 256]
ldp x4, x5, [x1, #16]

... that'd make it much easier to see by eye that this is sound, much as
I trust my mental arithmetic. ;)
... likewise here:

stnp xt1, xt2, [x0, #off - 256]

I don't see a nicer way to write this sequence without using an
additional register, so with those changes:

Reviewed-by: Mark Rutland <mark.r...@arm.com>

Thanks,
Mark.

>
> ret
> ENDPROC(copy_page)
> --
> 2.24.0.rc0.303.g954a862665-goog
>

Mark Rutland

unread,
Oct 25, 2019, 6:02:38 AM10/25/19
to samito...@google.com, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
Minor nit, but could we make the title a bit more specific (and more
uniform across the series)? e.g.

arm64: kvm: avoid x18 in __cpu_soft_restart

That makes things a bit nicer when trawling through git logs as the
scope of the patch is clearer.

On Thu, Oct 24, 2019 at 03:51:19PM -0700, samito...@google.com wrote:
> From: Ard Biesheuvel <ard.bie...@linaro.org>
>
> The code in __cpu_soft_restart() uses x18 as an arbitrary temp register,
> which will shortly be disallowed. So use x8 instead.
>
> Link: https://patchwork.kernel.org/patch/9836877/
> Signed-off-by: Ard Biesheuvel <ard.bie...@linaro.org>
> Signed-off-by: Sami Tolvanen <samito...@google.com>

Either way:

Reviewed-by: Mark Rutland <mark.r...@arm.com>

Mark Rutland

unread,
Oct 25, 2019, 6:56:49 AM10/25/19
to samito...@google.com, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
Huh. I didn't realise it was valid to have a space after the `#` like
this. I see we're very inconsistent about style on that front, so this
is fine, I'll just have to get used to it. :)
I think it'd be worth a comment on how this size was chosen. IIRC this
empirical?

> +#define SCS_END_MAGIC 0xaf0194819b1635f6UL

Keyboard smash? ... or is there a prize for whoever figures out the
secret? ;)

> +
> +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO)
> +
> +static inline void *task_scs(struct task_struct *tsk)
> +{
> + return task_thread_info(tsk)->shadow_call_stack;
> +}
> +
> +static inline void task_set_scs(struct task_struct *tsk, void *s)
> +{
> + task_thread_info(tsk)->shadow_call_stack = s;
> +}

This should probably be named get and set, or have:

#define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack)

... which can have a trivial implementation as NULL for the !SCS case.

> +
> +extern void scs_init(void);
> +extern void scs_task_init(struct task_struct *tsk);
> +extern void scs_task_reset(struct task_struct *tsk);
> +extern int scs_prepare(struct task_struct *tsk, int node);
> +extern bool scs_corrupted(struct task_struct *tsk);
> +extern void scs_release(struct task_struct *tsk);
> +
> +#else /* CONFIG_SHADOW_CALL_STACK */
> +
> +static inline void *task_scs(struct task_struct *tsk)
> +{
> + return 0;
> +}

For all the trivial wrappers you can put the implementation on the same
line as the prototype. That makes it a bit easier to compare against the
prototypes on the other side of the ifdeffery.

e.g. this lot can be:

static inline void *task_scs(struct task_struct *tsk) { return 0; }
static inline void task_set_scs(struct task_struct *tsk, void *s) { }
static inline void scs_init(void) { }
...
Nit: alphabetical order, please (this should come before stackleak.h).
Can we please fold scs_prepare() into scs_task_init() and do this in
dup_task_struct()? That way we set this up consistently in one place,
where we're also allocating the regular stack.

Arguably stackleak_task_init() would better fit there too.

>
> stackleak_task_init(p);
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dd05a378631a..e7faeb383008 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu)
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> + scs_task_reset(idle);

I'm a bit confused by this -- please see comments below on
scs_task_reset().
Nit: alphabetical order, please.

> +#include <linux/vmalloc.h>
> +#include <asm/scs.h>
> +
> +static inline void *__scs_base(struct task_struct *tsk)
> +{
> + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1));
> +}

We only ever assign the base to task_scs(tsk), with the current live
value being in a register that we don't read. Are we expecting arch code
to keep this up-to-date with the register value?

I would have expected that we just leave this as the base (as we do for
the regular stack in the task struct), and it's down to arch code to
save/restore the current value where necessary.

Am I missing some caveat with that approach?

> +
> +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP
> +
> +/* Keep a cache of shadow stacks */
> +#define SCS_CACHE_SIZE 2
> +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]);
> +
> +static void *scs_alloc(int node)
> +{
> + int i;
> +
> + for (i = 0; i < SCS_CACHE_SIZE; i++) {
> + void *s;
> +
> + s = this_cpu_xchg(scs_cache[i], NULL);
> + if (s) {
> + memset(s, 0, SCS_SIZE);
> + return s;
> + }
> + }
> +
> + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE);

It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE.
Slightly simpler as:

return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1;

Thanks,
Mark.

Mark Rutland

unread,
Oct 25, 2019, 7:03:19 AM10/25/19
to samito...@google.com, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-bu...@googlegroups.com, kernel-h...@lists.openwall.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org
On Thu, Oct 24, 2019 at 03:51:24PM -0700, samito...@google.com wrote:
> With CONFIG_FUNCTION_GRAPH_TRACER, function return addresses are
> modified in ftrace_graph_caller and prepare_ftrace_return to redirect
> control flow to ftrace_return_to_handler. This is incompatible with
> SCS.

I'm guessing it's difficult to always figure out the SCS slot for an
instrumented callsite unless we pass this explicitly from the ftrace
entry code, so we'd probably have to change some common infrastructure
for that.

We have a similar issue with pointer authentication, and we're solving
that with -fpatchable-function-entry, which allows us to hook the
callsite before it does anything with the return address. IIUC we could
use the same mechanism here (and avoid introducing a third).

Are there plans to implement -fpatchable-function-entry on the clang
side?

Thanks,
Mark.

Nick Desaulniers

unread,
Oct 25, 2019, 12:22:43 PM10/25/19
to Sami Tolvanen, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-built-linux, Kernel Hardening, Linux ARM, LKML
prefer:

for ...:
if foo() == 0:
return

to:

for ...:
if foo() != 0:
continue
return
--
Thanks,
~Nick Desaulniers

Sami Tolvanen

unread,
Oct 25, 2019, 3:24:47 PM10/25/19
to Masahiro Yamada, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Mark Rutland, Nick Desaulniers, Jann Horn, Miguel Ojeda, clang-built-linux, Kernel Hardening, linux-arm-kernel, Linux Kernel Mailing List
On Thu, Oct 24, 2019 at 6:31 PM Masahiro Yamada
<yamada....@socionext.com> wrote:
> $(subst ... ) is not the correct use here.
>
> It works like sed, s/$(CC_CFLAGS_SCS)//
> instead of matching by word.
>
>
>
>
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> is more correct, and simpler.

Thanks, I will change this in v3.

Sami

Sami Tolvanen

unread,
Oct 25, 2019, 4:49:34 PM10/25/19
to Mark Rutland, Will Deacon, Catalin Marinas, Steven Rostedt, Masami Hiramatsu, Ard Biesheuvel, Dave Martin, Kees Cook, Laura Abbott, Nick Desaulniers, Jann Horn, Miguel Ojeda, Masahiro Yamada, clang-built-linux, Kernel Hardening, linux-arm-kernel, LKML
On Fri, Oct 25, 2019 at 3:56 AM Mark Rutland <mark.r...@arm.com> wrote:
> > +#define SCS_SIZE 1024
>
> I think it'd be worth a comment on how this size was chosen. IIRC this
> empirical?

Correct. I'll add a comment.

> > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL
>
> Keyboard smash? ... or is there a prize for whoever figures out the
> secret? ;)

It's a random number, so if someone figures out a secret in it,
they'll definitely deserve a prize. :)

> > +static inline void task_set_scs(struct task_struct *tsk, void *s)
> > +{
> > + task_thread_info(tsk)->shadow_call_stack = s;
> > +}
>
> This should probably be named get and set, or have:
>
> #define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack)
>
> ... which can have a trivial implementation as NULL for the !SCS case.

Sure, sounds good.

> For all the trivial wrappers you can put the implementation on the same
> line as the prototype. That makes it a bit easier to compare against the
> prototypes on the other side of the ifdeffery.
>
> e.g. this lot can be:
>
> static inline void *task_scs(struct task_struct *tsk) { return 0; }
> static inline void task_set_scs(struct task_struct *tsk, void *s) { }
> static inline void scs_init(void) { }

Agreed.

> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index bcdf53125210..ae7ebe9f0586 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -94,6 +94,7 @@
> > #include <linux/livepatch.h>
> > #include <linux/thread_info.h>
> > #include <linux/stackleak.h>
> > +#include <linux/scs.h>
>
> Nit: alphabetical order, please (this should come before stackleak.h).

The includes in kernel/fork.c aren't in alphabetical order, so I just
added this to the end here.

> > + retval = scs_prepare(p, node);
> > + if (retval)
> > + goto bad_fork_cleanup_thread;
>
> Can we please fold scs_prepare() into scs_task_init() and do this in
> dup_task_struct()? That way we set this up consistently in one place,
> where we're also allocating the regular stack.

Yes, that does sound cleaner. I'll change that.

> > + scs_task_reset(idle);
>
> I'm a bit confused by this -- please see comments below on
> scs_task_reset().

> > +static inline void *__scs_base(struct task_struct *tsk)
> > +{
> > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1));
> > +}
>
> We only ever assign the base to task_scs(tsk), with the current live
> value being in a register that we don't read. Are we expecting arch code
> to keep this up-to-date with the register value?
>
> I would have expected that we just leave this as the base (as we do for
> the regular stack in the task struct), and it's down to arch code to
> save/restore the current value where necessary.
>
> Am I missing some caveat with that approach?

To keep the address of the currently active shadow stack out of
memory, the arm64 implementation clears this field when it loads x18
and saves the current value before a context switch. The generic code
doesn't expect the arch code to necessarily do so, but does allow it.
This requires us to use __scs_base() when accessing the base pointer
and to reset it in idle tasks before they're reused, hence
scs_task_reset().

> > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE);
>
> It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE.

Ack.

> > +static inline unsigned long *scs_magic(struct task_struct *tsk)
> > +{
> > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long));
>
> Slightly simpler as:
>
> return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1;

Yes, that's a bit cleaner.

I'll fix these in v3. Thank you for the review!

Sami
It is loading more messages.
0 new messages