[PATCH v2 1/3] compiler_types: Introduce the Clang __preserve_most function attribute

35 views
Skip to first unread message

Marco Elver

unread,
Aug 4, 2023, 5:06:52 AM8/4/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
[1]: "On X86-64 and AArch64 targets, this attribute changes the calling
convention of a function. The preserve_most calling convention attempts
to make the code in the caller as unintrusive as possible. This
convention behaves identically to the C calling convention on how
arguments and return values are passed, but it uses a different set of
caller/callee-saved registers. This alleviates the burden of saving and
recovering a large register set before and after the call in the
caller."

[1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most

Introduce the attribute to compiler_types.h as __preserve_most.

Use of this attribute results in better code generation for calls to
very rarely called functions, such as error-reporting functions, or
rarely executed slow paths.

Beware that the attribute conflicts with instrumentation calls inserted
on function entry which do not use __preserve_most themselves. Notably,
function tracing which assumes the normal C calling convention for the
given architecture. Where the attribute is supported, __preserve_most
will imply notrace. It is recommended to restrict use of the attribute
to functions that should or already disable tracing.

Signed-off-by: Marco Elver <el...@google.com>
---
v2:
* Imply notrace, to avoid any conflicts with tracing which is inserted
on function entry. See added comments.
---
include/linux/compiler_types.h | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 547ea1ff806e..12c4540335b7 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -106,6 +106,33 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
#define __cold
#endif

+/*
+ * On x86-64 and arm64 targets, __preserve_most changes the calling convention
+ * of a function to make the code in the caller as unintrusive as possible. This
+ * convention behaves identically to the C calling convention on how arguments
+ * and return values are passed, but uses a different set of caller- and callee-
+ * saved registers.
+ *
+ * The purpose is to alleviates the burden of saving and recovering a large
+ * register set before and after the call in the caller. This is beneficial for
+ * rarely taken slow paths, such as error-reporting functions that may be called
+ * from hot paths.
+ *
+ * Note: This may conflict with instrumentation inserted on function entry which
+ * does not use __preserve_most or equivalent convention (if in assembly). Since
+ * function tracing assumes the normal C calling convention, where the attribute
+ * is supported, __preserve_most implies notrace.
+ *
+ * Optional: not supported by gcc.
+ *
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#preserve-most
+ */
+#if __has_attribute(__preserve_most__)
+# define __preserve_most notrace __attribute__((__preserve_most__))
+#else
+# define __preserve_most
+#endif
+
/* Builtins */

/*
--
2.41.0.640.ga95def55d0-goog

Marco Elver

unread,
Aug 4, 2023, 5:06:55 AM8/4/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Turn the list debug checking functions __list_*_valid() into inline
functions that wrap the out-of-line functions. Care is taken to ensure
the inline wrappers are always inlined, so that additional compiler
instrumentation (such as sanitizers) does not result in redundant
outlining.

This change is preparation for performing checks in the inline wrappers.

No functional change intended.

Signed-off-by: Marco Elver <el...@google.com>
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 6 +++---
include/linux/list.h | 15 +++++++++++++--
lib/list_debug.c | 11 +++++------
3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index d68abd7ea124..589284496ac5 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,8 +26,8 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool ___list_add_valid(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) ||
NVHE_CHECK_DATA_CORRUPTION(prev->next != next) ||
@@ -37,7 +37,7 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,
return true;
}

-bool __list_del_entry_valid(struct list_head *entry)
+bool ___list_del_entry_valid(struct list_head *entry)
{
struct list_head *prev, *next;

diff --git a/include/linux/list.h b/include/linux/list.h
index f10344dbad4d..e0b2cf904409 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,10 +39,21 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
-extern bool __list_add_valid(struct list_head *new,
+extern bool ___list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next);
-extern bool __list_del_entry_valid(struct list_head *entry);
+static __always_inline bool __list_add_valid(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next)
+{
+ return ___list_add_valid(new, prev, next);
+}
+
+extern bool ___list_del_entry_valid(struct list_head *entry);
+static __always_inline bool __list_del_entry_valid(struct list_head *entry)
+{
+ return ___list_del_entry_valid(entry);
+}
#else
static inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
diff --git a/lib/list_debug.c b/lib/list_debug.c
index d98d43f80958..fd69009cc696 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,8 +17,8 @@
* attempt).
*/

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool ___list_add_valid(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (CHECK_DATA_CORRUPTION(prev == NULL,
"list_add corruption. prev is NULL.\n") ||
@@ -37,9 +37,9 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,

return true;
}
-EXPORT_SYMBOL(__list_add_valid);
+EXPORT_SYMBOL(___list_add_valid);

-bool __list_del_entry_valid(struct list_head *entry)
+bool ___list_del_entry_valid(struct list_head *entry)
{
struct list_head *prev, *next;

@@ -65,6 +65,5 @@ bool __list_del_entry_valid(struct list_head *entry)
return false;

return true;
-
}
-EXPORT_SYMBOL(__list_del_entry_valid);
+EXPORT_SYMBOL(___list_del_entry_valid);
--
2.41.0.640.ga95def55d0-goog

Marco Elver

unread,
Aug 4, 2023, 5:06:57 AM8/4/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The feature has never been designed with performance in
mind, yet common list manipulation is happening across hot paths all
over the kernel.

Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
checking inline, and only upon list corruption delegates to the
reporting slow path.

To generate optimal machine code with CONFIG_DEBUG_LIST_MINIMAL:

1. Elide checking for pointer values which upon dereference would
result in an immediate access fault -- therefore "minimal" checks.
The trade-off is lower-quality error reports.

2. Use the newly introduced __preserve_most function attribute
(available with Clang, but not yet with GCC) to minimize the code
footprint for calling the reporting slow path. As a result,
function size of callers is reduced by avoiding saving registers
before calling the rarely called reporting slow path.

Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.

3. Because the inline checks are a subset of the full set of checks in
___list_*_valid(), always return false if the inline checks failed.
This avoids redundant compare and conditional branch right after
return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_DEBUG_LIST_MINIMAL (using a Clang compiler
with "preserve_most") shows throughput improvements, in my case of ~7%
on average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Signed-off-by: Marco Elver <el...@google.com>
---
v2:
* Note that lib/Makefile disables function tracing for everything and
__preserve_most's implied notrace is a noop here.
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 +
include/linux/list.h | 56 +++++++++++++++++++++++++---
lib/Kconfig.debug | 15 ++++++++
lib/list_debug.c | 2 +
4 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 589284496ac5..df718e29f6d4 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

+__list_valid_slowpath
bool ___list_add_valid(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -37,6 +38,7 @@ bool ___list_add_valid(struct list_head *new, struct list_head *prev,
return true;
}

+__list_valid_slowpath
bool ___list_del_entry_valid(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index e0b2cf904409..a28a215a3eb1 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,20 +39,64 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
-extern bool ___list_add_valid(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
+
+#ifdef CONFIG_DEBUG_LIST_MINIMAL
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
+extern bool __list_valid_slowpath ___list_add_valid(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);
static __always_inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- return ___list_add_valid(new, prev, next);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+ /*
+ * In the minimal config, elide checking if next and prev are
+ * NULL, since the immediate dereference of them below would
+ * result in a fault if NULL.
+ *
+ * With the minimal config we can afford to inline the checks,
+ * which also gives the compiler a chance to elide some of them
+ * completely if they can be proven at compile-time. If one of
+ * the pre-conditions does not hold, the slow-path will show a
+ * report which pre-condition failed.
+ */
+ if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+ return true;
+ ret = false;
+ }
+
+ ret &= ___list_add_valid(new, prev, next);
+ return ret;
}

-extern bool ___list_del_entry_valid(struct list_head *entry);
+extern bool __list_valid_slowpath ___list_del_entry_valid(struct list_head *entry);
static __always_inline bool __list_del_entry_valid(struct list_head *entry)
{
- return ___list_del_entry_valid(entry);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+ struct list_head *prev = entry->prev;
+ struct list_head *next = entry->next;
+
+ /*
+ * In the minimal config, elide checking if next and prev are
+ * NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+ * dereference of them below would result in a fault.
+ */
+ if (likely(prev->next == entry && next->prev == entry))
+ return true;
+ ret = false;
+ }
+
+ ret &= ___list_del_entry_valid(entry);
+ return ret;
}
#else
static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..e72cf08af0fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1680,6 +1680,21 @@ config DEBUG_LIST

If unsure, say N.

+config DEBUG_LIST_MINIMAL
+ bool "Minimal linked list debug checks"
+ default !DEBUG_KERNEL
+ depends on DEBUG_LIST
+ help
+ Only perform the minimal set of checks in the linked-list walking
+ routines to catch corruptions that are not guaranteed to result in an
+ immediate access fault.
+
+ This trades lower quality error reports for improved performance: the
+ generated code should be more optimal and provide trade-offs that may
+ better serve safety- and performance- critical environments.
+
+ If unsure, say Y.
+
config DEBUG_PLIST
bool "Debug priority linked list manipulation"
depends on DEBUG_KERNEL
diff --git a/lib/list_debug.c b/lib/list_debug.c
index fd69009cc696..daad32855f0d 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
* attempt).
*/

+__list_valid_slowpath
bool ___list_add_valid(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -39,6 +40,7 @@ bool ___list_add_valid(struct list_head *new, struct list_head *prev,
}
EXPORT_SYMBOL(___list_add_valid);

+__list_valid_slowpath
bool ___list_del_entry_valid(struct list_head *entry)
{
struct list_head *prev, *next;
--
2.41.0.640.ga95def55d0-goog

Marco Elver

unread,
Aug 4, 2023, 1:50:28 PM8/4/23
to Steven Rostedt, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, 4 Aug 2023 at 18:03, Steven Rostedt <ros...@goodmis.org> wrote:
>
> On Fri, 4 Aug 2023 11:02:57 +0200
> Marco Elver <el...@google.com> wrote:
>
> > Turn the list debug checking functions __list_*_valid() into inline
> > functions that wrap the out-of-line functions. Care is taken to ensure
> > the inline wrappers are always inlined, so that additional compiler
> > instrumentation (such as sanitizers) does not result in redundant
> > outlining.
> >
> > This change is preparation for performing checks in the inline wrappers.
> >
> > No functional change intended.
>
> I think the entire underscoring functions calling more underscoring
> functions in the kernel is an abomination. Yes, there's lots of precedence
> to this craziness, but let's not extend it.
>
> Can we give actual real names to why the function is "special" besides that
> it now has another underscore added to it?
>
> I've been guilty of this madness myself, but I have learned the errors of
> my ways, and have been avoiding doing so in any new code I write.

That's fair. We can call them __list_*_valid() (inline), and
__list_*_valid_or_report() ?

Marco Elver

unread,
Aug 4, 2023, 2:08:50 PM8/4/23
to Steven Rostedt, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, 4 Aug 2023 at 19:59, Steven Rostedt <ros...@goodmis.org> wrote:
>
> On Fri, 4 Aug 2023 13:57:57 -0400
> Steven Rostedt <ros...@goodmis.org> wrote:
> > __list_*_valid_check() ?

Well, in patch 3/3, the inline function will also do a reduced set of
checking, so "valid_check" is also misleading because both will do
checks.

The key distinguishing thing between the inline and non-inline version
is that the non-inline version will check more things, and also
produce reports.

So I can see

1. __list_*_valid_or_report()
2. __list_*_full_valid()

To be appropriate. Preference?

> I have to admit, I think the main reason kernel developers default to using
> these useless underscores is because kernel developers are notoriously
> lousy at naming. ;-)

Heh, naming is hard. ;-)

Peter Zijlstra

unread,
Aug 4, 2023, 2:14:56 PM8/4/23
to Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
This seems to shrink the ARM64 vmlinux just a little and mirrors what we
do on x86 in asm. I'll leave it to the arm64 people to judge if this is
worth the hassle.

Index: linux-2.6/arch/arm64/include/asm/preempt.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/preempt.h
+++ linux-2.6/arch/arm64/include/asm/preempt.h
@@ -88,15 +88,18 @@ void preempt_schedule_notrace(void);
#ifdef CONFIG_PREEMPT_DYNAMIC

DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_preempt_schedule(void);
+void __preserve_most dynamic_preempt_schedule(void);
#define __preempt_schedule() dynamic_preempt_schedule()
-void dynamic_preempt_schedule_notrace(void);
+void __preserve_most dynamic_preempt_schedule_notrace(void);
#define __preempt_schedule_notrace() dynamic_preempt_schedule_notrace()

#else /* CONFIG_PREEMPT_DYNAMIC */

-#define __preempt_schedule() preempt_schedule()
-#define __preempt_schedule_notrace() preempt_schedule_notrace()
+void __preserve_most preserve_preempt_schedule(void);
+void __preserve_most preserve_preempt_schedule_notrace(void);
+
+#define __preempt_schedule() preserve_preempt_schedule()
+#define __preempt_schedule_notrace() preserve_preempt_schedule_notrace()

#endif /* CONFIG_PREEMPT_DYNAMIC */
#endif /* CONFIG_PREEMPTION */
Index: linux-2.6/kernel/sched/core.c
===================================================================
--- linux-2.6.orig/kernel/sched/core.c
+++ linux-2.6/kernel/sched/core.c
@@ -6915,7 +6915,7 @@ DEFINE_STATIC_CALL(preempt_schedule, pre
EXPORT_STATIC_CALL_TRAMP(preempt_schedule);
#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
static DEFINE_STATIC_KEY_TRUE(sk_dynamic_preempt_schedule);
-void __sched notrace dynamic_preempt_schedule(void)
+void __sched __preserve_most dynamic_preempt_schedule(void)
{
if (!static_branch_unlikely(&sk_dynamic_preempt_schedule))
return;
@@ -6924,6 +6924,11 @@ void __sched notrace dynamic_preempt_sch
NOKPROBE_SYMBOL(dynamic_preempt_schedule);
EXPORT_SYMBOL(dynamic_preempt_schedule);
#endif
+#else
+void __sched __preserve_most preserve_preempt_schedule(void)
+{
+ preempt_schedule();
+}
#endif

/**
@@ -6988,7 +6993,7 @@ DEFINE_STATIC_CALL(preempt_schedule_notr
EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace);
#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
static DEFINE_STATIC_KEY_TRUE(sk_dynamic_preempt_schedule_notrace);
-void __sched notrace dynamic_preempt_schedule_notrace(void)
+void __sched __preserve_most dynamic_preempt_schedule_notrace(void)
{
if (!static_branch_unlikely(&sk_dynamic_preempt_schedule_notrace))
return;
@@ -6997,6 +7002,11 @@ void __sched notrace dynamic_preempt_sch
NOKPROBE_SYMBOL(dynamic_preempt_schedule_notrace);
EXPORT_SYMBOL(dynamic_preempt_schedule_notrace);
#endif
+#else
+void __sched __preserve_most preserve_preempt_schedule_notrace(void)
+{
+ preempt_schedule_notrace();
+}
#endif

#endif /* CONFIG_PREEMPTION */

Peter Zijlstra

unread,
Aug 4, 2023, 2:19:20 PM8/4/23
to Steven Rostedt, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, Aug 04, 2023 at 01:59:02PM -0400, Steven Rostedt wrote:
> On Fri, 4 Aug 2023 13:57:57 -0400
> Steven Rostedt <ros...@goodmis.org> wrote:
> > __list_*_valid_check() ?
> >
>
> I have to admit, I think the main reason kernel developers default to using
> these useless underscores is because kernel developers are notoriously
> lousy at naming. ;-)

Well, that and I detest novella length identifiers.

Miguel Ojeda

unread,
Aug 5, 2023, 2:30:37 AM8/5/23
to Steven Rostedt, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, Aug 4, 2023 at 6:03 PM Steven Rostedt <ros...@goodmis.org> wrote:
>
> Can we give actual real names to why the function is "special" besides that
> it now has another underscore added to it?

+1, some docs on top of that can also help a lot to explain the
intended difference quickly to a reader.

Cheers,
Miguel

Miguel Ojeda

unread,
Aug 5, 2023, 2:35:55 AM8/5/23
to Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, Aug 4, 2023 at 11:06 AM Marco Elver <el...@google.com> wrote:
>
> will imply notrace. It is recommended to restrict use of the attribute
> to functions that should or already disable tracing.

Should the last sentence here be added into the code comment?

Apart from that this looks fine from a `compiler_attributes.h`
perspective (even if we are not there anymore):

Reviewed-by: Miguel Ojeda <oj...@kernel.org>

Cheers,
Miguel

Florian Weimer

unread,
Aug 7, 2023, 7:41:19 AM8/7/23
to Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
* Marco Elver:

> [1]: "On X86-64 and AArch64 targets, this attribute changes the calling
> convention of a function. The preserve_most calling convention attempts
> to make the code in the caller as unintrusive as possible. This
> convention behaves identically to the C calling convention on how
> arguments and return values are passed, but it uses a different set of
> caller/callee-saved registers. This alleviates the burden of saving and
> recovering a large register set before and after the call in the
> caller."
>
> [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most

You dropped the interesting part:

| If the arguments are passed in callee-saved registers, then they will
| be preserved by the callee across the call. This doesn’t apply for
| values returned in callee-saved registers.
|
| · On X86-64 the callee preserves all general purpose registers, except
| for R11. R11 can be used as a scratch register. Floating-point
| registers (XMMs/YMMs) are not preserved and need to be saved by the
| caller.
|
| · On AArch64 the callee preserve all general purpose registers, except
| X0-X8 and X16-X18.

Ideally, this would be documented in the respective psABI supplement.
I filled in some gaps and filed:

Document the ABI for __preserve_most__ function calls
<https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/45>

Doesn't this change impact the kernel module ABI?

I would really expect a check here

> +#if __has_attribute(__preserve_most__)
> +# define __preserve_most notrace __attribute__((__preserve_most__))
> +#else
> +# define __preserve_most
> +#endif

that this is not a compilation for a module. Otherwise modules built
with a compiler with __preserve_most__ attribute support are
incompatible with kernels built with a compiler without that attribute.

Thanks,
Florian

Marco Elver

unread,
Aug 7, 2023, 8:25:05 AM8/7/23
to Florian Weimer, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Mon, 7 Aug 2023 at 13:41, Florian Weimer <fwe...@redhat.com> wrote:
>
> * Marco Elver:
>
> > [1]: "On X86-64 and AArch64 targets, this attribute changes the calling
> > convention of a function. The preserve_most calling convention attempts
> > to make the code in the caller as unintrusive as possible. This
> > convention behaves identically to the C calling convention on how
> > arguments and return values are passed, but it uses a different set of
> > caller/callee-saved registers. This alleviates the burden of saving and
> > recovering a large register set before and after the call in the
> > caller."
> >
> > [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most
>
> You dropped the interesting part:

I will add it back for the kernel documentation.

> | If the arguments are passed in callee-saved registers, then they will
> | be preserved by the callee across the call. This doesn’t apply for
> | values returned in callee-saved registers.
> |
> | · On X86-64 the callee preserves all general purpose registers, except
> | for R11. R11 can be used as a scratch register. Floating-point
> | registers (XMMs/YMMs) are not preserved and need to be saved by the
> | caller.
> |
> | · On AArch64 the callee preserve all general purpose registers, except
> | X0-X8 and X16-X18.
>
> Ideally, this would be documented in the respective psABI supplement.
> I filled in some gaps and filed:
>
> Document the ABI for __preserve_most__ function calls
> <https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/45>

Good idea. I had already created
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899, and we need
better spec to proceed for GCC anyway.

> Doesn't this change impact the kernel module ABI?
>
> I would really expect a check here
>
> > +#if __has_attribute(__preserve_most__)
> > +# define __preserve_most notrace __attribute__((__preserve_most__))
> > +#else
> > +# define __preserve_most
> > +#endif
>
> that this is not a compilation for a module. Otherwise modules built
> with a compiler with __preserve_most__ attribute support are
> incompatible with kernels built with a compiler without that attribute.

That's true, but is it a real problem? Isn't it known that trying to
make kernel modules built for a kernel with a different config (incl.
compiler) is not guaranteed to work? See IBT, CFI schemes, kernel
sanitizers, etc?

If we were to start trying to introduce some kind of minimal kernel to
module ABI so that modules and kernels built with different toolchains
keep working together, we'd need a mechanism to guarantee this minimal
ABI or prohibit incompatible modules and kernels somehow. Is there a
precedence for this somewhere?

Peter Zijlstra

unread,
Aug 7, 2023, 8:31:53 AM8/7/23
to Florian Weimer, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
We have a metric ton of options that can break module ABI. If you're
daft enough to not build with the exact same compiler and .config you
get to keep the pieces.

Florian Weimer

unread,
Aug 7, 2023, 8:37:04 AM8/7/23
to Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
* Marco Elver:

> Good idea. I had already created
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899, and we need
> better spec to proceed for GCC anyway.

Thanks for the reference.

>> Doesn't this change impact the kernel module ABI?
>>
>> I would really expect a check here
>>
>> > +#if __has_attribute(__preserve_most__)
>> > +# define __preserve_most notrace __attribute__((__preserve_most__))
>> > +#else
>> > +# define __preserve_most
>> > +#endif
>>
>> that this is not a compilation for a module. Otherwise modules built
>> with a compiler with __preserve_most__ attribute support are
>> incompatible with kernels built with a compiler without that attribute.
>
> That's true, but is it a real problem? Isn't it known that trying to
> make kernel modules built for a kernel with a different config (incl.
> compiler) is not guaranteed to work? See IBT, CFI schemes, kernel
> sanitizers, etc?
>
> If we were to start trying to introduce some kind of minimal kernel to
> module ABI so that modules and kernels built with different toolchains
> keep working together, we'd need a mechanism to guarantee this minimal
> ABI or prohibit incompatible modules and kernels somehow. Is there a
> precedence for this somewhere?

I think the GCC vs Clang thing is expected to work today, isn't it?
Using the Clang-based BPF tools with a GCC-compiled kernel requires a
matching ABI.

The other things you listed result in fairly obvious breakage, sometimes
even module loading failures. Unconditional crashes are possible as
well. With __preserve_most__, the issues are much more subtle and may
only appear for some kernel/module compielr combinations and
optimization settings. The impact of incorrectly clobbered registers
tends to be like that.

Thanks,
Florian

Jakub Jelinek

unread,
Aug 7, 2023, 8:38:22 AM8/7/23
to Marco Elver, Florian Weimer, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Mon, Aug 07, 2023 at 02:24:26PM +0200, Marco Elver wrote:
> > | If the arguments are passed in callee-saved registers, then they will
> > | be preserved by the callee across the call. This doesn’t apply for
> > | values returned in callee-saved registers.
> > |
> > | · On X86-64 the callee preserves all general purpose registers, except
> > | for R11. R11 can be used as a scratch register. Floating-point
> > | registers (XMMs/YMMs) are not preserved and need to be saved by the
> > | caller.
> > |
> > | · On AArch64 the callee preserve all general purpose registers, except
> > | X0-X8 and X16-X18.
> >
> > Ideally, this would be documented in the respective psABI supplement.
> > I filled in some gaps and filed:
> >
> > Document the ABI for __preserve_most__ function calls
> > <https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/45>
>
> Good idea. I had already created
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899, and we need
> better spec to proceed for GCC anyway.

"Registers used for passing arguments
are preserved by the called function, but registers used for
returning results are not."

You mean just GPRs or also vector SSE or MMX registers? Because if some
of those are to be preserved by callee, there is an issue that they need
to be e.g. handled during unwinding, with all the consequences. It is hard
to impossible to guess what size needs to be saved/restored, both normally
or during unwinding. As caller could be say -mavx512f and expect
preservation of all 512 bits and callee -msse2 or -mavx{,2},
or caller -mavx{,2} and expect preservation of all 256 bits and callee -msse2 etc.
MSABI "solves" that by making just the low 128 bits preserved and upper bits
clobbered.

Jakub

Florian Weimer

unread,
Aug 7, 2023, 8:43:53 AM8/7/23
to Jakub Jelinek, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
* Jakub Jelinek:

> On Mon, Aug 07, 2023 at 02:24:26PM +0200, Marco Elver wrote:
>> > | If the arguments are passed in callee-saved registers, then they will
>> > | be preserved by the callee across the call. This doesn’t apply for
>> > | values returned in callee-saved registers.
>> > |
>> > | · On X86-64 the callee preserves all general purpose registers, except
>> > | for R11. R11 can be used as a scratch register. Floating-point
>> > | registers (XMMs/YMMs) are not preserved and need to be saved by the
>> > | caller.
>> > |
>> > | · On AArch64 the callee preserve all general purpose registers, except
>> > | X0-X8 and X16-X18.
>> >
>> > Ideally, this would be documented in the respective psABI supplement.
>> > I filled in some gaps and filed:
>> >
>> > Document the ABI for __preserve_most__ function calls
>> > <https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/45>
>>
>> Good idea. I had already created
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899, and we need
>> better spec to proceed for GCC anyway.
>
> "Registers used for passing arguments
> are preserved by the called function, but registers used for
> returning results are not."
>
> You mean just GPRs or also vector SSE or MMX registers?

I think this is pretty clear for x86-64:

| Floating-point registers (XMMs/YMMs) are not preserved and need to be
| saved by the caller.

The issue is more with future GPR extensions like APX.

Thanks,
Florian

Jakub Jelinek

unread,
Aug 7, 2023, 9:07:05 AM8/7/23
to Florian Weimer, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
The above wording conflicts with that, so it should be clarified.

Jakub

Marco Elver

unread,
Aug 7, 2023, 9:07:59 AM8/7/23
to Florian Weimer, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
I, personally, wouldn't bet on it. It very much depends on the kernel
config used.

> Using the Clang-based BPF tools with a GCC-compiled kernel requires a
> matching ABI.

BPF is a different story altogether, and falls more into the category
of user space to kernel ABI, which the kernel has strong guarantees
on.

> The other things you listed result in fairly obvious breakage, sometimes
> even module loading failures. Unconditional crashes are possible as
> well. With __preserve_most__, the issues are much more subtle and may
> only appear for some kernel/module compielr combinations and
> optimization settings. The impact of incorrectly clobbered registers
> tends to be like that.

One way around this would be to make the availability of the attribute
a Kconfig variable. Then externally compiled kernel modules should do
the right thing, since they ought to use the right .config when being
built.

I can make that change for v3.

Peter Zijlstra

unread,
Aug 7, 2023, 11:06:32 AM8/7/23
to Florian Weimer, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Mon, Aug 07, 2023 at 02:36:53PM +0200, Florian Weimer wrote:

> I think the GCC vs Clang thing is expected to work today, isn't it?
> Using the Clang-based BPF tools with a GCC-compiled kernel requires a
> matching ABI.

Nope, all bets are off. There is no module ABI, in the widest possible
sense.

There's all sorts of subtle breakage, I don't remember the exact
details, but IIRC building the kernel with a compiler that has
asm-goto-output and modules with a compiler that doesn't have it gets
you fireworks.

We absolutely do even bother tracking any of this.

There's also things like GCC plugins, they can randomly (literally in
the case of struct randomization) change things around that isn't
compatible with what the other compiler does -- or perhaps even a later
version of the same compiler.


Nick Desaulniers

unread,
Aug 7, 2023, 11:27:41 AM8/7/23
to Florian Weimer, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org, Jakub Jelinek, Greg KH
Surely the Linux kernel has a stable ABI for modules right? Nah.
https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst

>
> Thanks,
> Florian
>


--
Thanks,
~Nick Desaulniers

Marco Elver

unread,
Aug 8, 2023, 6:21:29 AM8/8/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
[1]: "On X86-64 and AArch64 targets, this attribute changes the calling
convention of a function. The preserve_most calling convention attempts
to make the code in the caller as unintrusive as possible. This
convention behaves identically to the C calling convention on how
arguments and return values are passed, but it uses a different set of
caller/callee-saved registers. This alleviates the burden of saving and
recovering a large register set before and after the call in the caller.
If the arguments are passed in callee-saved registers, then they will be
preserved by the callee across the call. This doesn't apply for values
returned in callee-saved registers.

* On X86-64 the callee preserves all general purpose registers, except
for R11. R11 can be used as a scratch register. Floating-point
registers (XMMs/YMMs) are not preserved and need to be saved by the
caller.

* On AArch64 the callee preserve all general purpose registers, except
x0-X8 and X16-X18."

[1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most

Introduce the attribute to compiler_types.h as __preserve_most.

Use of this attribute results in better code generation for calls to
very rarely called functions, such as error-reporting functions, or
rarely executed slow paths.

Beware that the attribute conflicts with instrumentation calls inserted
on function entry which do not use __preserve_most themselves. Notably,
function tracing which assumes the normal C calling convention for the
given architecture. Where the attribute is supported, __preserve_most
will imply notrace. It is recommended to restrict use of the attribute
to functions that should or already disable tracing.

The attribute may be supported by a future GCC version (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899).

Signed-off-by: Marco Elver <el...@google.com>
Reviewed-by: Miguel Ojeda <oj...@kernel.org>
Acked-by: Steven Rostedt (Google) <ros...@goodmis.org>
---
v3:
* Quote more from LLVM documentation about which registers are
callee/caller with preserve_most.
* Code comment to restrict use where tracing is meant to be disabled.

v2:
* Imply notrace, to avoid any conflicts with tracing which is inserted
on function entry. See added comments.
---
include/linux/compiler_types.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 547ea1ff806e..c88488715a39 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -106,6 +106,34 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
#define __cold
#endif

+/*
+ * On x86-64 and arm64 targets, __preserve_most changes the calling convention
+ * of a function to make the code in the caller as unintrusive as possible. This
+ * convention behaves identically to the C calling convention on how arguments
+ * and return values are passed, but uses a different set of caller- and callee-
+ * saved registers.
+ *
+ * The purpose is to alleviates the burden of saving and recovering a large
+ * register set before and after the call in the caller. This is beneficial for
+ * rarely taken slow paths, such as error-reporting functions that may be called
+ * from hot paths.
+ *
+ * Note: This may conflict with instrumentation inserted on function entry which
+ * does not use __preserve_most or equivalent convention (if in assembly). Since
+ * function tracing assumes the normal C calling convention, where the attribute
+ * is supported, __preserve_most implies notrace. It is recommended to restrict
+ * use of the attribute to functions that should or already disable tracing.
+ *
+ * Optional: not supported by gcc.
+ *
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#preserve-most
+ */
+#if __has_attribute(__preserve_most__)
+# define __preserve_most notrace __attribute__((__preserve_most__))
+#else
+# define __preserve_most
+#endif

Marco Elver

unread,
Aug 8, 2023, 6:21:31 AM8/8/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Turn the list debug checking functions __list_*_valid() into inline
functions that wrap the out-of-line functions. Care is taken to ensure
the inline wrappers are always inlined, so that additional compiler
instrumentation (such as sanitizers) does not result in redundant
outlining.

This change is preparation for performing checks in the inline wrappers.

No functional change intended.

Signed-off-by: Marco Elver <el...@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* Some documentation.
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 6 ++---
include/linux/list.h | 37 +++++++++++++++++++++++++---
lib/list_debug.c | 11 ++++-----
3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index d68abd7ea124..16266a939a4c 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,8 +26,8 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) ||
NVHE_CHECK_DATA_CORRUPTION(prev->next != next) ||
@@ -37,7 +37,7 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,
return true;
}

-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;

diff --git a/include/linux/list.h b/include/linux/list.h
index f10344dbad4d..130c6a1bb45c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,10 +39,39 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
-extern bool __list_add_valid(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
-extern bool __list_del_entry_valid(struct list_head *entry);
+/*
+ * Performs the full set of list corruption checks before __list_add().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);
+
+/*
+ * Performs list corruption checks before __list_add(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_add_valid(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next)
+{
+ return __list_add_valid_or_report(new, prev, next);
+}
+
+/*
+ * Performs the full set of list corruption checks before __list_del_entry().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+
+/*
+ * Performs list corruption checks before __list_del_entry(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_del_entry_valid(struct list_head *entry)
+{
+ return __list_del_entry_valid_or_report(entry);
+}
#else
static inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
diff --git a/lib/list_debug.c b/lib/list_debug.c
index d98d43f80958..2def33b1491f 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,8 +17,8 @@
* attempt).
*/

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (CHECK_DATA_CORRUPTION(prev == NULL,
"list_add corruption. prev is NULL.\n") ||
@@ -37,9 +37,9 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,

return true;
}
-EXPORT_SYMBOL(__list_add_valid);
+EXPORT_SYMBOL(__list_add_valid_or_report);

-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;

@@ -65,6 +65,5 @@ bool __list_del_entry_valid(struct list_head *entry)
return false;

return true;
-
}
-EXPORT_SYMBOL(__list_del_entry_valid);
+EXPORT_SYMBOL(__list_del_entry_valid_or_report);
--
2.41.0.640.ga95def55d0-goog

Marco Elver

unread,
Aug 8, 2023, 6:21:34 AM8/8/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Signed-off-by: Marco Elver <el...@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
__preserve_most's implied notrace is a noop here.
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 +
include/linux/list.h | 64 +++++++++++++++++++++++++---
lib/Kconfig.debug | 15 +++++++
lib/list_debug.c | 2 +
4 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
return true;
}

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..066fe33e99bf 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_DEBUG_LIST_MINIMAL
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
/*
* Performs the full set of list corruption checks before __list_add().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_add_valid_or_report(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);

/*
* Performs list corruption checks before __list_add(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_add_valid_or_report().
*/
static __always_inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- return __list_add_valid_or_report(new, prev, next);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+ /*
+ * In the minimal config, elide checking if next and prev are
+ * NULL, since the immediate dereference of them below would
+ * result in a fault if NULL.
+ *
+ * With the minimal config we can afford to inline the checks,
+ * which also gives the compiler a chance to elide some of them
+ * completely if they can be proven at compile-time. If one of
+ * the pre-conditions does not hold, the slow-path will show a
+ * report which pre-condition failed.
+ */
+ if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_add_valid_or_report(new, prev, next);
+ return ret;
}

/*
* Performs the full set of list corruption checks before __list_del_entry().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);

/*
* Performs list corruption checks before __list_del_entry(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_del_entry_valid_or_report().
*/
static __always_inline bool __list_del_entry_valid(struct list_head *entry)
{
- return __list_del_entry_valid_or_report(entry);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+ struct list_head *prev = entry->prev;
+ struct list_head *next = entry->next;
+
+ /*
+ * In the minimal config, elide checking if next and prev are
+ * NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+ * dereference of them below would result in a fault.
+ */
+ if (likely(prev->next == entry && next->prev == entry))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_del_entry_valid_or_report(entry);
+ return ret;
}
#else
static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..e72cf08af0fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1680,6 +1680,21 @@ config DEBUG_LIST

If unsure, say N.

+config DEBUG_LIST_MINIMAL
+ bool "Minimal linked list debug checks"
+ default !DEBUG_KERNEL
+ depends on DEBUG_LIST
+ help
+ Only perform the minimal set of checks in the linked-list walking
+ routines to catch corruptions that are not guaranteed to result in an
+ immediate access fault.
+
+ This trades lower quality error reports for improved performance: the
+ generated code should be more optimal and provide trade-offs that may
+ better serve safety- and performance- critical environments.
+
+ If unsure, say Y.
+
config DEBUG_PLIST
bool "Debug priority linked list manipulation"
depends on DEBUG_KERNEL
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
* attempt).
*/

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
}
EXPORT_SYMBOL(__list_add_valid_or_report);

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
--
2.41.0.640.ga95def55d0-goog

Peter Zijlstra

unread,
Aug 8, 2023, 6:57:18 AM8/8/23
to Florian Weimer, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org, Josh Poimboeuf
On Mon, Aug 07, 2023 at 01:41:07PM +0200, Florian Weimer wrote:
So in the GCC bugzilla you raise the point of unwinding.

So in arch/x86 we've beeing doing what this attribute proposes (except
without that weird r11 exception) for a long while.

We simply do: asm("call foo_thunk"); instead of a C call, and then
have the 'thunk' thing PUSH/POP all the registers around a regular C
function.

Paravirt does quite a lot of that as well.

In a very few cases we implement a function specifically to avoid all
the PUSH/POP nonsense because it's so small the stack overhead kills it.

For unwinding this means that the 'thunk' becomes invisible when IP is
not inside it. But since the thunk is purely 'uninteresting' PUSH/POP
around a real C call, this is not an issue.

[[ tail calls leaving their sibling invisible is a far more annoying
issue ]]

If the function is one of those special things, then it will be a leaf
function and we get to see it throught he IP.


Now, the problem with __preserve_most is that it makes it really easy to
deviate from this pattern, you can trivially write a function that is
not a trivial wrapper and then does not show up on unwind. This might
indeed be a problem.

Ofc. the kernel doesn't longjmp much, and we also don't throw many
exxceptions, but the live-patch people might care (although ORC unwinder
should be able to deal with all this perfectly fine).

In userspace, would not .eh_frame still be able to unwind this?

Florian Weimer

unread,
Aug 8, 2023, 7:41:15 AM8/8/23
to Peter Zijlstra, Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org, Josh Poimboeuf
* Peter Zijlstra:

> Now, the problem with __preserve_most is that it makes it really easy to
> deviate from this pattern, you can trivially write a function that is
> not a trivial wrapper and then does not show up on unwind. This might
> indeed be a problem.

Backtrace generation shouldn't be impacted by a compiler implementation
of __preserve_most__. If unwinding implies restoring register contents,
the question becomes whether the unwinder can be taught to do this
natively. For .eh_frame/PT_GNU_EH_FRAME-based unwinders and
__preserve_most__, I think that's true because they already support
custom ABIs (and GCC uses them for local functions). In other cases, if
the unwinder does not support the extra registers, then it might still
be possible to compensate for that via code generation (e.g., setjmp
won't be __preserve_most__, so the compiler would have to preserve
register contents by other means, also accounting for the returns-twice
nature, likewise for exception handling landing pads).

But __preserve_all__ is a completely different beast. I *think* it is
possible to do this with helpers (state size, state save, state restore)
and strategically placed restores after returns-twice functions and the
like, but people disagree. This has come up before in the context of
the s390x vector ABI and the desire to add new callee-saved registers.
We just couldn't make that work at the time. On the other hand,
__preserve_all__ goes into the other direction (opt-in of extra saves),
so it may be conceptually easier.

Thanks,
Florian

Mark Rutland

unread,
Aug 8, 2023, 8:35:42 AM8/8/23
to Marco Elver, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
So long as this implies notrace I believe it is safe, so:

Acked-by: Mark Rutland <mark.r...@arm.com>

Mark.

Kees Cook

unread,
Aug 8, 2023, 5:27:17 PM8/8/23
to Marco Elver, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote:
> Numerous production kernel configs (see [1, 2]) are choosing to enable
> CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
> configs [3]. The feature has never been designed with performance in
> mind, yet common list manipulation is happening across hot paths all
> over the kernel.
>
> Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
> checking inline, and only upon list corruption delegates to the
> reporting slow path.

I'd really like to get away from calling this "DEBUG", since it's used
more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
making this better a while back, but the series never landed. Do you
have a bit of time to look through it?

https://github.com/KSPP/linux/issues/10
https://lore.kernel.org/lkml/2020032415364...@kernel.org/

-Kees

--
Kees Cook

Marco Elver

unread,
Aug 9, 2023, 3:36:09 AM8/9/23
to Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
I'm fine renaming this one. But there are other issues that Will's
series solves, which I don't want this series to depend on. We can try
to sort them out separately.

The main problem here is that DEBUG_LIST has been designed to be
friendly for debugging (incl. checking poison values and NULL). Some
kernel devs may still want that, but for production use is pointless
and wasteful.

So what I can propose is to introduce CONFIG_LIST_HARDENED that
doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
we still use that code to produce a report.

If there are other list types that have similar debug checks, but
where we can optimize performance by eliding some and moving them
inline, we can do the same (CONFIG_*_HARDENED). If the checks are
already optimized, we could just rename it to CONFIG_*_HARDENED and
remove the DEBUG-name.

I'm not a big fan of the 2 modes either, but that's what we get if we
want to support the debugging and hardening usecases. Inlining the
full set of checks does not work for performance and size.

Marco Elver

unread,
Aug 9, 2023, 5:57:29 AM8/9/23
to Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Wed, Aug 09, 2023 at 09:35AM +0200, Marco Elver wrote:

> > I'd really like to get away from calling this "DEBUG", since it's used
> > more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
> > making this better a while back, but the series never landed. Do you
> > have a bit of time to look through it?
> >
> > https://github.com/KSPP/linux/issues/10
> > https://lore.kernel.org/lkml/2020032415364...@kernel.org/
>
> I'm fine renaming this one. But there are other issues that Will's
> series solves, which I don't want this series to depend on. We can try
> to sort them out separately.
>
> The main problem here is that DEBUG_LIST has been designed to be
> friendly for debugging (incl. checking poison values and NULL). Some
> kernel devs may still want that, but for production use is pointless
> and wasteful.
>
> So what I can propose is to introduce CONFIG_LIST_HARDENED that
> doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
> we still use that code to produce a report.

How about the below?

We'll add CONFIG_HARDEN_LIST (in Kconfig.hardening), which is
independent of CONFIG_DEBUG_LIST. For the implementation it selects
DEBUG_LIST, but irrelevant for users.

This will get us the best of both worlds: a version for hardening that
should remain as fast as possible, and one for debugging with better
reports.

------ >8 ------

From: Marco Elver <el...@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH v4 3/3] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

To generate optimal machine code with CONFIG_HARDEN_LIST:

1. Elide checking for pointer values which upon dereference would
result in an immediate access fault -- therefore "minimal" checks.
The trade-off is lower-quality error reports.

2. Use the newly introduced __preserve_most function attribute
(available with Clang, but not yet with GCC) to minimize the code
footprint for calling the reporting slow path. As a result,
function size of callers is reduced by avoiding saving registers
before calling the rarely called reporting slow path.

Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.

3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <el...@google.com>
---
v4:
* Rename to CONFIG_HARDEN_LIST, which can independently be selected from
CONFIG_DEBUG_LIST.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
__preserve_most's implied notrace is a noop here.
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 +
include/linux/list.h | 64 +++++++++++++++++++++++++---
lib/Kconfig.debug | 12 ++++--
lib/list_debug.c | 2 +
security/Kconfig.hardening | 14 ++++++
5 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
return true;
}

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..1c7f70b7cc7a 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_HARDEN_LIST
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
/*
* Performs the full set of list corruption checks before __list_add().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_add_valid_or_report(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);

/*
* Performs list corruption checks before __list_add(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
*/
static __always_inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- return __list_add_valid_or_report(new, prev, next);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+ /*
+ * With the hardening version, elide checking if next and prev
+ * are NULL, since the immediate dereference of them below would
+ * result in a fault if NULL.
+ *
+ * With the reduced set of checks, we can afford to inline the
+ * checks, which also gives the compiler a chance to elide some
+ * of them completely if they can be proven at compile-time. If
+ * one of the pre-conditions does not hold, the slow-path will
+ * show a report which pre-condition failed.
+ */
+ if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_add_valid_or_report(new, prev, next);
+ return ret;
}

/*
* Performs the full set of list corruption checks before __list_del_entry().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);

/*
* Performs list corruption checks before __list_del_entry(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
*/
static __always_inline bool __list_del_entry_valid(struct list_head *entry)
{
- return __list_del_entry_valid_or_report(entry);
+ bool ret = true;
+
+ if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+ struct list_head *prev = entry->prev;
+ struct list_head *next = entry->next;
+
+ /*
+ * With the hardening version, elide checking if next and prev
+ * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+ * dereference of them below would result in a fault.
+ */
+ if (likely(prev->next == entry && next->prev == entry))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_del_entry_valid_or_report(entry);
+ return ret;
}
#else
static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..6b0de78fb2da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1672,11 +1672,15 @@ config HAVE_DEBUG_BUGVERBOSE
menu "Debug kernel data structures"

config DEBUG_LIST
- bool "Debug linked list manipulation"
- depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+ bool "Debug linked list manipulation" if !HARDEN_LIST
+ depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION || HARDEN_LIST
help
- Enable this to turn on extended checks in the linked-list
- walking routines.
+ Enable this to turn on extended checks in the linked-list walking
+ routines.
+
+ If you care about performance, you should enable CONFIG_HARDEN_LIST
+ instead. This option alone trades better quality error reports for
+ worse performance, and is more suitable for debugging.

If unsure, say N.

diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
* attempt).
*/

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
}
EXPORT_SYMBOL(__list_add_valid_or_report);

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..a8aef895f13d 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,20 @@ config ZERO_CALL_USED_REGS

endmenu

+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+ bool "Check integrity of linked list manipulation"
+ select DEBUG_LIST
+ help
+ Minimal integrity checking in the linked-list manipulation routines
+ to catch memory corruptions that are not guaranteed to result in an
+ immediate access fault.
+
+ If unsure, say N.
+
+endmenu
+
config CC_HAS_RANDSTRUCT
def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
# Randstruct was first added in Clang 15, but it isn't safe to use until
--
2.41.0.640.ga95def55d0-goog

Marco Elver

unread,
Aug 9, 2023, 12:32:45 PM8/9/23
to Steven Rostedt, Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
[...]
>
> I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> way around. It logically doesn't make sense that HARDEN_LIST would select
> DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> DEBUG_LIST (because who knows, it may add other features I don't want). But
> then, I may have stumbled over something and want more info, and enable
> DEBUG_LIST (while still having HARDEN_LIST) enabled.
>
> I think you are looking at this from an implementation perspective and not
> the normal developer one.
>
[...]
>
> That is, if DEBUG_LIST is enabled, we always call the
> __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> do the shortcut.

Good point - I think this is better. See below tentative v4.

Kees: Does that also look more like what you had in mind?

Thanks,
-- Marco

------ >8 ------

From: Marco Elver <el...@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

Since DEBUG_LIST is functionally a superset of HARDEN_LIST, the Kconfig
variables are designed to reflect that: DEBUG_LIST selects HARDEN_LIST,
whereas HARDEN_LIST itself has no dependency on DEBUG_LIST.
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 +
include/linux/list.h | 64 +++++++++++++++++++++++++---
lib/Kconfig.debug | 9 +++-
lib/Makefile | 2 +-
lib/list_debug.c | 5 ++-
security/Kconfig.hardening | 13 ++++++
7 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 9ddc025e4b86..c89c85a41ac4 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -25,7 +25,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
-hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+hyp-obj-$(CONFIG_HARDEN_LIST) += list_debug.o
hyp-obj-y += $(lib-objs)

##
diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
return true;
}

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..ef899c27c68b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
WRITE_ONCE(list->prev, list);
}

+#ifdef CONFIG_HARDEN_LIST
+
#ifdef CONFIG_DEBUG_LIST
+# define __list_valid_slowpath
+#else
+# define __list_valid_slowpath __cold __preserve_most
+#endif
+
/*
* Performs the full set of list corruption checks before __list_add().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_add_valid_or_report(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);

/*
* Performs list corruption checks before __list_add(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
*/
static __always_inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- return __list_add_valid_or_report(new, prev, next);
+ bool ret = true;
+
+ if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
*/
static __always_inline bool __list_del_entry_valid(struct list_head *entry)
{
- return __list_del_entry_valid_or_report(entry);
+ bool ret = true;
+
+ if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+ struct list_head *prev = entry->prev;
+ struct list_head *next = entry->next;
+
+ /*
+ * With the hardening version, elide checking if next and prev
+ * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+ * dereference of them below would result in a fault.
+ */
+ if (likely(prev->next == entry && next->prev == entry))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_del_entry_valid_or_report(entry);
+ return ret;
}
#else
static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..9e38956d6f50 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1674,9 +1674,14 @@ menu "Debug kernel data structures"
config DEBUG_LIST
bool "Debug linked list manipulation"
depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+ select HARDEN_LIST
help
- Enable this to turn on extended checks in the linked-list
- walking routines.
+ Enable this to turn on extended checks in the linked-list walking
+ routines.
+
+ This option trades better quality error reports for performance, and
+ is more suitable for kernel debugging. If you care about performance,
+ you should only enable CONFIG_HARDEN_LIST instead.

If unsure, say N.

diff --git a/lib/Makefile b/lib/Makefile
index 42d307ade225..a7ebc9090f04 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -161,7 +161,7 @@ obj-$(CONFIG_BTREE) += btree.o
obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-$(CONFIG_HARDEN_LIST) += list_debug.o
obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o

obj-$(CONFIG_BITREVERSE) += bitrev.o
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..38ddc7c01eab 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -2,7 +2,8 @@
* Copyright 2006, Red Hat, Inc., Dave Jones
* Released under the General Public License (GPL).
*
- * This file contains the linked list validation for DEBUG_LIST.
+ * This file contains the linked list validation and error reporting for
+ * HARDEN_LIST and DEBUG_LIST.
*/

#include <linux/export.h>
@@ -17,6 +18,7 @@
* attempt).
*/

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -39,6 +41,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
}
EXPORT_SYMBOL(__list_add_valid_or_report);

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..19aa2b7d2f64 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,19 @@ config ZERO_CALL_USED_REGS

endmenu

+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+ bool "Check integrity of linked list manipulation"

Kees Cook

unread,
Aug 10, 2023, 4:12:02 PM8/10/23
to Marco Elver, Steven Rostedt, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> [...]
> >
> > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > way around. It logically doesn't make sense that HARDEN_LIST would select
> > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > then, I may have stumbled over something and want more info, and enable
> > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> >
> > I think you are looking at this from an implementation perspective and not
> > the normal developer one.
> >
> [...]
> >
> > That is, if DEBUG_LIST is enabled, we always call the
> > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > do the shortcut.
>
> Good point - I think this is better. See below tentative v4.
>
> Kees: Does that also look more like what you had in mind?

Yeah, this looks good. My only nit would be a naming one. All the
other hardening features are named "HARDENED", but perhaps the "ED"
is redundant in the others. Still, consistency seems nicer. What do you
think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
to keep the subsystem name first and then apply optional elements after.)

One note: do the LKDTM list hardening tests still pass? i.e.
CORRUPT_LIST_ADD
CORRUPT_LIST_DEL

> [...]
> + /*
> + * With the hardening version, elide checking if next and prev
> + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> + * dereference of them below would result in a fault.
> + */
> + if (likely(prev->next == entry && next->prev == entry))
> + return true;

I'm not super excited about skipping those checks, since they are
values that can be reached through kernel list management confusion. If
an attacker is using a system where the zero-page has been mapped
and is accessible (i.e. lacking SMAP etc), then attacks could still
be constructed. However, I do recognize this chain of exploitation
prerequisites is getting rather long, so probably this is a reasonable
trade off on modern systems.

-Kees

--
Kees Cook

Marco Elver

unread,
Aug 11, 2023, 5:11:29 AM8/11/23
to Kees Cook, Steven Rostedt, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Thu, 10 Aug 2023 at 22:12, Kees Cook <kees...@chromium.org> wrote:
>
> On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> > On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> > [...]
> > >
> > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > > way around. It logically doesn't make sense that HARDEN_LIST would select
> > > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > > then, I may have stumbled over something and want more info, and enable
> > > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> > >
> > > I think you are looking at this from an implementation perspective and not
> > > the normal developer one.
> > >
> > [...]
> > >
> > > That is, if DEBUG_LIST is enabled, we always call the
> > > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > > do the shortcut.
> >
> > Good point - I think this is better. See below tentative v4.
> >
> > Kees: Does that also look more like what you had in mind?
>
> Yeah, this looks good. My only nit would be a naming one. All the
> other hardening features are named "HARDENED", but perhaps the "ED"
> is redundant in the others. Still, consistency seems nicer. What do you
> think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
> to keep the subsystem name first and then apply optional elements after.)

Naming is a bit all over. :-/
I agree with the <subsystem>_<suboption> scheme, generally. I think
initially I tried to keep the name shorter, and also find a good
counter-part to DEBUG_<suboption>, therefore HARDEN_LIST.

Let's just change it to CONFIG_LIST_HARDENED, given the existing
"HARDENED" options.

I don't have a strong preference.

> One note: do the LKDTM list hardening tests still pass? i.e.
> CORRUPT_LIST_ADD
> CORRUPT_LIST_DEL

Yes, they do. Though I need to also adjust BUG_ON_DATA_CORRUPTION to
select LIST_HARDENED, and the test should check for the new option
(which is implied by DEBUG_LIST now). There will be an additional
patch to adjust that.

> > [...]
> > + /*
> > + * With the hardening version, elide checking if next and prev
> > + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> > + * dereference of them below would result in a fault.
> > + */
> > + if (likely(prev->next == entry && next->prev == entry))
> > + return true;
>
> I'm not super excited about skipping those checks, since they are
> values that can be reached through kernel list management confusion. If
> an attacker is using a system where the zero-page has been mapped
> and is accessible (i.e. lacking SMAP etc), then attacks could still
> be constructed. However, I do recognize this chain of exploitation
> prerequisites is getting rather long, so probably this is a reasonable
> trade off on modern systems.

Sure, it's a trade-off for systems which do have the bare minimum of
modern hardware security features.

Marco Elver

unread,
Aug 11, 2023, 11:19:59 AM8/11/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Note: The additional preprocessor check against architecture should not
be necessary if __has_attribute() only returns true where supported;
also see https://github.com/ClangBuiltLinux/linux/issues/1908. But until
__has_attribute() does the right thing, we also guard by known-supported
architectures to avoid build warnings on other architectures.

The attribute may be supported by a future GCC version (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899).

Signed-off-by: Marco Elver <el...@google.com>
Reviewed-by: Miguel Ojeda <oj...@kernel.org>
Reviewed-by: Nick Desaulniers <ndesau...@google.com>
Acked-by: Steven Rostedt (Google) <ros...@goodmis.org>
Acked-by: Mark Rutland <mark.r...@arm.com>
---
v4:
* Guard attribute based on known-supported architectures to avoid
compiler warnings about the attribute being ignored.

v3:
* Quote more from LLVM documentation about which registers are
callee/caller with preserve_most.
* Code comment to restrict use where tracing is meant to be disabled.

v2:
* Imply notrace, to avoid any conflicts with tracing which is inserted
on function entry. See added comments.
---
include/linux/compiler_types.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 547ea1ff806e..c523c6683789 100644
+#if __has_attribute(__preserve_most__) && (defined(CONFIG_X86_64) || defined(CONFIG_ARM64))
+# define __preserve_most notrace __attribute__((__preserve_most__))
+#else
+# define __preserve_most
+#endif
+
/* Builtins */

/*
--
2.41.0.694.ge786442a9b-goog

Marco Elver

unread,
Aug 11, 2023, 11:20:02 AM8/11/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Turn the list debug checking functions __list_*_valid() into inline
functions that wrap the out-of-line functions. Care is taken to ensure
the inline wrappers are always inlined, so that additional compiler
instrumentation (such as sanitizers) does not result in redundant
outlining.

This change is preparation for performing checks in the inline wrappers.

No functional change intended.

Signed-off-by: Marco Elver <el...@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* Some documentation.
---
arch/arm64/kvm/hyp/nvhe/list_debug.c | 6 ++---
include/linux/list.h | 37 +++++++++++++++++++++++++---
lib/list_debug.c | 11 ++++-----
3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index d68abd7ea124..16266a939a4c 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,8 +26,8 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) ||
NVHE_CHECK_DATA_CORRUPTION(prev->next != next) ||
@@ -37,7 +37,7 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,
return true;
}

-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;

diff --git a/include/linux/list.h b/include/linux/list.h
index f10344dbad4d..130c6a1bb45c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,10 +39,39 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
}

#ifdef CONFIG_DEBUG_LIST
-extern bool __list_add_valid(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
-extern bool __list_del_entry_valid(struct list_head *entry);
+/*
+ * Performs the full set of list corruption checks before __list_add().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);
+
+/*
+ * Performs list corruption checks before __list_add(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_add_valid(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next)
+{
+ return __list_add_valid_or_report(new, prev, next);
+}
+
+/*
+ * Performs the full set of list corruption checks before __list_del_entry().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+
+/*
+ * Performs list corruption checks before __list_del_entry(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_del_entry_valid(struct list_head *entry)
+{
+ return __list_del_entry_valid_or_report(entry);
+}
#else
static inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
diff --git a/lib/list_debug.c b/lib/list_debug.c
index d98d43f80958..2def33b1491f 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,8 +17,8 @@
* attempt).
*/

-bool __list_add_valid(struct list_head *new, struct list_head *prev,
- struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
{
if (CHECK_DATA_CORRUPTION(prev == NULL,
"list_add corruption. prev is NULL.\n") ||
@@ -37,9 +37,9 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,

return true;
}
-EXPORT_SYMBOL(__list_add_valid);
+EXPORT_SYMBOL(__list_add_valid_or_report);

-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;

@@ -65,6 +65,5 @@ bool __list_del_entry_valid(struct list_head *entry)
return false;

return true;
-
}
-EXPORT_SYMBOL(__list_del_entry_valid);
+EXPORT_SYMBOL(__list_del_entry_valid_or_report);
--
2.41.0.694.ge786442a9b-goog

Marco Elver

unread,
Aug 11, 2023, 11:20:05 AM8/11/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

To generate optimal machine code with CONFIG_LIST_HARDENED:

1. Elide checking for pointer values which upon dereference would
result in an immediate access fault (i.e. minimal hardening
checks). The trade-off is lower-quality error reports.

2. Use the __preserve_most function attribute (available with Clang,
but not yet with GCC) to minimize the code footprint for calling
the reporting slow path. As a result, function size of callers is
reduced by avoiding saving registers before calling the rarely
called reporting slow path.

Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.

3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
Kconfig variables are changed to reflect that: DEBUG_LIST selects
LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
DEBUG_LIST.

Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
Signed-off-by: Marco Elver <el...@google.com>
---
v4:
* Rename to CONFIG_LIST_HARDENED, which can independently be selected
from CONFIG_DEBUG_LIST.
* LKDTM test should just check CONFIG_LIST_HARDENED (which is also
implied by DEBUG_LIST).
* Comment word smithing.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
__preserve_most's implied notrace is a noop here.
---
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 +
drivers/misc/lkdtm/bugs.c | 4 +-
include/linux/list.h | 64 +++++++++++++++++++++++++---
lib/Kconfig.debug | 9 +++-
lib/Makefile | 2 +-
lib/list_debug.c | 5 ++-
security/Kconfig.hardening | 13 ++++++
8 files changed, 88 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 9ddc025e4b86..2250253a6429 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -25,7 +25,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
-hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
hyp-obj-y += $(lib-objs)

##
diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)

/* The predicates checked here are taken from lib/list_debug.c. */

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
return true;
}

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 3c95600ab2f7..963b4dee6a7d 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -393,7 +393,7 @@ static void lkdtm_CORRUPT_LIST_ADD(void)
pr_err("Overwrite did not happen, but no BUG?!\n");
else {
pr_err("list_add() corruption not detected!\n");
- pr_expected_config(CONFIG_DEBUG_LIST);
+ pr_expected_config(CONFIG_LIST_HARDENED);
}
}

@@ -420,7 +420,7 @@ static void lkdtm_CORRUPT_LIST_DEL(void)
pr_err("Overwrite did not happen, but no BUG?!\n");
else {
pr_err("list_del() corruption not detected!\n");
- pr_expected_config(CONFIG_DEBUG_LIST);
+ pr_expected_config(CONFIG_LIST_HARDENED);
}
}

diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..164b4d0e9d2a 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
WRITE_ONCE(list->prev, list);
}

+#ifdef CONFIG_LIST_HARDENED
+
#ifdef CONFIG_DEBUG_LIST
+# define __list_valid_slowpath
+#else
+# define __list_valid_slowpath __cold __preserve_most
+#endif
+
/*
* Performs the full set of list corruption checks before __list_add().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_add_valid_or_report(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next);

/*
* Performs list corruption checks before __list_add(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking
+ * inline to catch non-faulting corruptions, and only if a corruption is
+ * detected calls the reporting function __list_add_valid_or_report().
*/
static __always_inline bool __list_add_valid(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- return __list_add_valid_or_report(new, prev, next);
+ bool ret = true;
+
+ if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+ /*
+ * With the hardening version, elide checking if next and prev
+ * are NULL, since the immediate dereference of them below would
+ * result in a fault if NULL.
+ *
+ * With the reduced set of checks, we can afford to inline the
+ * checks, which also gives the compiler a chance to elide some
+ * of them completely if they can be proven at compile-time. If
+ * one of the pre-conditions does not hold, the slow-path will
+ * show a report which pre-condition failed.
+ */
+ if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_add_valid_or_report(new, prev, next);
+ return ret;
}

/*
* Performs the full set of list corruption checks before __list_del_entry().
* On list corruption reports a warning, and returns false.
*/
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);

/*
* Performs list corruption checks before __list_del_entry(). Returns false if a
* corruption is detected, true otherwise.
+ *
+ * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking
+ * inline to catch non-faulting corruptions, and only if a corruption is
+ * detected calls the reporting function __list_del_entry_valid_or_report().
*/
static __always_inline bool __list_del_entry_valid(struct list_head *entry)
{
- return __list_del_entry_valid_or_report(entry);
+ bool ret = true;
+
+ if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+ struct list_head *prev = entry->prev;
+ struct list_head *next = entry->next;
+
+ /*
+ * With the hardening version, elide checking if next and prev
+ * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+ * dereference of them below would result in a fault.
+ */
+ if (likely(prev->next == entry && next->prev == entry))
+ return true;
+ ret = false;
+ }
+
+ ret &= __list_del_entry_valid_or_report(entry);
+ return ret;
}
#else
static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..c38745ad46eb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1674,9 +1674,14 @@ menu "Debug kernel data structures"
config DEBUG_LIST
bool "Debug linked list manipulation"
depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+ select LIST_HARDENED
help
- Enable this to turn on extended checks in the linked-list
- walking routines.
+ Enable this to turn on extended checks in the linked-list walking
+ routines.
+
+ This option trades better quality error reports for performance, and
+ is more suitable for kernel debugging. If you care about performance,
+ you should only enable CONFIG_LIST_HARDENED instead.

If unsure, say N.

diff --git a/lib/Makefile b/lib/Makefile
index 1ffae65bb7ee..d1397785ec16 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -167,7 +167,7 @@ obj-$(CONFIG_BTREE) += btree.o
obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-$(CONFIG_LIST_HARDENED) += list_debug.o
obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o

obj-$(CONFIG_BITREVERSE) += bitrev.o
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..db602417febf 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -2,7 +2,8 @@
* Copyright 2006, Red Hat, Inc., Dave Jones
* Released under the General Public License (GPL).
*
- * This file contains the linked list validation for DEBUG_LIST.
+ * This file contains the linked list validation and error reporting for
+ * LIST_HARDENED and DEBUG_LIST.
*/

#include <linux/export.h>
@@ -17,6 +18,7 @@
* attempt).
*/

+__list_valid_slowpath
bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
@@ -39,6 +41,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
}
EXPORT_SYMBOL(__list_add_valid_or_report);

+__list_valid_slowpath
bool __list_del_entry_valid_or_report(struct list_head *entry)
{
struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..ffc3c702b461 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,19 @@ config ZERO_CALL_USED_REGS

endmenu

+menu "Hardening of kernel data structures"
+
+config LIST_HARDENED
+ bool "Check integrity of linked list manipulation"
+ help
+ Minimal integrity checking in the linked-list manipulation routines
+ to catch memory corruptions that are not guaranteed to result in an
+ immediate access fault.
+
+ If unsure, say N.
+
+endmenu
+
config CC_HAS_RANDSTRUCT
def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
# Randstruct was first added in Clang 15, but it isn't safe to use until
--
2.41.0.694.ge786442a9b-goog

Marco Elver

unread,
Aug 11, 2023, 11:20:09 AM8/11/23
to el...@google.com, Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
BUG_ON_DATA_CORRUPTION is turning detected corruptions of list data
structures from WARNings into BUGs. This can be useful to stop further
corruptions or even exploitation attempts.

However, the option has less to do with debugging than with hardening.
With the introduction of LIST_HARDENED, it makes more sense to move it
to the hardening options, where it selects LIST_HARDENED instead.

Without this change, combining BUG_ON_DATA_CORRUPTION with LIST_HARDENED
alone wouldn't be possible, because DEBUG_LIST would always be selected
by BUG_ON_DATA_CORRUPTION.

Signed-off-by: Marco Elver <el...@google.com>
---
v4:
* New patch, after LIST_HARDENED was made independent of DEBUG_LIST, and
now DEBUG_LIST depends on LIST_HARDENED.
---
lib/Kconfig.debug | 12 +-----------
security/Kconfig.hardening | 10 ++++++++++
2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c38745ad46eb..c7348d1fabe5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1673,7 +1673,7 @@ menu "Debug kernel data structures"

config DEBUG_LIST
bool "Debug linked list manipulation"
- depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+ depends on DEBUG_KERNEL
select LIST_HARDENED
help
Enable this to turn on extended checks in the linked-list walking
@@ -1715,16 +1715,6 @@ config DEBUG_NOTIFIERS
This is a relatively cheap check but if you care about maximum
performance, say N.

-config BUG_ON_DATA_CORRUPTION
- bool "Trigger a BUG when data corruption is detected"
- select DEBUG_LIST
- help
- Select this option if the kernel should BUG when it encounters
- data corruption in kernel memory structures when they get checked
- for validity.
-
- If unsure, say N.
-
config DEBUG_MAPLE_TREE
bool "Debug maple trees"
depends on DEBUG_KERNEL
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index ffc3c702b461..2cff851ebfd7 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -290,6 +290,16 @@ config LIST_HARDENED

If unsure, say N.

+config BUG_ON_DATA_CORRUPTION
+ bool "Trigger a BUG when data corruption is detected"
+ select LIST_HARDENED
+ help
+ Select this option if the kernel should BUG when it encounters
+ data corruption in kernel memory structures when they get checked
+ for validity.
+
+ If unsure, say N.
+
endmenu

config CC_HAS_RANDSTRUCT
--
2.41.0.694.ge786442a9b-goog

Kees Cook

unread,
Aug 14, 2023, 7:21:45 PM8/14/23
to Marco Elver, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
Should this go via -mm, the hardening tree, or something else? I'm happy
to carry it if no one else wants it?

-Kees

--
Kees Cook

Marco Elver

unread,
Aug 15, 2023, 2:29:47 PM8/15/23
to Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
v3 of this series is already in mm-unstable, and has had some -next
exposure (which was helpful in uncovering some additional issues).
Therefore, I think it's appropriate that it continues in mm and Andrew
picks up the latest v4 here.

Your official Ack would nevertheless be much appreciated!

Thanks,
-- Marco

Andrew Morton

unread,
Aug 15, 2023, 5:31:22 PM8/15/23
to Kees Cook, Marco Elver, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Mon, 14 Aug 2023 16:21:43 -0700 Kees Cook <kees...@chromium.org> wrote:

> Should this go via -mm, the hardening tree, or something else? I'm happy
> to carry it if no one else wants it?

Please do so.

Kees Cook

unread,
Aug 15, 2023, 5:58:38 PM8/15/23
to Andrew Morton, Marco Elver, Kees Cook, Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon, Arnd Bergmann, Greg Kroah-Hartman, Paul Moore, James Morris, Serge E. Hallyn, Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen, linux-ar...@lists.infradead.org, kvm...@lists.linux.dev, linux-...@vger.kernel.org, linux-secu...@vger.kernel.org, ll...@lists.linux.dev, Dmitry Vyukov, Alexander Potapenko, kasa...@googlegroups.com, linux-to...@vger.kernel.org
On Fri, 11 Aug 2023 17:18:38 +0200, Marco Elver wrote:
> [1]: "On X86-64 and AArch64 targets, this attribute changes the calling
> convention of a function. The preserve_most calling convention attempts
> to make the code in the caller as unintrusive as possible. This
> convention behaves identically to the C calling convention on how
> arguments and return values are passed, but it uses a different set of
> caller/callee-saved registers. This alleviates the burden of saving and
> recovering a large register set before and after the call in the caller.
> If the arguments are passed in callee-saved registers, then they will be
> preserved by the callee across the call. This doesn't apply for values
> returned in callee-saved registers.
>
> [...]

Applied to for-next/hardening, thanks!

[1/4] compiler_types: Introduce the Clang __preserve_most function attribute
https://git.kernel.org/kees/c/7a0fd5e16785
[2/4] list_debug: Introduce inline wrappers for debug checks
https://git.kernel.org/kees/c/b16c42c8fde8
[3/4] list: Introduce CONFIG_LIST_HARDENED
https://git.kernel.org/kees/c/aebc7b0d8d91
[4/4] hardening: Move BUG_ON_DATA_CORRUPTION to hardening options
https://git.kernel.org/kees/c/aa9f10d57056

Take care,

--
Kees Cook

Reply all
Reply to author
Forward
0 new messages