[PATCH] x86: Alias memset to __builtin_memset.

13 views
Skip to first unread message

Clement Courbet

unread,
Mar 23, 2020, 7:43:06 AM3/23/20
to Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Recent compilers know the meaning of builtins (`memset`,
`memcpy`, ...) and can replace calls by inline code when
deemed better. For example, `memset(p, 0, 4)` will be lowered
to a four-byte zero store.

When using -ffreestanding (this is the case e.g. building on
clang), these optimizations are disabled. This means that **all**
memsets, including those with small, constant sizes, will result
in an actual call to memset.

We have identified several spots where we have high CPU usage
because of this. For example, a single one of these memsets is
responsible for about 0.3% of our total CPU usage in the kernel.

Aliasing `memset` to `__builtin_memset` allows the compiler to
perform this optimization even when -ffreestanding is used.
There is no change when -ffreestanding is not used.

Below is a diff (clang) for `update_sg_lb_stats()`, which
includes the aforementionned hot memset:
memset(sgs, 0, sizeof(*sgs));

Diff:
movq %rsi, %rbx ~~~> movq $0x0, 0x40(%r8)
movq %rdi, %r15 ~~~> movq $0x0, 0x38(%r8)
movl $0x48, %edx ~~~> movq $0x0, 0x30(%r8)
movq %r8, %rdi ~~~> movq $0x0, 0x28(%r8)
xorl %esi, %esi ~~~> movq $0x0, 0x20(%r8)
callq <memset> ~~~> movq $0x0, 0x18(%r8)
~~~> movq $0x0, 0x10(%r8)
~~~> movq $0x0, 0x8(%r8)
~~~> movq $0x0, (%r8)

In terms of code size, this grows the clang-built kernel a
bit (+0.022%):
440285512 vmlinux.clang.after
440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <cou...@google.com>
---
arch/x86/include/asm/string_64.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..7073c25aa4a3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
void *memset(void *s, int c, size_t n);
void *__memset(void *s, int c, size_t n);

+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
#define __HAVE_ARCH_MEMSET16
static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
{
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
#undef memcpy
#define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
#define memset(s, c, n) __memset(s, c, n)

#ifndef __NO_FORTIFY
--
2.25.1.696.g5e7596f4ac-goog

Nathan Chancellor

unread,
Mar 23, 2020, 11:47:24 AM3/23/20
to Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
This ifdef is unnecessary, this will always be true because the minimum
supported GCC version is 4.6 and clang pretends it is 4.2.1. It appears
the Intel compiler will pretend to be whatever whatever GCC version the
host is using (no idea if it is still used by anyone or truly supported
but still worth mentioning) so it would still always be true because of
the GCC 4.6 requirement.

I cannot comment on the validity of the patch even though the reasoning
seems sound to me.

Cheers,
Nathan

kbuild test robot

unread,
Mar 23, 2020, 4:16:54 PM3/23/20
to Clement Courbet, kbuil...@lists.01.org, clang-bu...@googlegroups.com
Hi Clement,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on v5.6-rc7 next-20200323]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Clement-Courbet/x86-Alias-memset-to-__builtin_memset/20200324-004007
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 248ed51048c40d36728e70914e38bffd7821da57
config: x86_64-randconfig-h003-20200323 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 006244152d6c7dd6a390ff89b236cc7801834b46)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER=clang make.cross ARCH=x86_64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <l...@intel.com>

All errors (new ones prefixed by >>):

In file included from scripts/mod/devicetable-offsets.c:3:
In file included from include/linux/mod_devicetable.h:13:
In file included from include/linux/uuid.h:12:
>> include/linux/string.h:358:24: error: definition of builtin function '__builtin_memset'
__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
^
arch/x86/include/asm/string_64.h:27:29: note: expanded from macro 'memset'
#define memset(s, c, count) __builtin_memset(s, c, count)
^
1 error generated.
make[2]: *** [scripts/Makefile.build:99: scripts/mod/devicetable-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1112: prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:179: sub-make] Error 2
27 real 8 user 13 sys 79.89% cpu make prepare

vim +/__builtin_memset +358 include/linux/string.h

6974f0c4555e28 Daniel Micay 2017-07-12 357
6974f0c4555e28 Daniel Micay 2017-07-12 @358 __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
6974f0c4555e28 Daniel Micay 2017-07-12 359 {
6974f0c4555e28 Daniel Micay 2017-07-12 360 size_t p_size = __builtin_object_size(p, 0);
6974f0c4555e28 Daniel Micay 2017-07-12 361 if (__builtin_constant_p(size) && p_size < size)
6974f0c4555e28 Daniel Micay 2017-07-12 362 __write_overflow();
6974f0c4555e28 Daniel Micay 2017-07-12 363 if (p_size < size)
6974f0c4555e28 Daniel Micay 2017-07-12 364 fortify_panic(__func__);
6974f0c4555e28 Daniel Micay 2017-07-12 365 return __builtin_memset(p, c, size);
6974f0c4555e28 Daniel Micay 2017-07-12 366 }
6974f0c4555e28 Daniel Micay 2017-07-12 367

:::::: The code at line 358 was first introduced by commit
:::::: 6974f0c4555e285ab217cee58b6e874f776ff409 include/linux/string.h: add the option of fortified string.h functions

:::::: TO: Daniel Micay <danie...@gmail.com>
:::::: CC: Linus Torvalds <torv...@linux-foundation.org>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuil...@lists.01.org
.config.gz

Kees Cook

unread,
Mar 23, 2020, 5:59:40 PM3/23/20
to Clement Courbet, kbuild test robot, kbuil...@lists.01.org, clang-bu...@googlegroups.com
This needs to be within #ifndef CONFIG_FORTIFY_SOURCE. FORTIFY already
does this. Additionally, shouldn't this just be done universally,
instead of in x86-specific code?

-Kees
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-li...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/202003240432.AGknaLzA%25lkp%40intel.com.



--
Kees Cook

Nick Desaulniers

unread,
Mar 23, 2020, 9:22:20 PM3/23/20
to Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), LKML, clang-built-linux
On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
>
> Recent compilers know the meaning of builtins (`memset`,
> `memcpy`, ...) and can replace calls by inline code when
> deemed better. For example, `memset(p, 0, 4)` will be lowered
> to a four-byte zero store.
>
> When using -ffreestanding (this is the case e.g. building on
> clang), these optimizations are disabled. This means that **all**
> memsets, including those with small, constant sizes, will result
> in an actual call to memset.

Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile
right)? Who's adding it for 64b?

arch/x86/Makefile has a comment:
88 # temporary until string.h is fixed
89 KBUILD_CFLAGS += -ffreestanding
Did you look into fixing that?

>
> We have identified several spots where we have high CPU usage
> because of this. For example, a single one of these memsets is
> responsible for about 0.3% of our total CPU usage in the kernel.
>
> Aliasing `memset` to `__builtin_memset` allows the compiler to
> perform this optimization even when -ffreestanding is used.
> There is no change when -ffreestanding is not used.
>
> Below is a diff (clang) for `update_sg_lb_stats()`, which
> includes the aforementionned hot memset:
> memset(sgs, 0, sizeof(*sgs));
>
> Diff:
> movq %rsi, %rbx ~~~> movq $0x0, 0x40(%r8)
> movq %rdi, %r15 ~~~> movq $0x0, 0x38(%r8)
> movl $0x48, %edx ~~~> movq $0x0, 0x30(%r8)
> movq %r8, %rdi ~~~> movq $0x0, 0x28(%r8)
> xorl %esi, %esi ~~~> movq $0x0, 0x20(%r8)
> callq <memset> ~~~> movq $0x0, 0x18(%r8)
> ~~~> movq $0x0, 0x10(%r8)
> ~~~> movq $0x0, 0x8(%r8)
> ~~~> movq $0x0, (%r8)
>
> In terms of code size, this grows the clang-built kernel a
> bit (+0.022%):
> 440285512 vmlinux.clang.after
> 440383608 vmlinux.clang.before

The before number looks bigger? Did it shrink in size, or was the
above mislabeled?
--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Mar 23, 2020, 9:26:47 PM3/23/20
to Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), LKML, clang-built-linux
Further, `make CC=clang -j71 kernel/sched/fair.o V=1` doesn't show
`-ffreestanding` being used. Any idea where it's coming from in your
build? Maybe a local modification to be removed?
--
Thanks,
~Nick Desaulniers

Clement Courbet

unread,
Mar 24, 2020, 10:06:45 AM3/24/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Clement Courbet, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Thanks for the comments. Answers below.

> This ifdef is unnecessary
> This needs to be within #ifndef CONFIG_FORTIFY_SOURCE

Thanks, fixed.

> shouldn't this just be done universally ?

It looks like every architecture does its own magic with memory
functions. I'm not very familiar with how the linux kernel is
organized, do you have a pointer for where this would go if enabled
globally ?

> Who's adding it for 64b?
> Any idea where it's coming from in your
> build? Maybe a local modification to be removed?

Actually this is from our local build configuration. Apparently this
is needed to build on some architectures that redefine common
symbols, but the authors seemed to feel strongly that this should be
on for all architectures. I've reached out to the authors for an
extended rationale.
On the other hand I think that there is no reason to prevent people
from building with freestanding if we can easily allow them to.



Clement Courbet

unread,
Mar 24, 2020, 10:07:51 AM3/24/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
changes in v2:
- Removed ifdef(GNUC >= 4).
- Disabled change when CONFIG_FORTIFY_SOURCE is set.
---
arch/x86/include/asm/string_64.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
void *memset(void *s, int c, size_t n);
void *__memset(void *s, int c, size_t n);

+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)

Joe Perches

unread,
Mar 24, 2020, 11:09:57 AM3/24/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
On Tue, 2020-03-24 at 15:07 +0100, Clement Courbet wrote:
> Recent compilers know the meaning of builtins (`memset`,
> `memcpy`, ...) and can replace calls by inline code when
> deemed better. For example, `memset(p, 0, 4)` will be lowered
> to a four-byte zero store.
>
> When using -ffreestanding (this is the case e.g. building on
> clang), these optimizations are disabled. This means that **all**
> memsets, including those with small, constant sizes, will result
> in an actual call to memset.
[]
> In terms of code size, this grows the clang-built kernel a
> bit (+0.022%):
> 440285512 vmlinux.clang.after
> 440383608 vmlinux.clang.before

This shows the kernel getting smaller not larger.
Is this still reversed or is this correct?


Clement Courbet

unread,
Mar 24, 2020, 11:59:25 AM3/24/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
In terms of code size, this shrins the clang-built kernel a
bit (-0.022%):
440383608 vmlinux.clang.before
440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <cou...@google.com>

---

changes in v2:
- Removed ifdef(GNUC >= 4).
- Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
- Fixed commit message: the kernel shrinks in size.
---
arch/x86/include/asm/string_64.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
void *memset(void *s, int c, size_t n);
void *__memset(void *s, int c, size_t n);

+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)

Clement Courbet

unread,
Mar 24, 2020, 12:02:22 PM3/24/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Clement Courbet, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
> This shows the kernel getting smaller not larger.
> Is this still reversed or is this correct?

Yes, sorry. This was wrong. The kernel actully shrinks. Fixed.


Nick Desaulniers

unread,
Mar 24, 2020, 1:30:00 PM3/24/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), LKML, clang-built-linux
On Tue, Mar 24, 2020 at 7:06 AM Clement Courbet <cou...@google.com> wrote:
>
> Thanks for the comments. Answers below.
>
> > This ifdef is unnecessary
> > This needs to be within #ifndef CONFIG_FORTIFY_SOURCE
>
> Thanks, fixed.
>
> > shouldn't this just be done universally ?
>
> It looks like every architecture does its own magic with memory
> functions. I'm not very familiar with how the linux kernel is
> organized, do you have a pointer for where this would go if enabled
> globally ?
>
> > Who's adding it for 64b?
> > Any idea where it's coming from in your
> > build? Maybe a local modification to be removed?
>
> Actually this is from our local build configuration. Apparently this

Not sure we should modify this for someone's downstream tree to work
around an issue they introduced. If you want to file a bug
internally, I'd be happy to comment on it.

Does __builtin_memset detect support for `rep stosb`, then patch the
kernel to always use it or not? The kernel is limited in that we use
-mno-sse and friends to avoid saving/restoring vector registers on
context switch unless kernel_fpu_{begin|end}() is called, which the
compiler doesn't insert for memcpy's.

Did you verify what this change does for other compilers?

> is needed to build on some architectures that redefine common
> symbols, but the authors seemed to feel strongly that this should be

Sounds like a linkage error or issue with symbol visibility; I don't
see how -ffreestanding should have any bearing on that.

> on for all architectures. I've reached out to the authors for an
> extended rationale.
> On the other hand I think that there is no reason to prevent people
> from building with freestanding if we can easily allow them to.

I disagree. The codegen in the kernel is very tricky to get right;
it's somewhat an embedded system, yet provides many symbols that would
have been provided by libc, yet it doesn't use vector operations for
the general case. Adding -ffreestanding to optimize one hot memset in
one function is using a really big hammer to solve the wrong problem.
--
Thanks,
~Nick Desaulniers

Clement Courbet

unread,
Mar 25, 2020, 3:59:40 AM3/25/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Clement Courbet, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
> Not sure we should modify this for someone's downstream tree to work
> around an issue they introduced. If you want to file a bug
> internally, I'd be happy to comment on it.

I don't have a strong opinion on that. I don't know the policy of the
linux kernel w.r.t. this. There is an internal bug for this, where
kernel maintainers suggested I upstream this patch.

> Does __builtin_memset detect support for `rep stosb`, then patch the
> kernel to always use it or not?

__builtin_memset just allows the compiler to recognize that this has the
semantics of a memset, exactly like it happens when -freestanding is not
specified.

In terms of what compilers do when expanding the memset, it depends on
the size.
gcc or clang obviously do not generate vector code when -no-sse is
specified, as this would break promises.

clang inlines stores for small sizes and switches to memset as the size
increases: https://godbolt.org/z/_X16xt

gcc inlines stores for tiny sizes, then switches to repstos for larger
sizes: https://godbolt.org/z/m-G233 This behaviour is additionally
configurable through command line flags: https://godbolt.org/z/wsoJpQ

> Did you verify what this change does for other compilers?

Are there other compilers are used to build the kernel on x86 ?

icc does the same as gcc and clang: https://godbolt.org/z/yCju_D

> yet it doesn't use vector operations for the general case

I'm not sure how vector operations relate to freestanding, or this change.

> Adding -ffreestanding to optimize one hot memset in
> one function is using a really big hammer to solve the wrong
> problem.

-ffreestanding was not added to optimize anything. It was already there.
If anything -ffreestanding actually pessimizes a lot of the code
generation, as it prevents the compiler from recognizing the semantics
of common primitives. This is what this change is trying to fix.
Removing -ffreestanding from the options is another (easier?) way to fix
the problem. This change is a stab at accomodating both those who want
freestanding and those who want performance.

The hot memset I mentionned was just the hottest one. But as you can imagine
there are many constant-size memsets spread across many functions, some of
which are also very hot, the others constituting a long tail which is not
negligible.


Bernd Petrovitsch

unread,
Mar 25, 2020, 7:33:49 AM3/25/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Hi all!

Sry for being late at the party:

On 24/03/2020 16:59, Clement Courbet wrote:
[...]
> ---
> arch/x86/include/asm/string_64.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..9cfce0a840a4 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
> void *memset(void *s, int c, size_t n);
> void *__memset(void *s, int c, size_t n);
>
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if !defined(CONFIG_FORTIFY_SOURCE)
> +#define memset(s, c, count) __builtin_memset(s, c, count)
To be on the safe side, the usual way to write macros is like

#define memset(s, c, count) __builtin_memset((s), (c), (count))

as no one know what is passed as parameter to memset() and the extra
pair of parentheses don't hurt.

And similar below (and I fear there are more places).

Or did I miss something in the Kernel?

> +#endif
> +
> #define __HAVE_ARCH_MEMSET16
> static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
> {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
> #undef memcpy
> #define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memcpy(dst, src, len) __memcpy((dst), (src), (len))
> #define memmove(dst, src, len) __memmove(dst, src, len)
#define memmove(dst, src, len) __memmove((dst), (src), (len))
> +#undef memset
> #define memset(s, c, n) __memset(s, c, n)
#define memset(s, c, n) __memset((s), (c), (n))
>
> #ifndef __NO_FORTIFY
>

MfG,
Bernd
--
Bernd Petrovitsch Email : be...@petrovitsch.priv.at
LUGA : http://www.luga.at

Clement Courbet

unread,
Mar 26, 2020, 8:26:05 AM3/26/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Bernd Petrovitsch, Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
In terms of code size, this shrins the clang-built kernel a
bit (-0.022%):
440383608 vmlinux.clang.before
440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <cou...@google.com>

---

changes in v2:
- Removed ifdef(GNUC >= 4).
- Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
- Fixed commit message: the kernel shrinks in size.

changes in v4:
- Properly parenthesize the macro.
---
arch/x86/include/asm/string_64.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..1bfa825e9ad3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
void *memset(void *s, int c, size_t n);
void *__memset(void *s, int c, size_t n);

+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset((s), (c), (count))

Clement Courbet

unread,
Mar 26, 2020, 8:38:51 AM3/26/20
to Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Segher Boessenkool, Greg Kroah-Hartman, Allison Randal, Clement Courbet, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
I discussed with the original authors who added freestanding to our
build. It turns out that it was added globally but this was just to
to workaround powerpc not compiling under clang, but they felt the
fix was appropriate globally.

Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
advises against freestanding. Also, I've did some research and
discovered that the original reason for using freestanding for
powerpc has been fixed here:
https://lore.kernel.org/linuxppc-dev/20191119045712.396...@gmail.com/

I'm going to remove -ffreestanding from downstream, so we don't really need
this anymore, sorry for waisting people's time.

I wonder if the freestanding fix from the aforementioned patch is really needed
though. I think that clang is actually right to point out the issue.
I don't see any reason why setjmp()/longjmp() are declared as taking longs
rather than ints. The implementation looks like it only ever propagates the
value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
with integer parameters. But I'm not a PowerPC expert, so I might
be misreading the code.


So it seems that we could just remove freestanding altogether and rewrite the
code to:

diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index 279d03a1eec6..7941ae68fe21 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -12,7 +12,9 @@

#define JMP_BUF_LEN 23
-extern long setjmp(long *);
-extern void longjmp(long *, long);
+typedef long * jmp_buf;
+
+extern int setjmp(jmp_buf);
+extern void longjmp(jmp_buf, int);

I'm happy to send a patch for this, and get rid of more -ffreestanding.
Opinions ?

Nick Desaulniers

unread,
Mar 26, 2020, 1:18:59 PM3/26/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Joe Perches, Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Segher Boessenkool, Greg Kroah-Hartman, Allison Randal, linuxppc-dev, LKML, clang-built-linux
Yes, I think the above diff and additionally cleaning up
-ffreestanding from arch/powerpc/kernel/Makefile and
arch/powerpc/xmon/Makefile would be a much better approach. If that's
good enough to fix the warning, I kind of can't believe we didn't spot
that in the original code review!

Actually, the god awful warning was:
./arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *) __attribute__((returns_twice));
^
So jmp_buf was missing, wasn't used, but also the long vs int confusion.

I tested the above diff, all calls to setjmp under arch/powerpc/ just
compare the return value against 0. Callers of longjmp just pass 1
for the final parameter. So the above changes should be no functional
change.
--
Thanks,
~Nick Desaulniers

Michael Ellerman

unread,
Mar 27, 2020, 12:06:21 AM3/27/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Segher Boessenkool, Greg Kroah-Hartman, Allison Randal, Clement Courbet, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
If it works then it looks like a much better fix than using -ffreestanding.

Please submit a patch with a change log etc. and I'd be happy to merge
it.

cheers

Segher Boessenkool

unread,
Mar 27, 2020, 1:13:03 PM3/27/20
to Clement Courbet, Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches, Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Greg Kroah-Hartman, Allison Randal, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Hi!

On Thu, Mar 26, 2020 at 01:38:39PM +0100, Clement Courbet wrote:
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
> #define JMP_BUF_LEN 23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Pedantically, jmp_buf should be an array type. But, this will probably
work fine, and it certainly is better than what we had before.

You could do
typedef long jmp_buf[JMP_BUF_LEN];
perhaps?

Thanks,


Segher
Reply all
Reply to author
Forward
0 new messages