Re: [PATCH v3 13/14] RISC-V: add infrastructure to allow different str* implementations

3 views
Skip to first unread message

Ard Biesheuvel

unread,
Dec 2, 2022, 3:20:00 AM12/2/22
to Palmer Dabbelt, Nick Desaulniers, clang-built-linux, he...@sntech.de, Ard Biesheuvel, linux...@lists.infradead.org, christoph...@vrull.eu, prabhaka...@gmail.com, Conor Dooley, philipp...@vrull.eu, ajo...@ventanamicro.com, emil.renne...@canonical.com, linu...@vger.kernel.org, heiko.s...@vrull.eu, Conor Dooley
On Fri, 2 Dec 2022 at 05:08, Palmer Dabbelt <pal...@dabbelt.com> wrote:
>
> On Wed, 30 Nov 2022 14:56:13 PST (-0800), he...@sntech.de wrote:
> > From: Heiko Stuebner <heiko.s...@vrull.eu>
> >
> > Depending on supported extensions on specific RISC-V cores,
> > optimized str* functions might make sense.
> >
> > This adds basic infrastructure to allow patching the function calls
> > via alternatives later on.
> >
> > The main idea is to have the core str* functions be inline functions
> > which then call the most optimized variant and this call then be
> > replaced via alternatives.
> >
> > The big advantage is that we don't need additional calls.
> > Though we need to duplicate the generic functions as the main code
> > expects either itself or the architecture to provide the str* functions.
> >
> > The added *_generic functions are done in assembler (taken from
> > disassembling the main-kernel functions for now) to allow us to control
> > the used registers.
> >
> > Reviewed-by: Conor Dooley <conor....@microchip.com>
> > Signed-off-by: Heiko Stuebner <heiko.s...@vrull.eu>
> > ---
> > arch/riscv/Makefile | 3 ++
> > arch/riscv/include/asm/string.h | 66 +++++++++++++++++++++++++++++++++
> > arch/riscv/kernel/image-vars.h | 6 +--
> > arch/riscv/lib/Makefile | 3 ++
> > arch/riscv/lib/strcmp.S | 38 +++++++++++++++++++
> > arch/riscv/lib/strlen.S | 29 +++++++++++++++
> > arch/riscv/lib/strncmp.S | 41 ++++++++++++++++++++
> > 7 files changed, 183 insertions(+), 3 deletions(-)
> > create mode 100644 arch/riscv/lib/strcmp.S
> > create mode 100644 arch/riscv/lib/strlen.S
> > create mode 100644 arch/riscv/lib/strncmp.S
> >
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index 0d13b597cb55..581e4370c2a6 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -80,6 +80,9 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
> > KBUILD_CFLAGS += -fno-omit-frame-pointer
> > endif
> >
> > +# strchr is special case, as gcc might want to call its own strlen from there
> > +KBUILD_CFLAGS += -fno-builtin-strlen -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strchr
>
> I was poking around Ard's comment on that -DRISCV_EFI to try and figure
> out what it was doing, but I think this is the bigger issue. I haven't
> benchmarked anything, but my guess is that turning off support for these
> builtin routines will outweigh the faster outline implementations of
> these routines. I don't have any benchmarks to prove that, but in
> general compilers are pretty smart about handling these builtin routines
> in the common cases and deferring that to runtime is probably the wrong
> option.
>

Indeed. Case in point:

riscv64-linux-gnugcc -O -S -o - -xc - <<<"int foo() { return strlen(\"bar\"); }"

gives me

li a0,3
ret

whereas this

riscv64-linux-gnu-gcc -fno-builtin-strlen -O -S -o - -xc - <<<"int
foo() { return strlen(\"bar\"); }"

gives me

.LC0:
.string "bar"
.text
.align 1
.globl foo
.type foo, @function
foo:
addi sp,sp,-16
sd ra,8(sp)
lla a0,.LC0
call strlen@plt
ld ra,8(sp)
addi sp,sp,16
jr ra

Other str* and mem* routines are optimized in similar ways when
dealing with small sizes or compile time constants.

Also, I'd recommend avoiding redefining these prototypes to static
inline as it deviates from the official prototypes in the C library.

> I haven't looked all that closely at this patch set. Is there some
> reason it's necessary to disable the builtin handling? If not then my
> guess is it's better to leave that enabled unless some benchmarks show
> otherwise (and I don't know of any Zb* hardware to test against).
>

Why not have a single generic version of each in the binary with a
patchable NOP at the start, and patch that to an optimized version
when available? That way. all of the early code can remain as is.
Reply all
Reply to author
Forward
0 new messages