[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

185 views
Skip to first unread message

Guillaume Chatelet via llvm-dev

unread,
Apr 26, 2019, 7:47:56 AM4/26/19
to llvm...@lists.llvm.org
TL;DR:
Defining memory functions in C / C++ results in a chicken and egg problem. Clang can mutate the code into semantically equivalent calls to libc. None of `-fno-builtin-memcpy`, `-ffreestanding` nor `-nostdlib` provide a satisfactory answer to the problem.

Goal
Create libc's memory functions (aka `memcpy`, `memset`, `memcmp`, ...) in C++ to benefit from compiler's knowledge and profile guided optimizations.

Current state
LLVM is allowed to replace a piece of code that looks like a memcpy with an IR intrinsic that implements the same semantic, namely `call void @llvm.memcpy.p0i8.p0i8.i64` (e.g. https://godbolt.org/z/0y1Yqh).

This is a problem when designing a libc's memory function as the compiler may choose to replace the implementation with a call to itself (e.g. https://godbolt.org/z/eg0p_E)

Using `-fno-builtin-memcpy` prevents the compiler from understanding that an expression has memory copy semantic, effectively removing `@llvm.memcpy` at the IR level : https://godbolt.org/z/lnCIIh. In this specific example, the vectorizer kicks in and the generated code is quite good. Unfortunately this is not always the case: https://godbolt.org/z/mHpAYe.

In addition `-fno-builtin-memcpy` prevents the compiler from understanding that a piece of code has the memory copy semantic but does not prevent the compiler from generating calls to libc's `memcpy`, for instance:
Using `__builtin_memcpy`: https://godbolt.org/z/O0sjIl
Passing big structs by value: https://godbolt.org/z/4BUDc0

In both cases, the generated `@llvm.memcpy` IR intrinsic is lowered into a libc `memcpy` call.

We would like to use `__builtin_memcpy` to communicate the semantic to the compiler but prevent it from generating calls to the libc.

One could argue that this is the purpose of `-ffreestanding` but the standard leaves a lot of freestanding requirements implementation defined ( see https://en.cppreference.com/w/cpp/freestanding ).

In practice, making sure that `-ffreestanding` never calls libc memory functions will probably do more harm than good. People using `-ffreestanding` are now expecting the compiler to call these functions, inlining bloat can be problematic for the embedded world ( see comments in https://reviews.llvm.org/D60719 )

Proposals
We envision two approaches: an attribute to prevent the compiler from synthesizing calls or a set of builtins to communicate the intent more precisely to the compiler.

  1. A function/module attribute to disable synthesis of calls

    1.1 A specific attribute to disable the synthesis of a single call
__attribute__((disable_call_synthesis("memcpy")))
Question: Is it possible to specify the attribute several times on a function to disable many calls?

    1.2 A specific attribute to disable synthesis of all libc calls
__attribute__((disable_libc_call_synthesis))
With this one we are losing precision and we may inline too much. There is also the question of what is considered a libc function, LLVM mainly defines target library calls.

    1.3 Stretch - a specific attribute to redirect a single synthesizable function.
This one would help explore the impact of replacing a synthesized function call with another function but is not strictly required to solve the problem at hand.
__attribute__((redirect_synthesized_calls("memcpy", "my_memcpy")))

  2. A set of builtins in clang to communicate the intent clearly

__builtin_memcpy_alwaysinline(...)
__builtin_memmove_alwaysinline(...)
__builtin_memset_alwaysinline(...)

To achieve this we may have to provide new IR builtins (e.g. `@llvm.alwaysinline_memcpy`) which can be a lot of work.

David Chisnall via llvm-dev

unread,
Apr 29, 2019, 4:48:30 AM4/29/19
to llvm...@lists.llvm.org
On 26/04/2019 12:47, Guillaume Chatelet via llvm-dev wrote:
>     1.2 A specific attribute to disable synthesis of all libc calls
> __attribute__((disable_libc_call_synthesis))
> With this one we are losing precision and we may inline too much. There
> is also the question of what is considered a libc function, LLVM mainly
> defines target library calls.

Target library is probably more relevant than libc. We have a number of
issues with libm on tier 2 platforms for FreeBSD without assembly fast
paths. This requires work-arounds for the fact that clang likes to say
'oh, this function seems to be calling X on the result of Y, and I know
that this can be more efficient if you replace that sequence with Z',
ignoring the fact that this case is an implementation of Z.

The same thing is true in Objective-C runtime implementations, where we
need to be careful to avoid LLVM performing optimisations on the ARC
functions that result in infinite recursion.

There are numerous cases of compiler-rt suffering from the same issue.

TL;DR: This is a really important problem for clang and your proposed
solution 1 looks like it is far more broadly applicable.

David
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Guillaume Chatelet via llvm-dev

unread,
Apr 30, 2019, 10:01:46 AM4/30/19
to David Chisnall, t.p.no...@gmail.com, llvm...@lists.llvm.org
Thx for the feedback David.

So we're heading toward a broader
> __attribute__((disable_call_synthesis))

David what do you think about the additional version that restrict the effect to a few named functions?
> e.g. __attribute__((disable_call_synthesis("memset", "memcpy", "sqrt")))

A warning should be issued if the arguments are not part of RuntimeLibcalls.def.

Also I'd like to get your take on whether it makes sense to have this attribute apply to functions only or at module level as well.

Thx,
Guillaume

Xinliang David Li via llvm-dev

unread,
Apr 30, 2019, 11:52:02 AM4/30/19
to Guillaume Chatelet, llvm-dev
On Tue, Apr 30, 2019 at 7:01 AM Guillaume Chatelet via llvm-dev <llvm...@lists.llvm.org> wrote:
Thx for the feedback David.

So we're heading toward a broader
> __attribute__((disable_call_synthesis))

David what do you think about the additional version that restrict the effect to a few named functions?
> e.g. __attribute__((disable_call_synthesis("memset", "memcpy", "sqrt")))


Nit: the attribute basically just states that there is no runtime support for these functions in this context, so why not directly name it so:

__attribute__((no_runtime_for("memcpy", "memset", "sqt")))

It still allows compiler to synthesize calls to builtins that are *guaranteed* to be inline expanded later (if that is available).

David

David Chisnall via llvm-dev

unread,
Apr 30, 2019, 12:28:34 PM4/30/19
to Guillaume Chatelet, t.p.no...@gmail.com, llvm...@lists.llvm.org
On 30/04/2019 15:01, Guillaume Chatelet wrote:
> David what do you think about the additional version that restrict the
> effect to a few named functions?
> > e.g. __attribute__((disable_call_synthesis("memset", "memcpy", "sqrt")))

I would find that exceptionally useful. For the libm example,
preventing LLVM from synthesising calls to other libm functions that may
call this one would be the fine-grained control that we want. For an
Objective-C runtime, being able to explicitly disable synthesising ARC
calls would be similarly useful (though I can no longer construct an
example where LLVM does the wrong thing, so maybe this is fixed already
in the ARC passes).

Guillaume Chatelet via llvm-dev

unread,
May 7, 2019, 5:49:28 AM5/7/19
to David Chisnall, llvm...@lists.llvm.org
A POC patch is available here for discussion
Reply all
Reply to author
Forward
0 new messages