On 2015-10-22 at 08:02 "'Davide Libenzi' via Akaros"
<
aka...@googlegroups.com> wrote:
> A solution could be to have "if gcc use builtins, otherwise fall back
> to open coded".
> The if-gcc is actually going to be, in reality, wider than that,
> because many compilers support most of them.
>
> Note: Many GCC string/memory functions have a preamble that aligns
> pointers, and then use the SSE instructions with xmm registers.
This is a problem for the kernel and for some user code (vcore context
code). In general, we don't do anything that could touch the
FP/MMX/XMM registers.
> This is a big win for large memory blocks, but not so much for
> blocks smaller than 32 bytes (cycles spent in align code is not
> countered by savings). We ended up, at my previous job, having two
> sets of functions. One using glibc ones (which mapped to builtins),
> and open coded ones (which we used when we knew the blocks we were
> working on were relatively short). For things like memcpy() or
> memset() on big blocks, the win in using SSE is pretty big.
For kernel code, if we know we're memcpy/memsetting large blocks,
perhaps we could save and restore the FPU state around a special large
memcpy/memset function. The overhead of the save/restore would adjust
the break-even point where we switch from one style (the current
version) to the SSE style.
Barret