Bonita Montero <
Bonita....@gmail.com> wrote:
>> Note that if the array contains, for example, integers or floating point
>> values, and the function in question performs some arithmetic on those
>> values, the array having a fixed known size helps the compiler quite a
>> lot in optimization (more precizely SIMD vectorization).
>
> LOL, when you believe in auto-vectorization you also believe in santa
> claus. Auto-vectorization only works for some predefined code-patterns
> the compiler knows. In most cases you would have to do the vectorization
> yourself through SIMD-intrinsics.
Have you even looked at what modern compilers like recent versions of
gcc and clang produce using automatic vectorization? Because I have.
It's not always perfect, but it's miles better than no vectorization
of any kind. Often they are able to do surprisingly complex automatic
vectorization of relatively complex structures.
Consider, for example, this kind of code:
//-------------------------------------------------------------------
#include <array>
struct Point4D
{
float x, y, z, w;
};
using Point4DVec = std::array<Point4D, 4>;
Point4DVec foo(const Point4DVec& v1, const Point4DVec& v2, float f)
{
Point4DVec result;
for(std::size_t i = 0; i < result.size(); ++i)
{
result[i].x = (v1[i].x - v2[i].x) * f;
result[i].y = (v1[i].y - v2[i].y) * f;
result[i].z = (v1[i].z - v2[i].z) * f;
result[i].w = (v1[i].w - v2[i].w) * f;
}
return result;
}
//-------------------------------------------------------------------
gcc compiles that to:
vbroadcastss ymm2, xmm0
vmovups ymm3, YMMWORD PTR [rsi]
vmovups ymm0, YMMWORD PTR [rsi+32]
vsubps ymm1, ymm3, YMMWORD PTR [rdx]
vsubps ymm0, ymm0, YMMWORD PTR [rdx+32]
mov rax, rdi
vmulps ymm1, ymm1, ymm2
vmulps ymm0, ymm0, ymm2
vmovups YMMWORD PTR [rdi], ymm1
vmovups YMMWORD PTR [rdi+32], ymm0
vzeroupper
ret