I’m using Fedora 22 and gcc 4.9.2 to run llvm 3.5.1 on an ARM Juno reference box (cortex A53 & A57).
I tried compiling some simple functions like dot product and axpy() into assembly to see if any of the SIMD instructions were generated (they weren’t).
Perhaps I’m missing some compiler flag to enable it.
Does anyone know what the status is for aarch64 generating SIMD instructions?
Anyone coordinating or leading this effort? (if there is one)
Which compiler flags have you been using ?
There is definitely support for AArch64’s SIMD instructions, but their use depends on what the vectorizers can do with your code.
So far, all I have tried is –O3 and with & without “-mcpu=cortex-a57”.
I’m new to LLVM so I’m not familiar with what optimization flags are available.
I tried poking around in the LLVM documentation but haven’t found a definitive list.
The clang man page is skimpy on details.
You can try something along the lines of “-03 - mcpu=cortex-a57 –mfpu=neon –ffast-math”
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c
dot.c:15:1: remark: loop not vectorized: value that could not be identified as
reduction is used outside the loop [-Rpass-analysis=loop-vectorize]
}
^
dot.c:15:1: note: could not determine the original source location for :0:0
I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a bunch of features but when I tried
adding “-mfpu=neon” or “-mattr=+neon”, clang complained that the option was unrecognized.
<dot.s><dot.c>
Better. With this test I see:
% clang -S -O3 -Rpass=loop-vectorize test.c
test.c:3:3: remark: vectorized loop (vectorization factor: 4, unrolling
interleave factor: 2) [-Rpass=loop-vectorize]
for(i = 0; i < 1000; i++) {
^
% clang -S -O3 -o test1.s –mcpu=cortex-a57 -Rpass=loop-vectorize test.c
test.c:3:3: remark: vectorized loop (vectorization factor: 4, unrolling
interleave factor: 4) [-Rpass=loop-vectorize]
for(i = 0; i < 1000; i++) {
^
Both use SIMD instructions.
Changing the code to use a variable for the loop limit works OK as well as changing int to float.
So I guess it is the return in dot.c that is causing a problem.
I will file a bug since I think the vectorizer should handle that case.
float foo(float *b, float *c) {
int i;
float v = 0.0;
for(i = 0; i < 1000; i++) {
v += b[i] + c[i];
}
return v;
}