[LLVMdev] ARM NEON intrinsics in clang

Stanislav Manilov

unread,

Sep 26, 2013, 7:22:25 AM9/26/13

to llv...@cs.uiuc.edu

Hello LLVM Devs,

I am starting my PhD on Automatic Parallelization for DSP and want to play with some ARM NEON intrinsics for a start. I spent the last three days trying to compile a version of LLVM that would allow me to compile sources that contain these intrinsics, but with no success.

In the process I found out that clang doesn't support NEON (as per http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html), but there has been at least some effort in adding it (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times and I gave up. After some discussions with colleagues (notably Alberto Magni, who added OpenCL support to clang some time ago http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-November/012293.html) my current plan is to implement the ARM NEON intrinsics as a shared library, using attributes as in:

typedef float float4 __attribute__((ext_vector_type(4)));

or if that doesn't work, I will try to implement the intrinsics in clang itself (not sure this is the best way of doing it).

Ideally, I want to be able to compile C code that includes ARM NEON intrinsics to other targets (TI processors, e.g.).

Suggestions, comments, and recommendations are very welcome.

Kind regards,

- Stan

--

Stan Manilov

1st year Ph.D. student

2013 Graduate in B.Sc. Computer Science and Mathematics

The University of Edinburgh

Renato Golin

unread,

Sep 26, 2013, 11:01:58 AM9/26/13

to Stanislav Manilov, LLVM Dev

On 26 September 2013 12:22, Stanislav Manilov <S.Z.M...@sms.ed.ac.uk> wrote:

In the process I found out that clang doesn't support NEON (as per http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html), but there has been at least some effort in adding it (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

Hi Stanislav,

LLVM does support NEON on ARM32 for a very long time. The commit you're referring is about AArch64, and yes, support for ARM64 NEON is patchy at the moment, but it's progressing quite quickly. What back-end are you trying to use? 32-bits or 64-bits?

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times and I gave up. After some discussions with colleagues (notably Alberto Magni, who added OpenCL support to clang some time ago http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-November/012293.html) my current plan is to implement the ARM NEON intrinsics as a shared library, using attributes as in:

LLVM 2.9 is really old, and llvm-gcc is discontinued, so I wouldn't even try that. If you don't want to use trunk, I recommend you to use LLVM with Clang 3.3 and see what you get.

typedef float float4 __attribute__((ext_vector_type(4)));
or if that doesn't work, I will try to implement the intrinsics in clang itself (not sure this is the best way of doing it).

Ideally, I want to be able to compile C code that includes ARM NEON intrinsics to other targets (TI processors, e.g.).

So, if I get it right, you have a file with ARM NEON intrinsics (the ones defined in arm_neon.h) and passed it through LLVM 2.9 with LLVM-GCC front-end and failed.

As of 2010, LLVM can compile every single NEON instruction, but you should use LLVM's own version of arm_neon.h, since the type definitions do vary between toolchains. In the end, they amount to the same thing on each toolchain, but their representation can be different.

I suggest you try with Clang 3.3 and if that fails, we'll start from there.

cheers,

--renato

Tim Northover

unread,

Sep 26, 2013, 11:07:31 AM9/26/13

to Stanislav Manilov, LLVM Developers Mailing List

Hi Stan,

> I spent the last three days trying to compile a version of LLVM that would
> allow me to compile sources that contain these intrinsics, but with no success.

Ok. This we can probably help with. Did you manage to build a version
of Clang (preferably from git/subversion)?

If so, you're probably having problems cross-compiling. Renato's
recently worked on some documentation in this area:
http://clang.llvm.org/docs/CrossCompilation.html.

But for a quick hack, you could try:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

("ffreestanding" will dodge any issues with your supporting toolchain,
but won't work for larger tests. You've got to actually solve the
issues before you start running code).

> In the process I found out that clang doesn't support NEON (as per
> http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html),

That's rather out of date, I'm afraid. 32-bit ARM does support both
NEON intrinsics and a reasonable amount of LLVM's own
auto-vectorisation (which is in its early stages, but we have some
kind of loop and SLP vectorisation going on).

> but there has been at least some effort in adding it
> (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

That patch is part of the effort to implement NEON (instructions and
intrinsics) on the 64-bit ARM architecture (AArch64).

> I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times
> and I gave up.

Yep. llvm-gcc is long dead, and LLVM 2.9 isn't much healthier.

> current plan is to implement the ARM NEON intrinsics as a shared library,
> using attributes as in:

That would probably be possible, but very bad from a performance
perspective. The whole point of NEON intrinsics is to speed up vector
code; if you've got the overhead of a call/return for each intrinsic
and completely fixed registers around even that you'll be in for a
world of pain.

> Ideally, I want to be able to compile C code that includes ARM NEON
> intrinsics to other targets (TI processors, e.g.).

Now that's going to be harder. LLVM itself doesn't support any TI
processors, for a start. And many of the NEON intrinsics (those with
more complex semantics) compile to LLVM IR with LLVM-level intrinsics,
which are only supported in the ARM backend.

Your shared library idea would work semantically, of course. But I'm
not sure what useful information could be extracted from it.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Stanislav Manilov

unread,

Sep 26, 2013, 12:45:02 PM9/26/13

to Tim Northover, LLVM Developers Mailing List

Hello Tim,

> I spent the last three days trying to compile a version of LLVM that would
> allow me to compile sources that contain these intrinsics, but with no success.

Ok. This we can probably help with. Did you manage to build a version
of Clang (preferably from git/subversion)?

Yes, I managed to build the latest (r191291) svn revision of LLVM + clang.

If so, you're probably having problems cross-compiling. Renato's
recently worked on some documentation in this area:
http://clang.llvm.org/docs/CrossCompilation.html.

But for a quick hack, you could try:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

("ffreestanding" will dodge any issues with your supporting toolchain,
but won't work for larger tests. You've got to actually solve the
issues before you start running code).

This works, which is great! My confusion came from not knowing the combination of flags for cross-compiling for ARM, and for getting "#error "NEON support not enabled"" when getting it wrong, which combined with the outdated knowledge of the internet lead me to believe that NEON is not supported.

I will read that cross compilation guide before asking further questions about this set of flags.

> In the process I found out that clang doesn't support NEON (as per
> http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html),

That's rather out of date, I'm afraid. 32-bit ARM does support both
NEON intrinsics and a reasonable amount of LLVM's own
auto-vectorisation (which is in its early stages, but we have some
kind of loop and SLP vectorisation going on).

> but there has been at least some effort in adding it
> (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

That patch is part of the effort to implement NEON (instructions and
intrinsics) on the 64-bit ARM architecture (AArch64).

Great! It seemed quite confusing that this is the main official information one gets when searching for "arm neon clang llvm", especially when parts of the documentation (http://clang.llvm.org/docs/LanguageExtensions.html#langext-vectors) claim that clang supports NEON. I am happy that it actually does.

> I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times
> and I gave up.

Yep. llvm-gcc is long dead, and LLVM 2.9 isn't much healthier.

I was thinking it was just me being a noob.

> current plan is to implement the ARM NEON intrinsics as a shared library,
> using attributes as in:

That would probably be possible, but very bad from a performance

perspective. The whole point of NEON intrinsics is to speed up vector
code; if you've got the overhead of a call/return for each intrinsic
and completely fixed registers around even that you'll be in for a
world of pain.

> Ideally, I want to be able to compile C code that includes ARM NEON
> intrinsics to other targets (TI processors, e.g.).

Now that's going to be harder. LLVM itself doesn't support any TI
processors, for a start. And many of the NEON intrinsics (those with
more complex semantics) compile to LLVM IR with LLVM-level intrinsics,
which are only supported in the ARM backend.

Your shared library idea would work semantically, of course. But I'm
not sure what useful information could be extracted from it.

That was my plan for adding NEON support in clang, which as I know now has been thankfully done by someone who is more aware of how the platform works. The TI processors were a bad example, PowerPC is maybe a better one, as I just checked and there is a backend in LLVM for such processors. My current goal is exactly to add support for such LLVM-level intrinsics to a non-ARM backend, in order to make ARM-specific C code (one that contains NEON intrinsics) compilable for another target.

Thanks a lot for your time and help. I will try to setup my cross compilation toolchain and ask again if I get seriously stuck.

Cheers,

- Stan

--

Stan Manilov

1st year Ph.D. student

2013 Graduate in BSc Computer Science and Mathematics

The University of Edinburgh

Renato Golin

unread,

Sep 26, 2013, 1:07:01 PM9/26/13

to Stanislav Manilov, LLVM Dev

On 26 September 2013 17:52, Stanislav Manilov <stanisla...@gmail.com> wrote:

To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit.

Cortex-A9 is still 32-bits, so you'll have all support you need. ;)

however it doesn't if I remove the -ffreestanding flag. I need to figure this out next.

Can you at least assemble the file to .s? You won't be able to compile Tim's example to executable because you don't have a main in there.

cheers,

--renato

Renato Golin

unread,

Sep 26, 2013, 2:47:12 PM9/26/13

to Stanislav Manilov, LLVM Dev

On 26 September 2013 18:13, Stanislav Manilov <stanisla...@gmail.com> wrote:

which I suspect has something to do with the fact that in /usr/include I have a folder called x86_64-linux-gnu but not one called arm-linux-gnueabihf. Am I even remotely right?

Yes, you are, and the docs should (hopefully) have all the information you need to get past that, and other common problems. ;)

cheers,

--renato

Stanislav Manilov

unread,

Sep 26, 2013, 12:52:09 PM9/26/13

to Renato Golin, LLVM Dev

Hello Renato,

It turned out I just didn't do the cross-compilation correctly, and Tim Northover already pointed me to a guide you have written on it (http://clang.llvm.org/docs/CrossCompilation.html), so I will read that before continuing with my efforts.

To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit.

I am much happy to compile the latest code and am successfully doing so. I tried to compile release 2.9, as I (wrongly) believed that I need llvm-gcc in order to compile NEON code on LLVM.

Tim's minimalist example worked on my clang3.4:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

however it doesn't if I remove the -ffreestanding flag. I need to figure this out next.

Thank you for your help.

Cheers,

- Stan

On Thu, Sep 26, 2013 at 4:01 PM, Renato Golin <renato...@linaro.org> wrote:

On 26 September 2013 12:22, Stanislav Manilov <S.Z.M...@sms.ed.ac.uk> wrote:

In the process I found out that clang doesn't support NEON (as per http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html), but there has been at least some effort in adding it (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

Hi Stanislav,

LLVM does support NEON on ARM32 for a very long time. The commit you're referring is about AArch64, and yes, support for ARM64 NEON is patchy at the moment, but it's progressing quite quickly. What back-end are you trying to use? 32-bits or 64-bits?

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times and I gave up. After some discussions with colleagues (notably Alberto Magni, who added OpenCL support to clang some time ago http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-November/012293.html) my current plan is to implement the ARM NEON intrinsics as a shared library, using attributes as in:

LLVM 2.9 is really old, and llvm-gcc is discontinued, so I wouldn't even try that. If you don't want to use trunk, I recommend you to use LLVM with Clang 3.3 and see what you get.

typedef float float4 __attribute__((ext_vector_type(4)));

or if that doesn't work, I will try to implement the intrinsics in clang itself (not sure this is the best way of doing it).

Ideally, I want to be able to compile C code that includes ARM NEON intrinsics to other targets (TI processors, e.g.).

Stanislav Manilov

unread,

Sep 26, 2013, 1:13:31 PM9/26/13

to Renato Golin, LLVM Dev

To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit.

Cortex-A9 is still 32-bits, so you'll have all support you need. ;)

Ah, Okay, embarrassing...

however it doesn't if I remove the -ffreestanding flag. I need to figure this out next.

Can you at least assemble the file to .s? You won't be able to compile Tim's example to executable because you don't have a main in there.

I can compile to assembly with the -ffreestanding flag on, but without it I get:

In file included from neon.c:1:

In file included from /home/stan/Fortress/Dev/llvm/build-trunk/Debug+Asserts/bin/../lib/clang/3.4/include/arm_neon.h:31:

In file included from /home/stan/Fortress/Dev/llvm/build-trunk/Debug+Asserts/bin/../lib/clang/3.4/include/stdint.h:64:

In file included from /usr/include/stdint.h:25:

In file included from /usr/include/features.h:341:

/usr/include/stdc-predef.h:30:10: fatal error: 'bits/predefs.h' file not found

#include <bits/predefs.h>

which I suspect has something to do with the fact that in /usr/include I have a folder called x86_64-linux-gnu but not one called arm-linux-gnueabihf. Am I even remotely right?