Is SIMD support forthcoming?

654 views
Skip to first unread message

gauta...@gmail.com

unread,
Sep 21, 2014, 10:30:58 PM9/21/14
to golan...@googlegroups.com
Hi guys,

I know this topic has been brought up before, but I have some new questions whose answers I haven't been able to find. Does Go have any plans to add SIMD support? There are several ways one could go about doing this - probably the best way would be to add a separate package that provides specialized SIMD routines. This is what D has done (core.simd), and what C has done (for example, emmintrin.h in Clang), and what Rust is planning to do as well . Another way would be to have really good autovectorization support. This approach is really hard - one essentially has to build a "sufficiently smart compiler" but it is possible, as shown by the Intel C/C++ compilers. For all I know Go already has autovectorization enabled - if so, how does one hint to the compiler that a routine can be vectorized? It seems to me that vectorization is probably one of the causes why Go performance still lags behind C performance by a factor of 1.5 - 2. Anyways, thanks in advance for any responses.

Ian Lance Taylor

unread,
Sep 22, 2014, 11:11:01 AM9/22/14
to gauta...@gmail.com, golang-nuts
The gccgo compiler has auto-vectorization support via the GCC
optimizers, of course, but I don't know how easy it would be to get it
to be generated for Go. The slice bounds checks might get in the way.
It might be necessary for the frontend to hoist bounds checks out of
loops before passing it on to the middle-end.

I'm not aware of anybody looking at auto-vectorization in the gc
compiler.

One problem with separate packages with specialized SIMD routines is
that they are inherently processor-specific. So I'm skeptical about
putting any such package in the standard library. Anybody could write
such a package accessible via go get, of course.

Also, the gc compiler is currently unable to inline functions written
in assembler code, so a significant amount, perhaps all, of the
benefit will be lost. This is a solvable problem but I'm not aware of
anybody working on that, either.

Ian

Brendan Tracey

unread,
Sep 22, 2014, 2:19:16 PM9/22/14
to golan...@googlegroups.com, gauta...@gmail.com
I'm not a go developer, but this is my take on the situation.

The Go team is not opposed to having vectorization, but it is not on the top of their priority list. It has been implied in the past that the compiler moving to Go will enable such optimizations. It had been hoped that the compiler transition would happen for 1.4, but I believe it has slipped to 1.5 because the focus of this release was on moving the runtime to Go (this paves the path for more advanced GC). I would imagine that the Go compiler transition will happen for 1.5, at which point the top priority (aside from bugfixes) will be transforming the compiler into looking like a go program rather than a c program. This will probably happen during 1.5 and 1.6. It seems that at that point they are interested in having SSA for Go. Once that's done, it should be easier to implement more complicated optimizations.

In my experience, removing bounds checking can provide up to 30% improvement in performance for numeric code. I would guess such an optimization will be implemented eventually, because such performance gains will be seen throughout the go ecosystem. I personally would love to see autovectorization, but in the past the developers have not expressed much interest in floating point computational speed. Who knows though, with the compiler and runtime in Go, both structured as nice go packages, with a moving and compacting GC, and a robust cache-optimizing scheduler, maybe vectorization will be the next on their list.

Go is designed to enable vectorization, but it takes doing. I personally don't expect it before 1.7, however it is an open source project.

unread,
Sep 25, 2014, 8:36:50 AM9/25/14
to golan...@googlegroups.com, gauta...@gmail.com
In my opinion, the Intel C/C++ compiler 15.0 isn't a sufficiently smart compiler. Compiling https://github.com/tul-project/benchmarks/blob/master/c/mandelbrot.c with ICC 15.0.0 (command line: icc -fast mandelbrot.c, cpu: Haswell) generates scalar instructions only. A sufficiently smart compiler would be able to auto-vectorize and auto-parallelize the code.
Reply all
Reply to author
Forward
0 new messages