[go-nuts] Go viewpoint on SIMD intrinsics

1,221 views
Skip to first unread message

orthochronous

unread,
May 12, 2010, 9:46:06 AM5/12/10
to golang-nuts
Hi,

after searching around for a while I can't find anything either way on
the Go viewpoint on SIMD intrinsics. Does anyone on the language
design side have a view on whether SIMD intrinsics should/shouldn't be
made available in Go? (To be clear, I'm not talking about automatic
vectorisation being applied to scalar code, but analogues of the C-
style intrinsics _mm_add_epi16 (on Intel) or vaddq_u64 (on ARM NEON)
that allow mixing of intrinsics with standard higher level code for,
eg, control flow.)

I can understand that there may be good reasons for not wanting to add
support for them: off the top of my head, you've got to be able to
specify variable alignment and you introduce the possibility of
writing heavily chip-class specific code. On the other hand, for
certain classes of application it can be quite important to be able to
use these facilities from a high-level, general purpose language. So
I'm just trying to get an idea if this lies in Go's eventual future.

Many thanks for any enlightenment,
Regards,
Orthochronous

Andrew Gerrand

unread,
May 12, 2010, 3:48:34 PM5/12/10
to orthochronous, golang-nuts
I doubt Go will ever support SIMD explicitly at the language level. Go
is currently architecture-agnostic, and I don't see any reason why
that should change. It's conceivable that some primitive data types
and operations could be added to the language that would be compiled
to SIMD-specific code on architectures that support it. This is not on
the roadmap right now, though.

Andrew

orthochronous

unread,
May 12, 2010, 5:06:34 PM5/12/10
to Andrew Gerrand, golang-nuts
Thanks for the info. (Incidentally, if you get to thinking about
primitive data types be sure to look at the complications of integer
SIMD operations and not just the relatively simple floating point SIMD
when you try and plan things.) The trade-off of range-of-applicability
for simplicity and platform independence seems like a good one for Go.

Regards,
Orthochronous

Pete Wilson

unread,
May 12, 2010, 11:45:47 PM5/12/10
to golang-nuts
I think that eschewing intrinsics for stuff like SIMD is a Very Good
Idea.

Register-SIMD is an idea whose hardware implementation is disarmingly
simple, but whose programming model poses difficulties.

Given the concepts of goroutines and channels, we'd probably be better
off supposing the existence of different, but equally-simple, hardware
that supported explicit parallelisation of loops (into goroutines) or
the automatic vectorisation thereof (into goroutines).

The hardware is straightforward - a swarm of execution units (like in
SIMD, but lots), but with simple hardware surrounding each unit
collection, forming a very simple processor (probably with HW support
for lowcost messaging) with appropriate interprocessor interconnect.
As described, one spends a bit too much power fetching instructions,
but there are ways round that.

And when there's no 'vector' work to do, the machines are rather
useful general-purpose processors - unlike the idiot SIMD units...

-- Pete

On May 12, 4:06 pm, orthochronous <orthochron...@gmail.com> wrote:
> On Wed, May 12, 2010 at 8:48 PM, Andrew Gerrand <a...@golang.org> wrote:

orthochronous

unread,
May 13, 2010, 4:27:28 AM5/13/10
to Pete Wilson, golang-nuts
On Thu, May 13, 2010 at 4:45 AM, Pete Wilson <pe...@kivadesigngroupe.com> wrote:
> I think that eschewing intrinsics for stuff like SIMD is a Very Good
> Idea.
>
> Register-SIMD is an idea whose hardware implementation is disarmingly
> simple, but whose programming model poses difficulties.

I'll just note that "Everyone has an unconscious tendency to assume
the kind of programs they write are the kinds of programs everyone
writes". In theory, the kind of short-vector SIMD instructions are
what you want for image processing, image analysis and statistical
modelling that I do; for a lot of things, including writing compilers
or text-servers, they are almost completely useless. Unfortunately
Intel SSE has developed piecemeal implementing the operations they
believed programmers wanted (eg, on an Atom CPU I can use a vector
minimum operation on 16-bit integers but there's no vector minimum on
32-bit integers until SSE4) rather than a complete set of operations,
and the instruction set "limitation" that each operation can only
refer to two registers (removed in SSE5-something) gives some weird
instructions. There are much fewer, but not zero, implementation
idiocies in the better designed ARM-NEON SIMD instructions which makes
programming with them simpler. They are undoubtedly very difficult to
use for general computation, but they were designed for a different
part of the simple data parallel/low chip power budget space.

(The only reason that I'm checking about Go is I'm just checking
various next-generation languages to see if there's a better language
than C++ for building a big interactive image processing and analysis
library. Go is clearly not the appropriate medium for the kind of code
that I will be writing, but I can entirely understand that simplicity
and machine independence are more important for Go. General purpose
languages tend to develop the sprawl of C++, so staying focussed is
good.)

> Given the concepts of goroutines and channels, we'd probably be better
> off supposing the existence of different, but equally-simple, hardware
> that supported explicit parallelisation of loops (into goroutines) or
> the automatic vectorisation thereof (into goroutines).
>
> The hardware is straightforward - a swarm of execution units (like in
> SIMD, but lots), but with simple hardware surrounding each unit
> collection, forming a very simple processor (probably with HW support
> for lowcost messaging) with appropriate interprocessor interconnect.
> As described, one spends a bit too much power fetching instructions,
> but there are ways round that.
>
> And when there's no 'vector' work to do, the machines are rather
> useful general-purpose processors - unlike the idiot SIMD units...

I'll just note that I'm not aware of any current chip, particularly
any designed with low power usage in mind, that implements this kind
of circuitry.

Regards,
Orthochronous

Joe Poirier

unread,
May 13, 2010, 10:18:40 AM5/13/10
to golang-nuts
> > Given the concepts of goroutines and channels, we'd probably be better
> > off supposing the existence of different, but equally-simple, hardware
> > that supported explicit parallelisation of loops (into goroutines) or
> > the automatic vectorisation thereof (into goroutines).
>
> > The hardware is straightforward - a swarm of execution units (like in
> > SIMD, but lots), but with simple hardware surrounding each unit
> > collection, forming a very simple processor (probably with HW support
> > for lowcost messaging) with appropriate interprocessor interconnect.
> > As described, one spends a bit too much power fetching instructions,
> > but there are ways round that.
>
> > And when there's no 'vector' work to do, the machines are rather
> > useful general-purpose processors - unlike the idiot SIMD units...
>
> I'll just note that I'm not aware of any current chip, particularly
> any designed with low power usage in mind, that implements this kind
> of circuitry.
>

Chuck Moore's Forth chip and Sandbridge SoC come to mind:

http://www.intellasys.net/
http://www.greenarrays.com/
http://www.sandbridgetech.com/

-joe

mg

unread,
May 13, 2010, 3:37:05 PM5/13/10
to golang-nuts
On May 12, 3:48 pm, Andrew Gerrand <a...@golang.org> wrote:
> I doubt Go will ever support SIMD explicitly at the language level.

Maybe there could be an extension mechanism for body-less function
declarations somewhat similar to how Sun's inline template files work,
which are also used in their mediaLib. It would probably, like on Sun
Studio compilers, require the compiler to correctly schedule passed on
assembly instructions (including SIMD instructions).

http://developers.sun.com/solaris/articles/inlining.html
http://developers.sun.com/solaris/articles/inline_assembly.html
http://developers.sun.com/solaris/articles/vis.html
http://developers.sun.com/sunstudio/documentation/ss12u1/mr/man/man1/inline.1.html
http://docs.sun.com/doc/819-2242/libmlib-3lib
http://en.wikipedia.org/wiki/mediaLib
Reply all
Reply to author
Forward
0 new messages