The gccgo compiler has auto-vectorization support via the GCC
optimizers, of course, but I don't know how easy it would be to get it
to be generated for Go. The slice bounds checks might get in the way.
It might be necessary for the frontend to hoist bounds checks out of
loops before passing it on to the middle-end.
I'm not aware of anybody looking at auto-vectorization in the gc
compiler.
One problem with separate packages with specialized SIMD routines is
that they are inherently processor-specific. So I'm skeptical about
putting any such package in the standard library. Anybody could write
such a package accessible via go get, of course.
Also, the gc compiler is currently unable to inline functions written
in assembler code, so a significant amount, perhaps all, of the
benefit will be lost. This is a solvable problem but I'm not aware of
anybody working on that, either.
Ian