Hi,
I haven't played too much with cgo, but after a few minutes, here's what I've got.
- Direct conversion between float(s) and __m128 is unsafe and not recommended, instead, you should load/store your floats into an __m128 field.
- You'll need to convert Go's float32 to C float (and a pointer, if passing the whole slice).
- The compiler flags were just getting in my way (never played with them), and without the code seems to run just fine. Maybe somebody can shed some light on them :)
I believe the code below does what you aimed for ;).
Cheers,
Peter
PS: Of course, you could call the load method directly and not wrap it, I just thought it cleaner this way in this demo code.