for this bodyless func header; func f40pc(i *[4]float32)
there is this file; "f40pc_amd64.c" (in this case for the formula; sqrt(x+1))
// func f40pc(i *[4]float32)
TEXT ·f40pc+0(SB),$16-8
MOVQ i+0(FP),AX // get 64 bit address from first parameter
MOVAPS (AX),X0 // load 128bit, 4xfloat32, from memory
MOVSS $(1.0),X1 // load single precision var, 1.0 , into lower 32 bits
SHUFPS $0x00,X1,X1 // duplicate it 4 times across the register
ADDPS X1,X0 // parallel add
SQRTPS X0,X0 // parallel sqrt in-place
MOVAPS X0,(AX) // put 128bit back to same address
RET ,
HEADS-UP: the assembler used by Go reads left-to-right, so the destination is the last parameter, but most assembler and documentation i've seen, has the parameters right-to-left.
it took me a while to figure out that the shuffle command here (3 parameters), has the 'selector' parameter first, when all the documentation i found has it last, just an assembler choice, but unless you know.
bit inconsistent really, since inside actual go code this paradyne is right-to-left.
also: the m/c coming from -S needs a middle dot '·' prefixed to the name to make it work, don't know why.