i had a similar problem and i dont know how to solve it, in fortran i do
input
predgemm matrix a
4.0000 1.0000 1.0000 2.0000
0.0000 3.0000 4.0000 1.0000
0.0000 1.0000 3.0000 1.0000
0.0000 0.0000 0.0000 6.0000
predgemm matrix c
-4.0000 7.0000 1.0000 12.0000
-9.0000 2.0000 -2.0000 -2.0000
-4.0000 2.0000 -2.0000 8.0000
-1.0000 1.0000 -1.0000 19.0000
is=4
nb=3
mb=1
ldc=lda=4
CALL dgemm2( 'N', 'N', is-1, nb, mb, -one,a( 1, is ), lda, c( is, js ), ldc, one,c( 1, js ), ldc)
dgemm matrix a
1 2 3 4
1 4.0000 1.0000 1.0000 2.0000
2 0.0000 3.0000 4.0000 1.0000
3 0.0000 1.0000 3.0000 1.0000
4 0.0000 0.0000 0.0000 6.0000
dgemm matrix c
1 2 3 4
1 -2.0000 5.0000 3.0000 12.0000
2 -8.0000 1.0000 -1.0000 -2.0000
3 -3.0000 1.0000 -1.0000 8.0000
4 -1.0000 1.0000 -1.0000 19.0000
in go
same values
[]float64 len: 16, cap: 16, [4,0,0,0,1,3,1,0,1,4,3,0,2,1,1,6]
[]float64 len: 16, cap: 16, [-4,-9,-4,-0.9999999999999998,7,2,2,1,1,-2,-2,-0.9999999999999998,12,-2,8,19]
bi.Dgemm(blas.Trans, blas.NoTrans, is-1, nb, mb, -1, a[lda*(is-1):], lda, c[is-1+ldc*(js-1)+3:], ldc, 1, c[ldc*(js-1):], ldc)
and getting
[]float64 len: 16, cap: 16, [-8,-11,-6,-0.9999999999999998,5,1,1,1,0,-3,-2,-0.9999999999999998,12,-2,8,19]