Follow up on this, I've dug into the DTGSY2 code doing side by side debugging and I'm finding discrepancies in results of Dgemm, which uses Daxpy under the hood.
This is the code:
Go:
bi.Dgemm(blas.NoTrans, blas.NoTrans, is, nb, mb, -1,
a[is:], lda, rhs, mb, 1, c[js:], ldc)
Fortran:
CALL DGEMM( 'N', 'N', IS-1, NB, MB, -ONE,
$ A( 1, IS ), LDA, RHS( 1 ), MB, ONE,
$ C( 1, JS ), LDC )
Here are the inputs which differ: (All leading dimensions are = 4)
FORTRAN:
A(1,1)@16
[16]
[1]: 4
[2]: 0
[3]: 0
[4]: 0
[5]: 1
[6]: 3
[7]: 1
[8]: 0
[9]: 1
[10]: 4
[11]: 3
[12]: 0
[13]: 2
[14]: 1
[15]: 1
[16]: 6
RHS(1)@8
[8]
[1]: 2
[2]: 1
[3]: -1
[4]: 3
[5]: 3
[6]: 1
[7]: -1
[8]: 2
C(1,1)@16
[16]
[1]: 1.0000000000000004
[2]: -1.0000000000000004
[3]: -0.99999999999999956
[4]: -0.99999999999999978
[5]: 9.0000000000000018
[6]: 2
[7]: 1
[8]: 1
[9]: 7.0000000000000018
[10]: -1
[11]: 3
[12]: -0.99999999999999978
[13]: 16
[14]: -0.99999999999999978
[15]: 7
[16]: 20
IS
2
JS
2
LDA
4
LDC
4
GO:
a
[]float64 len: 16, cap: 16, [4,1,1,2,0,3,4,1,0,1,3,1,0,0,0,6]
rhs
[]float64 len: 8, cap: 8, [2,1,-1,3,3,1,-1,2]
c
[]float64 len: 16, cap: 16, [1.0000000000000004,9.000000000000002,7.000000000000002,16,-1.0000000000000004,2,-1,-0.9999999999999998,-0.9999999999999996,1,3,7,-0.9999999999999998,1,-0.9999999999999998,20]
is
1
nb
2
mb
2
lda
4
js
1
The result is as follows:
Fortran:
C(1,1)@16
[16]
[1]: 1.0000000000000004
[2]: -1.0000000000000004
[3]: -0.99999999999999956
[4]: -0.99999999999999978
[5]: 6.0000000000000018
[6]: 2
[7]: 1
[8]: 1
[9]: 5.0000000000000018
[10]: -1
[11]: 3
[12]: -0.99999999999999978
[13]: 16
[14]: -0.99999999999999978
[15]: 7
[16]: 20
GO:
c
[]float64 len: 16, cap: 16, [1.0000000000000004,8.000000000000002,3.0000000000000018,16,-1.0000000000000004,2,-1,-0.9999999999999998,-0.9999999999999996,1,3,7,-0.9999999999999998,1,-0.9999999999999998,20]
As you can see, most values match, except for the
second value in Go (fifth in Fortran) and the
third value in Go.
Here is the Go output array (C) in column major representation for the discrepancy to be more visual:
call reorder(c,4,4)
[0]: 1.0000000000000004
[1]: -1.0000000000000004
[2]: -0.9999999999999996
[3]: -0.9999999999999998
[4]: 8.000000000000002
[5]: 2
[6]: 1
[7]: 1
[8]: 3.0000000000000018
[9]: -1
[10]: 3
[11]: -0.9999999999999998
[12]: 16
[13]: -0.9999999999999998
[14]: 7
[15]: 20