Below is a selection of my benchmarks (on amd64). More and code at https://github.com/harrydb/go/tree/master/matrix
Note it is not my intention to write a full matrix package, gomatrix is great. I just wanted to teach myself how to implement stuff like this and see if I could get things faster.
Cheers,
Harry
BenchmarkMulDouglas__1024 1 1876168000 ns/op
BenchmarkMulStrassPar1024-2 1 1296087000 ns/op
BenchmarkMulDouglas__512 5 264489000 ns/op
BenchmarkMulStrassPar512-2 10 199333100 ns/op
BenchmarkMulStrassen_512 5 298305000 ns/op
BenchmarkMulBLAS_____512 5 562627800 ns/op
BenchmarkMulSimple___512 5 636601200 ns/op
BenchmarkMulGomatrix_512-2 5 783658800 ns/op
BenchmarkMulGomatrix_512 5 659225400 ns/op
BenchmarkMulDouglas__256 50 36059340 ns/op
BenchmarkMulStrassPar256-2 100 34300640 ns/op
BenchmarkMulStrassen_256 50 41638480 ns/op
BenchmarkMulBLAS_____256 50 54319880 ns/op
BenchmarkMulSimple___256 20 75649650 ns/op
BenchmarkMulGomatrix_256-2 20 93713050 ns/op
BenchmarkMulGomatrix_256 20 78700000 ns/op
BenchmarkMulBLAS______32 20000 95448 ns/op
BenchmarkMulSimple____32 10000 147572 ns/op
BenchmarkMulGomatrix__32 10000 146323 ns/op
BenchmarkMulGomatrix__32-2 10000 311446 ns/op
Note it is not my intention to write a full matrix package, gomatrix is great.