Hi everyone!
I recently downloaded the OpenBLAS libraries and did a benchmark comparison with a benchmark-suite from the ATLAS project.
I've got a Haswell Core i7 CPU with FMA4.
In the kernel notifications there is stated, that the benchmark results of "dgemm" with a single thread should be round about 45 GFLOPs.
Unfortunately, in benchmarks I only get about 28 GFLOPs. I compiled the library using the "make" command and he compiled 2 libraries.
"libopenblas.a" and "libopenblas-Haswell-**.a". I tested both libraries with the benchmark-suite, which calls the fortran function "dgemm_" and compares it with the C-interface of ATLAS.
No matter, which library I use, the benchmark results are the same. For linking I used the -pthread option, which is for single-threading I suppose.
I tried it on 2 different Linux System, Ubuntu (Debian-based) and Manjaro (Arch-based) with same results.
Maybe You can tell me, what I did wrong or what the reasons are for this bad performance.
I had a test-wise MKL library for comparison and did the same static linking as with openblas and got the 45,2 GFLOPs.
Thank You for any help
Marcel Sachtleben