Benchmark Results with the OpenBLAS library

973 views
Skip to first unread message

Marcel Sachtleben

unread,
May 26, 2015, 10:36:42 AM5/26/15
to openbla...@googlegroups.com
Hi everyone!

I recently downloaded the OpenBLAS libraries and did a benchmark comparison with a benchmark-suite from the ATLAS project.
I've got a Haswell Core i7 CPU with FMA4.
In the kernel notifications  there is stated, that the benchmark results of "dgemm" with a single thread should be round about 45 GFLOPs.
Unfortunately, in benchmarks I only get about 28 GFLOPs. I compiled the library using the "make" command and he compiled 2 libraries.
"libopenblas.a" and "libopenblas-Haswell-**.a". I tested both libraries with the benchmark-suite, which calls the fortran function "dgemm_" and compares it with the C-interface of ATLAS.
No matter, which library I use, the benchmark results are the same. For linking I used the -pthread option, which is for single-threading I suppose.

I tried it on 2 different Linux System, Ubuntu (Debian-based) and Manjaro (Arch-based) with same results.
Maybe You can tell me, what I did wrong or what the reasons are for this bad performance.
I had a test-wise MKL library for comparison and did the same static linking as with openblas and got the 45,2 GFLOPs.

Thank You for any help

Marcel Sachtleben

Zhang Xianyi

unread,
May 26, 2015, 12:19:29 PM5/26/15
to Marcel Sachtleben, openbla...@googlegroups.com
Hi Marcel,

What's the CPU?  Matrix Input?

--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcel Sachtleben

unread,
May 27, 2015, 1:24:22 PM5/27/15
to openbla...@googlegroups.com, marcel.s...@gmail.com
I use an Intel(R) Core(TM) i7-4710MQ within my notebook.
The matrix sizes went from 500x500 to 5000x5000 with 500 step.

I will add a screenshot from the benchmark suite:

It's even worth than i thought. The comparison is to the ATLAS implementation

I linked both libopenblas.a and libopenblas_haswellp-r0.2.13.a.
The link flags i set to "-pthread -lm" aswell as "-lpthread -lm" which didn't make any difference.

For comparison I added the benchmark result from Intels MKL aswell.

Don't know what I'm doing wrong.


ATLAS_vs_OpenBLAS.txt
ATLAS_vs_OpenBLAS_haswellp.txt
ATLAS_vs_MKL.txt

Werner Saar

unread,
May 28, 2015, 4:40:13 AM5/28/15
to openbla...@googlegroups.com
Hi,

attached are benchmark results, that I created today on the haswell machine in our lab.
I used the dgemm benchmark in the folder benchmark of OpenBLAS.
The benchmarks ran at 3.9 GHZ. Because your machine is a liitle bit slower ( only 3.4 GHZ maximum ),
45 GFLOPs is OK.

Regards
Werner
dgemm.png

ckl

unread,
May 28, 2015, 5:40:57 AM5/28/15
to openbla...@googlegroups.com
Hi,


Am Donnerstag, 28. Mai 2015 10:40:13 UTC+2 schrieb Werner Saar:
Hi,

attached are benchmark results, that I created today on the haswell machine in our lab.
I used the dgemm benchmark in the folder benchmark of OpenBLAS.
The benchmarks ran at 3.9 GHZ. Because your machine is a liitle bit slower ( only 3.4 GHZ maximum ),
45 GFLOPs is OK.

Are benchmarks availabe made on Windows OS?

Cheers,

Carl
 

Werner Saar

unread,
May 28, 2015, 5:46:19 AM5/28/15
to openbla...@googlegroups.com
Hi,

sorry, I don't have Windows OS
and cannot run benchmarks on this platform

Regards
Werner
--

ckl

unread,
May 28, 2015, 5:57:53 AM5/28/15
to openbla...@googlegroups.com
There seems some differences between Linux and Windows with mullti-threaded kernels, see i.e. https://github.com/xianyi/OpenBLAS/issues/532. Do you have an idea what to do about that? Is it helpful to run the benchmark code without any comparison to MKL or ATLAS?

Werner Saar

unread,
May 28, 2015, 6:26:04 AM5/28/15
to openbla...@googlegroups.com
Hi,

You should not run different benchmarks in a short time, because
the cpu's get hot and then you will have lower cpu frequencies.
Wait at least one or two minutes, before running the next benchmark.

I think, that it's also a bad idea, to mix benchmark runs,
used in xdl3blastst ( caching issues) .

I will try to compile OpenBLAS/benchmark on a Windows OS.

Regards
Werner

ckl

unread,
May 28, 2015, 7:23:44 AM5/28/15
to openbla...@googlegroups.com
I use the following Makefile.rule for Windows builds

# 64bit
VERSION = 0.2.14
DYNAMIC_ARCH = 1
CC = gcc
FC = gfortran
BINARY = 64
USE_THREAD = 1
USE_OPENMP = 0
NUM_THREADS = 32
NO_WARMUP = 1
NO_AFFINITY = 1
CONSISTENT_FPCSR = 1
PREFIX = /tmp/openblas
COMMON_OPT = -O2 -march=x86-64 -mtune=generic
FCOMMON_OPT = -frecursive
MAX_STACK_ALLOC = 2048

# 32bit
..
BINARY = 32
..
COMMON_OPT = -O2 -march=pentium4 -mtune=generic -mfpmath=sse -msse2
...

For the 32 bit build I removed 3 non-sse2 targets to be able to use the -msse2 switch.

One question arises for windows builds:

Is it useful to use the following rule?

USE_SIMPLE_THREADED_LEVEL3 = 1

- Carl
Reply all
Reply to author
Forward
0 new messages