Remarkable variations in execution times for same matrix sizes for DGEMM on Intel Haswell i5-4590

46 views

GEMMOpenBLAS

Skip to first unread message

Ravi Manumachu

unread,

Dec 30, 2015, 7:31:56 PM12/30/15

to OpenBLAS-users

Dear All,

I am observing lot of variation in execution times of OpenBLAS DGEMM on Intel Haswell, whose spec is shown below:

Architecture: x86_64

model name : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz

CPU(s): 4

Thread(s) per core: 1

Core(s) per socket: 4

Socket(s): 1

NUMA node(s): 1

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 6144K

In the experiments, the DGEMM routine multiplies two square matrices of size NxN. The matrix sizes (N) are varied from 16960 to 17088. The attached plot shows the variations in execution times. This variation is observed for other matrix sizes too. No environment variable (OPENBLAS_NUM_THREADS) is set and so it is assumed that OpenBLAS uses 4 threads. Also, each experimental point in the plot is an average of 5 executions.

Please also note that when OpenBLAS was compiled, the following line is commented out in Makefile.rule:

# If you want to disable CPU/Memory affinity on Linux.

#NO_AFFINITY = 1

This is allow OpenBLAS to use CPU affinity. This flag, however, does not seem to make any difference (whether commented or not).

Please let me know if this is a known issue or if there are any workarounds for minimal deviations in execution times.

Regards

Ravi

dgemm_execution_times_haswell-i5-4590.png

Reply all

Reply to author

Forward

0 new messages