Performance comparison of OpenBLAS, MKL and Matlab for executing DGEMM

2,356 views
Skip to first unread message

Santa Claus

unread,
Jul 1, 2014, 5:35:37 AM7/1/14
to openbla...@googlegroups.com
Dear all,

I have successfully built the binary "libopenblas.dll" as well as "libopenblas.lib" of openblas-v0.2.10-pre3-src.tar.gz in the MSYS shell of MinGW 20131004.

Two randomly generated 5k-by-5k real matrices A and B are used to test "DGEMM" provided by OpenBLAS v0.2.10 pre3 (with dynamic link library), MKL 11.1 update 1 (with static link libraries) and Matlab R2012a (with C = A * B) respectively, the corresponding results (in terms of computing speed) are presented below,

    a. Execution for the first time                                             b. Execution for the second time

Fig 1. Time consumed by DGEMM of OpenBLAS v0.2.10 pre3


Fig 2. Time consumed by DGEMM of MKL 11.1 update 1


Fig 3. Time consumed by expression of "C = A * B" in Matlab R2012a

Figure 3 indicates that the results computed by either MKL or OpenBlas basically agree well with that returned by Matlab. However, as shown in Figure 1, OpenBLAS exhibites the slowest but also unsteady computing speed.


When the foregoing binary "libopenblas.dll" and "libopenblas.lib" (which were built by myself) are substituted by the early release OpenBLAS-v0.2.8-x86-Win.zip, OpenBLAS speeds itself up significantly, and saves nearly half time than its contestants, which is shown in Figure 4. This is a definitely, positively, absolutely great boost to the computing speed of BLAS !!! (sorry for my excitation)

Fig 4. Time consumed by DGEMM of OpenBLAS v0.2.8


Finally, can you be kind enough to answer my following questions that

1) why does OpenBLAS v0.2.8 outperform OpenBLAS v0.2.10 pre 3 so much? 

2) if I have missed or mistaken something in the build process?


Thanks in advance for your upcoming replies~


Best regards,

Wenkai Zhao

Werner Saar

unread,
Jul 1, 2014, 5:51:49 AM7/1/14
to openbla...@googlegroups.com
On 01.07.2014 11:35, Santa Claus wrote:
> Dear all,
>
> Many thanks to the instructions suggested in
> https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio
> ,
> I have successfully built the binary "libopenblas.dll" as well as
> "libopenblas.lib" of openblas-v0.2.10-pre3-src.tar.gz
> <http://sourceforge.net/projects/openblas/files/v0.2.10-pre/openblas-v0.2.10-pre3-src.tar.gz/download> in
> the MSYS shell of MinGW 20131004.
>
> Two randomly generated *5k*-by-*5k* real matrices *A* and *B* are used to
> test "DGEMM" provided by OpenBLAS v0.2.10 pre3 (with dynamic link library),
> MKL 11.1 update 1 (with static link libraries) and Matlab R2012a (with C =
> A * B) respectively, the corresponding results (in terms of computing
> speed) are presented below,
>
> <https://lh6.googleusercontent.com/-gPSLFKhx-vA/U7JztcCS8rI/AAAAAAAAAB0/rO-eIjqdCSY/s1600/OpenBLAS+v0.2.10+pre3_1.png>
> <https://lh4.googleusercontent.com/-g3pOcabl_O8/U7Jz8Nyt-GI/AAAAAAAAAB8/wP5CE6mIjXs/s1600/OpenBLAS+v0.2.10+pre3_2.png>
>
> a. Execution for the first time
> b. Execution for the second time
>
> Fig 1. Time consumed by DGEMM of OpenBLAS v0.2.10 pre3
>
> <https://lh3.googleusercontent.com/-qLsTmc9lvy4/U7JykxBBbAI/AAAAAAAAABg/s2Huc-pbfr4/s1600/MKL+11.1+update1.png>
>
> Fig 2. Time consumed by DGEMM of MKL 11.1 update 1
>
> <https://lh6.googleusercontent.com/-I5506j7TW8I/U7J0-HVDMhI/AAAAAAAAACI/vkEK4DaoUdo/s1600/Matlab+R2012a.png>
>
> Fig 3. Time consumed by expression of "C = A * B" in Matlab R2012a
>
> Figure 3 indicates that the results computed by either MKL or OpenBlas
> basically agree well with that returned by Matlab. However, as shown in
> Figure 1, OpenBLAS exhibites the slowest but also unsteady computing speed.
>
>
> When the foregoing binary "libopenblas.dll" and "libopenblas.lib" (which
> were built by myself) are substituted by the early release
> OpenBLAS-v0.2.8-x86-Win.zip
> <http://sourceforge.net/projects/openblas/files/v0.2.8/OpenBLAS-v0.2.8-x86-Win.zip/download>,
> OpenBLAS speeds itself up significantly, and saves nearly half time than
> its contestants, which is shown in Figure 4. This is a definitely,
> positively, absolutely great boost to the computing speed of BLAS !!!
> (sorry for my excitation)
>
> <https://lh6.googleusercontent.com/-t1u0jIWHouo/U7J74IYDnHI/AAAAAAAAACY/fWb-AoTrA5w/s1600/OpenBLAS+v0.2.8.png>
>
> Fig 4. Time consumed by DGEMM of OpenBLAS v0.2.8
>
>
> Finally, can you be kind enough to answer my following questions that
>
> 1) why does OpenBLAS v0.2.8 outperform OpenBLAS v0.2.10 pre 3 so much?
>
> 2) if I have missed or mistaken something in the build process?
>
>
> Thanks in advance for your upcoming replies~
>
>
> Best regards,
>
> Wenkai Zhao
>
Hi,

I need some informations.
What is your processor and how did you build OpenBLAS. Did
you built a 32- or 64-binary?

Best regards
Werner

Santa Claus

unread,
Jul 1, 2014, 10:51:24 AM7/1/14
to openbla...@googlegroups.com, wern...@googlemail.com
Hi Werner,

Glad to see your quick response and sorry for my carelessness.

My processor is Pentium(R) Dual-Core CPU E5200 @ 2.5GHz (2 CPUs). OS is 32-bit WinXP. (obsolete, sorry...)

My build consists of,
1. Open the MSYS shell and enter into the directory as "/E/openblas-v0.2.10-pre3";
2. Type "make" and wait for completion;
3. Type "make PREFIX="./exports" install" to place "libopenblas.dll" and "Include" in the directory "./exports";
4. Enter into "./exports" and type "lib /machine:i386 /def:libopenblas.def" to generate the import library "libopenblas.lib".

Thank you very much.

Best regards,
Wenkai Zhao

在 2014年7月1日星期二UTC+8下午5时51分49秒,Werner Saar写道:

Zhang Xianyi

unread,
Jul 3, 2014, 3:42:03 AM7/3/14
to Santa Claus, openbla...@googlegroups.com, Werner Saar
Hi Wenkai, Werner,

Dual-Core E5200 is a Penryn microarchitecture processor. OpenBLAS used /kernel/x86/gemm_kernel_2x4_penryn.S kernel on 32-bit Windows. I think this kernel comes from GotoBLAS2.

Xianyi


--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Santa Claus

unread,
Jul 3, 2014, 11:23:48 PM7/3/14
to openbla...@googlegroups.com, zwk...@gmail.com, wern...@googlemail.com
Hi Xianyi,

So pleased to see your reply!

OpenBLAS actually beats both MKL and Matlab in terms of speed when BLASs are called.

However, the LAPACK subroutines in OpenBLAS perform not as well as the BLAS subroutines. To execute DGEEV of LAPACK on a 500-by-500 real matrix, OpenBLAS consumed 1.11 sec while MKL and Matlab consumed 0.74 sec and 0.61 sec respectively.

In my view, if BLASs have been accelerated significantly, LAPACKs will be also accelerated accordingly, won't they?

Sincerely yours,
Wenkai Zhao

在 2014年7月3日星期四UTC+8下午3时42分03秒,Zhang Xianyi写道:

José Luis García Pallero

unread,
Jul 4, 2014, 4:22:23 AM7/4/14
to Santa Claus, openbla...@googlegroups.com, Werner Saar
2014-07-04 5:23 GMT+02:00 Santa Claus <zwk...@gmail.com>:
> Hi Xianyi,
>
> So pleased to see your reply!
>
> OpenBLAS actually beats both MKL and Matlab in terms of speed when BLASs are
> called.
>
> However, the LAPACK subroutines in OpenBLAS perform not as well as the BLAS
> subroutines. To execute DGEEV of LAPACK on a 500-by-500 real matrix,
> OpenBLAS consumed 1.11 sec while MKL and Matlab consumed 0.74 sec and 0.61
> sec respectively.
>
> In my view, if BLASs have been accelerated significantly, LAPACKs will be
> also accelerated accordingly, won't they?

Hello:

This is not entirely exact. The reference Lapack is not parallelized,
so its parallel performance relies entirely on BLAS. Obviously, as
faster is BLAS, Lapack is also faster, but it is not true that if a
version of OpenBLAS is 1.2x faster than other version, Lapack has to
be 1.2x faster too. The Lapack routines in MKL are probably
parallelized one level over the BLAS, so its performance is better
that the reference Lapack+BLAS

Best regards
--
*****************************************
José Luis García Pallero
jgpa...@gmail.com
(o<
/ / \
V_/_
Use Debian GNU/Linux and enjoy!
*****************************************

Santa Claus

unread,
Jul 4, 2014, 9:58:24 AM7/4/14
to openbla...@googlegroups.com, zwk...@gmail.com, wern...@googlemail.com
Dear Pallero,

Thanks for your detailed explanations, which have corrected my poor extrapolation.

Since the LAPACK routines are encapsulated in the OpenBLAS library, I assumed that some driver routines like xgeev or xgesvd had been also parallelized as the BLAS subroutines.

Best regards,
Wenkai Zhao

在 2014年7月4日星期五UTC+8下午4时22分23秒,José Luis García Pallero写道:
Reply all
Reply to author
Forward
0 new messages