Building an optimized OpenBLAS packages

855 views
Skip to first unread message

Yuan Xu

unread,
Dec 12, 2013, 7:23:28 AM12/12/13
to openbla...@googlegroups.com
Hi all,

I am new to OpenBLAS, thanks for the wonderful package.
I have a ubuntu system (Intel Core i7), and tried openblas from debian package, it is really fast.

Then I download the source code from git, and build the library with 'make', the resulted library is libopenblas_sandybridgep-r0.2.8.so
it detects my CPU correctly, but according to my test, it is slower than the one from debian package.

Did I miss some option for building optimized OpenBLAS?
Thanks!

PS:
* the openblas from debian package is version 0.2.6 (it claimed to switch kernels according to running architecture)

Zhang Xianyi

unread,
Dec 12, 2013, 7:47:22 AM12/12/13
to Yuan Xu, openbla...@googlegroups.com
Thank you for using OpenBLAS
Did you just test dot?

Xianyi


2013/12/12 Yuan Xu <xuyu...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yuan Xu

unread,
Dec 12, 2013, 7:48:48 AM12/12/13
to openbla...@googlegroups.com, Yuan Xu
Yes, only dot

Yuan Xu

unread,
Dec 12, 2013, 2:33:28 PM12/12/13
to Zhang Xianyi, openbla...@googlegroups.com
Hi Xianyi,

Thanks for your reply.
After limiting the number of threads to 1, both versions perform the same.
When setting the number of threads to 4, the debian packaged version gets improved, however the locally compiled version performs the same as before.

the output of 'make' is:

 OpenBLAS build complete.

  OS               ... Linux             
  Architecture     ... x86_64               
  BINARY           ... 64bit                 
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
  Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded; Max num-threads is 4)

so, what do I miss?

Best Regards,

Xu, Yuan


On Thu, Dec 12, 2013 at 1:53 PM, Zhang Xianyi <traits...@gmail.com> wrote:
please set the environment before your test.
export OPENBLAS_NUM_THREADS=1


2013/12/12 Yuan Xu <xuyu...@gmail.com>

Carter Schonwald

unread,
Dec 14, 2013, 3:46:29 PM12/14/13
to Yuan Xu, Zhang Xianyi, openbla...@googlegroups.com
Could it be something like having MPI enabled or not?

Yuan Xu

unread,
Dec 14, 2013, 4:26:10 PM12/14/13
to Carter Schonwald, Zhang Xianyi, openbla...@googlegroups.com
How do I enable/disable MPI? thanks!

Best Regards,

Xu, Yuan

Zhang Xianyi

unread,
Dec 14, 2013, 8:35:59 PM12/14/13
to Yuan Xu, openbla...@googlegroups.com



2013/12/13 Yuan Xu <xuyu...@gmail.com>

Hi Xianyi,

Thanks for your reply.
After limiting the number of threads to 1, both versions perform the same.
When setting the number of threads to 4, the debian packaged version gets improved, however the locally compiled version performs the same as before.

Sorry, I have no idea about debian packaged version.
 

the output of 'make' is:

 OpenBLAS build complete.

  OS               ... Linux             
  Architecture     ... x86_64               
  BINARY           ... 64bit                 
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
  Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded; Max num-threads is 4)

It looks fine.

Zhang Xianyi

unread,
Dec 14, 2013, 8:36:04 PM12/14/13
to Yuan Xu, Carter Schonwald, openbla...@googlegroups.com
What do you mean enable/disable MPI?
You can `make USE_THREAD=0` to build single thread library.

On runtime, you can call `void openblas_set_num_threads(int num_threads);`.



2013/12/15 Yuan Xu <xuyu...@gmail.com>

Yuan Xu

unread,
Dec 15, 2013, 3:53:39 PM12/15/13
to Zhang Xianyi, openbla...@googlegroups.com
When the number of threads is changed from 1 to 4, the performance should be better in a i7 CPU, right?
It wasn't in my test, but the precompiled debian version gets better as expected.
Now my question is, does building with just 'make' command (without any optional flag) result optimized binary?

Thanks!

Best Regards,

Xu, Yuan

Zhang Xianyi

unread,
Dec 19, 2013, 2:27:18 AM12/19/13
to Yuan Xu, openbla...@googlegroups.com



2013/12/16 Yuan Xu <xuyu...@gmail.com>

When the number of threads is changed from 1 to 4, the performance should be better in a i7 CPU, right?
It wasn't in my test, but the precompiled debian version gets better as expected.
Now my question is, does building with just 'make' command (without any optional flag) result optimized binary?

Yes, by default 'make ' is optimized.
Reply all
Reply to author
Forward
0 new messages