how to compile julia with nvblas instead of libopenblas

393 views
Skip to first unread message

John Smith

unread,
Nov 12, 2014, 4:35:20 PM11/12/14
to julia...@googlegroups.com


Hello,

Does anybody out there know how to compile Julia with cuBLAS replacement for BLAS? cuBLAS is not open source but it is freely available since CUDA 6.0. 
According to my preliminary tests, the randmult of Float32 matrices (size 8K) is roughly 6x faster on GTX760 than quadcore i7, so this seems to be quite a gain.

More specifically, two questions:

1. With octave, one can simply switch BLAS versions, e.g.

"LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so octave" 

or

"LD_PRELOAD=/usr/lib/openblas-base/libopenblas.so.0 octave"

However, with julia this does not work, why?

2. When compiling Julia source from github, what do I have to change in Make and Make.inc files in order to replace the Julia's default
libopenblas.so.0 with libnvblas.so?

Please notice that the speed gain is incredible only for Float32, but this is still quite important as the codes are much faster with nvblas.

I will wait for your suggestions, thank you for your time.

John Smith

cdm

unread,
Nov 12, 2014, 4:55:01 PM11/12/14
to julia...@googlegroups.com

this may be helpful ...



i have not tried adding
this package and have
no experience with it.

good luck,

cdm

Elliot Saba

unread,
Nov 12, 2014, 5:47:04 PM11/12/14
to julia...@googlegroups.com
When compiling your Julia, you need to set the following make variables:

LIBBLAS=-lnvblas
LIBLAPACK=-lnvblas
USE_SYSTEM_BLAS=1
USE_SYSTEM_LAPACK=1

I'm assuming that libnvblas provides lapack as well.  If it doesn't, you may run into issues because the LAPACK library needs access to BLAS functionality itself.
-E

John Smith

unread,
Nov 13, 2014, 3:46:38 AM11/13/14
to julia...@googlegroups.com
Thanks for your inputs. Not much luck with my first attempts at CUBLAS.jl (some errors), but this is exactly what I am looking for (basically Python's cudamat ability to multiply two matrices on GPU and send back to the host).

Regarding recompilation, I have tried linking explicitly to  the available "/usr/local/cuda-6.5/lib64/libnvblas.so", in Make.inc:

LIBBLAS = -L/usr/local/cuda-6.5/lib64 -lnvblas
LIBBLASNAME = libnvblas

It passes the test on BLAS, but fails with LAPACK:

checking for sgemm_ in -L/usr/local/cuda-6.5/lib64 -lnvblas... yes
checking for cheev_ in -L/usr/local/cuda-6.5/lib64 -lnvblas... no
...
make: *** [release] Error 2

It is a shame that it is hard to compile Julia with BLAS and without LAPACK, or with the BLAS subset of BLAS changed to that of cuBLAS while falling back to the CPU LAPACK because Julia has a lot of other goodies and my applications just need matrix products, while the compilation gets stuck with cheev_, which is some specialized eigenvalue decomposition I do not need at all. It seems this is a problem experienced by Elliot Saba as well. 

BTW, in Octave it is easy to change BLAS to libnvblas.so while leaving the ATLAS part from the same libopenblas.so (see the first part of my post, except that I forgot to mention that one must place nvblas.conf, one can find more details at http://www.tuicool.com/articles/mQb6bu which seems to be quite useful), so the "old wheel" Octave is in a way much closer to "hybrid computing", and I wish Julia had the same functionality (there would be no need for any external libs and extra syntax in order to multiply two matrices on GPU as in CUBLAS.jl (if you get it working), in Octave you run the same code, which saves a lot of time and debugging).

Nick Henderson

unread,
Nov 13, 2014, 4:53:02 PM11/13/14
to julia...@googlegroups.com
Hi John, I've been meaning to get back to work on CUBLAS.jl.  I have not found the time since the academic quarter started.  The CUBLAS.jl code requires my fork of CUDArt.jl:


Give that a try.  I have not looked at it for some weeks now.

Cheers,
Nick

Viral Shah

unread,
Nov 13, 2014, 10:00:40 PM11/13/14
to julia...@googlegroups.com
I think USE_SYSTEM_LAPACK should be 0. That way it should use a system provided BLAS that should be the CUDA BLAS, and build LAPACK from source. This is meant to work, and if it does not, please file an issue. 

-viral
Reply all
Reply to author
Forward
0 new messages