RFE: Make BLAS/LAPACK implementation a runtime choice.

153 views
Skip to first unread message

Volker Braun

unread,
Dec 23, 2012, 7:23:16 AM12/23/12
to sage-...@googlegroups.com
The handling of BLAS/LAPACK libraries suck, Sage basically hardcodes ATLAS or OSX accelerate into the build process. Matrix operations are one place where using ISA extensions and careful choice of blocking to match processor cache really matters, but we have no mechanism to use a good implementation in binary builds. 

My proposal is to make the BLAS library configurable at runtime with defaults chosen by an automatic benchmark run. This is similar to what GMP/MPIR does but at a higher level. Instead of writing our own "FAT" BLAS, we use a wrapper library that resolves (using dlopen/dlsym) the BLAS library at runtime to the reference/netlib BLAS, ATLAS, OpenBlas, Intel MKL, or any other implementation. The backend can be selected with the LINEAR_ALGEBRA_CONFIG environment variable. 

Ultimately, there will be a Python program to build different BLAS/LAPACK libraries and install them either in /usr/lib/linear_algebra/NAME/ (or /usr/lib64 or any compile-time prefix) or $(HOME)/.local/lib/linear_algebra/NAME/ where NAME is a combination of implementation and compile options used. The LINEAR_ALGEBRA_CONFIG environment variable will then be set to NAME.

As a demo, I wrote the wrapper library linalg_cblas.so, see


which wraps two CBLAS functions. For now, LINEAR_ALGEBRA_CONFIG is used as name (possibly with full path info) of the cblas library, though this will be changed. The example program


is just a call to these two CBLAS functions and is linked with -llinalg_cblas -ldl. Then the CBLAS library can be selected at runtime (here switched between Sage GSL CBLAS, the Fedora GSL CBLAS, and ATLAS):

[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=~/Sage/sage/local/lib/libgslcblas.so ./test/startup 
Address of cblas_zdotc_sub: d817a0b0
Address of cblas_dgemm: d815ffb0
destructor
[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=libgslcblas.so.0 ./test/startup 
Address of cblas_zdotc_sub: c30260f0
Address of cblas_dgemm: c300c000
destructor
[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=libcblas.so ./test/startup 
Address of cblas_zdotc_sub: c58159e0
Address of cblas_dgemm: c580c420
destructor

The computational cost is of course an extra stack frame whenever you call a BLAS/LAPACK function. But then thats already faster than any operation that you can do in Python. 

In the Sage build process we would then build the wrapper library and the reference blas by default. The wrapper will try to pick up system BLAS/LAPACK implementations if they are available. Installing ATLAS or OpenBlas would then be optional. Right now we compile ATLAS in parallel together with anything else, which makes little sense if you want to get accurate timings. Also the ATLAS library interface changed in ATLAS-3.10 so we'll have to modify everything sooner or later. Finally, ATLAS-3.10 doesn't work on Itanium any more and there is not much interest in making it work there, so if we ever want to get AVX support, say, on modern Intel chips we need to fall back to a dumb implementation on Itanium (or face the fact that Itanium is dead).

Julien Puydt

unread,
Dec 23, 2012, 7:43:17 AM12/23/12
to sage-...@googlegroups.com
Le 23/12/2012 13:23, Volker Braun a �crit :
> The handling of BLAS/LAPACK libraries suck, Sage basically hardcodes
> ATLAS or OSX accelerate into the build process. Matrix operations are
> one place where using ISA extensions and careful choice of blocking to
> match processor cache really matters, but we have no mechanism to use a
> good implementation in binary builds.
>
> My proposal is to make the BLAS library configurable at runtime with
> defaults chosen by an automatic benchmark run. This is similar to what
> GMP/MPIR does but at a higher level. Instead of writing our own "FAT"
> BLAS, we use a wrapper library that resolves (using dlopen/dlsym) the
> BLAS library at runtime to the reference/netlib BLAS, ATLAS, OpenBlas,
> Intel MKL, or any other implementation. The backend can be selected with
> the LINEAR_ALGEBRA_CONFIG environment variable.


Why is a wrapper needed? The same code should build as well with one
implementation as with another... or even run with a different
implementation from build-time to run-time to another run-time, in fact.

For example, in debian, there is a /usr/lib/libblas.so... which is a
symlink to the real blas (I'm simplifying a little here -- there is an
'alternatives' mechanism which is more involved). Then all binary
packages which need a blas just depend on this pseudo-blas. You can just
change the carpet under their feet like you want and whenever you want,
without even a recompilation needed.

Snark on #sagemath

Volker Braun

unread,
Dec 23, 2012, 8:18:10 AM12/23/12
to sage-...@googlegroups.com
You can pull out the rug under your feet if you have a dumb blas library that doesn't depend on anything. It gets a bit more tricky if you have library versions and dependencies (e.g. ATLAS cblas depends on pthreads and libatlas, which you nowadays have to specify on the linker command line). 

Julien Puydt

unread,
Dec 23, 2012, 9:56:01 AM12/23/12
to sage-...@googlegroups.com
Le 23/12/2012 14:18, Volker Braun a �crit :
> You can pull out the rug under your feet if you have a dumb blas library
> that doesn't depend on anything. It gets a bit more tricky if you have
> library versions and dependencies (e.g. ATLAS cblas depends on pthreads
> and libatlas, which you nowadays have to specify on the linker command
> line).

At compile time, the linker indeed has to know that the particular
libblas library you use has such and such deps, but will only put in the
dependence on libblas. The runtime linker, when relinking your
executable, will link to the now-available libblas, see that it depends
on something else, and link to that something else too. And that means
that if you change the implementation (and the deps of libblas), then
your executable still works the same.

To demonstrate it works like I describe, you'll find attached a trivial
test. Run "sh autogen.sh", then make, then go have a look in .libs/
using ldd: there you'll find that the libvisible library depends on an
libhidden library, but that the test program only links to libvisible.

This is why I think wrappers are not needed,

Snark on #sagemath
test-0.0.0.tar.bz2

Volker Braun

unread,
Dec 23, 2012, 10:26:45 AM12/23/12
to sage-...@googlegroups.com
I know of course that the runtime linker can resolve the symbol even if you didn't specify them at compile time, otherwise the dlopen()/dlsym() would never work. But in practice that means that you have to switch your libblas.so symlink all the time. Whenever you compile something you need the dumb blas, and when you run Sage you need the fast blas. What if you run cython() in Sage to access blas from C? How is that going to work if you have multiple Sage processes? What if Sage is installed system-wide and $SAGE_LOCAL/lib is not user-writeable?



Julien Puydt

unread,
Dec 23, 2012, 10:41:10 AM12/23/12
to sage-...@googlegroups.com
Le 23/12/2012 16:26, Volker Braun a �crit :
My point is that as long as blas is a common api, you don't need to do
anything to make it easy to change from an implementation to another.

Nothing forces you to change the blas from compile-time to run-time ; I
really don't understand your objection : it's about being able to do
something, not being coerced to do so.

What justifies adding a wrapper in the code when the linker is basically
already one?

Snark on #sagemath

Volker Braun

unread,
Dec 23, 2012, 11:08:52 AM12/23/12
to sage-...@googlegroups.com
On Sunday, December 23, 2012 3:41:10 PM UTC, Snark wrote:
Nothing forces you to change the blas from compile-time to run-time

But we want 

A) compile and link to BLAS (from 3rd party packages and from our own Python extension classes).
B) run Sage with a fast BLAS which will require extra linker arguments if you compile-time link against it.

How is our setup.py and 3rd party spkgs going to know about which libraries to link to (at compile time) in order to get BLAS?

One could have a script that spits out the linker options for the currently-active BLAS. Of course then everything will, for example, pick up dependencies to pthreads and libatlas if that was the active BLAS back when it was compiled. And break if you decide to delete the atlas install. 

Also, changing the BLAS that way still requires write access to SAGE_LOCAL/lib since the linker has (or should have) hardcoded the RPATH in the binary. One could LD_PRELOAD the chosen BLAS, but I think we should try to get away from LD_PRELOAD/LD_LIBRARY_PATH hacks and not cement them in.

William Stein

unread,
Dec 23, 2012, 11:56:15 AM12/23/12
to sage-...@googlegroups.com
On Sun, Dec 23, 2012 at 4:23 AM, Volker Braun <vbrau...@gmail.com> wrote:
The handling of BLAS/LAPACK libraries suck, Sage basically hardcodes ATLAS or OSX accelerate into the build process. Matrix operations are one place where using ISA extensions and careful choice of blocking to match processor cache really matters, but we have no mechanism to use a good implementation in binary builds. 

My proposal is to make the BLAS library configurable at runtime with defaults chosen by an automatic benchmark run. This is similar to what GMP/MPIR does but at a higher level. Instead of writing our own "FAT" BLAS, we use a wrapper library that resolves (using dlopen/dlsym) the BLAS library at runtime to the reference/netlib BLAS, ATLAS, OpenBlas, Intel MKL, or any other implementation.


I was at a conference a week ago with a bunch of applied mathematicians who are extremely experienced in high performance *numerical* computing (at least compared to me). One strongly suggested we simply dump ATLAS for OpenBLAS (https://github.com/xianyi/OpenBLAS).      OpenBLAS does not support OS X, but that might be OK, since I don't think we build ATLAS on OS X anyways.

I'm not for or against the above proposal (yet).  But I'm putting it out there.    The argument was that OpenBLAS has significant and broad developer momentum around it, whereas ATLAS is almost all Clint Whaley's project, and OpenBLAS is much easier to build.  I've not verified any of these claims.   (Both projects are I think BSD licensed.)

William
 
The backend can be selected with the LINEAR_ALGEBRA_CONFIG environment variable. 

Ultimately, there will be a Python program to build different BLAS/LAPACK libraries and install them either in /usr/lib/linear_algebra/NAME/ (or /usr/lib64 or any compile-time prefix) or $(HOME)/.local/lib/linear_algebra/NAME/ where NAME is a combination of implementation and compile options used. The LINEAR_ALGEBRA_CONFIG environment variable will then be set to NAME.

As a demo, I wrote the wrapper library linalg_cblas.so, see


which wraps two CBLAS functions. For now, LINEAR_ALGEBRA_CONFIG is used as name (possibly with full path info) of the cblas library, though this will be changed. The example program


is just a call to these two CBLAS functions and is linked with -llinalg_cblas -ldl. Then the CBLAS library can be selected at runtime (here switched between Sage GSL CBLAS, the Fedora GSL CBLAS, and ATLAS):

[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=~/Sage/sage/local/lib/libgslcblas.so ./test/startup 
Address of cblas_zdotc_sub: d817a0b0
Address of cblas_dgemm: d815ffb0
destructor
[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=libgslcblas.so.0 ./test/startup 
Address of cblas_zdotc_sub: c30260f0
Address of cblas_dgemm: c300c000
destructor
[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=libcblas.so ./test/startup 
Address of cblas_zdotc_sub: c58159e0
Address of cblas_dgemm: c580c420
destructor

The computational cost is of course an extra stack frame whenever you call a BLAS/LAPACK function. But then thats already faster than any operation that you can do in Python. 

In the Sage build process we would then build the wrapper library and the reference blas by default. The wrapper will try to pick up system BLAS/LAPACK implementations if they are available. Installing ATLAS or OpenBlas would then be optional. Right now we compile ATLAS in parallel together with anything else, which makes little sense if you want to get accurate timings. Also the ATLAS library interface changed in ATLAS-3.10 so we'll have to modify everything sooner or later. Finally, ATLAS-3.10 doesn't work on Itanium any more and there is not much interest in making it work there, so if we ever want to get AVX support, say, on modern Intel chips we need to fall back to a dumb implementation on Itanium (or face the fact that Itanium is dead).

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To post to this group, send email to sage-...@googlegroups.com.
To unsubscribe from this group, send email to sage-devel+...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel?hl=en.
 
 



--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

Julien Puydt

unread,
Dec 23, 2012, 12:13:56 PM12/23/12
to sage-...@googlegroups.com
Le 23/12/2012 17:56, William Stein a �crit :
> I was at a conference a week ago with a bunch of applied mathematicians
> who are extremely experienced in high performance *numerical* computing
> (at least compared to me). One strongly suggested we simply dump ATLAS
> for OpenBLAS (https://github.com/xianyi/OpenBLAS). OpenBLAS does
> not support OS X, but that might be OK, since I don't think we build
> ATLAS on OS X anyways.
>
> I'm not for or against the above proposal (yet). But I'm putting it out
> there. The argument was that OpenBLAS has significant and broad
> developer momentum around it, whereas ATLAS is almost all Clint Whaley's
> project, and OpenBLAS is much easier to build. I've not verified any of
> these claims. (Both projects are I think BSD licensed.)

I was all for throwing atlas a few days ago ; I have since been in touch
with Clint Whaley (reporting upstream my issues with #10508 [1]), and he
has been extremely reactive and efficient. That has definitely softened
my point of view on the matter.

In any case, the question of atlas vs openblas can only be asked when
sage will be blas-implementation-agnostic ; and that is the point of
this thread, if I don't err.

Snark on #sagemath

PS:
[1]
https://sourceforge.net/tracker/?func=detail&aid=3598167&group_id=23725&atid=379482

William Stein

unread,
Dec 23, 2012, 12:36:04 PM12/23/12
to sage-...@googlegroups.com


On Sun, Dec 23, 2012 at 9:13 AM, Julien Puydt <julien...@laposte.net> wrote:

Le 23/12/2012 17:56, William Stein a écrit :

I was at a conference a week ago with a bunch of applied mathematicians
who are extremely experienced in high performance *numerical* computing
(at least compared to me). One strongly suggested we simply dump ATLAS
for OpenBLAS (https://github.com/xianyi/OpenBLAS).      OpenBLAS does
not support OS X, but that might be OK, since I don't think we build
ATLAS on OS X anyways.

I'm not for or against the above proposal (yet).  But I'm putting it out
there.    The argument was that OpenBLAS has significant and broad
developer momentum around it, whereas ATLAS is almost all Clint Whaley's
project, and OpenBLAS is much easier to build.  I've not verified any of
these claims.   (Both projects are I think BSD licensed.)

I was all for throwing atlas a few days ago ; I have since been in touch with Clint Whaley (reporting upstream my issues with #10508 [1]), and he has been extremely reactive and efficient. That has definitely softened my point of view on the matter.

In any case, the question of atlas vs openblas can only be asked when sage will be blas-implementation-agnostic ;

Why?  That's like saying the question of git vs mercurial can only be asked when sage is revision-control-system agnostic.


and that is the point of this thread, if I don't err.

Snark on #sagemath

PS:
[1] https://sourceforge.net/tracker/?func=detail&aid=3598167&group_id=23725&atid=379482
--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To post to this group, send email to sage-...@googlegroups.com.
To unsubscribe from this group, send email to sage-devel+unsubscribe@googlegroups.com.

Francois Bissey

unread,
Dec 23, 2012, 12:43:32 PM12/23/12
to <sage-devel@googlegroups.com>
Why do I have openblas installed on my mac (intel) then? Recent openblas require clang though. Haven't managed to compile it with gcc. Overall I like openblas, I even put it on a BlueGene/P.

Francois

This email may be confidential and subject to legal privilege, it may
not reflect the views of the University of Canterbury, and it is not
guaranteed to be virus free. If you are not an intended recipient,
please notify the sender immediately and erase all copies of the message
and any attachments.

Please refer to http://www.canterbury.ac.nz/emaildisclaimer for more
information.

Julien Puydt

unread,
Dec 23, 2012, 1:02:40 PM12/23/12
to sage-...@googlegroups.com
Le 23/12/2012 18:36, William Stein a �crit :
> On Sun, Dec 23, 2012 at 9:13 AM, Julien Puydt <julien...@laposte.net
> <mailto:julien...@laposte.net>> wrote:
> In any case, the question of atlas vs openblas can only be asked
> when sage will be blas-implementation-agnostic ;
>
> Why? That's like saying the question of git vs mercurial can only be
> asked when sage is revision-control-system agnostic.

My claim is that to replace atlas with openblas, you'll need to modify
the code in a way which will make it easy to go back and forth.

Snark on #sagemath

Volker Braun

unread,
Dec 23, 2012, 1:26:21 PM12/23/12
to sage-...@googlegroups.com
I don't want to *only* support OpenBlas and hardcode that into build scripts instead of ATLAS. For one, it doesn't support ARM (=not at all) or AMD Bulldozer (=probably not that fast on the newest AMD chip). There are also bugs, so I would rather implement a more flexible setup than jump ship completely and then find that it crashes&burns on some ancient Itanium box that happens to be on the "Sage supported platforms" list. 

For the record, the https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt. Can, in theory, be compiled on OSX. Not sure how it compares to Apple's own sauce.

Having a wrapper library would also give us much more control to deal with bugs in  BLAS implementations, e.g. one could use ATLAS and override a buggy function with the reference BLAS. The downside is of course added complexity. It is also technically slower, but I don't think in a measurable way. The most performance-critical people will always go for static linking and non-PIC binaries but that's not really an option for a general-purpose system.

Volker Braun

unread,
Dec 23, 2012, 1:37:09 PM12/23/12
to sage-...@googlegroups.com
On a related note, does anybody know MAGMA (not the number theory system, silly). The BLAS implementation, aka Matrix Algebra on GPU and Multicore Architectures: http://icl.cs.utk.edu/magma/

Julien Puydt

unread,
Dec 24, 2012, 5:02:30 AM12/24/12
to sage-...@googlegroups.com
Le 23/12/2012 18:13, Julien Puydt a �crit :
> I was all for throwing atlas a few days ago ; I have since been in touch
> with Clint Whaley (reporting upstream my issues with #10508 [1]), and he
> has been extremely reactive and efficient. That has definitely softened
> my point of view on the matter.

I just compiled the hot-from-the-stove development release 3.11.3 and it
compiled in about two hours on my arm box. I'm now eagerly waiting for a
stable version containing the relevant changes.

I still think that from a purely architectural point of view it would be
better to have blas-agnosticism in sage, but atlas has gone way up in my
esteem!

Snark on #sagemath

Jason Grout

unread,
Dec 25, 2012, 2:38:48 AM12/25/12
to sage-...@googlegroups.com
On 12/23/12 11:37 AM, Volker Braun wrote:
> On a related note, does anybody know MAGMA (not the number theory
> system, silly). The BLAS implementation, aka Matrix Algebra on GPU and
> Multicore Architectures: http://icl.cs.utk.edu/magma/
>

I've been keeping an eye on MAGMA and PLASMA.

Also, the other day, Ralph Gommers announced on the scipy list [1] that
Intel was offering free MKL licenses for building/testing scientific
python software (specifically, stuff listed at
http://numfocus.org/projects-2/projects/). In a later message [2], he
said Intel would be pleased if they offered binaries for download that
were built against MKL. I assume this means the end-user would still
have to have an MKL license; if so, it doesn't completely resolve the
ATLAS/OpenBLAS issue in this thread. However, as long as we are talking
about switching BLAS implementations, it might be interesting to also
look at what it would take to build against MKL. If there is interest,
we might try asking Intel about the same sort of deal for distributing
Sage binaries built against MKL.

Thanks,

Jason


[1]
http://thread.gmane.org/gmane.comp.python.numeric.general/52372/focus=52455

[2]
http://thread.gmane.org/gmane.comp.python.numeric.general/52372/focus=52455



Volker Braun

unread,
Dec 25, 2012, 5:43:50 AM12/25/12
to sage-...@googlegroups.com
The commercial version of MKL lets you redistribute the library. There isn't much else really but I guess that means you can't redistribute the headers and documentation. I guess Intel is not much concerned about possible piracy by research institutes. In fact, I don't understand why Intel isn't giving the MKL away for free, they are in the hardware business after all. Probably not enough competition from AMD

I'm not quite clear if that binary distribution would be compatible with the GPL. I take it its a "borderline case". See http://www.gnu.org/licenses/gpl-faq.html#GPLAndPlugins:

----------------------------------------------------
If a program released under the GPL uses plug-ins, what are the requirements for the licenses of a plug-in? (#GPLAndPlugins)
It depends on how the program invokes its plug-ins. If the program uses fork and exec to invoke plug-ins, then the plug-ins are separate programs, so the license for the main program makes no requirements for them.

If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins. This means the plug-ins must be released under the GPL or a GPL-compatible free software license, and that the terms of the GPL must be followed when those plug-ins are distributed.

If the program dynamically links plug-ins, but the communication between them is limited to invoking the ‘main’ function of the plug-in with some options and waiting for it to return, that is a borderline case.
----------------------------------------------------

In any case it would be nice to have access to MKL even if only to make sure that you can link to it in the privacy of your own home.
Reply all
Reply to author
Forward
0 new messages