Multi-threaded BLAS conflicting with multi-threaded code

Douglas Bates

unread,

Mar 30, 2012, 1:31:02 AM3/30/12

to juli...@googlegroups.com

One of the issues that came up with a commercial port/extension of R from Revolution Analytics was using multi-threaded code that has multi-threaded BLAS. Revolution ships with MKL BLAS from Intel. In their extensions to R they do some multi-threading for fitting some types of models to large data sets. I think the approach is similar to using distributed arrays on chunks of rows but I'm not sure.

Anyway, they found to their chagrin that the parallelized version of the calculations was slower than the single thread version. The reason was that the code called BLAS functions which were themselves spawning threads and you ended up expecting 4 threads for 4 cores and getting 16 threads, or something like that.

I expect a way to attack this problem is to be able to control the number of threads that OpenBlas will try to spawn for a calculation. If I recall correctly that can be set through an environment variable (or maybe I am confusing OpenBlas and OpenMP).

Has this issue been considered in the design of the parallelization within Julia?

Stefan Karpinski

unread,

Mar 31, 2012, 10:27:26 PM3/31/12

to juli...@googlegroups.com

You can change environment variables easily enough:

ENV["VAR"] = "value"

Assuming that OpenBLAS checks that at run-time and not at startup time, that should do the trick. Currently we don't have a threading model — we only have a distributed multi-process model. So this wouldn't be an issue yet, but it's definitely something to keep in mind when the time comes to do some sort of threading. It's entirely possible that the kind of threading that MKL does is the only kind we'll ever support — i.e. automatic parallelization of conceptually sequential but vectorized code.

Viral Shah

unread,

Apr 1, 2012, 11:27:00 PM4/1/12

to juli...@googlegroups.com

This is a very common problem that manifests itself in every parallel computing code that tries to integrate more than one library. In most MPI codes that use pthreads and a couple of different libraries, this is a common issue. It arises from the fact that one is trying to use two different models of parallelism in this case, which are not aware of each other.

Julia itself is not multi-threaded yet, although it can easily call multi-threaded libraries such as openblas and others. But, the moment you turn on multi-threading in all the libraries, you can easily have the same problem in julia. This could be avoided if you use Intel's Threading Building Blocks, or Apple's Blocks, but then these have to be used throughout your code for parallelism, and you can easily end up with problems if you have multiple parallelism models.

It is very difficult for a generic thread scheduler such as the one in the OS to figure out the right thing to do. Julia has Tasks, which give us concurrency, and a process parallelism model with communication. On multi-core, we can optimize the IPC to get much of the benefits of multi-threaded code. In theory, with a combination of Tasks and process parallelism, we can get all the latency hiding and speedups you need.

My approach has been to keep it simple, and use the single threaded versions of all libraries, leaving the parallelism to be figured out by julia. Obviously, this has some ways to go, and some people can benefit with multi-threading. For those cases, we can provide some switches to allow multi-threaded libraries (OpenBLAS and FFTW to start with), but with some warnings.

-viral

Konrad Hinsen

unread,

Apr 2, 2012, 4:15:43 AM4/2/12

to juli...@googlegroups.com

Viral Shah writes:

> This is a very common problem that manifests itself in every
> parallel computing code that tries to integrate more than one
> library. In most MPI codes that use pthreads and a couple of
> different libraries, this is a common issue. It arises from the
> fact that one is trying to use two different models of parallelism
> in this case, which are not aware of each other.

It's actually worse, because even if everyone uses the same model of
parallelism, few libraries are written to take into account that some
other library would also like to be parallel. There is still a lot of
work to be done before we can compose parallel libraries freely.

> My approach has been to keep it simple, and use the single threaded
> versions of all libraries, leaving the parallelism to be figured
> out by julia. Obviously, this has some ways to go, and some people
> can benefit with multi-threading. For those cases, we can provide
> some switches to allow multi-threaded libraries (OpenBLAS and FFTW
> to start with), but with some warnings.

A solution that has worked well in my experience is to enable
parallelism for specific libraries using both an environment variable
and a optional function argument when calling a function of that
library, with the function argument taking precedence. The environment
variable provides a very simple solution that works for many
applications and doesn't require touching any source code. The
function argument is for fine-tuning in more complex situations.
Both ways should permit to set the number of cores to use for any
given library.

Konrad.

Reply all

Reply to author

Forward