Julia backslash performance vs MATLAB backslash

3,520 views
Skip to first unread message

Mingming Wang

unread,
May 30, 2013, 11:10:59 PM5/30/13
to julia...@googlegroups.com
Hi, 

I am trying to port my MATLAB program to Julia. The for loop is about 25% faster. But the backslash is about 10 times slower. It seems in MATLAB, the backslash is parallelized automatically. Is there any plan in Julia to do this? BTW, the matrix I am solving is sparse and symmetric.

Thanks

Viral Shah

unread,
May 31, 2013, 2:11:25 AM5/31/13
to julia...@googlegroups.com
How many cores do you have? Are you using the precompiled mac binaries for 0.2pre - those have multi-threaded BLAS disabled as it was crashing.

I believe that the sparse solver is not parallelized, but the underlying BLAS calls are.

Can you provide a self-contained example and file an issue, so that we can make sure this gets addressed when the BLAS threading is enabled again?

-viral

Steven G. Johnson

unread,
Jun 1, 2013, 8:58:49 AM6/1/13
to julia...@googlegroups.com


On Friday, May 31, 2013 2:11:25 AM UTC-4, Viral Shah wrote:
How many cores do you have? Are you using the precompiled mac binaries for 0.2pre - those have multi-threaded BLAS disabled as it was crashing.

I believe that the sparse solver is not parallelized, but the underlying BLAS calls are.

Sparse solves make relatively little use of the BLAS.   (Even in a supernodal algorithm, they only use the BLAS for small blocks, too small to benefit from multiple cores.)

We need a parallelized sparse solver (e.g. via PETSc).

Mingming Wang

unread,
Jun 2, 2013, 11:48:15 PM6/2/13
to julia...@googlegroups.com
I made some modification, right now , it's 4x slower.

I have put the test example here, https://github.com/blademwang11/Topopt

I am indeed using precompiled mac binaries for 0.2pre.

Viral Shah

unread,
Jun 3, 2013, 1:42:29 AM6/3/13
to julia...@googlegroups.com
Tim, great to see you on the julia list!

Does SuiteSparse have a function that implements the polyalgorithm for \ ? If so, it would be simplest for us to use that rather than developing a polyalgorithm in julia ourselves. 

-viral

On Sunday, June 2, 2013 4:05:54 PM UTC+5:30, Tim Davis wrote:
If you're seeing a 10x slowdown as compared to x=A\b in MATLAB, then it could be due to several reasons.  You might be using the wrong solver.  MATLAB's x=A\b is a metasolver that picks the solver based on the matrix properties.  Julia might be picking the wrong solver.  Or, you might not be using the multicore BLAS.  I do all my flops in the BLAS in x=A\b, except for extremely sparse matrices (for which I use a non-BLAS based solver).  With the BLAS, x=A\b can reach up to 50% of the theoretical peak of  a multicore machine, depending on the problem (25% is more typical).

If you link to the standard Fortran BLAS, you will see a 10x slowdown, typically, as compared to x=A\b in MATLAB.

Toss me the matrix and I'll give it a try.  You can upload it to http://www.cise.ufl.edu/dropbox/www .

(I'm the author of most of the sparse solvers in x=A\b in MATLAB).

thanks,
Tim

Viral Shah

unread,
Jun 3, 2013, 1:47:49 AM6/3/13
to julia...@googlegroups.com
Steve,

I agree that we should plug in the various parallel sparse direct and iterative solvers through PETSc. However, we should ensure that our BLAS and such are configured correctly as well. 

I am planning on adding routines for openblas to set the number of threads. Can we also have a function that can set the number of threads for FFTW?

-viral

Tim Davis

unread,
Jun 3, 2013, 7:43:37 AM6/3/13
to julia...@googlegroups.com
The closest thing I have is the "factorize" function in my MATLAB_Tools directory.  It's written in MATLAB.

You can see how MATLAB picks its solver with this option:

spparms ('spumoni', 3)
x=A\b ;

then set spparms ('spumoni',0) back, to make it be quiet.

Douglas Bates

unread,
Jun 4, 2013, 10:48:39 AM6/4/13
to julia...@googlegroups.com
On Thursday, May 30, 2013 10:10:59 PM UTC-5, Mingming Wang wrote:
Hi, 

I am trying to port my MATLAB program to Julia. The for loop is about 25% faster. But the backslash is about 10 times slower. It seems in MATLAB, the backslash is parallelized automatically. Is there any plan in Julia to do this? BTW, the matrix I am solving is sparse and symmetric.

For a sparse symmetric matrix try

cholfact(A)\b

The simple

A\b

call will always use an LU decomposition from UMFPACK.
 

Tim Davis

unread,
Jun 4, 2013, 10:58:27 AM6/4/13
to julia...@googlegroups.com, Douglas Bates
That can explain the slowdown in A\b.  If you have a symmetric pos. definite matrix,
then even a fast LU can suffer way more fillin than Cholesky, if it decides to pivot
off the diagonal because it's worried about the numerical values.

The effect of 10x is likely not due to the BLAS or parallelization.  It's due to fillin,
most likely.

This is why spparms('spumoni',3) in MATLAB is helpful.

(p.s.  I'm guessing that an email reply to this thread will get to the julia group,
but let me know if that's not the case).

John Myles White

unread,
Jun 4, 2013, 11:34:44 AM6/4/13
to julia...@googlegroups.com
Yes, the whole Julia group receives e-mails to the mailing list as replies to a specific thread.

 -- John

Viral Shah

unread,
Jun 4, 2013, 2:14:34 PM6/4/13
to julia...@googlegroups.com

We should do the polyalgorithm that Tim pointed out. Perhaps best to have an issue for this.

-viral

Douglas Bates

unread,
Jun 4, 2013, 2:49:09 PM6/4/13
to julia...@googlegroups.com
On Tuesday, June 4, 2013 1:14:34 PM UTC-5, Viral Shah wrote:

We should do the polyalgorithm that Tim pointed out. Perhaps best to have an issue for this.

Do you mean just to check it the matrix is Hermitian and, if so, attempt a sparse Cholesky?  I suppose the check could be a bit more sophisticated in that we now have an iterator for the diagonal elements and can check for non-positive diagonal element before attempting the Cholesky.  The bottom line is always that you need to perform the Cholesky factorization in order to find out if it will succeed.

Tim or Steve Johnson:  Is it worthwhile doing any further checking on the structure of the matrix, perhaps checking the value of the quadratic form evaluated on randomly oriented unit vectors, before trying the Cholesky factorization or would that be a waste of time?

Mingming Wang

unread,
Jun 4, 2013, 4:08:55 PM6/4/13
to julia...@googlegroups.com
In my case, the cholfact(A)\b is 20% faster than A\b. The Matrix A is a sparse positive definite symmetric stiffness matrix generated by linear quadrilateral finite element discretization.

Viral Shah

unread,
Jun 4, 2013, 11:11:48 PM6/4/13
to julia...@googlegroups.com
Doug,

Ideally, the backslash needs to look for diagonal matrices, triangular matrices and permutations thereof, banded matrices and the least squares problems (non-square). In case it is square, symmetric and hermitian, with a heavy diagonal(?), cholesky can be attempted, with a fallback to LU. I believe we do some of this in the dense \ polyalgorithm, but I am not sure if we look for the banded cases yet.

This is what Octave does:

This is Tim's Factorize for solving linear and least squares systems:

-viral

Viral Shah

unread,
Jun 4, 2013, 11:31:21 PM6/4/13
to julia...@googlegroups.com

Stefan Karpinski

unread,
Jun 5, 2013, 12:12:12 AM6/5/13
to Julia Users
Goodness. This is why there needs to be a polyalgorithm – no mortal user could know all of this stuff!

Viral Shah

unread,
Jun 5, 2013, 2:39:15 AM6/5/13
to julia...@googlegroups.com
I guess it is the last 20 years of sparse solver work packed into one function. Not many fields can boast of providing this level of usability out of their work. :-)

There are a class of people who believe that things like \ encourage blackbox usage, with people doing stuff they do not understand, and there are others who believe in standing on the shoulders of giants.

I find that we have taken a good approach in Julia, where we have \ and it will have the perfect polyalgorithm at some point. But, you also have the option of digging deeper with interfaces such as lufact(), cholfact(), qrfact(), and finally, even if that does not work out for you, call the LAPACK and SuiteSparse functions directly.

-viral

Ehsan Eftekhari

unread,
Jan 5, 2015, 7:21:59 AM1/5/15
to julia...@googlegroups.com
I'm solving diffusion equation in Matlab on a 3D uniform grid (31x32x33) and Julia. I use the "\" to solve the linear system of equations. Here is the performance of the linear solver in Julia:
elapsed time: 2.743971424 seconds (35236720 bytes allocated)

and Matlab (I used spparms('spumoni',1) to see what "\" does in Matlab):
sp\: bandwidth = 1056+1+1056.
sp\: is A diagonal? no.
sp\: is band density (0.00) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering.
sp\: UMFPACK's factorization was successful.
sp\: UMFPACK's solve was successful.
Elapsed time is 0.819120 seconds.

I have uploaded the sparse matrix (M) and the right-hand side (RHS) vectors in a mat file here:
https://drive.google.com/open?id=0B8OOfC6oWXEPV2xYTWFMZTljU00&authuser=0

I read in the documents that Julia uses Umfpack for sparse matrices. My question is why umfpack is faster when it is called from matlab?

The matlab and julia codes are here:
https://drive.google.com/open?id=0B8OOfC6oWXEPbXFnYlh2TFBKV1k&authuser=0
https://drive.google.com/open?id=0B8OOfC6oWXEPdlNfOEFKbnV5MlE&authuser=0

and the FVM codes are here:
https://github.com/simulkade/FVTool
https://github.com/simulkade/JFVM

Thanks a lot in advance,

Ehsan

Tim Davis

unread,
Jan 5, 2015, 2:01:34 PM1/5/15
to julia...@googlegroups.com
The difference could be the BLAS.  MATLAB comes with its own BLAS library, and the performance
of the BLAS has a huge impact on the performance of UMFPACK, particularly for 3D discretizations.

Viral Shah

unread,
Jan 5, 2015, 3:11:47 PM1/5/15
to julia...@googlegroups.com, da...@tamu.edu
The BLAS will certainly make a difference, but OpenBLAS is reasonably good. 

I also wonder what is happening in our \ polyalgorithm. The profile suggests the code is trying Cholesky decomposition, but it really shouldn't since the matrix is not symmetric. If I just do the lufact(), which essentially calls Umfpack, I can match Matlab timing:

@time F = lufact(d["M"]); F \ d["RHS"];

-viral

Tim Davis

unread,
Jan 5, 2015, 3:27:23 PM1/5/15
to julia...@googlegroups.com, Tim Davis
That does sound like a glitch in the "\" algorithm, rather than in UMFPACK.  The OpenBLAS is pretty good.

This is very nice in Julia:

F = lufact (d["M"]) ; F \ d

That's a great idea to have a factorization object like that.  I have a MATLAB toolbox that does
the same thing, but it's not a built-in function inside MATLAB.  It's written in M, so it can be slow for
small matrices.   With it, however, I can do:

F = factorize (A) ;    % does an LU, Cholesky, QR, SVD, or whatever.  Uses my polyalgorithm for "\".
x = F\b ;

I can do S = inverse(A); which returns a factorization, not an inverse, but with a flag
set so that S*b does A\b (and yes, S\b would do A*b, since S keeps a copy of A inside it, as well).

You can also specify the factorization, such as

 F=factorize(A, 'lu')
F=factorize(A,'svd') ; etc.

It's in SuiteSparse/MATLAB_tools/Factorize, if you're interested.  I've suggested the same
feature to The MathWorks.

My factorize function includes a backslash polyalgorithm, if you're interested in taking a look.

Algorithm 930: FACTORIZE: an object-oriented linear system solver for MATLAB T. A. Davis, ACM Transactions on Mathematical Software, Vol 39, Issue 4, pp. 28:1 - 28:18, 2013. 

Viral Shah

unread,
Jan 5, 2015, 3:44:05 PM1/5/15
to julia...@googlegroups.com, Tim Davis
Tim - thanks for the reference. The paper will come in handy. This is a longstanding issue, that we just haven’t got around to addressing yet, but perhaps now is a good time.

https://github.com/JuliaLang/julia/issues/3295

We have a very simplistic factorize() for sparse matrices that must have been implemented as a stopgap. This is what it currently does and that explains everything.

# placing factorize here for now. Maybe add a new file
function factorize(A::SparseMatrixCSC)
m, n = size(A)
if m == n
Ac = cholfact(A)
Ac.c.minor == m && ishermitian(A) && return Ac
end
return lufact(A)
end

-viral

Tim Davis

unread,
Jan 5, 2015, 3:52:24 PM1/5/15
to Viral Shah, julia...@googlegroups.com, Tim Davis
oops.  Yes, your factorize function is broken.  You might try mine instead, in my
factorize package.

I have a symmetry-checker in CHOLMOD.  It checks if the matrix is symmetric and
with positive diagonals.  I think I have a MATLAB interface for it too.  The code is efficient,
since it doesn't form A transpose, and it quits early as soon as asymmetry is detected.

It does rely on the fact that MATLAB requires its sparse matrices to have sorted row indices
in each column, however.

Viral Shah

unread,
Jan 5, 2015, 3:56:12 PM1/5/15
to Tim Davis, julia...@googlegroups.com
Thanks, that is great. I was wondering about the symmetry checker - we have the naive one currently, but I can just use the CHOLMOD one now.

-viral

Ehsan Eftekhari

unread,
Jan 5, 2015, 4:09:33 PM1/5/15
to julia...@googlegroups.com, da...@tamu.edu
Following your advice, I tried the code again, this time I also used MUMPS solver from https://github.com/lruthotto/MUMPS.jl
I used a 42x43x44 grid. These are the results:

MUMPS: elapsed time: 2.09091471 seconds
lufact: elapsed time: 5.01038297 seconds (9952832 bytes allocated)
backslash: elapsed time: 16.604061696 seconds (80189136 bytes allocated, 0.45% gc time)

and in Matlab:
Elapsed time is 5.423656 seconds.

Thanks a lot Tim and Viral for your quick and helpful comments.

Kind regards,
Ehsan

Stefan Karpinski

unread,
Jan 5, 2015, 6:09:46 PM1/5/15
to Julia Users, da...@tamu.edu
A word of legal caution: Tim, I believe some (all?) of your SuiteSparse code is GPL and since Julia is MIT (although not all libraries are), we can look at pseudocode but not copy GPL code while legally keeping the MIT license on Julia's standard library.

Also, thanks so much for helping with this.

Kevin Squire

unread,
Jan 5, 2015, 9:01:45 PM1/5/15
to julia...@googlegroups.com
Since Tim wrote the code (presumably?), couldn't he give permission to license it under MIT?  (Assuming he was okay with that, of course!).

Cheers,
   Kevin

Viral Shah

unread,
Jan 5, 2015, 9:44:38 PM1/5/15
to julia...@googlegroups.com

I believe that it is University of Florida that owns the copyright and they would lose licencing revenue. I would love it too if we could have these under the MIT licence, but it may not be a realistic expectation.

Looking at the paper is the best way to go. Jiahao has already produced the pseudo code in the issue, and we do similar things in our dense \.

-viral

Viral Shah

unread,
Jan 5, 2015, 11:29:36 PM1/5/15
to julia...@googlegroups.com
This is similar to the FFTW situation, where the license is held by MIT.

-viral

Tim Davis

unread,
Jan 6, 2015, 2:49:30 PM1/6/15
to julia...@googlegroups.com
Most of my code in SuiteSparse is under my copyright, not the University of Florida.  (I'm now at Texas A&M by the way ...

Most of SuiteSparse is LGPL or GPL, but the Factorize package itself is 2-clause BSD (attached).

So you can use the Factorize package as you wish.  The Factorize does connect to sparse Cholesky (chol in MATLAB),
sparse LU, etc, but those are different packages (and all of them are GPL or LGPL).  The backslash polyalgorithm is in
Factorize, however, and is thus 2-clause BSD.




license.txt

Stefan Karpinski

unread,
Jan 6, 2015, 4:13:55 PM1/6/15
to Julia Users
2-clause BSD is basically MIT-equivalent, so that works.

Tony Kelman

unread,
Jan 6, 2015, 6:00:59 PM1/6/15
to julia...@googlegroups.com
That's good to hear on the BSD license, and thanks for correcting our misunderstanding of the SuiteSparse licensing situation.

One thing I'd like us to be clearer about in our notation for Julia is when we say ldltfact, we only have a modified-Cholesky, diagonal-D variant of the LDL^T factorization hooked up. In the Factorize paper and in Matlab, ldl is a Bunch-Kaufman LDL^T where D is a block-diagonal matrix with either 1x1 or 2x2 blocks. Most of the people in this thread are well aware of this, but for the sake of any who aren't I'd like us to make the distinction more clear. Bunch-Kaufman block-diagonal LDL^T can handle symmetric indefinite matrices, and do useful things like give you the inertia (number of positive, negative, and zero eigenvalues) of the matrix. This is mandatory in many optimization applications. Modified-Cholesky with diagonal D can only handle semidefinite matrices, or with extra care, some special classes like quasidefinite that appear in a subset (e.g. convex optimization) of symmetric problems.

I believe CSparse does have a Bunch-Kaufman LDL^T implementation, but it's not as high-performance as say Cholmod or UMFPACK. MATLAB uses HSL MA57 for sparse ldl, which is an excellent well-regarded piece of Fortran code but unfortunately is not under a redistributable license. The public-domain MUMPS code has this functionality and is exposed by several Julia packages, none of which currently meet the cross-platform build system requirements that Base Julia has.

-Tony

Tim Davis

unread,
Jan 6, 2015, 9:14:04 PM1/6/15
to julia...@googlegroups.com
CSparse doesn't have a proper Bunch-Kaufman LDL^T with proper 2-by-2 pivoting.  It just allows for 1-by-1 pivots, and it doesn't check to see if they are small.  MA57 is the best code out there for that, but if MUMPS can do it then that would be the best option.  My SuiteSparse is missing this feature.
Reply all
Reply to author
Forward
0 new messages