Can't figure out why Julia code takes 2x time of identical Python

862 views
Skip to first unread message

Eric Martin

unread,
Dec 24, 2013, 5:17:16 PM12/24/13
to julia...@googlegroups.com
Hi,

I'm a new Julia user, and I decided to write a slightly optimized QR decomposition for tridiagonal matrices (using Householder projections).

My code (and it's output) is here: https://gist.github.com/lightcatcher/8118181 . I've confirmed that both implementations are correct (they match each other and the output from qr and np.linalg.qr), but the Julia code consistently takes twice as long as the Python code.

From profiling the Julia code, I know that a very large part of the runtime (> 90%) is spent doing the matrix multiplications to update matrices R and Qt. I checked that my numpy installation is using LAPACK and BLAS in /usr/lib (ie not using MKL), but I don't know how to check which library Julia is using for matrix operations (but I assume its the same one).

Further interesting tidbits:
In Julia, using the built-in 'qr' takes ~.13s.
In Python, using np.linalg.qr takes ~.25s, so my Python tridiagonal QR decomposition method is faster (as it takes ~0.2s).

Also, its (marginally but consistently) faster in Julia to just call "copy(T)" rather than "full(Tridiagonal(map(i -> diag(T, i), -1:1)...))", which seems odd to me because "copy" should be O(N^2) and the other method (only copying the sub, main, and super diagonals) is O(N), where N is the dimension of a square matrix.


So, can anyone shed some light on why my Julia code is 2x as slow as the very similar Python code?
Also, a tangential question: Does Julia have an equivalent to "if __name__ == '__main__'"? (in other words how can I automatically run main).

Thanks for the help,
Eric

Eric Martin

unread,
Dec 24, 2013, 5:36:09 PM12/24/13
to julia...@googlegroups.com
Interesting note:

I updated my Python and Julia code to avoid double copying the slices from the arrays (I was copying R[i:i+1, :] twice, now I just do it once and reuse that copy). This decreased the Julia time to ~0.30 seconds (so cut about 0.08s) but didn't change the amount of time Python was taking (which I find to be quite odd, maybe numpy is somehow intelligent about not double copying for slices?).

Now Julia code takes 1.5x the amount of time as Python code.

-Eric

Ivar Nesje

unread,
Dec 24, 2013, 6:04:45 PM12/24/13
to julia...@googlegroups.com
I don't understand what your code does, but I know that Julia currently makes a copy of an array when you do slicing. Numpy gives a reference. You might try to pass the array and index instead of a slice. For maximal performance in Julia you might try to write explicit loops instead of vectorized expressions.

Stefan Karpinski

unread,
Dec 24, 2013, 6:45:39 PM12/24/13
to Julia Users
On Tue, Dec 24, 2013 at 5:17 PM, Eric Martin <e.a.mar...@gmail.com> wrote:
Also, its (marginally but consistently) faster in Julia to just call "copy(T)" rather than "full(Tridiagonal(map(i -> diag(T, i), -1:1)...))", which seems odd to me because "copy" should be O(N^2) and the other method (only copying the sub, main, and super diagonals) is O(N), where N is the dimension of a square matrix.

Both do work proportional to the number of elements in the matrix, which is O(N^2). The copy is a *lot* simpler, however. All it does is a memcpy, which is not much slower than just filling the matrix with zeros:

julia> A = randn(10000,10000);

julia> @time copy(A);
elapsed time: 0.432631311 seconds (800450296 bytes allocated)

julia> @time zeros(10000,10000);
elapsed time: 0.365747207 seconds (800000128 bytes allocated)

Eric Martin

unread,
Dec 24, 2013, 7:31:56 PM12/24/13
to julia...@googlegroups.com
Ivar:
Thanks, I assumed numpy made a copy while slicing, not just doing it by reference (which numpy calls a view). I found that reducing from double copy to single copy cut ~0.08s, so reducing from single copy to no copy should put the time at roughly ~0.22s, which is very similar to numpy's time.

Stefan:
Thanks, I forgot that malloc doesn't return 0-filled memory (so that initializing a tridiagonal matrix still involves setting all of the elements not on the tridiagonal to 0).

So, does Julia have some equivalent to a numpy view (so that I can perform the matrix multiplication only on the slice of the matrix without having to copy that slice), or would I need to just hard-code this matrix multiplication (on a submatrix) to avoid copying?

-Eric

Ivar Nesje

unread,
Dec 24, 2013, 7:55:23 PM12/24/13
to julia...@googlegroups.com
There is a SubArray implementation, sub(), but it has some performance issues related to indexing that should be fixed before it becomes default for slicing.

For small arrays I would guess that hardcoded multiplication is faster than calling a BLAS.

You can find info about you julia by calling versioninfo()

Alan Edelman

unread,
Dec 25, 2013, 8:42:50 AM12/25/13
to julia...@googlegroups.com
You say Householder, but I would say you could be  "chasing the subdiagonal" with Givens Rotations
You are using a full array but you only need three diagonals of R, and if you save Q as n-1 angles you don't even need a dense Q

This begs the question of why you are doing QR on a symmetric tridiagonal?  hopefully not a step for eigenvalues because chasing the bulge is 
the way to go for that.

Eric Martin

unread,
Dec 30, 2013, 11:22:43 PM12/30/13
to julia...@googlegroups.com
@all who provided Julia optimization advice

Thanks for the help optimizing my code. I performed 2 more optimizations to my previously posted code (current code is here: https://gist.github.com/lightcatcher/8118181 ):
1. hardcoding both matrix multiplications so that they occur in the same loop. This lets me avoid copying the matrices.
2. computing Q rather than Q' (allows me to iterate over matrix in cache-friendly manner).

Note only the Julia code is updated (not the timings or the Python code in the gist).

These reduced my tridiagonal QR times to ~0.04s compared to the 0.14s required by Julia's built-in 'qr' method, making my Julia tridiagonal QR implementation faster than any of the alternatives I looked at.

@Alan

I agree I'm "chasing the subdiagonal", but I'm doing it with Householder reflections and only using the non-zero components of the "u" vector (where the Householder projection is P = I - u*u' and only the first 2 components of u are non-zero).

I'm writing all of this code to learn Julia as well as to better understand some of the material from my numerical linear algebra class. I am writing the QR decomposition to use in QR iteration to find the eigenvalues of a tridiagonal matrix (as part of computing low-rank SVD by Lanczos iteration).

I've read about "chasing the bulge" and the implicit QR algorithm, but I haven't been able to find a description (or implementation/psuedocode) that I could easily understand. Is there any chance you could point me to an implementation or pseudocode or a good explanation of the algorithm? Do you happen to know of an exposition of the implicit QR algorithm that uses Householder projections rather than Givens rotations? 

In particular, I've looked at the description in Demmel's "Applied Numerical Linear Algebra" book and am having trouble understanding the algorithm from that. If you happen to be familiar with that book, the only real coverage of implicit QR algorithm seems to be example 4.10, which doesn't make sense to me if matrix A is already tridiagonal or upper Hessenberg (as it just looks like a way to convert a matrix to upper Hessenberg form).

Thanks for any help with the numerical linear algebra, and thanks to all for the optimization advice. Sorry if the numerical linear algebra stuff is off-topic for this mailing list.

-Eric

Stefan Karpinski

unread,
Dec 31, 2013, 1:39:24 PM12/31/13
to Julia Users
On Mon, Dec 30, 2013 at 11:22 PM, Eric Martin <e.a.mar...@gmail.com> wrote:
Sorry if the numerical linear algebra stuff is off-topic for this mailing list.

Numerical linear algebra stuff will never be off-topic for this list :-)

Jiahao Chen

unread,
Jan 3, 2014, 9:26:03 AM1/3/14
to julia...@googlegroups.com
Have you tried Golub and Van Loan's Matrix Computations? In the 3/e, it is on p. 418 ff (Section 8.3.5). They call it zero-chasing, but it's the same thing.

Andreas Noack Jensen

unread,
Jan 3, 2014, 1:34:39 PM1/3/14
to julia...@googlegroups.com
I got the point after reading the double shift QR section of



2014/1/3 Jiahao Chen <cji...@gmail.com>

Have you tried Golub and Van Loan's Matrix Computations? In the 3/e, it is on p. 418 ff (Section 8.3.5). They call it zero-chasing, but it's the same thing.


I've read about "chasing the bulge" and the implicit QR algorithm, but I haven't been able to find a description (or implementation/psuedocode) that I could easily understand. Is there any chance you could point me to an implementation or pseudocode or a good explanation of the algorithm? Do you happen to know of an exposition of the implicit QR algorithm that uses Householder projections rather than Givens rotations? 




--
Med venlig hilsen

Andreas Noack Jensen

Alan Edelman

unread,
Jan 3, 2014, 1:52:36 PM1/3/14
to julia...@googlegroups.com
sometimes it's called implicit QR

Jack Poulson

unread,
Mar 9, 2014, 2:39:30 PM3/9/14
to julia...@googlegroups.com
On Tuesday, December 24, 2013 7:55:23 PM UTC-5, Ivar Nesje wrote:

I apologize if I've missed subsequent emails on this topic, but I teach graduate numerical linear algebra and am heavily anticipating matrix views being introduced into a stable release of Julia so that I can use them for concise, self-contained code samples as an alternative to Octave/MATLAB (which does not have 'free' matrix views). I've spoken with a couple of the developers on GitHub and offline about this but was hoping for a public/documented response.

Introducing matrix views would allow me to mirror the style of my C++ numerical linear algebra library within Julia (for example, see the following for a distributed unblocked algorithm for the reduction to upper Hessenberg form: https://github.com/elemental/Elemental/blob/43e979a75aef41261670bbd549ed7fafbd986b67/include/elemental/lapack-like/condense/Hessenberg/UUnb.hpp#L95).

Jack

Milan Bouchet-Valat

unread,
Mar 9, 2014, 3:08:53 PM3/9/14
to julia...@googlegroups.com
Le dimanche 09 mars 2014 à 11:39 -0700, Jack Poulson a écrit :
> On Tuesday, December 24, 2013 7:55:23 PM UTC-5, Ivar Nesje wrote:
> There is a SubArray implementation, sub(), but it has some
> performance issues related to indexing that should be fixed
> before it becomes default for slicing.
>
> For small arrays I would guess that hardcoded multiplication
> is faster than calling a BLAS.
>
> You can find info about you julia by calling versioninfo()
>
>
> I apologize if I've missed subsequent emails on this topic, but I
> teach graduate numerical linear algebra and am heavily anticipating
> matrix views being introduced into a stable release of Julia so that I
> can use them for concise, self-contained code samples as an
> alternative to Octave/MATLAB (which does not have 'free' matrix
> views). I've spoken with a couple of the developers on GitHub and
> offline about this but was hoping for a public/documented response.
I think you will find more information than you could have dreamed of
here. ;-)

https://github.com/JuliaLang/julia/pull/5556


Regards

Jack Poulson

unread,
Mar 9, 2014, 3:21:30 PM3/9/14
to julia...@googlegroups.com

Thank you, Milan. I'm still a bit confused though: the (long) discussion did not seem to conclusively state whether or not the pull request would make it into the 0.3 release or not. I noticed that there is a 22-hour-old 0.3 prerelease, so I assume that the decision has been made already. Will 0.3 contain the new functionality?

Jack

Ivar Nesje

unread,
Mar 9, 2014, 3:25:13 PM3/9/14
to julia...@googlegroups.com
I think you can just do `Pkg.add("ArrayViews")` to get the new functionality.

The change in Base julia will be that slicing will return an ArrayView automatically. Before that change, you can will have to `using ArrayViews` and use the view() function instead.

Ivar

Stefan Karpinski

unread,
Mar 9, 2014, 3:31:57 PM3/9/14
to Julia Users
That pull request is unfortunately not quite ready to go into the 0.3 release candidate. It will definitely go into 0.4, however.
Reply all
Reply to author
Forward
0 new messages