Re: [julia-dev] Julia/Fortran performance example

476 views
Skip to first unread message

Rhys Ulerich

unread,
Oct 8, 2012, 9:10:46 PM10/8/12
to juli...@googlegroups.com
> ... The code whitens colored noise by computing a
> Cholesky factorization of a covariance matrix and applies the
> lower-triangular factor or its inverse to a vector. It exploits the
> matrix's special structure in our application to do these operations in time
> linear in the length of the vector.

Out of curiosity, does this structure have a name?

- Rhys

Bradley Alpert

unread,
Oct 8, 2012, 9:52:47 PM10/8/12
to juli...@googlegroups.com
It is of the class of matrices that are semi-separable of rank k.  (Actually, our class is a subset of these, but fast algorithms exist for the larger class.)

Brad

Bradley Alpert

unread,
Oct 10, 2012, 10:33:59 AM10/10/12
to juli...@googlegroups.com
Perhaps I should ask these questions differently.  Is it feasible for someone (ignorant of LLVM technology) to see the code that Julia's JIT compiler produces?  Or is such ignorance easily remedied?  Thanks for any pointers!

Brad

Stefan Karpinski

unread,
Oct 10, 2012, 11:04:47 AM10/10/12
to juli...@googlegroups.com
I haven't gotten a chance to take a look at your code, but the disassemble function will show the LLVM assembly code for a function applied to a specific type signature. For example:

julia> disassemble(+,(Int,Int))

define %jl_value_t* @"+1023"(%jl_value_t*, %jl_value_t**, i32) {
top:
  %3 = load %jl_value_t** %1, align 8, !dbg !8757
  %4 = getelementptr inbounds %jl_value_t* %3, i64 0, i32 0, !dbg !8757
  %5 = getelementptr %jl_value_t** %4, i64 1, !dbg !8757
  %6 = bitcast %jl_value_t** %5 to i64*, !dbg !8757
  %7 = load i64* %6, align 8, !dbg !8757
  %8 = getelementptr %jl_value_t** %1, i64 1, !dbg !8757
  %9 = load %jl_value_t** %8, align 8, !dbg !8757
  %10 = getelementptr inbounds %jl_value_t* %9, i64 0, i32 0, !dbg !8757
  %11 = getelementptr %jl_value_t** %10, i64 1, !dbg !8757
  %12 = bitcast %jl_value_t** %11 to i64*, !dbg !8757
  %13 = load i64* %12, align 8, !dbg !8757
  %14 = add i64 %13, %7, !dbg !8764
  %15 = call %jl_value_t* @jl_box_int64(i64 %14), !dbg !8764
  ret %jl_value_t* %15, !dbg !8764
}

As you can see, LLVM assembly code is pretty verbose – a lot of this produces no actual machine code in the end, but you can see somewhat what's happening.

On Wed, Oct 10, 2012 at 10:33 AM, Bradley Alpert <bkal...@gmail.com> wrote:
Perhaps I should ask these questions differently.  Is it feasible for someone (ignorant of LLVM technology) to see the code that Julia's JIT compiler produces?  Or is such ignorance easily remedied?  Thanks for any pointers!

Brad

--
 
 
 

Tim Holy

unread,
Oct 10, 2012, 11:14:56 AM10/10/12
to juli...@googlegroups.com
I'll let others answer your more recent question about inspecting the produced
code, but a few comments on those 4 lines. I haven't done any testing on your
code myself, so take with a shaker full of salt.

Here's my main guess: you're passing a whole bunch of variables in as a tuple.
I'm not certain, but I think that defeats some of julia's optimization
abilities. Try passing them in as separate arguments, e.g., rather than
covprodut(x,cholsav) write that as covprodut(x,cholsav...) and write
cholprodut out with each variable as an explicit input.

On Monday, October 08, 2012 04:19:06 PM Bradley Alpert wrote:
> We are therefore paying a computational cost factor of about 35 for Julia
> compared to Fortran for the solve, which is primarily execution of these
> four lines of code
>
> for i=1:n
> x[i]=(y[i]-real(sum(ss)))*dsuminv[i]
> ss=(ss+x[i]*conj(d[:,i])).*conjb
> end
>
> where typeof(ss) and typeof(conjb) is Array{Complex128,1} of length k=11,
> typeof(d) is Array{Complex128,2} with first dimension also of length k. (I
> tried making the sum(ss) and the array operations on the third line into
> explicit loops, but those changes caused a dramatic slowdown!)

sum was, until a couple of days ago, using the KBN algorithm (more accurate
but slower, especially on some machines). You may want to test again with a
very recent checkout.

I am surprised about the dramatic slowdown by explicitly writing out the
loops; that's not my usual experience. I wonder if you need more type
declarations? It's only in a minority of cases that those help (julia is quite
good at inference), but I've definitely seen it happen (sometimes quite
dramatically so).

Hope at least some of this points in the right direction.

--Tim

Mike Nolta

unread,
Oct 10, 2012, 12:52:00 PM10/10/12
to juli...@googlegroups.com
Yeah, i'm surprised too, but when i turned the 3rd line into an
explicit loop it sped up by an order of magnitude.

https://github.com/nolta/code-bits/blob/master/covar3.jl

Mike

> Hope at least some of this points in the right direction.
>
> --Tim
>
> --
>
>
>

Bradley Alpert

unread,
Oct 10, 2012, 2:09:15 PM10/10/12
to juli...@googlegroups.com
Wow, this is strange!  I just repeated the test, with my original third line and Mike's replacement, and got times of 3.21 s and 9.08 s, respectively.  I am running under Linux with Julia Version 0.0.0+97687299.r5727, Commit 5727f3a01e (2012-09-26 13:00:45).  Must be a problem with my build.

Brad

Mike Nolta

unread,
Oct 10, 2012, 3:08:42 PM10/10/12
to juli...@googlegroups.com
Huh, you're right. I also made a couple of tweaks (eliminating globals
and inlining the tuple args) which didn't affect the vectorized code
performance, but they turn out to have a big impact on the loopy code.

-Mike
> --
>
>
>

Bradley Alpert

unread,
Oct 10, 2012, 8:05:31 PM10/10/12
to juli...@googlegroups.com
Thank you, Mike.  The effect of inlining the tuple args is dramatic, a factor of 40 in the code with the loop!  (Elucidates a limitation of the compiler.)  Anyway, this puts Julia within a factor of two of Fortran on the solve--very nice.

Brad

Jeff Bezanson

unread,
Oct 10, 2012, 10:10:13 PM10/10/12
to juli...@googlegroups.com
Good, I will try to work on this. Tuples are difficult since they tend
to cause exponential type explosions (m^n tuple types of length n),
which the compiler tries to avoid. We can probably improve our
heuristics. I know that sounds a bit lame, but in practice you do in
fact need heuristics :)
> --
>
>
>

Tim Holy

unread,
Oct 10, 2012, 10:38:35 PM10/10/12
to juli...@googlegroups.com
On Wednesday, October 10, 2012 10:10:13 PM Jeff Bezanson wrote:
> Good, I will try to work on this.

See also https://github.com/JuliaLang/julia/commit/73d8ca4eb, which might be
related.

Reply all
Reply to author
Forward
0 new messages