Performance compared to Matlab

1,078 views
Skip to first unread message

Vishnu Raj

unread,
Oct 18, 2015, 7:17:50 AM10/18/15
to julia-users
Although Julia homepage shows using Julia over Matlab gains more in performance, my experience is quite opposite.
I was trying to simulate channel evolution using Jakes Model for wireless communication system.

Matlab code is:
function [ h, tf ] = Jakes_Flat( fd, Ts, Ns, t0, E0, phi_N )
%JAKES_FLAT
%   Inputs:
%       fd, Ts, Ns  : Doppler frequency, sampling time, number of samples
%       t0, E0      : initial time, channel power
%       phi_N       : initial phase of the maximum Doppler frequeny
%       sinusoid
%
%   Outputs:
%       h, tf       : complex fading vector, current time

   
if nargin < 6,  phi_N = 0;  end
   
if nargin < 5,  E0 = 1;     end
   
if nargin < 4,  t0 = 0;     end
   
    N0
= 8;         % As suggested by Jakes
    N  
= 4*N0 + 2;  % an accurate approximation
    wd
= 2*pi*fd;   % Maximum Doppler frequency[rad]
    t  
= t0 + [0:Ns-1]*Ts;  % Time vector
    tf
= t(end) + Ts;       % Final time
    coswt
= [ sqrt(2)*cos(wd*t); 2*cos(wd*cos(2*pi/N*[1:N0]')*t) ];
    h  = E0/sqrt(2*N0+1)*exp(j*[phi_N pi/(N0+1)*[1:N0]])*coswt;
end
Enter code here...

My call results in :
>> tic; Jakes_Flat( 926, 1E-6, 50000, 0, 1, 0 ); toc
Elapsed time is 0.008357 seconds.


My corresponding Julia code is
function Jakes_Flat( fd, Ts, Ns, t0 = 0, E0 = 1, phi_N = 0 )
# Inputs:
#
# Outputs:
  N0  
= 8;                  # As suggested by Jakes
  N  
= 4*N0+2;             # An accurate approximation
  wd  
= 2*pi*fd;            # Maximum Doppler frequency
  t  
= t0 + [0:Ns-1]*Ts;
  tf  
= t[end] + Ts;
  coswt
= [ sqrt(2)*cos(wd*t'); 2*cos(wd*cos(2*pi/N*[1:N0])*t') ]
  h
= E0/sqrt(2*N0+1)*exp(im*[ phi_N pi/(N0+1)*[1:N0]']) * coswt
  return h, tf;
end
# Saved this as "jakes_model.jl"


My first call results in
julia> include( "jakes_model.jl" )
Jakes_Flat (generic function with 4 methods)

julia
> @time Jakes_Flat( 926, 1e-6, 50000, 0, 1, 0 )
elapsed time
: 0.65922234 seconds (61018916 bytes allocated)

julia
> @time Jakes_Flat( 926, 1e-6, 50000, 0, 1, 0 )
elapsed time
: 0.042468906 seconds (17204712 bytes allocated, 63.06% gc time)

For first execution, Julia is taking huge amount of time. On second call, even though Julia take considerably less(0.042468906 sec) than first(0.65922234 sec), it's still much higher to Matlab(0.008357 sec).
I'm using Matlab R2014b and Julia v0.3.10 on Mac OSX10.10.

- vish

Jason Merrill

unread,
Oct 18, 2015, 8:49:30 AM10/18/15
to julia-users
Two quick pieces of advice:

Try

@code_warntype Jakes_Flat( 926, 1e-6, 50000, 0, 1, 0 )

For some reason, the type of the variable "h" is not being inferred tightly (i.e. it has type any). If you can sort things out so that h is typed more concretely, I suspect you'll be able to match matlab's performance.

However, if you're going for speed, this code is begging to be devectorized. It creates tons of fairly large intermediate arrays. In Julia, don't be afraid of for loops.

It may actually be easier to devectorize the code than it will be to find and fix the typing issue in the original, because the original uses some fairly complicated array concatenation.

Andras Niedermayer

unread,
Oct 18, 2015, 8:54:04 AM10/18/15
to julia-users
There is a type instability (see here) that slows down your code. 

@code_warntype Jakes_Flat( 926, 1e-6, 50000, 0, 1, 0 )
shows that variable h has type Any.

I managed to track down the type instability to the concatenation 
[ phi_N pi/(N0+1)*[1:N0]']

You can get around the type instability with this modification:
function Jakes_Flat( fd, Ts, Ns, t0 = 0, E0 = 1, phi_N = 0 )
# Inputs:
#
# Outputs:
  N0  
= 8;                  # As suggested by Jakes
  N  
= 4*N0+2;             # An accurate approximation
  wd  
= 2*pi*fd;            # Maximum Doppler frequency
  t  
= t0 + [0:Ns-1;]*Ts;
  tf  
= t[end] + Ts;
  coswt
= [ sqrt(2)*cos(wd*t'); 2*cos(wd*cos(2*pi/N*[1:N0;])*t') ]

  temp
= zeros(1,N0+1)
  temp
[1,2:end] = pi/(N0+1)*[1:N0;]'
  temp[1,1] = phi_N
  h = E0/sqrt(2*N0+1)*exp(im*temp ) * coswt
  return h, tf;
end

This is faster 30% faster than before, but still not as fast as Matlab. Maybe someone else knows the reason for the speed difference (I guess it's due to temporary arrays).

Andras Niedermayer

unread,
Oct 18, 2015, 9:06:51 AM10/18/15
to julia-users
The type instability looks like a bug in Julia to me, filed issue:

Daniel Carrera

unread,
Oct 18, 2015, 9:41:54 AM10/18/15
to julia-users
Hello,

Other people have already given advice on how to speed up the code. I just want to comment that Julia really is faster than Matlab, but the way that you make code faster in Julia is almost the opposite of how you do it in Matlab. Specifically, in Matlab the advice is that if you want the code to be fast, you need to eliminate every loop you can and write vectorized code instead. This is because Matlab loops are slow. But Julia loops are fast, and vectorized code creates a lot of overhead in the form of temporary variables, garbage collection, and extra loops. So in Julia you optimize code by putting everything into loops. The upshot is that if you take a Matlab-optimized program and just do a direct line-by-line conversion to Julia, the Julia version can easily be slower. But by the same token, if you took a Julia-optimized program and converted it line-by-line to Matlab, the Matlab version would be ridiculously slow.

Oh, and in Julia you also care about types. If the compiler can infer correctly the types of your variables it will write more optimal code.

Cheers,
Daniel.

Kristoffer Carlsson

unread,
Oct 18, 2015, 9:47:30 AM10/18/15
to julia-users
Worth saying is that Julia is not faster than Matlab if most of your time is spent on doing cosinus on a large array, which this is doing.

Glen O

unread,
Oct 19, 2015, 1:03:13 AM10/19/15
to julia-users
How many cores does your computer have?

Recent versions of Matlab automatically parallelise calculations like the coswt calculations, I believe, whereas Julia requires explicit request for parallel computations. A quick profiling in Julia reveals that it's actually the coswt matrix construction that is taking the vast majority of the time.

Phil Tomson

unread,
Oct 19, 2015, 3:05:11 PM10/19/15
to julia-users
Several comments here about the need to de-vectorize code and use for-loops instead. However, vectorized code is a lot more compact and generally easier to read than lots of for-loops. I seem to recall that there was discussion in the past about speeding up vectorized code in Julia so that it could be on par with the vectorized code performance - is this still something being worked on for future versions?

Otherwise, if we keep telling people that they need to convert their code to use for-loops, I think Julia isn't going to seem very compelling for people looking for alternatives to Matlab, R, etc.

Milan Bouchet-Valat

unread,
Oct 19, 2015, 3:15:12 PM10/19/15
to julia...@googlegroups.com
Le lundi 19 octobre 2015 à 12:05 -0700, Phil Tomson a écrit :
> Several comments here about the need to de-vectorize code and use for
> -loops instead. However, vectorized code is a lot more compact and
> generally easier to read than lots of for-loops. I seem to recall
> that there was discussion in the past about speeding up vectorized
> code in Julia so that it could be on par with the vectorized code
> performance - is this still something being worked on for future
> versions?
>
> Otherwise, if we keep telling people that they need to convert their
> code to use for-loops, I think Julia isn't going to seem very
> compelling for people looking for alternatives to Matlab, R, etc.
There's a long discussion, with still no perfect solution :
https://github.com/JuliaLang/julia/issues/8450

But I'm confident something will finally be done about this. :-)


Regards

David Gold

unread,
Oct 19, 2015, 3:20:24 PM10/19/15
to julia-users
One doesn't always need to write the loops oneself. Oftentimes switching from a pure operator (e.g. broadcast) to its in-place counterpart (e.g. broadcast!) can make a world of difference:

julia> A = rand(5_000_000);

julia> function f(A)
    sum(A .+ A.^2)
end
f (generic function with 1 method)

julia> f(A); @time f(A)
0.182246 seconds (206 allocations: 76.307 MB, 5.70% gc time)
4.166856979458311e6


julia> _g(x, y) = x + y^2
_g (generic function with 1 method)

julia> function g(A)
    temp = Array(Float64,5_000_000)
    broadcast!(_g, temp, A, A)
    sum(temp)
end
g (generic function with 1 method)

julia> g(A); @time g(A)
0.023660 seconds (8 allocations: 38.147 MB, 28.73% gc time)
4.166856979458311e6

Thus it may help to cultivate knowledgeable use of map! and broadcast! as opposed to pure broadcasted operators amongst newcomers to Julia.

Steven G. Johnson

unread,
Oct 19, 2015, 3:31:01 PM10/19/15
to julia-users


On Monday, October 19, 2015 at 3:05:11 PM UTC-4, Phil Tomson wrote:
Otherwise, if we keep telling people that they need to convert their code to use for-loops, I think Julia isn't going to seem very compelling for people looking for alternatives to Matlab, R, etc.

You only need loops if you are optimizing critical code, and that is usually only a small percentage of most people's code.

It's not like vectorized code is terrible (much worse than Matlab or R), just that it is often not as fast as C; this is also true in Matlab or R — a whole sequence of vectorized operations like x + 2*y.^2 + z.^3 .* (x -  5) is not nearly as fast as fusing all the operations into a single C (or Julia) loop.    The exception, of course, is when Matlab vectorized operations are multi-threaded (as in the original post in this thread) then they can of course be faster than a non-parallel Julia (or C) vectorized operation or explicit loop.

 

Daniel Carrera

unread,
Oct 19, 2015, 5:15:31 PM10/19/15
to julia...@googlegroups.com
Julia vectorized code should certainly be as fast as Matlab and NumPy vectorized code, and to the best of my knowledge, it already is. But de-vectorized code will always have an advantage because you have low-level control, and you can avoid temp variables.

Cheers,
Daniel.

John Pearson

unread,
Oct 19, 2015, 7:51:39 PM10/19/15
to julia-users
This can be true, but I think a lot of us forget just how much time it takes to learn to write compact vectorized code, and how difficult it can be to vectorize absolutely everything. For instance, it's great to be able to write
 
x' * A * x

in Matlab, but a whole ton of code using repmat or bsxfun often becomes easier to read (IMO) when you can actually look at a single line in a loop and understand a) how many indices everything has and b) what the [i, j, k, ...] entry of the resulting array looks like.

I have been forced to write a lot of very difficult-to-parse code in Matlab simply in order to avoid what would have been straightforward loops.

On Monday, October 19, 2015 at 3:05:11 PM UTC-4, Phil Tomson wrote:

Gabriel Gellner

unread,
Oct 21, 2015, 3:11:45 AM10/21/15
to julia-users
Is it possible to tell Julia to run the vectorized code in parallel? Looking at the documentation I see that you can do it easily for the looped version but is there a macro or something to pass the vectorized version so that is becomes parallel?

Stefan Karpinski

unread,
Oct 21, 2015, 12:46:00 PM10/21/15
to Julia Users
On Tue, Oct 20, 2015 at 1:07 PM, Gabriel Gellner <gabriel...@gmail.com> wrote:
Is it possible to tell Julia to run the vectorized code in parallel? Looking at the documentation I see that you can do it easily for the looped version but is there a macro or something to pass the vectorized version so that is becomes parallel?

Not yet but once we introduce threads to the language, it will be. 
Message has been deleted

Kristoffer Carlsson

unread,
Oct 21, 2015, 4:17:18 PM10/21/15
to julia-users
For fun (and science) I tried out the new package https://github.com/IntelLabs/ParallelAccelerator.jl for this problem.

Here is the code:

function Jakes_Flat( fd, Ts, Ns, t0 = 0, E0 = 1, phi_N = 0 )
# Inputs:
#
# Outputs:
  N0  = 8;                  # As suggested by Jakes
  N   = 4*N0+2;             # An accurate approximation
  wd  = 2*pi*fd;            # Maximum Doppler frequency
  
  ts   = collect(t0 + (0:Ns-1)*Ts)
  tf  = ts[end] + Ts;
  Ns = collect(1:N0)

  coswt = [ cosvec(ts, wd)'; cosmat(ts, Ns, wd, N) ]
  h = E0/sqrt(2*N0+1)*exp(im*[ phi_N pi/(N0+1)*(1:N0)']) * coswt
  return h, tf;
end

@acc function cosvec(ts, wd)
    Float64[sqrt(2)*cos(wd*t) for t in ts]
end

@acc function cosmat(ts, Ns, wd, N)
    Float64[2*cos(wd*cos(2*pi/N*n)*t) for n in Ns, t in ts]
end


Benchmarking this I get:

julia> @time Jakes_Flat( 926, 1e-6, 50000, 0, 1, 0 )
  0.004779 seconds (115 allocations: 4.965 MB)

and without calling the accelerated functions (by putting @noacc in front of the function calls, I get):

julia> @time Jakes_Flat_noacc( 926, 1e-6, 50000, 0, 1, 0 )
  0.019072 seconds (75 allocations: 8.396 MB)

The matlab code on my computer runs at:

>> tic; Jakes_Flat( 926, 1E-6, 50000, 0, 1, 0 ); toc
Elapsed time is 0.009936 seconds.

So.. great victory for ParallelAccelerator.jl?

On Sunday, October 18, 2015 at 1:17:50 PM UTC+2, Vishnu Raj wrote:

Kristoffer Carlsson

unread,
Oct 21, 2015, 4:23:53 PM10/21/15
to julia-users
Btw it is really cool to see julia running at 400% CPU when running a list comprehension.

I did some more benchmarks with larger N to reduce the noise a bit and the difference is actually not that great between Matlab and Julia. However, tying with Matlabs parallellized vectorized maps is great imho.

julia> @time h, f = Jakes_Flat( 926, 1e-6, 5000000, 0, 1, 0 )
 
0.585940 seconds (153 allocations: 495.918 MB, 12.47% gc time)
(


>> tic; Jakes_Flat( 926, 1E-6, 5000000, 0, 1, 0 ); toc
Elapsed time is 0.609867 seconds.

Lindsey Kuper

unread,
Oct 21, 2015, 5:56:39 PM10/21/15
to julia-users
It's fantastic to see some good ParallelAccelerator results "in the wild"!  Thanks for sharing.

Lindsey

jcal...@gmail.com

unread,
Oct 31, 2015, 12:33:08 PM10/31/15
to julia-users
Since people seem to recommend de-vectorisation a lot -- I should comment on my observation that de-vectorisation in Julia often makes things slower... will get some code out and start a new topic on this...
Reply all
Reply to author
Forward
0 new messages