Any 0.5 performance tips?

Andrew

nieprzeczytany,

28 wrz 2016, 22:15:2028.09.2016

do julia-users

My large project is much (3-4x?) slower under 0.5. I know there are a variety of open issues about this to be hopefully fixed in the 0.5.x timeframe, but are there any general workarounds at the moment?

My project includes the following in case it's relevant:

Many nested functions forming closures, which I pass to optimization and equation solving functions. In 0.4 I used the Base.call trick on custom types to make performant closures. I rewrote the code for 0.5 to just use regular anonymous functions since they are fast now.
ForwardDiff, splines (Dierckx mostly)
Large custom immutable types which carry parameters. These get passed around. I have been considering just making all my parameters global constants rather than passing them around. It seems that this could get them inlined into my functions and save time. However, all my simple tests show global constants perform exactly the same as explicitly passing the parameters, so as long as this still holds in big codes this shouldn't matter.
A lot of nested function calls. I prefer to write lots of small functions instead of one big one

K leo

nieprzeczytany,

29 wrz 2016, 00:44:0429.09.2016

do julia-users

Just discovered a few days ago that keyword arguments in functions are very slow, but I guess they are not slower in 0.5.

Kristoffer Carlsson

nieprzeczytany,

29 wrz 2016, 01:37:1829.09.2016

do julia-users

Look for Core.Box in @code_warntype. See https://github.com/JuliaLang/julia/issues/15276

Andrew

nieprzeczytany,

29 wrz 2016, 14:15:3529.09.2016

do julia-users

I've used @code_warntype everywhere I can think to and I've only found one Core.box. The @code_warntype looks like this

Variables:
  #self#::#innerloop#3133{#bellman_obj}
  state::State{IdioState,AggState}
  EVspline::Dierckx.Spline1D
  model::Model{CRRA_Family,AggState}
  policy::PolicyFunctions{Array{Float64,6},Array{Int64,6}}
  OO::NW
  #3130::##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj}

Body:
  begin 
      #3130::##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj} = $(Expr(:new, ##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj}, :(state), :(EVspline), :(model), :(policy), :(OO), :((Core.getfield)(#self#,:bellman_obj)::#bellman_obj)))
      SSAValue(0) = #3130::##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj}
      (Core.setfield!)((Core.getfield)(#self#::#innerloop#3133{#bellman_obj},:obj)::CORE.BOX,:contents,SSAValue(0))::##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj}
      return SSAValue(0)
  end::##3130#3134{State{IdioState,AggState},Dierckx.Spline1D,Model{CRRA_Family,AggState},PolicyFunctions{Array{Float64,6},Array{Int64,6}},NW,#bellman_obj}

I put the CORE.BOX in all caps near the bottom.

I have no idea if this is actually a problem. The return type is stable. Also, this function appears in an outer loop.

What I noticed putting a @time in places is that in 0.5, occasionally calls to my nonlinear equation solver take a really long time, like here:

  0.069224 seconds (9.62 k allocations: 487.873 KB)
  0.000007 seconds (39 allocations: 1.922 KB)
  0.000006 seconds (29 allocations: 1.391 KB)
  0.000011 seconds (74 allocations: 3.781 KB)
  0.000009 seconds (54 allocations: 2.719 KB)
  0.000008 seconds (54 allocations: 2.719 KB)
  0.000008 seconds (49 allocations: 2.453 KB)
  0.000007 seconds (44 allocations: 2.188 KB)
  0.000007 seconds (44 allocations: 2.188 KB)
  0.000006 seconds (39 allocations: 1.922 KB)
  0.000007 seconds (39 allocations: 1.922 KB)
  0.000006 seconds (39 allocations: 1.922 KB)
  0.000005 seconds (34 allocations: 1.656 KB)
  0.000005 seconds (34 allocations: 1.656 KB)
  0.000004 seconds (29 allocations: 1.391 KB)
  0.000004 seconds (24 allocations: 1.125 KB)
  0.007399 seconds (248 allocations: 15.453 KB)
  0.000009 seconds (30 allocations: 1.594 KB)
  0.000004 seconds (25 allocations: 1.328 KB)
  0.000004 seconds (25 allocations: 1.328 KB)

  0.000010 seconds (70 allocations: 3.719 KB)
  0.072703 seconds (41.74 k allocations: 1.615 MB)

Andrew

nieprzeczytany,

29 wrz 2016, 14:21:5129.09.2016

do julia-users

In the 0.4 version the above times are pretty consistent. I never observe any several thousand allocation calls. I wonder if compilation is occurring repeatedly.

This isn't terribly pressing for me since I'm not currently working on this project, but if there's an easy fix it would be useful for future work.

(sorry I didn't mean to post twice. For some reason hitting spacebar was interpreted as the post command?)

Tim Holy

nieprzeczytany,

29 wrz 2016, 15:36:4629.09.2016

do julia...@googlegroups.com

No real clue about what's happening, but my immediate thought was that if your algorithm is iterative and uses some kind of threshold to decide convergence, then it seems possible that a change in the accuracy of some computation might lead to it getting "stuck" occasionally due to roundoff error. That's probably more likely to happen because of some kind of worsening rather than some improvement, but either is conceivable.

If that's even a possible explanation, I'd check for unusually-large numbers of iterations and then print some kind of convergence info.

Best,

--Tim

Andrew

nieprzeczytany,

29 wrz 2016, 21:45:1029.09.2016

do julia-users

I checked, and my objective function is evaluated exactly as many times under 0.4 as it is under 0.5. The number of iterations must be the same.

I also looked at the times more precisely. For one particular function call in the code, I have:

0.4 with old code: 6.7s 18.5M allocations

0.4 with 0.5 style code(regular anonymous functions) 11.6s, 141M allocations

0.5: 36.2s, 189M allocations

Surprisingly, 0.4 is still much faster even without the fast anonymous functions trick. It doesn't look like 0.5 is generating many more allocations than 0.4 on the same code, the time is just a lot slower.

Kristoffer Carlsson

nieprzeczytany,

30 wrz 2016, 01:56:2330.09.2016

do julia-users

Perhaps you can run, @profile together with ProfileView.jl and compare the two julia versions.

Mauro

nieprzeczytany,

30 wrz 2016, 02:30:0230.09.2016

do julia...@googlegroups.com

On Fri, 2016-09-30 at 03:45, Andrew <owe...@gmail.com> wrote:
> I checked, and my objective function is evaluated exactly as many times
> under 0.4 as it is under 0.5. The number of iterations must be the same.
>
> I also looked at the times more precisely. For one particular function call
> in the code, I have:
>
> 0.4 with old code: 6.7s 18.5M allocations
> 0.4 with 0.5 style code(regular anonymous functions) 11.6s, 141M
> allocations
> 0.5: 36.2s, 189M allocations
>
> Surprisingly, 0.4 is still much faster even without the fast anonymous
> functions trick. It doesn't look like 0.5 is generating many more
> allocations than 0.4 on the same code, the time is just a lot slower.

Sounds like your not far off a minimal, working example. Post it and
I'm sure it will be dissected in no time. (And an issue can be filed).

> On Thursday, September 29, 2016 at 3:36:46 PM UTC-4, Tim Holy wrote:
>>
>> No real clue about what's happening, but my immediate thought was that if
>> your algorithm is iterative and uses some kind of threshold to decide
>> convergence, then it seems possible that a change in the accuracy of some
>> computation might lead to it getting "stuck" occasionally due to roundoff
>> error. That's probably more likely to happen because of some kind of
>> worsening rather than some improvement, but either is conceivable.
>>
>> If that's even a possible explanation, I'd check for unusually-large
>> numbers of iterations and then print some kind of convergence info.
>>
>> Best,
>> --Tim
>>

>> On Thu, Sep 29, 2016 at 1:21 PM, Andrew <owe...@gmail.com <javascript:>>

Andrew

nieprzeczytany,

14 paź 2016, 13:51:1514.10.2016

do julia-users

I've found the main problem. I have a function which repeatedly accesses a 6-dimensional array in a loop. This function took no time in 0.4, but is actually very slow in 0.5. My problem looks very similar to 18774.

Here's an example:

A3 = rand(10, 10, 10);
function test3(A, nx1, nx2, nx3)
  for i = 1:10_000_000
    A[nx1, nx2, nx3]
  end
end

A5 = rand(10, 10, 10, 10, 10);
function test5(A, nx1, nx2, nx3, nx4, nx5)
  for i = 1:10_000_000
    A[nx1, nx2, nx3, nx4, nx5]
  end
end

A6 = rand(10, 10, 10, 10, 10, 10);
function test6(A, nx1, nx2, nx3, nx4, nx5, nx6)
  for i = 1:10_000_000
    A[nx1, nx2, nx3, nx4, nx5, nx6]
  end
end
function test6_fast(A, nx1, nx2, nx3, nx4, nx5, nx6)
  Asize = size(A)
  for i = 1:10_000_000
    A[sub2ind(Asize, nx1, nx2, nx3, nx4, nx5, nx6 )]
  end
end
@time test3(A3, 1, 1, 1)
@time test5(A5, 1, 1, 1, 1, 1)
@time test6(A6, 1, 1, 1, 1, 1, 1)
@time test6_fast(A6, 1, 1, 1, 1, 1, 1)

test6 takes 0.01s in 0.4 and takes 15s in 0.5. Using a linear index fixes the problem.

Mauro

nieprzeczytany,

14 paź 2016, 13:57:1714.10.2016

do julia...@googlegroups.com

Good detective work, please post your code in the issue!

On Fri, 2016-10-14 at 19:51, Andrew <owe...@gmail.com> wrote:
> I've found the main problem. I have a function which repeatedly accesses a
> 6-dimensional array in a loop. This function took no time in 0.4, but is
> actually very slow in 0.5. My problem looks very similar to 18774

> <https://github.com/JuliaLang/julia/issues/18774>.

>> On Fri, 2016-09-30 at 03:45, Andrew <owe...@gmail.com <javascript:>>

Odpowiedz wszystkim

Odpowiedz autorowi

Przekaż