Derivative free optimization

547 views
Skip to first unread message

Gilberto Noronha

unread,
Nov 12, 2013, 5:11:29 PM11/12/13
to julia...@googlegroups.com
Hi,

I was trying to experiment a bit with Julia and decided to implement the Nelder-Mead algorithm (http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method).

I was quite disappointed when I saw that my implementation was 10 times slower than my Java implementation (and I am not even a Java programmer!).
So I decided to benchmark my Julia code against some alternatives. It seems to have almost the same performance as Optim (https://github.com/JuliaOpt/Optim.jl
and to be at least three times slower than NLopt (https://github.com/JuliaOpt/NLopt.jl) when trying to minimize the Rosenbrock function with three decision variables.
But NLopt is calling C to do a lot of heavy work so that is not a fair comparison (maybe it is weird that it does not lead to a big improvement, but that might be because of all the times that C needs to call the Julia callback).

I did read the performance tips and I am aware of things, like splitting code into multiple functions and devectorizing loops.  
My questions are:
1) Is there another Nelder-Mead implementation (in pure Julia) that I could benchmark against?
2) How is Julia performance when using custom types? I am asking this because I did create a custom type for my implementation.

Sorry for not posting code. My implementation is a bit messy and I am still ashamed of it. I am also not particularly worried about Nelder Mead, but trying to get a sense of how performant Julia can be in my typical use cases.    
   
   

Tim Holy

unread,
Nov 12, 2013, 10:21:31 PM11/12/13
to julia...@googlegroups.com
Unless your implementations are algorithmically-identical in Java and Julia
(right down to floating-point roundoff errors), comparing performance on
something like an optimization problem can be delicate. One might take 10
times as many steps as the other, simply because of subtle differences in
convergence criteria, etc., in which case the performance comparison is
essentially meaningless. If this is a possibility, a more "microscopic" look
at specific aspects of the algorithm might be warranted.

I'd recommend some profiling:
http://docs.julialang.org/en/latest/stdlib/profile/
https://github.com/timholy/ProfileView.jl

Best,
--Tim

John Myles White

unread,
Nov 12, 2013, 10:27:24 PM11/12/13
to julia...@googlegroups.com
I'd reiterate Tim's point that performance comparisons are especially difficult to make for optimization. In particular, are the algorithms doing the same number of iterations and reaching the same states along the way?

The Optim code often uses a much stricter convergence threshold than other systems I've worked with. We've done very little to optimize the defaults and are probably too conservative.

-- John

Gilberto Noronha

unread,
Nov 12, 2013, 11:58:33 PM11/12/13
to julia...@googlegroups.com
Yes, the algorithm was identical and I did make sure that they were both taking the same number of iterations and going to the same place.

After I posted here, I was able to improve the performance significantly. The new code is still more than 3 times slower than Java but I learned some things:

1) Passing a slice to a function can be slow. At many points I had small functions taking a column of an array and doing some operations on it. By replacing the function call with inline code, I gained some performance.
2) copy and deepcopy seem to also be slow.

About point number 2, I wrote an example code:

function fillCols!(s, v, m)
  
  const n = size(s, 2)

  for j = 1:m  
    for i = 1:n
      s[:, i] = deepcopy(v)
    end
  end

end

function fillCols2!(s, v, m)
  
  const n = size(s, 2)

  for j = 1:m  
    for i = 1:n
      s[:, i] = copy(v)
    end
  end

end

function fillCols3!(s, v, m)
  
  const n = size(s, 2)

  const nrows = size(s, 1)

  for j = 1:m  
    for i = 1:n
      for k = 1:nrows
        s[k, i] = v[k]
      end
    end
  end

end

function main()

  m = 10000
  n = 100
  v = linspace(1, 10, n)
  s = zeros(n, n)
  
  tic()
  fillCols!(s, v, m)
  toc()
  
  tic()
  fillCols2!(s, v, m)
  toc()
  
  tic()
  fillCols3!(s, v, m)
  toc()

end   

This code is using three different ways to fill the columns of a matrix with a vector (does Julia have a built-in function for that?). On my machine, I get the following times:
main()
elapsed time: 0.686688603 seconds
elapsed time: 0.450131321 seconds
elapsed time: 0.17002956 seconds

So using a for loop is much faster than relying on copy and/or deepcopy. 
Am I doing something wrong?

John Myles White

unread,
Nov 13, 2013, 2:05:48 AM11/13/13
to julia...@googlegroups.com
Calling deepcopy() and copy() are allocating memory, whereas your final for-loop is not. You might want to use copy! instead.

— John

John Lynch

unread,
Nov 13, 2013, 2:30:16 AM11/13/13
to julia...@googlegroups.com
Here's 3 versions starting with your fastest code; going up to John's copy! and going down to one that uses slice to fill one dimension.


function fillCols2!(s, v, m)
 
const n = size(s, 2)
 
for j = 1:m  
   
for i = 1:
n
      copy
!(s[:, i],v)

   
end
 
end

end


function fillCols3!(s, v, m)
 
const n = size(s, 2)
 
const nrows = size(s, 1)

 
for j = 1:m, i = 1:n, k = 1:nrows
    s
[k, i] = v[k]
 
end
end

function fillCols4!(s, v, m)

 
const n = size(s, 2)
 
const nrows = size(s, 1)
 
for j = 1:m  
   
for i = 1:
n
     
#for k = 1:nrows
     
#  s[k, i] = v[k]
     
#end
      s
[:,i] = v
   
end
 
end
 
return s
end

function main()
  m
= 10000
  n
= 100
  v
= linspace(1, 10, n)

  s
= zeros(Float64,(n, n))

 
  tic
()
  fillCols2
!(s, v, m)
  toc
()
 
  tic
()
  fillCols3
!(s, v, m)
  toc
()

 
  tic
()
  fillCols4
!(s, v, m)
  toc
()
  s

end  

#This code is using three different ways to fill the columns of a matrix with a vector (does Julia have a built-in function for that?). On my machine, I get the following times:
main
()
#elapsed time: 0.686688603 seconds
#elapsed time: 0.450131321 seconds
#elapsed time: 0.17002956 seconds

# for my 3
#elapsed time: 0.513350673 seconds  using copy!
#elapsed time: 0.129654345 seconds  your final one
#elapsed time: 0.089774852 seconds  final one but slice for k = 1:nrows
#  not entirely sure the last one is right but it should be


John Lynch

unread,
Nov 13, 2013, 3:07:48 AM11/13/13
to julia...@googlegroups.com
Rereading the comment about functions slowing it down I thought I'd try breaking your fastest option up into 3 functions.  I've been reading the manual and it appeared to recommend more functions.

This is now my fillCols5!.  It is actually marginally faster than the version with only the one function and inline code so I suspect that you may have resolved some speed issues while you moved from functions to inline code.  Alternatively it may work in some circumstances but not others.



function fillCols3!(s, v, m)
 
const n = size(s, 2)
 
const nrows = size(s, 1)

 
for j = 1:m  
   
for i = 1:
n
     
for k = 1:nrows
        s
[k, i] = v[k]

     
end
   
end
 
end
end

function fillCols4!(s, v, m)
 
const n = size(s, 2)
 
const nrows = size(s, 1)
 
for j = 1:m  
   
for i = 1:n
     
#for k = 1:nrows
     
#  s[k, i] = v[k]
     
#end
      s
[:,i] = v
   
end
 
end
 
return s
end


f
(s,v,i,nrows) = for k = 1:nrows
        s
[k, i] = v[k]
     
end

g
(s,v,n,nrows) = for i = 1:n
  f
(s,v,i,nrows)
end  

function fillCols5!(s, v, m)

 
const n = size(s, 2)
 
const nrows = size(s, 1)
 
for j = 1:
m  
    g
(s,v,n,nrows)
 
end

end


function main()
  m
= 10000
  n
= 100
  v
= linspace(1, 10, n)

  s
= zeros((n, n))

 
  tic
()
  fillCols3
!(s, v, m)
  toc
()
 
  tic
()
  fillCols4
!(s, v, m)
  toc
()

 
  tic
()
  fillCols5
!(s, v, m)
  toc
()

 

end  

#This code is using three different ways to fill the columns of a matrix with a vector (does Julia have a built-in function for that?). On my machine, I get the following times:
main
()

#elapsed time: 0.142128951 seconds  standard
#elapsed time: 0.093676331 seconds  slice
#elapsed time: 0.132192888 seconds  functions split


Stefan Karpinski

unread,
Nov 13, 2013, 7:22:41 PM11/13/13
to Julia Users
Regarding your original question about using custom types, that depends on how your type is declared and used. Julia's immutables types can generally provide zero-overhead abstractions. Mutable types often introduce some overhead, but it shouldn't be too bad. It seems that most of the overhead here is coming from allocating and copying. Am I correct in guessing that the Java version doesn't do much copying? Also, why would you ever need to deepcopy anything in an optimization algorithm?

Gilberto Noronha

unread,
Nov 13, 2013, 7:55:37 PM11/13/13
to julia...@googlegroups.com
I did not have time to try John's suggestions before (thanks!).

I tried John's approach with the multiple loops and it takes about the same time as my calculation with a loop (if I run it multiple times, it can be a bit faster or slower but it seems to be noise).

The slice method is the fastest for me too. I did not try it before because I somehow thought that it did not deepcopy.

copy! seems to be almost as slow as copy.

Stefan, to answer your question, I need deepcopies because the algorithm works like this: 
1) The user provides a guess.
2) I use that guess to construct a "simplex" of n + 1 points, where n is the number of decision variables. Typically the first point is the guess, and the other ones are simple variations of the guess.
Of course I could make the algorithm modify the provided guess, but that was not done in Java so it would not be an apples to apples comparison anymore.

In my line of research, I often stumble upon problems when I have to maximize/minimize the same function thousands, if not millions of times, so that initial copy ends up making a real difference.
But it is not a big vector. In the Rosenbrock case, there were only three decision variables, so the vector had 3 elements (in my research I rarely deal with more than 10 decision variables).
On C++, I could just put that small array on the stack, and everything would be copied in a heartbeat, and without any memory allocation (btw, that's part of the reason the gsl's implementation of multidimensional minimization is inefficient when you are dealing with few decision variables, they malloc everything).
My guess is that the Java Virtual Machine was being smart and doing something like that for me. But Julia apparently was not, so I need to be more careful.

I asked about custom types because I got the impression (maybe I am wrong?) that you guys (Julia developers) are much more focused on numerical stuff (i.e, doubles) and did not spend as much time optimizing abstractions, but that's just a guess. I did not measure anything.

Iain Dunning

unread,
Nov 13, 2013, 10:03:29 PM11/13/13
to julia...@googlegroups.com
Could you post your code, maybe in a gist https://gist.github.com/? It doesn't matter if its clean or not, we won't judge!

Gunnar Farnebäck

unread,
Nov 14, 2013, 2:02:01 PM11/14/13
to julia...@googlegroups.com
Den torsdagen den 14:e november 2013 kl. 01:55:37 UTC+1 skrev Gilberto Noronha:
copy! seems to be almost as slow as copy.

If you're talking about looping around

copy!(s[:,i], v)

it's both slow and ineffective. What can be expected to happen is that s[:,i] allocates a temporary vector and copies elements from s to it. Then copy! copies the elements of v to the temporary vector, which afterwards goes out of scope and is garbage collected at some point. It's possible that some of this is optimized away, but in any case s is unaffected.

There have been discussions about changing so that slicing produces a reference to the original data instead of a copy, in which case the code above would both work and be potentially fast. In the mean time the fastest solution is probably to employ unsafe_copy! or unsafe_view, the latter being available from the NumericExtensions package. Both are as dangerous as their names indicate.

By the way, s[:,i] = v works differently since left hand side indexing calls on setindex! instead of getindex.

Gilberto Noronha

unread,
Nov 14, 2013, 3:33:13 PM11/14/13
to julia...@googlegroups.com
I might post my code on a repo after I have some time to clean it up.

But, like I said before, this wasn't meant to be about the Nelder-Mead algorithm. And it is also not about making Julia as fast as Java (I actually hate Java). I was just experimenting with the language.

Is there a good reason why slices make copies instead of pointing to the original data? This behavior seems counter intuitive to me. And what is the best way to pass a subarray to a function with copying? 

Thanks

Gilberto Noronha

unread,
Nov 14, 2013, 3:34:26 PM11/14/13
to julia...@googlegroups.com
Sorry, the question was supposed to be:
"What is the best way to pass a subarray to a function without copying?" 

Tim Holy

unread,
Nov 14, 2013, 4:21:20 PM11/14/13
to julia...@googlegroups.com
On Thursday, November 14, 2013 12:34:26 PM Gilberto Noronha wrote:
> Sorry, the question was supposed to be:
> "What is the best way to pass a subarray to a function without copying?"

You can find information about SubArrays in the help, and probably find quite a
bit of discussion in the mailing list archives.

Main points:
- you create them with sub() and slice().
- although it's gotten a lot better, performance is not always what you might
wish. 1d slices should not be a problem.
- there are many plans to make SubArrays better.

--Tim

Robert Feldt

unread,
Nov 26, 2013, 9:31:26 AM11/26/13
to julia...@googlegroups.com
Hi Gilberto,

Not sure this is relevant but for benchmarking/comparing optimizers the "classic" paper is by Dolan and Moré:

http://arxiv.org/pdf/cs/0102001.pdf

and I will implement such performance profiles in BlackBoxOptim.jl as the main way to compare/evaluate different optimization algorithms:


Currently I have not at all focused on performance though, just adding optimizers at this point.

Cheers,

Robert Feldt
Reply all
Reply to author
Forward
0 new messages