I did not have time to try John's suggestions before (thanks!).
I tried John's approach with the multiple loops and it takes about the same time as my calculation with a loop (if I run it multiple times, it can be a bit faster or slower but it seems to be noise).
The slice method is the fastest for me too. I did not try it before because I somehow thought that it did not deepcopy.
copy! seems to be almost as slow as copy.
Stefan, to answer your question, I need deepcopies because the algorithm works like this:
1) The user provides a guess.
2) I use that guess to construct a "simplex" of n + 1 points, where n is the number of decision variables. Typically the first point is the guess, and the other ones are simple variations of the guess.
Of course I could make the algorithm modify the provided guess, but that was not done in Java so it would not be an apples to apples comparison anymore.
In my line of research, I often stumble upon problems when I have to maximize/minimize the same function thousands, if not millions of times, so that initial copy ends up making a real difference.
But it is not a big vector. In the Rosenbrock case, there were only three decision variables, so the vector had 3 elements (in my research I rarely deal with more than 10 decision variables).
On C++, I could just put that small array on the stack, and everything would be copied in a heartbeat, and without any memory allocation (btw, that's part of the reason the gsl's implementation of multidimensional minimization is inefficient when you are dealing with few decision variables, they malloc everything).
My guess is that the Java Virtual Machine was being smart and doing something like that for me. But Julia apparently was not, so I need to be more careful.
I asked about custom types because I got the impression (maybe I am wrong?) that you guys (Julia developers) are much more focused on numerical stuff (i.e, doubles) and did not spend as much time optimizing abstractions, but that's just a guess. I did not measure anything.