Difference between fminunc and fminsearch

Jack

unread,

Jun 18, 2006, 12:24:54 PM6/18/06

to

Could anyone know the main difference between fminunc and fminsearch
because both are for finding the minimum of a multivariate function?
thanks in advance

John D'Errico

unread,

Jun 18, 2006, 3:13:57 PM6/18/06

to

The main difference?

The spelling of the names? ;-)

The fact that they use very different
algorithms?

You might take a read through this doc:

<http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=8553&objectType=FILE>

Section 21 talks a bit about the differences.
I tried to do a better job in the most recent
copy, which is not up there. So here is the
new text for that section. Note that fminunc
should have two different algorithms as options,
the largescale one and the older medium scale
one.

John

%% 22. Understanding how they work
%{
While I will not even try to give a full description of how any
optimizer works, a little bit of understanding is worth a tremendous
amount when there are problems. True understanding can only come
from study in some depth of an algorithm. I can't offer that here.

Instead, I'll try to show the broad differences between some of the
methods, and suggest when one might use one method over another.

Previously in this text I've referred to optimizers as tools that
operate on a black box, or compared them to a blindfolded individual
on a hillside. These are useful analogies but these analogies don't
really tell enough to visualize how the tools work. We'll start
talking about the method used in fminsearch, since it may well
be one of the tools which is used most often.

Fminsearch is a Nelder/Mead polytope algorithm. Some call it a
simplex method, but this may confuse things with linear programming.
It works by evaluating the objective function over a polytope of
points in the parameter space. In 2-dimensions, this will be a
triangle, in n-dimensions, the polytope (or simplex) will be
composed of n+1 points. The basic algorithm is simple. Compare
these n+1 points, and choose the WORST one to delete. Replace this
bad point with its reflection through the remaining points in
the polytope. You can visualize this method as flopping a triangle
around the parameter space until it finds the optimum. Where
appropriate, the polytope can shrink or grow in size.

I'll note that this basic Nelder/Mead code is simple, it requires
no gradient information at all, and can even survive an occasional
slope or function discontinuity. The downside of a polytope method
is how slowly it will converge, especially for higher dimensional
problems. I will happily use fminsearch for 2 or 3 variable problems,
especially if its something I will need to use infrequently. I will
rarely if ever consider fminsearch for more than 5 or 6 variables.
(Feel free to disagree. You may have more patience than I do.)

The other point to remember about a polytope method is the stopping
criterion. These methods are generally allowed to terminate when
all the function values over the polytope have the same value
within the function tolerance. (I have also seen the standard
deviation of the function values over the polytope used to compare
to the function tolerance.)

We can contrast the polytope algorithm to the more traditional
Newton-like family of methods, combined with a line search. These
methods include many of the optimizers in the optimization toolbox,
such as fminunc, lsqcurvefit, fsolve, etc. Whereas a polytope method
lays down a polytope on the objective function surface, then moves
the polytope around, these line-search methods attempt to be more
intelligent in their operation. The basic idea is to form a locally
linear approximation to the problem at hand, then solve the
linearized
problem exactly. This determines a direction to search along, plus
a predicted step length in that direction from your current point
in the parameter space. (I already discussed line searches in section
9.) This is why these tools require a gradient (or Jacobian as
appropriate) of the objective. This derivative information is used
to compute the necessary locally linear approximation.

The other feature of interest about these methods is how they "learn"
about the surface in question. The very first iteration taken might
typically be a steepest descent step. After the first iteration, the
optimizer can gradually form an adaptive approximation to the local
Hessian matrix of the objective. This, in effect tells the optimizer
what the local curvature of the surface looks like.

%}

%%

% As a comparison, we wll look again to the Rosenbrock function.
rosen = @(X) (1-X(:,1)).^2 + 105*(X(:,2)-X(:,1).^2).^2;
% first, generate a contour surface
[xc,yc] = meshgrid(-7:.05:7);
zc = rosen([xc(:),yc(:)]);
zc = reshape(zc,size(xc));

close all
contour(xc,yc,zc,[.1 1 4 16 64 256 1024 4096])
hold on

% now do an optimization using optimplot to plot the points
opts = optimset('fminsearch');
opts.OutputFcn = @optimplot;
opts.Display = 'iter';

Xfinal = fminsearch(rosen,[-6,4],opts);
hold off

%%

% try the same optimization using fminunc

close
contour(xc,yc,zc,[.1 1 4 16 64 256 1024 4096])
hold on

% now do an optimization using optimplot to plot the points
opts = optimset('fminunc');
opts.OutputFcn = @optimplot;
opts.Display = 'iter';

Xfinal = fminunc(rosen,[-6,4],opts);
hold off

% Note the difference between these two optimizers. Its most obvious
% at the start, when fminsearch started out with a relatively large
% simplex, which more slowly flopped down to the valley than did
% fminunc. Fminsearch also clearly overshoots the valley at first,
% then recovers. Also note that when it did hit the bottom of the
% valley, fminunc was able to take at least a few large initial
steps,
% until the valley became too curved and too flat that it also was
% forced into short steps too.

Anne

unread,

Jun 20, 2006, 9:49:32 AM6/20/06

to

Hi John!

Very long but very complete answer! I want to optimize a model
without particularly constraints, so I tried to do it with
fminsearch. But it doesn't work. I copy your exemple "rosen", but an
error message appears at this line :

rosen = @(X) (1-X(:,1)).^2 + 105*(X(:,2)-X(:,1).^2).^2;

it concerns @(X), it doesn't want '(' after '@'. Do you know why and
what I have to do? My question is certainly a little bit stupid, but
the same syntax is used in Matlab documents (it doesn't work more),
and it stops me in my research.
Thank you very much for your answer (if you have understood my
problem)
Anne

Randy Poe

unread,

Jun 20, 2006, 10:24:11 AM6/20/06

to

This is an anonymous function, introduced in Matlab 7. Do you
know what version of Matlab you are running?

Anonymous functions:
http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/f4-70115.html

Type the command VER to get your version.

- Randy

Anne

unread,

Jun 20, 2006, 10:34:17 AM6/20/06

to

Hello Randy!
Thank you for your answer! I have only Matlab 6.5 ... So I think I
have to create functions in other .m and call them. In French
language we say:"c'est pas gagné!"
Anne

John D'Errico

unread,

Jun 20, 2006, 2:37:29 PM6/20/06

to

Sorry. I tend to forget that many people
live with an older relase of matlab, before
anonymous functions became available.

Use an inline function instead:

rosen = inline('(1-X(:,1)).^2 + 105*(X(:,2)-X(:,1).^2).^2','X');

John

karlo

unread,

Jun 20, 2006, 4:54:24 PM6/20/06

to

Hi,
thank's for your complete expalanation!

i run you code in matlab 7. and i got the following error

??? Undefined command/function 'optimplot'.

what's that?

John D'Errico

unread,

Jun 20, 2006, 7:33:55 PM6/20/06

to

karlo wrote:
>
>
> Hi,
> thank's for your complete expalanation!
>
> i run you code in matlab 7. and i got the following error
>
> ??? Undefined command/function 'optimplot'.
>
> what's that?

A function that I wrote to plot the
trajectories of the optimizers. It
was to be included with the tool.
Here it is.

Sorry. (A sheepish look is on my face.)

John

function stop = optimplot(x, optimValues, state)
% plots the current point of a 2-d otimization
stop = [];
hold on;
plot(x(1),x(2),'.');
drawnow

Anne

unread,

Jun 21, 2006, 5:28:20 AM6/21/06

to

Hello!
Thank you for your response John! I tried with inline and it worked!
So I have used it to solve my own optimization problem.
I want to find m, c, k (that I have put in vector mck for
optimization), so that the movement of the model (which uses m, c, k)
and the movement of the system (a vector of 20001 numbers, function
of time) are the closest it's possible. But I think, time and
movement of the model are considered as variables! So it doesn't work
and it's not what I want!
Does someone know how I can optimize my function on a large time
domain?
Thank you very much for each suggestion.
(I hope I have been understandable!)
Anne

John D'Errico

unread,

Jun 21, 2006, 8:36:53 AM6/21/06

to

I'm not sure. Its early in the morning
for me. Not that I'm very sharp in the
afternoon either. ;-)

So you have three scalars (m,c,k)
(let me guess, magenta, cyan & black?)
that you wish to vary to optimize the
system.

The function is a vector of 20001 elements,
that is a function of time, varying
essentilly 0:20000. The shape of this
functional form depends upon the values
of (m,c,k)?

Let me also guess that you have an aim
for this function?

Please correct me where I'm mistaken
before we go any further.

John

Anne

unread,

Jun 21, 2006, 9:22:10 AM6/21/06

to

Good morning John!

Here it's 15 o'clock!

> I'm not sure. Its early in the morning
> for me. Not that I'm very sharp in the
> afternoon either. ;-)
>
> So you have three scalars (m,c,k)
> (let me guess, magenta, cyan & black?)
> that you wish to vary to optimize the
> system.

That's right, except for the significance of the letters : theese are
letters generally used in France to represent variables of a dynamic
system : mass, spring rate and damping!

> The function is a vector of 20001 elements,
> that is a function of time, varying
> essentilly 0:20000. The shape of this
> functional form depends upon the values
> of (m,c,k)?

Yes it is!

> Let me also guess that you have an aim
> for this function?

I have a three degrees of freedom system (20001 known dependant of
time values) and I want to identify it with a one degree then two,
three and four degrees of freedom model, at first I begin with the
most simple. Identification consists in finding m, c, k such as the
difference between the movement of the system and the model is
minimum. So I try to find m, c, k with optimization of the difference
between theese two signals.

But, inline accepts only one variable, mck and not the fact that I
want to optimize on a large dommain of time.

> Please correct me where I'm mistaken
> before we go any further.
>
> John

I think you see what I try to do, so let's know your suggestion!
Thank you for being so nice!
Anne