I have a model where I am using fmincon, and I am able to get the right solution. However, reaching the right answer is highly dependent on my starting guess. I would like to calibrate my model to real data, and in this case I don't know what to use as a starting guess. So I am looking to increase the stability of the optimization, so that even starting guesses far away from the real solution will produce the right answer.
I am therefore thinking of including analytical gradients and hessian for my objective function and constraints - is this the right thing to do? And which considerations should I do in choosing an algoritm?
Futhermore, I have tried looking at the examples on how to include gradients and hessians at:
However, I haven't completely understood it. As I read it as though I should write my constraint like this:
function [c,ceq,gradc,gradceq]=MyConstraint(x)
c=[];
ceq=x^2-5;
gradc=[];
gradceq=jacobian(ceq,x);
end
However, in this case x isn't a symbolic variable, right? So how can jacobian calculate the gradient? If this is wrong, I guess I would need to declare new temporary symbolic variables, then use 'jacobian', and lastly substitute my parameter values for the symbolic variables. Is this correct?
Thanks!!
Mads
No. Derivatives like the Hessian and gradient give you local information only, no matter how you compute them. They can't help direct you to a global solution.
You should describe your problem to us so that we might make recommendations on how to come up with a good initial guess.
You are likely wasting your time.
Providing analytical derivatives usually gains little
in terms of robustness to bad starting values. The
numerical gradient is generally not the problem
here. It is the bad starting values!
Sometimes providing the derivatives will gain, but
the gain may be only in terms of speed. Even this
is not always a given however, since often those
derivatives are more complex to compute.
Again, the problem is in your bad starting values.
John
I have a model where I am using fmincon, and I am able to get the right solution. However, reaching the right answer is highly dependent on my starting guess. I would like to calibrate my model to real data, and in this case I don't know what to use as a starting guess. So I am looking to increase the stability of the optimization, so that even starting guesses far away from the real solution will produce the right answer.
I am therefore thinking of including analytical gradients and hessian for my objective function and constraints - is this the right thing to do? And which considerations should I do in choosing an algoritm?
Futhermore, I have tried looking at the examples on how to include gradients and hessians at:
Then I won't waste on that.
I have tried doing a surface plot of the function value, against the two variables that I am minimizing over, and the plot doesn't show multiple minima - at least as I see it. However, it does look like the gradients are very flat near the minimum. Could this be why it finds other solutions, and how do I cope with this? I would like to post the figure of the plot,
but I don't know how.
Thanks
Mads
I don't know what happened, but somehow, your replies disappeared...
But I understood that I shouldn't waste time on including the hessian and gradients. Thanks.
I have tried doing a surface plot of the function value against the two variables that I am minimizing over, and the plot doesn't show multiple minima - at least as I see it. However, it does look like the gradients are very flat near the minimum. Could this be why it finds other solutions, and how do I cope with this? I would like to post the figure of the plot,
The gradient should be flat (i.e. close to zero) near the minimum. Do you mean that they are also flat far away? I.e., Is there a long trench in the surface plot. If so, you have a poorly scaled problem and computing the analytical Hessian should indeed help.
If there are no multiple minimum in the graph of the function then the algorithm is not finding other solutions. It is not finding a solution at all. The algorithm is simply not managing to converge quickly enough within the number of iterations that you are running. Using the analytical Hessian should accelerate it. Although, it might be better if you scaled your variables more appropriately.
>
> The gradient should be flat (i.e. close to zero) near the minimum. Do you mean that they are also flat far away? I.e., Is there a long trench in the surface plot. If so, you have a poorly scaled problem and computing the analytical Hessian should indeed help.
That's a good point.
>
> If there are no multiple minimum in the graph of the function then the algorithm is not finding other solutions. It is not finding a solution at all. The algorithm is simply not managing to converge quickly enough within the number of iterations that you are running. Using the analytical Hessian should accelerate it. Although, it might be better if you scaled your variables more appropriately.
Theoretically, the minimization algorithm could use Hessian and Gradient to fully readjusting bad scaling. But we have no idea how the algorithm is implemented in detail to make such assumption.
Bruno
It is exactly one long trench. Should I only include the hessian or also the gradients? And is it the hessian of both the objective function and constraints?
In scaling the variables, should they of the same magnitude of the objective function?
Thanks,
Mads
The gradients, too.
> And is it the hessian of both the objective function and constraints?
yes.
> In scaling the variables, should they of the same magnitude of the objective function?
There's no science to this. Just make sure you aren't measuring some variables in millimeters and others in kilometers. Ideally, they would be in units such that the Hessian eigenvalues are all close to 1 near the solution
Ideally, they *variation* should be comparable. The Hessian *after rescalling* (at the minimum) should be close to a orthogonal matrix.
Bruno
> Ideally, they *variation* should be comparable. The Hessian *after rescalling* (at the minimum) should be close to a orthogonal matrix.
---
Since the Hessian must also be symmetric, does this not imply that it be an identity matrix?
>
> Since the Hessian must also be symmetric, does this not imply that it be an identity matrix?
Not necessary.
Bruno
It also has to be positive semidefinite.
I'm pretty sure an orthogonal p.s.d. matrix has to be identity. Since it is orthogonal, all of eigenvalues are of magnitude 1. Since it is p.s.d., all the eigenvalues must therefore be real, nonnegative and hence equal to 1. Since all eigenvalues are equal 1 and it is p.s.d. , it must have eigendecomposition
Hessian=R'*I*R=I; %R is orthogonal
Anyway, this of course assumes the minimum is in the interior of the constrained region....
>
> I'm pretty sure an orthogonal p.s.d. matrix has to be identity...
Yes, you are actually right.
Bruno
I will try this!
Thank you all!
I have now considered including the analytic expressions of the hessian and gradient, but I have abandoned this, since they are extremely big expressions and take too long to calculate.
Instead I have tried to change the scale of the problem, and this has helped a lot. Now, my program finds the right solution for i wider range of starting guesses. However, I am fumbling in the dark here. I have no idea which variables, and i what way, I should change them. Is there any guidance in how to deal with poor scaling? Is there perhaps somewhere I can read about this?
Thanks!!
Mads
Like I said, you ideally want to scale the variables so that the Hessian at the minimum becomes similar to the identity matrix.
Because one doesn't know a priori where the minimum is and because Hessian computations are often difficult, ad hoc approximations are often made. For example, some people approximate the Hessian as a diagonal matrix, i.e. by computing the diagonal entries only. Likewise, instead of computing it at the apriori unknown minimum, some people compute it at the initial guess, which has to be pretty good anyway if you're dealing with functions having multiple local minima.
In short, there are a variety of techniques, all of them ad hoc. Most books on optimization will discuss them to some degree.
Aside from this, note also that fmincon allows you to supply whatever approximation to the Hessian that you like. For example, if you think a diagonal approximation will be good enough (and sufficiently cheaper than computing the whole matrix) you can pass that instead.
> Aside from this, note also that fmincon allows you to supply whatever approximation to the Hessian that you like. For example, if you think a diagonal approximation will be good enough (and sufficiently cheaper than computing the whole matrix) you can pass that instead.
====================
You could also pass a constant matrix for the Hessian to fmincon.
Sometimes people will approximate the Hessian as constant by evaluating it at the initial guess only. You generally expect this to work well though if you're close enough to the minimum to approximate the function by its 2nd order Taylor expansion.