fmincon jumpy behavior and possibility of a max step specification

Andrea Giannuzzi

unread,

Mar 19, 2012, 12:40:10 PM3/19/12

to

Hi all,
I'm using a package that employ fmincon to optimize a 10-param loglik function, but I get a very "jumpy" behavior from the minimizer in the very first part of the optimization. For instance, even specifying the *true* parameters as the initial ones (which I know being the true because I am fitting the true auto-regressive model over a time series generated with these parameters) after ten iteration (during which fmincon tries to perturb each parameter at the time), I systematically experiences jumps like from:

[2, -4;
-0.5, 0.5]

to

[5.65,-7.67;
11.16, 38.24]

(wrt the first four parameters only)

well..this fmincon trial could be wonderful in terms of directional derivative (I don't know), but it is terrible in terms of goodness of the estimate. In fact, with these parameters I get a conditional estimate *far* away from the independent variable, hence a loglik=0 (which in turn causes a lot of troubles in my routines). And the same happens a lot of times until I get the series stabilizing on a value closer the the true values (but still away from the initial value, whose "information" seems to get lost in the process).

This leads to the main question: is there a way to prevent this from happening? Or, which is the same, is there a way to set a maximum step-size for the optimizer? I read about DiffMaxChange option, but I tried it with active-set, sqp or interior-point as well w/o results. In fact I also read Steved Lord replying here as follows:

>You don't change the step size. FMINCON calculates the steps it takes based
>on which direction and distance will give it the best decrease in function
>value. To change this, you'd probably have to go in and rip out sections of
>the helper functions FMINCON uses ... I really don't think that you don't
>want to do this. It would get quite messy.

Which seems to put a headstone on my problem.. is mine really a hopeless goal? I would change fmincon for something similar if needed.. it is hard to believe that one can't impose such a simple restraint somehow..

Thank you in advance,
Andrea

Matt J

unread,

Mar 19, 2012, 2:15:15 PM3/19/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jk7nha$n68$1...@newscl01ah.mathworks.com>...

> Hi all,
> I'm using a package that employ fmincon to optimize a 10-param loglik function, but I get a very "jumpy" behavior from the minimizer in the very first part of the optimization. For instance, even specifying the *true* parameters as the initial ones (which I know being the true because I am fitting the true auto-regressive model over a time series generated with these parameters) after ten iteration (during which fmincon tries to perturb each parameter at the time), I systematically experiences jumps like from:
>
> [2, -4;
> -0.5, 0.5]
>
> to
>
> [5.65,-7.67;
> 11.16, 38.24]

===================

Are you really sure that the true parameters are supposed to give a minimizer? If you are generating the time series with simulated measurement noise, there's no reason why the true parameters should minimize the function. The location of the minimum will be influenced by the simulated errors.

If you are simulating without measurement noise, then what you report seems to say you have a bug in your implementation of the objective function. If you are initializing at the minimum, FMINCON should recognize that you are at a solution right away based on the gradient and other optimality conditions. Perhaps you have a non-differentiable cost function which is preventing FMINCON from identifying the true parameters as a stationary point of your objective. The "jumpiness" you describe sounds consistent with that.

Andrea Giannuzzi

unread,

Mar 19, 2012, 4:03:14 PM3/19/12

to

Dear Matt,

I want to thank you because you highlighted a meaningful point: I was expecting an estimate very close to the true parameters notwithstanding white-noise disturbances because of the sample size I am using (up to 15k periods), but yes, it is certainly possible that I am actually over-confident about the consistency of my estimator.

In any case, I think we could agree on the claim that, given a true value of (say) -0.5 and a final convergence towards -0.47, it is not particularly desirable that the optimizer initially jumps from -0.5 to +11.16 in guessing the minimum point because it is:

* in the best scenario, inefficient (given the convergence value);
* in the worst scenario, less accurate than if the optimizer remained around the initial (and more reasonable) value.

Honestly, I am not sure about the second point, but since due to the loglik=0 problem I get a number of

"User objective function returned NaN; trying a new point..."

in the "iter" output, I am very worried of the chance of losses in the informations provided by good initial parameters and I would definitely prefer to cap the size of the jumps if possible,

All the best,
Andrea

"Matt J" wrote in message <jk7t3j$eg9$1...@newscl01ah.mathworks.com>...

shhebaz ghouri

unread,

Mar 19, 2012, 4:55:18 PM3/19/12

to

I am not talking only about the unit step function.
question like this
y=
cos(x) for x>=0;
sin(x) for x<0;
how will we write the function in matlab.

Nasser M. Abbasi

unread,

Mar 19, 2012, 5:24:45 PM3/19/12

to

f = @(x) ~heaviside(x).*sin(x) + heaviside(x).*cos(x)
x = -4*pi:0.1:4*pi
plot(x,f(x),'o-')

--Nasser

Alan Weiss

unread,

Mar 20, 2012, 9:11:41 AM3/20/12

to

Perhaps you should try different fmincon algorithms. Since you don't say
which one you use, you might be using 'active-set' (which you usually
get if you don't specify one). Try using 'interior-point' instead.
http://www.mathworks.com/help/toolbox/optim/ug/brhkghv-18.html#bsbqd7i

Also, a likelihood function often has many local optima. You might want
to use many starting points to ensure that you obtain a global optimum.
If you have Global Optimization Toolbox, take a look at
http://www.mathworks.com/help/toolbox/gads/bsc59ag-1.html

Alan Weiss
MATLAB mathematical toolbox documentation

Matt J

unread,

Mar 20, 2012, 9:38:26 AM3/20/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jk83e2$8f0$1...@newscl01ah.mathworks.com>...

>
> Honestly, I am not sure about the second point, but since due to the loglik=0 problem I get a number of
>
> "User objective function returned NaN; trying a new point..."
>
> in the "iter" output, I am very worried of the chance of losses in the informations provided by good initial parameters and I would definitely prefer to cap the size of the jumps if possible,

=======================

You should show your code. The symptoms you're describing don't sound like problems with the way you're running FMINCON, e.g., what maximum step sizes and so forth. It sounds like you have bugs in your objective function. It would be easier to determine that if we could see your code.

Andrea Giannuzzi

unread,

Mar 20, 2012, 10:24:12 AM3/20/12

to

@Alan

Thank you Alan,

you pointed out another significant suggest potentially able to improve my estimates. I didn't implemented yet a GlobalSearch scheme but eventually I'll embed it in order to increase the likelihood of "good" local minima.

Again, however, this do not exclude the opportunity of increasing the efficiency of the solver by requiring a more "smooth" behavior if possibile (and if there are reasons, are it is the case, to come up with "good" initials parameters), and hence by limiting the step-size of the solver.

(As I said in the open post, I tried with interior-point, sqp and active-set as well without significant differences with reference to the question I'm explaining here).

@Matt

I'll try to summarize the focal points. I'm calling fmincon in the following way

[param]=fmincon(@(param)model_Lik(dep,indep,indep,param,k,S,advOpt,dispOut),param0, A,b,Aeq,beq,lB,uB,[],options);

where "model_Lik.m" starts with

function [sumlik,Output,logLikVec]=model_Lik(dep,indep,indep,param,k,S,advOpt,disp_out)

and has at the end

sumlik=-sum(log(f(2:end)));

where "f" is a vector containing the filtered probability densities of the dependent observation being generated using "param". Given Cond_mean=indep*param, I compute the pdf according to simple a multivariate Normal distribution:

pdf=1/(((2*pi())^(nc/2))*sqrt(det(covMat))).* exp(-0.5.*sum((dep(:,:)-Cond_mean(:,:))*inv(covMat).*(dep(:,:)-Cond_mean(:,:)),2));

As I noted before, the problem is that when fmincon jumps, I get as a consequence a Cond_mean=x*param far away from the dependent observation (dep), and therefore a pdf=0 for some rows (ie, less than the minimum floating point number). At the end, the filtered equation which delivers "f" faces a 0/0=NaN which enters the -sum(log(f(2:end))) and that's all.

In fact I think there are not a lot of insights in this code which could help avoiding sumloglik=0: the problem is that fmincon believes that it is a good idea to do such a jump, and moving accordingly implies a pdf of zero for the reasons I tried to explain.

Please. tell me if something is still unclear and I'll try to do my best to explain further,

Thank you very much for all the help you provided insofar!

Andrea

Matt J

unread,

Mar 20, 2012, 2:13:22 PM3/20/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jka3uc$ld9$1...@newscl01ah.mathworks.com>...

>
> pdf=1/(((2*pi())^(nc/2))*sqrt(det(covMat))).* exp(-0.5.*sum((dep(:,:)-Cond_mean(:,:))*inv(covMat).*(dep(:,:)-Cond_mean(:,:)),2));
>
> As I noted before, the problem is that when fmincon jumps, I get as a consequence a Cond_mean=x*param far away from the dependent observation (dep), and therefore a pdf=0 for some rows (ie, less than the minimum floating point number). At the end, the filtered equation which delivers "f" faces a 0/0=NaN which enters the -sum(log(f(2:end))) and that's all.
>
> In fact I think there are not a lot of insights in this code which could help avoiding sumloglik=0: the problem is that fmincon believes that it is a good idea to do such a jump, and moving accordingly implies a pdf of zero for the reasons I tried to explain.

===================

It's a bit hard to know what's going on without seeing the dependence of f on pdf, i.e., without understanding the filtering operation on pdf which you mention. I can think of only 2 reasonns why fmincon wants to jump, though

(1) The gradient at your initial point is large, which means it is far from optimal. If it is far from optimal, it makes sense to jump.

(2) The Hessian is close to singular -- the same thing that causes Newton's method to jump when you hit a saddle point.

You can try to limit the jumps by adding ub and lb bound constraints on param. Or, if applicable, you can try to think of ways to improve the conditioning of the problem and make the Hessian less singular.

Matt J

unread,

Mar 20, 2012, 3:40:17 PM3/20/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jka3uc$ld9$1...@newscl01ah.mathworks.com>...

>
>
> pdf=1/(((2*pi())^(nc/2))*sqrt(det(covMat))).* exp(-0.5.*sum((dep(:,:)-Cond_mean(:,:))*inv(covMat).*(dep(:,:)-Cond_mean(:,:)),2));

=============

Incidentally, you are slowing your code down greatly by doing things like
dep(:,:) and Cond_mean(:,:). Why do you think you need the (:,:) at all?

Also, you are doing lots of extra computation by repeatedly computing the constants
1/(((2*pi())^(nc/2))*sqrt(det(covMat))) every time the function is called. It would be better to pre-compute these things only once.

Finally, there are all the usual evils of using the INV function.

Andrea Giannuzzi

unread,

Mar 20, 2012, 4:48:11 PM3/20/12

to

The very last missing piece is:

pdf=(...) % Gaussian pdf calculated as I pointed out earlier
(...)
for i=1:number_rows % ie, number of observations
f(i,1)=ones(2,1)'*(MarkovMatrix*E(i-1,:)'.*pdf(i,:)');
E(i,:)=((MarkovMatrix*E(i-1,:)'.*pdf(i,:)')/f(i,1));
end

(it is a 2-states setting). The first row is for the filtered likelihood, the second row is for the filtered probability of being in each of the states (and the latter is used in the former in the following iteration). From here you can appreciate the 0/0=NaN problem, when for some "i" you experiences pdf(i,:)==[0 0]. Once the loop is ended, the loglik is computed as a summation of the logs of "f".

Thank you for underlying the inefficiencies in calculating Gaussian pdf: I am not the author of the code and, despite the impressive work of the former programmer, it is possible (and natural) that there are spaces for little improvements/cleaning work.

All the best,
Andrea

=============

"Matt J" wrote in message <jkahc2$9ms$1...@newscl01ah.mathworks.com>...

Matt J

unread,

Mar 20, 2012, 10:01:29 PM3/20/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jkaqeb$c9f$1...@newscl01ah.mathworks.com>...

> The very last missing piece is:
>
> pdf=(...) % Gaussian pdf calculated as I pointed out earlier
> (...)
> for i=1:number_rows % ie, number of observations
> f(i,1)=ones(2,1)'*(MarkovMatrix*E(i-1,:)'.*pdf(i,:)');
> E(i,:)=((MarkovMatrix*E(i-1,:)'.*pdf(i,:)')/f(i,1));
> end

================

There's nothing obviously bad about the above, that I can see. However, there are obviously regions where the objective function is undefined, in particular where

pdf(i,:)')/f(i,1))<=0

You will have to add constraints to discourage the algorithm from straying into these regions, and ideally also from straying into regions where the floating point computations are numerically sensitive.

Andrea Giannuzzi

unread,

Mar 21, 2012, 7:29:11 AM3/21/12

to

I see.. Anyway, it seems that I have to give up in my original aim of setting a certain step-size maximum for the route of the optimization ..is it correct? (Now it is somehow natural to me to infer such a conclusion, but I think it would be useful to definitely clarify this point for future reference.)

Thank you,
Andrea

=========
"Matt J" wrote in message <jkbcpp$6c9$1...@newscl01ah.mathworks.com>...

Matt J

unread,

Mar 21, 2012, 9:54:12 AM3/21/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jkce27$avs$1...@newscl01ah.mathworks.com>...

> I see.. Anyway, it seems that I have to give up in my original aim of setting a certain step-size maximum for the route of the optimization ..is it correct? (Now it is somehow natural to me to infer such a conclusion, but I think it would be useful to definitely clarify this point for future reference.)

=================

There are ways to limit the step-size using optimset. My feeling, though, is that it is a clumsier way to solve this problem. You currently only know the jump-size starting at one particular point. You don't really know exactly how the step-size will need to be limited as the algorithm progresses to other points.

Andrea Giannuzzi

unread,

Mar 21, 2012, 10:54:22 AM3/21/12

to

Sorry, but why do you are so cautious in actually explaining some of these methods? Please, do not misread me: I am grateful for your (insightful) warnings and I totally agree that turning off the "smart" way fmincon do its job is a sort of a trick, but it should be less potentially harmful, for instance, than adding constraints to the optimization region, it isn't? After all, it seems that fmincon iterates as long as "TolX" is not reached, hence there should be no critical downsides in requiring fmincon to proceed only by smaller steps: if in doing so TolX is (eventually) reached "before", the otherwise big-step would have been wrong; if TolX is is not reached before, the only downsize is a greater amount of iterations (but if I see that --to the extreme-- iterations limit became the usual stopping trigger of the optimization, I could easily increase this constraint). Yes, these two
methods could lead to different minima, but if I had evidence (for any reason) towards good local minimum *around* the starting value , why should I be worried by imposing a maximum step size which prevents the optimizer from (erroneously) going far away, and then turning back on its own steps?

>You don't really know exactly how the step-size will need to
> be limited as the algorithm progresses to other points.

Uhm, I think that by setting the maximum only I am still leaving a lot of degree of freedom to the algorithm: further directional derivatives still consistent with initial expected large swing? do more iterations; following directional derivatives decrease in absolute value? reduce by whatever you want the step size..

I will carefully resort to this option, I promise, but if I would try (if any, for diagnostic reasons itself..), how should I do?

===============

"Matt J" wrote in message <jkcmi4$ae5$1...@newscl01ah.mathworks.com>...

Matt J

unread,

Mar 21, 2012, 2:21:12 PM3/21/12

to

"Andrea Giannuzzi" <andrea.remove.th...@gmail.nosp.am.com> wrote in message <jkcq2u$mm4$1...@newscl01ah.mathworks.com>...

>
> Sorry, but why do you are so cautious in actually explaining some of these methods?

================

Not sure what you mean by cautious. I thought I explained everything to you!

> After all, it seems that fmincon iterates as long as "TolX" is not reached,

==================

No there are lots more stopping thresholds that influence how long fmincon will iterate:, TolFun, TolCon, etc...

>should be no critical downsides in requiring fmincon to proceed only by smaller steps:

====================

There's no downside. I just don't think it's guaranteed to help. I also don't think it's possible anymore. Originally, I was thinking that the DiffMaxChange option would let you control the maximum step-size, but now I see that it only governs finite difference derivative calculations.

> >You don't really know exactly how the step-size will need to
> > be limited as the algorithm progresses to other points.
>
> Uhm, I think that by setting the maximum only I am still leaving a lot of degree of freedom to the algorithm: further directional derivatives still consistent with initial expected large swing? do more iterations; following directional derivatives decrease in absolute value? reduce by whatever you want the step size..

================

But even if you could do this, how would you choose the maximum step-size effectively? If the iterations start to move in small steps toward the dangerous 0/0=NaN region, you will have to take smaller and smaller steps to avoid it.