Scaling problem with fmincon

Sargondjani

unread,

Jun 26, 2012, 7:34:06 AM6/26/12

to

hi, i want to maximize the expected (infinite time) utility. There are X number of states, and the choice variables are consumption (C) in every state.
The consumption in each state however determines the transition possibilities between states.
In the first M states one is employed so the government can tax those ppl, and in other states agents are unemployed, so they get benefits. The constraint is that total gov. expenses has to equal gov. income (tax revenue)

In math terms. The ojective:
E_V=S'*V; %S are the fractions in each state and V is the expected utility (defined as the bellman equation: V = U(C) +beta*Q(C)*V where Q is the transition matrix)

The constraint:
inc = wage * sum(S(1:M,1));
exp = S'*C;
C(1,1) = exp-inc;

The V's for everey state are of similar magnitude. The problem is that some fractions (in S) are very small, so the optimal C for those states is (really) inaccurate

The problem is that i can not simply increase their size because then they also get more weight in the constraint....

Is there a solution for this? Reference to some good examples is also welcome. The example in the matlab section 'when the solver fails' is not good enough though...

Many thanks in advance!!

Steve Grikschat

unread,

Jun 26, 2012, 10:11:07 AM6/26/12

to

"Sargondjani" wrote in message <jsc6ne$csf$1...@newscl01ah.mathworks.com>...

Since the C's are scaled in the exact same way in the objective and constraint, I not clear why you can't selectively scale the C's for the optimization, and then scale the solution back. I'm not sure that the S's are linear in the objective, so perhaps you can introduce an additional scaling for the purposes of making the optimization space more even.

Matt J

unread,

Jun 26, 2012, 10:23:07 AM6/26/12

to

"Sargondjani" wrote in message <jsc6ne$csf$1...@newscl01ah.mathworks.com>...
>

> The V's for everey state are of similar magnitude. The problem is that some fractions (in S) are very small, so the optimal C for those states is (really) inaccurate

==========

What does "inaccurate" mean for you? As long as both the accurate C and inaccurate C maximize the function, why is there any need to distinguish between them as solutions? If there is a need to distinguish between them, then quantify that distinction and use that quantity to improve your objective or constraints.

Incidentally, are you sure that Q,U,V are differentiable functions of C? If not, you may face limitations in the applicability of FMINCON.

Sargondjani

unread,

Jun 26, 2012, 4:52:07 PM6/26/12

to

"Steve Grikschat" wrote in message <jscftr$p6u$1...@newscl01ah.mathworks.com>...

Thanks for your reply! But S actually depends on C: S are the fractions in each state which are the determined by the transition matrix Q (which depends on C). So I dont think i can do this... and the relation of S is definitely not linear in the objective...

Do you still think it is possible to scale it in a helpful way?

Sargondjani

unread,

Jun 26, 2012, 5:03:06 PM6/26/12

to

The whole thing is differentiable, so that's not the problem...

=> What does "inaccurate" mean for you? As long as both the accurate C and inaccurate C maximize the function, why is there any need to distinguish between them as solutions?

You are right that the solution E_V will be accurate anyway, but i want C to be accurate as well, because I am actually more interested in the size of C

=>If there is a need to distinguish between them, then quantify that distinction and use that quantity to improve your objective or constraints."

What do you mean by this? I want the C to be very accurate, because I am interested in the differences in C for every state. (for instance to see if there is any logical ordering in them, and some have really large deviations (>50%) as i know the optimal solution in a test case. the ordering then gets totally lost)

Matt J

unread,

Jun 27, 2012, 9:51:07 AM6/27/12

to

"Sargondjani" wrote in message <jsd82a$ijj$1...@newscl01ah.mathworks.com>...

>
> =>If there is a need to distinguish between them, then quantify that distinction and use that quantity to improve your objective or constraints."
>
> What do you mean by this? I want the C to be very accurate, because I am interested in the differences in C for every state. (for instance to see if there is any logical ordering in them, and some have really large deviations (>50%) as i know the optimal solution in a test case. the ordering then gets totally lost)

=============

What I'm saying is that you haven't defined "accurate" clearly enough. If there are two points C1 and C2 which optimize E_V, then you should consider them both accurate.

Or, if one of the solutions does not have a property that you want, it means you didn't define the optimization problem correctly: you didn't supply enough constraints or you didn't choose an objective function that is sensitive enough to differences in the property that you want. So, find a way to quantify the missing property and express it either as a constraint or possibly as an additional term in your objective function to make the obejctive more sensitive to the thing you're trying to achieve.

Sargondjani

unread,

Jul 14, 2012, 6:04:27 AM7/14/12

to

> What I'm saying is that you haven't defined "accurate" clearly enough. If there are two points C1 and C2 which optimize E_V, then you should consider them both accurate.
>
> Or, if one of the solutions does not have a property that you want, it means you didn't define the optimization problem correctly: you didn't supply enough constraints or you didn't choose an objective function that is sensitive enough to differences in the property that you want. So, find a way to quantify the missing property and express it either as a constraint or possibly as an additional term in your objective function to make the obejctive more sensitive to the thing you're trying to achieve.

I have really thought alot about this, but i still dont know how to achieve this for my particular problem...

To formulate it as simple as possible:
I want to optimize the expected value:

EV(C)=S(C)'*V(C) %C is chose variable.

C,S and V are all 6x1 vectors (there are 6 states)
C=consumption vector
S(is,1)=fraction of ppl in state 'is', depends on C;
V=values of each state, depends on C;

The constraint is:
C = S'*C - w*S(1,1);

If one fraction of S is very small then of course it gets little weight in the objective function (and constraint), so the optimal value for C for that state will be much less accurate...
To compensate for this i would want to scale up the fractions, but then constraint doesnt make sense.

So i have no clue how to achieve what you suggest... if that even is possible. Any advice would be more than welcome and thanks in advance!

Matt J

unread,

Jul 14, 2012, 2:37:08 PM7/14/12

to

"Sargondjani" wrote in message <jtrg7b$bef$1...@newscl01ah.mathworks.com>...

>
>
> If one fraction of S is very small then of course it gets little weight in the objective function (and constraint), so the optimal value for C for that state will be much less accurate...
> To compensate for this i would want to scale up the fractions, but then constraint doesnt make sense.

===============

No. Your problem won't get solved by changing current information like S. Your problem can only get solved by adding new information. There are obviously properties you want C to have that EV(C) and the constraint you've written currently don't know about.

Might it be, for example, that C is supposed to vary smoothly from state to state? Is there a concept here of "neighboring" states? If so, what people sometimes do is add a penalty term to the cost function that penalizes sharp changes in C, e.g.,

min. S(C)'*V(C) + beta*norm(diff(C))^2

where beta>=0 is a parameter that controls the impact of the penalty term. Usually, you want to make it only as large as necessary for the penalty's influence to be felt, so as to avoid over-smoothing. You might also use a weighted norm with more weight placed on the states with small weights S(C), since those seem to be the ones that S(C) don't provide a lot of information for.

min. S(C)'*V(C) + beta*norm(weights.*diff(C))^2

These are just examples, though. The bottom line is that there is information currently missing from your problem formulation. It's like solving an under-determined system of linear equations. You'll get infinite possible solutions unless you add more equations. But since you're the only here who understands the application, you're the only one qualified to determine exactly what information is missing.

Sargondjani

unread,

Jul 16, 2012, 6:17:33 AM7/16/12

to

"Matt J" wrote in message <jtse8k$mdq$1...@newscl01ah.mathworks.com>...

Thank you so much matt!! It made me think alot more about it. And i think now that i have been silly not to see that i could scale my C with 1 divided by the fractions (1./S, but just as inputs in fmincon and then rescale them in the objective function), so that at least the effect on the objective is similar in magnitude. Then i would have to add the 1./S in the 'TypicalX' for the finite differences, and it should work

In a first test this didnt work, but i think with some tweaking i can make it work. Especially if I also check if the condition numbers of the Hessian are close to one (or at least if they get closer to 1) as you told me in the other thread... Since the optimization itself is pretty fast i could even try to optimize the scaling of C such that the condition numbers get closer to 1 (just as a test to understand it better)

Anyway, Im learning here :-) And again, many thanks!!

Matt J

unread,

Jul 16, 2012, 7:55:07 AM7/16/12

to

"Sargondjani" wrote in message <ju0pnt$a75$1...@newscl01ah.mathworks.com>...

>
>
> Thank you so much matt!! It made me think alot more about it. And i think now that i have been silly not to see that i could scale my C with 1 divided by the fractions (1./S, but just as inputs in fmincon and then rescale them in the objective function), so that at least the effect on the objective is similar in magnitude. Then i would have to add the 1./S in the 'TypicalX' for the finite differences, and it should work
>
> In a first test this didnt work, but i think with some tweaking i can make it work.

==================

I doubt it, but I'll be interested to here if it works.

As I mentioned in that other thread, FMINCON already takes care of the scaling of C internally. That's the whole reason why it computes the Hessian in the first place. It uses inv(Hessian) to scale the unknowns, C. Unless your scaling is really, really horrible to begin with, I can't see that being the issue.

The issue I've been talking about is that your optimization problem might be under-determined. Scaling your variables can't turn an under-determined problem into a well-determined one.

Sargondjani

unread,

Jul 16, 2012, 8:27:13 AM7/16/12

to

> I doubt it, but I'll be interested to here if it works.
>
> As I mentioned in that other thread, FMINCON already takes care of the scaling of C internally. That's the whole reason why it computes the Hessian in the first place. It uses inv(Hessian) to scale the unknowns, C. Unless your scaling is really, really horrible to begin with, I can't see that being the issue.
>
> The issue I've been talking about is that your optimization problem might be under-determined. Scaling your variables can't turn an under-determined problem into a well-determined one.

Hmmmm, I see. I still wonder how well the 'internal scaling' works, because in the other thread you also gave an example of bad scaling (one variable being 1e6 times larger) and this is approximately how bad my scalling is as well (1 fraction in S is almost 1 and another is 1e-7).

And the system is not undetermined (i am very sure about this), because C is consumption allocated to every state when it occurs, and only in certain states income (tax revenue) is earned. There is really only one way to optimally split the earned income so as to maximize the expected utility from consumption (S'V). Although numerical indetermination is possible if redistribution between states does not affect utility too much, but this does not seem to be the case....

So i know pretty sure it is only an accuracy problem... And the results are good for all states (that i can verify) except the one with a fraction of 1e-7 (other fractions are between 0.9 and 1e-4 or something like that). This also hints to a scaling problem.

Matt J

unread,

Jul 16, 2012, 9:59:08 AM7/16/12

to

"Sargondjani" wrote in message <ju11b1$7ni$1...@newscl01ah.mathworks.com>...

>
> > The issue I've been talking about is that your optimization problem might be under-determined. Scaling your variables can't turn an under-determined problem into a well-determined one.
>
> Hmmmm, I see. I still wonder how well the 'internal scaling' works, because in the other thread you also gave an example of bad scaling (one variable being 1e6 times larger)

===========

Not sure which example you mean.

> and this is approximately how bad my scalling is as well (1 fraction in S is almost 1 and another is 1e-7).

=========

Well, it might be that rescaling your variables will improve the finite difference computations of the gradient and Hessian. I don't know how FMINCON chooses its differencing step, and if it chooses it differently for different variables. For variables that affect the objective function more strongly, you would want to take smaller steps.

Sargondjani

unread,

Jul 17, 2012, 12:19:25 PM7/17/12

to

> > Hmmmm, I see. I still wonder how well the 'internal scaling' works, because in the other thread you also gave an example of bad scaling (one variable being 1e6 times larger)
> ===========
>
> Not sure which example you mean.

The example was with: f(x,y)=x^2 + 10e4 y^2, and then rescale to f(z,y)=1e-4*(z^2 +y^2);
In this example i checked the condition number. It is 10,000 in first case and 1 in second case. But still with this high conditional number in first case, the solution is just 1e-16 from true solution (in one iteration,albeit with a quadratic function of course).

In a test case (as in the orginal post) the condition number (without any scaling) is around 700 (for 8x8 Hessian, with largest entry on diagonal 76 and smallest 0.5). The gradient ranges between -13 and -.05. After I scale the parameters so the gradient becomes of similar magnitude the condition number drops to about 280.

When i try to minimize the condition number i can further reduce it to about 100. The solution is indeed somewhat better (just 1e-5 % though). I wonder if the condition number of the hessian could also influence this...

But in a different setting (still with the same type of models as in the orginal post) i get somewhat confused, again.

First i optimize without any scaling, i get a very inaccurate solution for C, and the condition number is around 1.5e4. Then i scale the parameters so the gradient is around 1 for every parameter. Then the solution is much more accurate for C (at least for the states for which i know C should be the same), however the condition number increases to 2e11.

I suspect that the increase in the condition number is due to inaccuracies in the finite difference estimation of gradient/hessian because i scale some parameters with 1e-5, so a change of 1e-6 in the scaled parameters results in a change of 10% in the rescaled variable. (i tried to control for this with 'TypicalX' but unsuccessfull so far).

I am surprised to find an increase in the accuracy of C combined with a worse condition number.... even if it can be explained by bad scaling in finite differences

Sargondjani

unread,

Jul 18, 2012, 3:28:13 AM7/18/12

to

i have run some additional tests and if i just look at the conditional number, i found that scaling so the gradient gets of similar magnitude does not help with sqp, but sometimes with active set (and even then setting typicalX to 1/gradient actually helped more)

this leaves me with 2 more more general questions:

is the hessian of the objective function alone a good measure, or could the condition number of the constraint also influence the accuracy of the result? (which i would not know about when i just consider the condition number of the objective)

so basically what i am asking is: is the condition number (of the hessian of the objective) a good measure to compare different algorithms or settings?

Bruno Luong

unread,

Jul 18, 2012, 4:16:19 AM7/18/12

to

"Sargondjani" wrote in message <ju5oid$nk$1...@newscl01ah.mathworks.com>...

>
> so basically what i am asking is: is the condition number (of the hessian of the objective) a good measure to compare different algorithms or settings?

No. The condition number only give an indication of the good/bad parametrization of the unknowns. The ill-posedness of the problem (observablity of the unknown) is still independent to the parametrization. It is an intrinsict characteristic of the problem you are dealing with.

Bruno

Matt J

unread,

Jul 18, 2012, 11:10:21 AM7/18/12

to

"Sargondjani" wrote in message <ju5oid$nk$1...@newscl01ah.mathworks.com>...
>
>

> this leaves me with 2 more more general questions:
>
> is the hessian of the objective function alone a good measure, or could the condition number of the constraint also influence the accuracy of the result? (which i would not know about when i just consider the condition number of the objective)
>
> so basically what i am asking is: is the condition number (of the hessian of the objective) a good measure to compare different algorithms or settings?

================

Assuming your problem is well-posed (and you've said you're sure that it is), cond(Hessian) only matters as far as speed of convergence of the algorithm is concerned. The speed of convergence of an algorithm affects accuracy only in the way that it interacts with your stopping parameters, TolX, TolFun, MaxIterations, etc... Obviously, if you stop too early, you can get an inaccurate result.

The Hessian condition number is often an indicator of the speed of convergence of an algorithm, although the relationship is different for different algorithms and some algorithms are more robust against bad conditioning than others. When the solution lies in the interior of the feasible set, it is the Hessian of the objective function alone that matters. When constraints are active at the solution, it is the Hessian of the Lagrange function

http://en.wikipedia.org/wiki/Lagrange_multiplier

that matters.

Optimization algorithms often try to improve speed by computing the Hessian, which opens up ways of compensating for a poor condition number on-the-fly. The easiest way to see this is to compare steepest descent with Newton's method for the toy problem I mentioned f(x,y)=x^2 + 1e4*y^2. This problem has a poor condition number, cond(Hessian)=1e4. Steepest descent performs very badly in this case: if you step in the direction of the negative gradient d=-2*(x,1e4*y) from an initial point (x,y), you are not stepping in the general direction of the solution at x=y=0. Hundreds of iterations may be necessary to get close to the solution. This is a symptom of the poor conditioning.

However, if you use the Newton direction

dnewton=-inv(Hessian)*gradient = -(x,y)

you are stepping directly toward the solution at x=y=0. With an appropriate step-size, you will get there in 1 iteration and in spite of the poor condition number.

Stepping in the Newton direction is what Optimization Toolbox solvers generally try to do, although their success depends on their ability to compute or approximate the Hessian, which different algorithms do in different ways. It is also influenced, as you noted, by finite difference computations, potentially.

Because fmincon is trying to recondition the Hessian for you, the bottom line is that by scaling the problem yourself manually, you are probably only helping the situation

(a) by making finite difference computations more reliable.

(b) by making the scaling of your variables match TolX, TolFun, TypicalX, etc... more appropriately

Sargondjani

unread,

Jul 21, 2012, 3:34:26 PM7/21/12

to

Thanks alot Bruno and Matt.... I think i understand alot better!

I still have the question regarding my initial problem. I made a simple example which totally confuses me... I put the program here under. It is of the type:
X0 = [0;0]
S= [1;1e-6];
objective = -S'*sqrt(X);
contraint = S'*(X-1);
then minimize objective with fmincon.

It is optimal to set X(1)=X(2)=1 and i dont want to change/scale S (because i cant do that in my real problem). I also want to use finite differences as in my real problem.

Now the best way to solve it, is to use SQP with high precision (TolFun). But i wish i could get close to that by scaling myself using active-set algorithm (because then i know i understand). Unfortunately, im unsuccelful at doing so.

The thing that confuses me most is that if i put TolFun = 1e-10 then the scaled problem gets a better solution. If i change it to TolFun = 1e-12 then the unscaled gets a (much) better solution. And, worst of all, the scaled version does not get better. It does converge faster though (as expected by the condition number).

I think this has to do with the inaccuracy of finite differences, especially because things get (alot) worse if i start for example from X0= [0.1;0.1]. I would really like to know how I could improve it.... Thanks alot in advance!

%%%%% START %%%%%%%%%%

function acc_test_forum()

clear all;
close all;
clc;

S= [1;1e-6];

X0 = [0;0];

%1. UNSCALED PROBLEM
us=1;
if us == 1;
f_obj=@(X)fun_obj_us(S,X);
f_con=@(X)fun_con_us(S,X);
options1 = optimset('Algorithm','active-set',...
'TolFun',1e-12,'TolX',1e-6,'Display','Iter');
display('UNSCALED OPTIMIZATION')
[X1,V1,~,~,~,gr1,hss1] = fmincon(f_obj,X0,[],[],[],[],zeros(2,1),[],f_con,options1);
c_hss1 = cond(hss1);
end

%2. SCALED PROBLEM
sc=1;
if sc==1;
vSC = S;
X0_sc = X0.*vSC;
f_obj=@(Xs)fun_obj_sc(S,Xs,vSC);
f_con=@(Xs)fun_con_sc(S,Xs,vSC);
options2 = optimset('Algorithm','active-set', ...
'TolFun',1e-12,'TolX',1e-12,'Display','Iter');
display('SCALED OPTIMIZATION')
[X2sc,V2,~,~,~,gr2,hss2] = fmincon(f_obj,X0_sc,[],[],[],[],zeros(2,1),[],f_con,options2);
X2 = X2sc./vSC;

c_hss2 = cond(hss2);
end
save acc_test_op
end

% AUXILIARY FUNCTIONS
%% UNSCALED OBJECTIVE
function [V] = fun_obj_us(S,X);
V = -S'*sqrt(X);
end

%% UNSCALED CONSTRAINT
function [C,Ceq] = fun_con_us(S,X);
C = S'*(X-1);
Ceq = [];
end

%% SCALED OBJECTIVE
function [V] = fun_obj_sc(S,Xs,vSC);
X = Xs ./ vSC;
V = -S'*sqrt(X);
end

%% SCALED CONSTRAINT
function [C,Ceq] = fun_con_sc(S,Xs,vSC);
X = Xs ./ vSC;
C = S'*(X-1);
Ceq = [];
end

%%%%%%END%%%%%%%

Matt J

unread,

Jul 21, 2012, 9:57:09 PM7/21/12

to

"Sargondjani" wrote in message <juf082$gcm$1...@newscl01ah.mathworks.com>...

> Thanks alot Bruno and Matt.... I think i understand alot better!
>
> I still have the question regarding my initial problem. I made a simple example which totally confuses me... I put the program here under. It is of the type:
> X0 = [0;0]
> S= [1;1e-6];
> objective = -S'*sqrt(X);
> contraint = S'*(X-1);
> then minimize objective with fmincon.
>
> It is optimal to set X(1)=X(2)=1 and i dont want to change/scale S (because i cant do that in my real problem). I also want to use finite differences as in my real problem.

==============

As in your code below, there are positivity constraints on X as well, right? Otherwise, the problem you've shown has no bounded solution.

A few remarks,

(1) I don't have the optimization toolbox, so I can't run your code or know what results you're getting unless you show them to me.

(2) In your formulation above, you have an equality constraint, but in your code below, you have it as an inequality constraint. Also, it is a linear constraint, yet you are using FMINCON's nonlcon argument to express it.

(3) The problem you've shown is not differentiable (because sqrt is not), violating the rules of fmincon. You could make the change of variables sqrt(x)=z to make it differentiable. Then, however, your equality constraint would indeed become nonlinear, and you should use nonlcon to express it, but it still should be an equality constraint, not an inequality.

(4) No idea why you're seeing improved convergence speed with scaling, unless of course it is a finite difference thing. As I said in earlier posts, I don't expect this.

Bruno Luong

unread,

Jul 22, 2012, 4:41:11 AM7/22/12

to

"Sargondjani" wrote in message <juf082$gcm$1...@newscl01ah.mathworks.com>...

> Thanks alot Bruno and Matt.... I think i understand alot better!
>
> I still have the question regarding my initial problem. I made a simple example which totally confuses me... I put the program here under. It is of the type:
> X0 = [0;0]
> S= [1;1e-6];
> objective = -S'*sqrt(X);
> contraint = S'*(X-1);
> then minimize objective with fmincon.
>
> It is optimal to set X(1)=X(2)=1 and i dont want to change/scale S (because i cant do that in my real problem). I also want to use finite differences as in my real problem.

Juts a quick note that both components of the solution are fully stuck by the constraints. In this case the Hessian analysis without constraints you made earlier is completely off. And beside your (unconstrained) Hessian is negative because of the square-root, which show how much off you are.

For constrained problem, more or less it's the Hessian of the cost function relative to the Hessian of the constraints that count (I make some verbal short-cut here).

Usually the analysis a non-linear constrained optimization is more complex than unconstrained linear problem, you just make an experience of it.

Bruno

Sargondjani

unread,

Jul 22, 2012, 6:33:30 AM7/22/12

to

Thanks both. From your comments and some more tests, i am convinced that getting the finite differences right is much more important than getting the scaling right.
Also because with bad fin. differences you can get a good condition number while the solution is very bad (garbage in, garbage out)

@Bruno: the constraint does not pin both components down to 1: i can increase X(1) at the cost of X(2). but if you mean that the constraint is always binding, yes, that is true.

And dont get your comment that the 'Hessian is negative because of the square-root, which show how much off you are'. I assume you transform the problem to maximization (to get negative hessian), but still i dont see what you mean... can you elaborate please?

@ Matt: Yes, X is bounded to >0
1) It was a bit difficult to post results, and i know enough already :-)

2) i wanted fmincon to handle the linear constraint as a non-linear one, because in my real problem it will be a non-linear constraint. It is an inequality constraint, but i know it will always be binding. i just ran some tests and it seems that it is then better to insert it as an equality constraint. (can you confirm this?)

3) with X>0 the function sqrt(X) is differentiable, no?

4) in your 'pet example' the scaling did improve convergence (as reflected in the condition number), so i was expecting some similar improvement if i scaled my problem,
but it turns out that scaling makes getting the finite differences right much more difficult (in this particular case at least).

maybe 'FinDiffRelStep' could help, but i have 2011a, which doesnt have this, so for me unscaled with finite-differences set to central or calculating the fin.diff.gradient myself ( supplying it with 'GradObj','on') seems to be the easiest way to get good results...

It took me quite some time to understand this, but Im there!! Thanks guys! :-)

Matt J

unread,

Jul 22, 2012, 7:46:19 AM7/22/12

to

"Sargondjani" wrote in message <jugktq$aot$1...@newscl01ah.mathworks.com>...

>
>
>
> @ Matt: Yes, X is bounded to >0

===========

No, the constraints in your code bound X to >=0, which is an important distinction for what we're talking about.

> 1) It was a bit difficult to post results, and i know enough already :-)

==============

But you've asked us to comment on your results. We can't do that unless you share them with us :-)

> 2) i wanted fmincon to handle the linear constraint as a non-linear one, because in my real problem it will be a non-linear constraint. It is an inequality constraint, but i know it will always be binding. i just ran some tests and it seems that it is then better to insert it as an equality constraint. (can you confirm this?)

===========

How can I confirm that it is better? Again, I can't run your code so I don't know what you are seeing. Is it "better" in terms of speed? In terms of accuracy? I can imagine it would improve both since informing FMINCON that the constraints should be binding helps it narrow its search.

> 3) with X>0 the function sqrt(X) is differentiable, no?

==========

The constraints are X>=0, so no. The gradient contains terms 1/2/sqrt(X(i)) which are undefined wherever any X(i)=0.

In any case, as long as the iterations of FMINCON are lucky enough not to go toward the boundary {X(i)=0}, this should be of no consequence.

> 4) in your 'pet example' the scaling did improve convergence (as reflected in the condition number), so i was expecting some similar improvement if i scaled my problem,
> but it turns out that scaling makes getting the finite differences right much more difficult (in this particular case at least).

===============

Then you've been ignoring what I've been saying about FMINCON already doing the condition number normalization for you. Normalizing something twice doesn't make it more normalized!

In any case, the scaling you've chosen 1./S is not a good choice. You can check this by calculating the Hessian of the objective function (which is the same as the Hessian of the Lagrange function in this case) at the desired solution X=1 both with and without this scaling and you will see that it leaves the condition number unchanged.

Without scaling, the Hessian is H=.75*diag(S) and so cond(H)=1e6

With scaling, the Hessian becomes H=.75*diag(1./S) and so cond(H) is still 1e6.

The scaling that you really want is sqrt(1./S)

Note that because the binding constraints are linear, their contribution to the Hessian of the overall Lagrange function is zero. So, I disagree with Bruno that the constraint Hessian is what matters in this particular case.

Sargondjani

unread,

Jul 24, 2012, 2:12:09 AM7/24/12

to

> > 4) in your 'pet example' the scaling did improve convergence (as reflected in the condition number), so i was expecting some similar improvement if i scaled my problem,
> > but it turns out that scaling makes getting the finite differences right much more difficult (in this particular case at least).
> ===============
>
> Then you've been ignoring what I've been saying about FMINCON already doing the condition number normalization for you. Normalizing something twice doesn't make it more normalized!

Hmm, you are right that i ignored that, because there are examples where the scaling does help. As in your pet example manual scaling can help, or is that just with fminunc?

> In any case, the scaling you've chosen 1./S is not a good choice. You can check this by calculating the Hessian of the objective function (which is the same as the Hessian of the Lagrange function in this case) at the desired solution X=1 both with and without this scaling and you will see that it leaves the condition number unchanged.
>
> Without scaling, the Hessian is H=.75*diag(S) and so cond(H)=1e6
>
> With scaling, the Hessian becomes H=.75*diag(1./S) and so cond(H) is still 1e6.
>
> The scaling that you really want is sqrt(1./S)
>
> Note that because the binding constraints are linear, their contribution to the Hessian of the overall Lagrange function is zero. So, I disagree with Bruno that the constraint Hessian is what matters in this particular case.

Right. I thought getting the gradients of similar magnitude would do the trick. Anyway, since fmincon already gets it right, i won't be bothered any more, haha. But it's good to know... Now I am back to getting finite differences accurately :-)

Matt J

unread,

Jul 24, 2012, 5:45:17 PM7/24/12

to

"Sargondjani" wrote in message <julebp$e6j$1...@newscl01ah.mathworks.com>...

>
> Hmm, you are right that i ignored that, because there are examples where the scaling does help. As in your pet example manual scaling can help, or is that just with fminunc?

=============

I've never seen that example run on any of the Optimization Toolbox solvers (and never ran it myself, since I don't have the Toolbox). I don't expect scaling to make a difference in any case.

Matt J

unread,

Jul 24, 2012, 8:36:07 PM7/24/12

to

"Sargondjani" wrote in message <julebp$e6j$1...@newscl01ah.mathworks.com>...
>

> Right. I thought getting the gradients of similar magnitude would do the trick.

==========

No. There are situations where the gradient components have dissimilar magnitudes for very good reasons. Take the well-conditioned quadratic

f(x,y)=10000*(x^2+y^2)/2

The gradient at x=-1, y=0 have very different sized components

g(-1,0)=[-10000;0]

But -g points exactly in the direction you want to go.