Avoiding expansion of squares

170 views
Skip to first unread message

Mark L. Stone

unread,
Sep 26, 2016, 9:08:40 PM9/26/16
to YALMIP
Consider the following Rosenbrock on steroids problem:

m = 1e6;
sdpvar X
(2,1)
assign
(X,[-1.2;1])
Obj = m*(X(2)-X(1)^2)^2+(1-X(1))^2
sol
= optimize(-10<=X<=10,Obj,sdpsettings('solver','baron','baron.maxtime',7200,'usex0',1))

The file passed from YALMIP via matbar to baron to solve the problem has this objective:

-2*x1 + x1^2 + 1000000*(x2^2) - 2000000*(x1^2)*x2 + 1000000*( x1^4) + 1

YALMIP has expanded the square.  When baron tries to solve this, ti doesn't recognize the non-negative sum of squares, and therefore does not immediately come up with a lower bound of 0.
When the matbar interface to baron is directly called from MATLAB, the objective retains its form as sum of squares, and baron immediately determines a lower bound of 0.
So it appears that YALMIP has expanded the squares before passing model info to matbar.

Is there any way to change this behavior in YALMIP?  For instance, an "inert" or something square which doesn't get expanded?

Thanks.

Johan Löfberg

unread,
Sep 27, 2016, 2:24:15 AM9/27/16
to YALMIP
Yes, this is a direct consequence of the fact that YALMIP doesn't parse anything, but simply evaluates expressions. This means that structures like this get lost.

Of course, one could introduce a new operator which doesn't expand, but I am not too fond of special operator but prefer vanilla MATLAB syntax.

For this particular case, I am bit surprised that baron doesn't presolve this better, as YALMIPs built-in global solver derives the lower bound 0 already during presolve (after an upper bound of ~0 is established by a nonlinear solve (which baron has available in the root node too). YALMIP does not use any more information but works with the flat expanded model.
>> sol = optimize([-10<=X<=10],Obj,sdpsettings('solver','bmibnb'))
* Starting YALMIP global branch & bound.
* Branch-variables : 1
* Upper solver     : fmincon
* Lower solver     : CPLEX
* LP solver        : CPLEX
 Node       Upper      Gap(%)       Lower    Open
    1 :   -1.164E-10     0.00      0.000E+00   0  Poor bound  
* Finished.  Cost: -1.1642e-10 Gap: 0
* Timing: 87% spent in upper solver (1 problems solved)
*         1% spent in lower solver (3 problems solved)
*         3% spent in LP-based domain reduction (4 problems solved)

To avoid loss of knowledge from expanded expressions, you have to explicitly work with the decomposition through equalities, which helps baron here as the objective now is trivially non-negative without much analysis
e = sdpvar(2,1);
sol = optimize([-10<=X<=10, e == [X(2)-X(1)^2;1-X(1)]],e'*e,sdpsettings('solver','baron'))





Johan Löfberg

unread,
Sep 27, 2016, 2:50:40 AM9/27/16
to YALMIP
Correction: An upper bound of 0 is not required for bminb to derive the lower bound 0. The lower bound 0 is derived directly from the quasi-quadratic form of the objective (presolved from the flattened expression)

Mark L. Stone

unread,
Sep 27, 2016, 8:42:35 AM9/27/16
to YALMIP
Thanks Johan.  That does the trick.

 Placing all the "residuals" in a vector e and using the quadratic objective e'*e looks like the way to deal with any nonlinear least squares problem. Would there be a down side to doing this if a local solver such as FMINCON, KNITRO, SNOPT, IPOPT were being used (possibly in conjunction with other linear or nonlinear constraints)?

Johan Löfberg

unread,
Sep 27, 2016, 8:58:23 AM9/27/16
to YALMIP
The downside is that you might take a possibly convex nonlinear objective and turn it into a guaranteed nonconvex constraints. Some solver might not enjoy that. YALMIP does this a lot though, any high-level operator f(g(x)) is written as f(z), z == g(x) immediately when there is more than one term in g(x), so you have probably solved problems of this type before without knowing it

Mark L. Stone

unread,
Sep 27, 2016, 9:28:19 AM9/27/16
to YALMIP
I played around a bit with local solvers using both my original version, and the e version (I inserted factor of sqrt(m) in the first component of e to make the formulations equivalent).

It seems that the e'*e formulation greatly decreases number of iterations and function evaluations, particularly for large m in both KNITRO and FMINCON.  However, with for instance m = 1e6, using KNITRO, depending on starting point and bounds, with the e'*e formulation it can converge, claiming local optimum with small OptError, to various interior points whose gradient is extremely non-zero, so I don't think (but not sure) it is just a numerical or tolerance issue with the very bad scaling and big numbers at starting point messing up the optimality tolerance (but maybe I'm wrong about thart),  So is KNITRO solving a nonlinear equality constrained problem which has non-global local minima, and therefore could be finding legitimate local optima to the problems it is being provided?  If so, then this would be very undesirable behavior on a real-world nonlinear least squares problem.  I didn't get to any non-global optimum with FMINCON on the cases I tried.

Mark L. Stone

unread,
Sep 27, 2016, 9:35:54 AM9/27/16
to YALMIP
KNITRO and IPOPT are both showing 2 variables with lower and upper bounds.   KNITRO shows 1 linear equality constraint and one nonlinear inequality constraint.  IPOT shows 2 equality constraints. SO I guess they could be solving the same problem.  What woudl the one linear equality and one nonlinear equality constrains be?

Mark L. Stone

unread,
Sep 27, 2016, 9:40:35 AM9/27/16
to YALMIP
So on second thought, something like a sum of squares might be nice for nonlinear least squares. Note that CVX has a function sum_square .

I'd rather have a solver take more iterations, than to introduce spurious local minima.

Mark L. Stone

unread,
Sep 27, 2016, 9:53:45 AM9/27/16
to YALMIP
O.k., I'm an idiot. I guess the nonlinear equality constraint is for e(1) and the linear equality constraint is for e(2).

Mark L. Stone

unread,
Sep 27, 2016, 10:05:26 AM9/27/16
to YALMIP
It looks like I messed up some of those KNITRO runs. Now it appears that the e formulation not only use few iterations, but has improved numerics  over the original version.  The original version was sometimes claiming to converged  to local optimum at points which clearly are not local optima - perhaps it was getting screwed up by huge gradient values at the starting points I was using.

Mark L. Stone

unread,
Sep 27, 2016, 10:11:13 AM9/27/16
to YALMIP
It seems that the e formulation has almost neutralized the horrrible ill-conditioning or the original formulation,becases the Hessian of the objective has condition number 1, but I guess something still creeps in through the nonlinear constraint into the Lagrangian, so that's not perfectly conditioned, but much better than the original, and the nonlinear equality constraint is not being squared.

Johan Löfberg

unread,
Sep 27, 2016, 10:19:14 AM9/27/16
to YALMIP
Here's for experimenting. Cannot be used with baron as baron doesn't know what sum_square is
sum_square.m

Johan Löfberg

unread,
Sep 27, 2016, 10:23:53 AM9/27/16
to YALMIP
quick fix so that it at least runs in bmibnb
sum_square.m

Johan Löfberg

unread,
Sep 27, 2016, 10:34:21 AM9/27/16
to YALMIP
I'll add baron support later tonight

Mark L. Stone

unread,
Sep 27, 2016, 10:55:54 AM9/27/16
to YALMIP
Thanks for the sum_square .

Fun fact:

m=1e16;
X
= sdpvar(2,1)
assign
(X,[-80000;90000]);
sol
= optimize([-100000<=X<=100000,e==[sqrt(m)*(X(2)-X(1)^2);1-X(1)]],e'*e,sdpsettings('solver','knitro'))

takes 18 iterations and 76 function evaluations (KNITRO 10.1.2)

The same thing, but in FMINCON (MATLAB R2014A) takes 17424 iterations and 95184 function evaluations.

Interestingly, using the above parameter values,
Obj = sum_square([sqrt(m)*(X(2)-X(1)^2),1-X(1)])
KNITRO converges to the correct solution, not a bogusly declared local optimum as with the original formulation.





Mark L. Stone

unread,
Sep 27, 2016, 10:59:11 AM9/27/16
to YALMIP
And on the same problem as immediately above, FMINCON is now down to 200 iterations and 1020 function evaluations vs. 17424 iterations and 95184 function evaluations using the original formulation.
Message has been deleted

Mark L. Stone

unread,
Sep 27, 2016, 3:34:03 PM9/27/16
to YALMIP
My post immediately above refers to use of sun_square with FMINCON.

Mark L. Stone

unread,
Oct 10, 2016, 1:46:07 PM10/10/16
to YALMIP
Johan, I'm trying to understand why FMINCON and KNITRO do (so much) better using the sum_square formulation of the objective function rather than the original objective function formulation.  I understand why that is the case for BARON, but don't understand for the local solvers FMINCON and KNITRO. What is the difference in the problems being passed by YALMIP to these solvers using sum_square vs. original formulation?

Original formulation (really nasty, horribly ill-conditioned Rosenbrock on ultra-steroids, which is far beyond the ability of FMINCON or KNITRO to handle)
m=1e16;
X
= sdpvar(2,1)
assign
(X,[-80000;90000]);
Obj = m*(X(2)-X(1)^2)^2+(1-X(1))^2
sol
= optimize(-100000<=X<=100000,Obj,sdpsettings('solver','knitro'))

sum_square formulation (which KNITRO solves in 18 iterations, 76 function evaluations, and FMINCON solves in 17424 iterations, 95184 function evaluations)
m=1e16;
X
= sdpvar(2,1)
assign
(X,[-80000;90000]);
Obj = sum_square([sqrt(m)*(X(2)-X(1)^2),1-X(1)])

sol
= optimize(-100000<=X<=100000,Obj,sdpsettings('solver','knitro'))





Johan Löfberg

unread,
Oct 10, 2016, 2:00:22 PM10/10/16
to YALMIP
With sum_square you will have a very nicely conditioned objective and nonlinear quadratic equalities, which knitro simply performs better on compared to a model with a quartic objective

Mark L. Stone

unread,
Oct 10, 2016, 2:41:49 PM10/10/16
to YALMIP
So are you saying that sum_square produces the same formulation as the e' * e with constraints on e formulation? If so, then I understand why the improvement is obtained.  But does that mean that sum_square could turn a convex problem into a non-convex problem with a spurious non-global local minimum which might be the solution returned by the solver?

Johan Löfberg

unread,
Oct 10, 2016, 2:51:45 PM10/10/16
to YALMIP
Yes.

The way YALMIPs nonlinear support is implemented is that all expressions in nonlinear functions are normalized, which means introduction of intermediate variables. This might change in future versions. In the develop branch, you can see some massive performance gains for problems of the type min f(Ax+b) s.t g(Cx+d)<=0, where these intermediate variables no longer are used (well, they are used, but never exposed to the solver, instead YALMIP eliminates them after gradient evaluations etc). These ideas might be extended to avoid exposing intermediate variables to the solver)

This takes several seconds in current version, but finishes trivially in develop branch. Only 1 variable really, but yalmip introduces 1000 new variables to simplify derivatives etc
sdpvar x
optimize
([],-entropy(x + randn(1,1000)))





Mark L. Stone

unread,
Oct 10, 2016, 7:23:12 PM10/10/16
to YALMIP
Hmm, I ran


sdpvar x
optimize
([],-entropy(x + randn(1,1000)))

with both ecos and scs under both yalmip_R20160930 and develop time stamped morning of 20161009.

Ecos was many times faster than scs direct or indirect.  But ecos seemed the same under both YALMIP{ versions. As did scs.  ~ 0.32 sec YALMIP,, 0.37 sec solver for ecos under both 20160930 and develop versions of YALMIP.

Does the develop version of YALMIP I got have what you are talking about? I think my configurations really were what I am saying, but I suppose I could have screwed up.

Is using the develop branch as "safe" as using the latest released version, or has it perhaps not undergone the full regimen of pre-release testing, and so is riskier for certain things?

Johan Löfberg

unread,
Oct 11, 2016, 1:57:35 AM10/11/16
to YALMIP
This has only been updated for some callback layers (fmincon, ipopt,knitro) so far. Adding it for solvers which work on cone representations is something I didn't think of, but it should be pretty easy, and reasonable, to do. 

Well, develop is where fixes and new features end up after development on separate branches, and everytime I push things to develop, a test harness is run automatically. It runs hundreds or thousand of small and large tests for over 20 minutes, so the likelihood of regression bugs is fairly low if it passes. If there is a green flag after the last commit (https://github.com/yalmip/YALMIP/commits/develop), it is essentially as safe as master, as these are the only tests I have). 

Johan Löfberg

unread,
Oct 11, 2016, 7:08:45 AM10/11/16
to YALMIP
Quick test reveals that ecos does not benefit from having these intermediate variables eliminated. The operations ecos performs on these while knowing the structure (exponential cone) are so simple and sparse that it does not make a difference

Mark L. Stone

unread,
Oct 11, 2016, 2:47:03 PM10/11/16
to YALMIP
scs didn't seem to benefit either - I guess that's because it also natively supports the exponential cone. It's about 40 tines slower than ecos, I guess in large part due to being a first order solver.

Of the solvers I tried (only with develop version, not with R20160930), knitro was the fastest.  FMINCON and OPOPT solved it as well. BMIBNB solved it (I put in bounds -10 <= x <= 10) using CPLEX and ecos.

BARON doesn't like entropy (same error with or without assign and usex0,1)
 
sdpvar x
assign
(x,1);
optimize
([],-entropy(x + randn(1,1000)),sdpsettings('solver','baron','usex0',1,'debug',1))
Error using baron (line 179)
There was an error evaluating a user supplied function. Please examine the below error and correct your function.

Error using "entropy" (line 51)
SDPVAR
/ENTROPY called with CHAR argument?

Error Trace:
- Error in "callbaron/@(x)-1*entropy(x(2))" (line 0)

Error in callbaron (line 79)
[x,fval,exitflag,info,allsol] = baron(obj,A,rl,ru,lb,ub,con,cl,cu,xtype,x0,opts);

Error in solvesdp (line 350)
   
eval(['output = ' solver.call '(interfacedata);']);

Error in optimize (line 31)
[varargout{1:nargout}] = solvesdp(varargin{:});

So I guess this is a bug report to not provide a model to BARON using entropy.  rntropy is not a function BARON knows how to deal with.  It doesn't even deal with trig functions.

Mark L. Stone

unread,
Oct 11, 2016, 2:50:20 PM10/11/16
to YALMIP
Typos in preceding post:
OPOPT --> IPOPT
rntropy --> entropy

Johan Löfberg

unread,
Oct 11, 2016, 3:07:50 PM10/11/16
to YALMIP
I haven't tested it on scs (and my test on ecos is not in develop, only on my machine). 

Yes, ecos and scs are exponential cone solvers (only available). For things like a nonlinear program with entropy cost, my opinion is that an exponential cone solver is fairly redundant. Better to simply use a nonlinear solver and overload the entropy operator and supply gradient (as is done for nonlinear solvers). An exponential cone representation of entropy forces you to work with every single xlogx element and have a cone for each term Actively working with exponential cone representations would be relevant first when there is a mix of socp, sdp and exponential cones.

baron is not based on callbacks so it can only work with function it knows and have support for, and this list is pretty short. Hence, to support entropy in baron, I would have to replace the entropy term with a sum of logarithmic terms in the string describing the objective

Mark L. Stone

unread,
Oct 11, 2016, 3:49:25 PM10/11/16
to YALMIP
On this problem (multiple runs), KNITRO is the fastest, then FMINCON, IPOPT, ecos. Then scs bringing up the rear, way back (those pesky first order things being slower than 2nd order).  KNITRO is about 26 times faster than ecos.  All of this is using 1009 develop.

Mark L. Stone

unread,
Oct 12, 2016, 3:06:04 AM10/12/16
to YALMIP
KNITRO does pretty well with the sum of squares, but it is possible to over-tax it with really ill-conditioned problems.
I added another dimension to Rosenbrock, and added another nonlinear square term which will add a second nonlinear constraint to the formulation. I made the two nonlinear constraints pretty tough.

X = sdpvar(3,1);
e = sdpvar(3,1);
m = 1e16;
n = 1e32;
assign(X,[-80000;90000;54000]);
sol = optimize([-100000<=X<=100000,e==[sqrt(n)*(X(3)-X(2)^2);sqrt(m)*(X(2)-X(1)^2);1-X(1)]],e'*e,sdpsettings('solver','knitro'))

The problem got more and more difficult for KNITRO to solve as n goes up through the 1e20's,and into the 1e30's. But it still solves it up just fine through n = 1e39.
 At n = 1e40, KNITRO descended for a while, then started ascending, then descended again.
 It was fast right at the end.
   iteration   objective value
    9207     5.090720e-007
    9208     1.443674e-012 
    9209     9.990564e-022
    9210     1.627170e-028 
    9211     1.419804e-028 
    9212     1.419804e-028 
    9213     1.419804e-028

Of course without the sum_square or e' * e formulation, KNITRO wouldn't have a chance on any of these.  All of these (even little old harmless n = 1e20) are way too difficult for FMINCON (using default settings for everything except for very high MaxIter and MaxFun values) to handle with any number of iterations and function evaluations - it just goes gradually wandering all over the place.

Mark L. Stone

unread,
Oct 12, 2016, 3:36:30 AM10/12/16
to YALMIP
Update,for m = 1e-16, n = 1e-20 (still pretty easy compared to what KNITRO solved), FMINCON finally declared local optimum at objective value of 9.569204e-15, after 276920 iterations and 1578136 objective function evaluations.

Mark L. Stone

unread,
Oct 12, 2016, 3:42:24 AM10/12/16
to YALMIP
KNITRO solved this same "easy" problem in 18 iterations and 76 objective function evaluations.  I guess this is another case of you get what you pay for.  You pay a fair amount for FMINCON, which takes 276920 iterations and 1578136 objective function evaluations..  Or you can pay quite a bit more for KNITRO which takes 18 iterations and 76 objective function evaluations.
Reply all
Reply to author
Forward
0 new messages