Sym,bolic vs. Automatic Differentiation of Matrices

220 views
Skip to first unread message

Mark L. Stone

unread,
Dec 9, 2013, 3:52:54 PM12/9/13
to yal...@googlegroups.com
Consider the following toy problem for illustration purposes:
B=spdvar(4,4,'full')
Bsq=B^2
Constraints=[0 <= B(:) <= 1]
Objective = Bsq(1,3) + 2*Bsq(2,1)+Bsq(4,4) %  random made up nonsense for this example

It appears that YALMIP is doing a fully symbolic matrix multiply on the elements of B in order to form Bsq, as would a Computer Algebra System if provided a matrix defined in terms of symbolic entries.  This approach suffers from severe curse of dimensionality as the the matrix dimension and power of the matrix increase.  Is it possible to get YALMIP to treat such a model via "numerical" means, and use automatic differentiation on functions of a matrix power, for instance, where the elements of B are the optimization variables, and the objective is defined in terms of an integer power of B? Such an approach ought to scale much more gracefully as matrix dimension and power increase. If it is possible to avoid the symbolic generation of B^2, is it still possible for YALMIP to recognize it as a QP and feed it to a QP solver?

Thanks.

Johan Löfberg

unread,
Dec 10, 2013, 7:57:46 AM12/10/13
to yal...@googlegroups.com
Yes, YALMIP is based on an internal symbolic machinery, and it will run into performance issues when models grow as it has to keep track of all monomial terms. However, careful definitions and it should typically not be a huge problem compared to the time spent in the solver.

A good thing to know is that there is highly optimized code for squaring an untouched simple vector
N = 25;
yalmip
('clear')
B
=sdpvar(N,N,'full')
tic
Bsq=B^2;
toc

yalmip
('clear')
B
=sdpvar(N^2,1,'full')
tic
Bsq=reshape(B.^2,N,N);
toc
Linear matrix variable 25x25 (full, real, 625 variables)
Elapsed time is 25.044241 seconds.
Linear matrix variable 625x1 (full, real, 625 variables)
Elapsed time is 0.030131 seconds.

This is for instance used when defining huge quadratic programs. Instead of

N = 1e5;
x
=sdpvar(N,1);
R
= sprandn(N,N,0.01);Q=R'*R;
obj = x'
*Q*x; %Slooooow
con
= [-1 <= x <= 1]

we define
z = sdpvar(N,1);
obj
= z'*z;  % Fast
con = [z == R*x, -1 <= x <= 1]; % No problems for good solvers




Message has been deleted

Mark L. Stone

unread,
Dec 10, 2013, 1:23:31 PM12/10/13
to yal...@googlegroups.com
Yohan,

I don't understand your recommended code which squares a vector, then reshapes to a matrix..  That appears to be forming B.^2, i.e., element-wise squaring of B, not B^2, which is the matrix multiplication B*B, which is what the "slow" method is doing, and what I want to do.  If your second code is changed to
B=sdpvar(N^2,1,'full')
Bsq=reshape(B,N,N)^2
then there does not appear to be any computation time advantage vs. the first method
B=sdpvar(N,N,'full')
Bsq=B^2;

Which of us has the misunderstanding?

As for the QP. I see the computational advantage of computing with the Cholesky factor R rather than the multiplied out (R'*R) version as you have shown.

Thanks.

Johan Löfberg

unread,
Dec 10, 2013, 1:24:41 PM12/10/13
to yal...@googlegroups.com
Ah, missed that. The B^2 object is a pretty nasty symbolic expression.

And the second comment, the main point there is the trick to work with an expression which only involves O(N) symbolic quadratics, and avoid the O(N^2) which would have been the case for the obvious model. For the solver, there is no gain (but typically no major loss either) but for the symbolic machinery, the difference is massive.

Johan Löfberg

unread,
Dec 10, 2013, 1:35:18 PM12/10/13
to yal...@googlegroups.com
BTW, is this the actual model you want to solve, i.e. B non-symmetric and objective trace(C*B*B)

Mark L. Stone

unread,
Dec 10, 2013, 1:36:16 PM12/10/13
to yal...@googlegroups.com
So is the bottom line that there is no way to get a gradient of a function of the elements of a matrix power evaluated in YALMIP by (matrix level calculus) automatic differentiation, as opposed to symbolic powering prior to differentiation?

Johan Löfberg

unread,
Dec 10, 2013, 1:45:46 PM12/10/13
to yal...@googlegroups.com
Correct. Polynomials expressions are expanded when defined

A solution if using a general nonlinear solver would be to define a R^(N^2) -> R operator similar to, e.g, the sdpvar/entropy operator. Such an operator, and its derivative, is only evaluated when required

.

Mark L. Stone

unread,
Dec 10, 2013, 2:31:11 PM12/10/13
to yal...@googlegroups.com
So you mean something that looks like this https://github.com/johanlofberg/YALMIP/blob/master/operators/entropy.m ?

I believe the probability of your writing it correctly is as high as mine is low. :)

If implemented and used, then would this result in automatic differentiation being used to evaluate gradients for an optimizer, avoid all purely symbolic multiplication, and thereby bypass the curse of dimensionality arsing from symbolic manipulation?

Johan Löfberg

unread,
Dec 10, 2013, 3:06:05 PM12/10/13
to yal...@googlegroups.com
The operator trace(C*B*B') would be (I implemented this operator instead of C*B*B since I am a bit unsure about the derivative of the non-symmetrized case)

function varargout = stone(varargin)
switch class(varargin{1})

   
case 'double'
        B
= varargin{1};
        B
= reshape(B,sqrt(length(B)),[]);
        C
= varargin{2};
        varargout
{1} = trace(C*B*B');
       
    case '
sdpvar'
        varargout{1} = yalmip('
define',mfilename,varargin{1:2});        

    case '
char'

        B = varargin{3};
        C = varargin{4};
       
        operator = struct('
convexity','none','monotonicity','none','definiteness','none','model','callback');      
        operator.derivative = @(x) derivative(x,C);

        varargout{1} = [];
        varargout{2} = operator;
        varargout{3} = B;

    otherwise
        error('
What?');
end

function d = derivative(B,C)

B = reshape(B,sqrt(length(B)),[]);
d = C*B+B*C;
d = d(:);

Testing
X = sdpvar(3,3,'full');
C
= randn(3);C = C*C';
Y = randn(3);
solvesdp([],trace(C*(X-Y)*(X-Y)'
));
double(X)

ans
=

   
-0.1269    0.1098   -1.9673
   
1.0703    0.0481   -0.6273
   
0.7669   -0.6433    1.3464



solvesdp
([],stone(X-Y,C),sdpsettings('solver','ipopt'));
double(X)

ans
=

   
-0.1269    0.1098   -1.9673
   
1.0703    0.0481   -0.6273
   
0.7669   -0.6433    1.3464



Mark L. Stone

unread,
Dec 10, 2013, 4:38:55 PM12/10/13
to yal...@googlegroups.com
Yohan,

Very nice.  Thanks!!

Edi

unread,
Sep 12, 2014, 8:08:26 AM9/12/14
to yal...@googlegroups.com
Dear Yohan,

I wonder how to get the power of a matrix in yalmip.
For instance I have a matrix B and I want to get B^k, where k is a sdpvar


Regards
Edmond

Johan Löfberg

unread,
Sep 12, 2014, 8:24:12 AM9/12/14
to yal...@googlegroups.com
Why are you posting the question under this old thread? Your question has nothing to do with the thread.

It is not supported, if you mean A^k = A*A*A*A....It is an absolutely horrific nonlinear operator

If you want to do elementwise power,you have to use a for-loop and raise each element (as there is a bug in that code when the argument is a matrix)


Edi

unread,
Sep 12, 2014, 8:43:13 AM9/12/14
to yal...@googlegroups.com
Dear Yohan,
Sorry for posting in this thread.

I want to find the k-th power of a matrix B and not an elementwise power. Any suggestion for this?


Thanks
Edmond

Johan Löfberg

unread,
Sep 12, 2014, 8:44:18 AM9/12/14
to yal...@googlegroups.com
Depends on the context

Edi

unread,
Sep 12, 2014, 8:49:27 AM9/12/14
to yal...@googlegroups.com
I have an objective function which depends on B^k. All I want to do is to optimize this objective function with respect to k. 
But the thing is I am stacked in here how to get the B^k , where k is a sdpvar.

Johan Löfberg

unread,
Sep 12, 2014, 8:57:15 AM9/12/14
to yal...@googlegroups.com
Is k real or integer? Is B small or large? What is the range in k. What are the other constraints like. How does B^k enter the objective. Etc etc. Most likely impossible to do in a nice way, but you never know


Edi

unread,
Sep 12, 2014, 9:09:05 AM9/12/14
to yal...@googlegroups.com
k is an integer, B can be probably up to 100x100 symmetric, the range in k is [1:200], "B^k" enters the Objective through W_matrix1 and "k" enters the objective through the "quantization_sensor_total_var_1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Pf=0.1;
w3=1;

W=diag(B^k);
    
W_matrix1=W(:,1);
%%%%%%%%%%%%%%%%%%%%%

  eig_W_new=eigs(B,N);
           
           
        for i=2:N
            quant_second_term(i)=((1-eig_W_new(i)^(2*k)./(1-eig_W_new(i)^2)));
        end 
          
    quantization_sensor_total_var_2=sum(quant_second_term);
  
  
     quantization_sensor_total_var_1= quantization_var*(epsE^2)*(k+quantization_sensor_total_var_2);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Objective=qfunc(((qfuncinv(Pf)*sqrt((1/N)*W_matrix1*sum(((w3.^2).*(2*M_1(fff)*(sigma.^4))))+quantization_sensor_total_var))-(1/N)*M_1(fff)*sum(w3.*((sigma.^2).*SNR_i)))/(sqrt((1/N)*W_matrix1*sum((w3.^2).*(2*M_1(fff)*(sigma.^4).*(1+2*SNR_i)))+quantization_sensor_total_var)));

Constraint=P_trans*k<=P_tot;

Johan Löfberg

unread,
Sep 12, 2014, 10:37:22 AM9/12/14
to yal...@googlegroups.com
Ouch.

Hmm, well the good news is that its friday. Yay.

The bad news is that you will never get this into YALMIP with any sensible efficiency. No other modelling language either I suspect. Even worse, I think no general purpose solver will be able to address this problem with any kind of hope of succeeding.

Your best bet is probably to solve 200 problems, one for each k. Those 200 problems you probably have to solve using some home-made solver.

Edi

unread,
Sep 18, 2014, 6:39:16 AM9/18/14
to yal...@googlegroups.com
Dear Johan,

I simplified somehow the problem and now my objective to be optimized in convex with respect to "k". I still have in the numerator the matrix multiplication and I am not sure how to perform that in yalmip. Note that "k" can be relaxed now to real values. Note that the matrix "W_original" is non-negative and ergodic.
 I would be grateful if you can help with this or any comments/ideas.


%======================================================================
k = sdpvar(1,1);                            
assign(k,1);

y1=rand(10,1);
y2=rand(10,1);
Var_max=5;
eig_w_original=eig(W_original);
epsE=0.01;
quantization_var=2;
M=10;

Var1=Var_max*((1/M)+(eig_w_original(2)^(2*k))*(1-(1/M)))+(epsE^2)*(quantization_var*k+((1-eig_w_original(2)^(2*k))/(1-eig_w_original(2)^2))*(M-1)*quantization_var);

x1=W_original^(k)*y1;

 x2=W_original^(k)*y2;
 
  Objective_final=((x1-x2).^2)./(Var1);

Constraints = [steps_final_original<=100]


  ops = sdpsettings('solver','bmibnb','bmibnb.maxiter',50,'usex0',1)
  sol = solvesdp(Constraints,Objective_final,ops)
%==================================================================================

Johan Löfberg

unread,
Sep 18, 2014, 6:51:41 AM9/18/14
to yal...@googlegroups.com
You cannot (should not, might work due to a bug) do DOUBLEMATRIX^SDPVAR

You objective does not make sense, as it is a matrix.

If k is your only decision variable, it would be overkill to use anything but a simple gridding.


Edi

unread,
Sep 18, 2014, 6:54:28 AM9/18/14
to yal...@googlegroups.com
Sorry, the new objective has to be like this. It is a scalar actually. 

Objective_final=((x1(2)-x2(2)).^2)./(Var1);

Johan Löfberg

unread,
Sep 18, 2014, 7:06:38 AM9/18/14
to yal...@googlegroups.com
If you for some reason absolutely want to solve this using nonlinear optimization, instead of performing the trivial gridding, you can scalarize the power

sdpvar k
Var1=V...
[V,D] = eig(W_original);
x1
=V*diag(diag(D).^k)*V'*y1;
x2=V*diag(diag(D).^k)*V'
*y2;
Objective = ((x1(2)-x2(2)).^2)./(Var1);  
solvesdp
([10 >= k>=1e-3],Objective,sdpsettings('solver','bmibnb'))

BMIBNB actually solves the problem to global optimality then in roughly 5 seconds on my machine (gridding takes 0.5s)

Edi

unread,
Sep 18, 2014, 7:27:20 AM9/18/14
to yal...@googlegroups.com
I get this error:


Undefined function 'flush' for input arguments of type 'double'.

Error in sdpvar/power (line 13)
x = flush(x);

Johan Löfberg

unread,
Sep 18, 2014, 8:03:56 AM9/18/14
to yal...@googlegroups.com
Update to the latest release

Johan Löfberg

unread,
Sep 18, 2014, 9:36:18 AM9/18/14
to
Add the known property that the objective is non-negative, and the gap is closed much faster

solvesdp([10 >= k>=1e-3,Objective>=0],Objective,sdpsettings('solver','bmibnb','usex0',1))

since this info only is of use in the lower bound computations and bound propagations, but is redundant in the nonlinear upper bound solves, you can add it as a cut

solvesdp([10 >= k>=1e-3,cut(Objective>=0)],Objective,sdpsettings('solver','bmibnb','usex0',1))

Reply all
Reply to author
Forward
0 new messages