【Emergency】deal with large-size variables, particularly with large co-variance matrix? (Out of memory)

107 views
Skip to first unread message

XM XM

unread,
Oct 10, 2016, 1:41:02 AM10/10/16
to YALMIP
I am working on a large QP problem, the problem is as follows:

r = sdpvar(len_r, 1); % len_r = 70000
r_mean = ones(len, 1); % this is a user-defined value
cov_mat = correlation(); % this is a computed co-variance matrix and is guaranteed to be positive definite. Its size is 70000*70000

Constraints = [-1 <= r <= r, others] % all constraints are linear 

My objective is to minimize (r - r_mean)' * cov_mat^(-1) * (r - r_mean)

The problem was supposed to be set as 

Objective = transpose(r - r_mean) * cov_mat^(-1) * (r - r_mean);
options = sdpsettings('verbose',1,'solver','cplex','cplex.qpmethod',1);
sol = solvesdp(Constraints,Objective, options);
if sol.problem == 0
 solution = sparse(double(r));
else
 display('Hmm, something went wrong!');
 yalmiperror(sol.problem);
end 


I know it is very difficult to inverse cov_mat and it is not necessary either. So I have tried setting objective as 

Objective = transpose(r - r_mean) * (cov_mat\(r - r_mean));
and 
Objective = matrix_partition_comp( len_r, cov_mat, r_mean, r );

Here, in function matrix_partition_comp, I use the idea of matrix partition from http://math.stackexchange.com/questions/2735/solving-very-large-matrices-in-pieces. Specifically, I partition the big 70000*70000 cov_mat into 4 equal-size blocks and also partition (r - r_mean) into 2 equal-size vector and do the corresponding computation, like what parallel computing does.  

Unfortunately, I always receive the error message: Error using sdpvar/mtimes Out of memory. Type HELP MEMORY for your options.

It is worth mentioning that if cov_mat is a simple identity matrix then the cplex works. However, cov_mat in my problem cannot be that simple, usually complicated. 

I know I have too many variables (r) to solve, but I have to. Can anyone help me with this problem?



Johan Löfberg

unread,
Oct 10, 2016, 1:59:58 AM10/10/16
to YALMIP
With N = 70000, you have to write it using a factorization,

R = factorize cleverly, since you have low-rank,R should beMxN where M<N
e == sdpvar(M,1)
optimize([-1 <= r <= 1, e == R*(r - r_mean)],e'*e)


XM XM

unread,
Oct 10, 2016, 2:27:22 AM10/10/16
to YALMIP
Hi Johan, 

Thanks for your quick response! Please correct me if I misunderstood you. Do you mean I need to find a R (M*N, M<N) so that R^T * R = cov_mat^(-1) ? Well, the problem is, I only know cov_mat so is there any established procedure that I can reference?

Then, suppose I get R, do I need to change the code set-up as 

r = sdpvar(len_r, 1); % len_r = 70000
r_mean = ones(len, 1); % this is a user-defined value
cov_mat = correlation(); % this is a computed co-variance matrix and is guaranteed to be positive definite. Its size is 70000*70000
e == sdpvar(M, 1);
Constraints = [-1 <= r <= r, others, e == R*(- r_mean) ] % all constraints are linear 
Objective = e' * e;
options = sdpsettings('verbose',1,'solver','cplex','cplex.qpmethod',1);
sol = solvesdp(Constraints,Objective, options);
if sol.problem == 0
 solution = sparse(double(r));
else
 display('Hmm, something went wrong!');
 yalmiperror(sol.problem);
end 

BTW, should I use e == sdpvar(M,1) or e = sdpvar(M, 1)

Johan Löfberg

unread,
Oct 10, 2016, 4:00:41 AM10/10/16
to YALMIP
well, R'*R = cov_mat^-1 is the definition of the cholesky factorization of cov_mat^-1

I dont think you want to inverse first though, probably better to work with e = R\(r-rmean) where R is cholesky of  cov_mat (might have a transpose wrong somewhere, but general idea)

If cov_mat is dense, or the cholesky of it, you will not be ale to solve this. 

Johan Löfberg

unread,
Oct 10, 2016, 4:51:33 AM10/10/16
to YALMIP
should be R'\(r-rmean)

z'*inv(X)*z = z'*inv(R'*R)*z = (z'*inv(R)*inv(R')*z) = e'*e where e == R'\z. If inverse cholesky is much more dense than cholesky, use R'*e == z

Johan Löfberg

unread,
Oct 10, 2016, 4:54:56 AM10/10/16
to YALMIP
Objective = transpose(r - r_mean) * (cov_mat\(r - r_mean)); is exactly the same as defining the inverse (as YALMIP works on a symbolic expression of the right-hand side, hence the back-solve is done against all basis of the right-hand-side, which typically would be all unit basis vectors spanning r)

The reason you want to work with the factor instead is that x'*S*x in the worst case will generate O(70000^2) symbolic expressions in YALMIP, i.e., all monomials in a quadratic function.

XM XM

unread,
Oct 10, 2016, 12:20:19 PM10/10/16
to YALMIP
Hi Johan, 

Thanks so much for your detailed explanation!!! Thank you for helping me understand the way yalmip works. My cov_mat is sparse so I think I can get R. My final question is, is the following setting up correct? I saw in your first response, you write "e == sdpvar(M, 1)", should it be e==sdpvar(M, 1) or e=sdpvar(M, 1)?  Many many many thanks!

r = sdpvar(len_r, 1); % len_r = 70000
r_mean = ones(len, 1); % this is a user-defined value

e == sdpvar(M, 1); % Suppose R is M*len_r
Constraints = [-1 <= r <= r, others, e == R'\(r - r_mean) ] % all constraints are linear, R is known
Objective = e' * e;

XM XM

unread,
Oct 10, 2016, 12:26:43 PM10/10/16
to YALMIP
Hi Johan,

BTW, for cholesky decomposition, if I am not wrong, R should be an upper triangle matrix with the same size as cov_mat. So if cov_mat is N*N, R is also N*N?

So e here is essentially N*1, right?

Johan Löfberg

unread,
Oct 10, 2016, 12:38:22 PM10/10/16
to YALMIP
It should be

e = sdpvar(M, 1);
Constraints = [-1 <= r <= r, others, e == R'\(r - r_mean) ]

If R is a complete Cholesky factor, then M=N. The case M<N is for the general low-rank factorization case, such as ones(100) = ones(100,1)*ones(1,100) 

Johan Löfberg

unread,
Oct 10, 2016, 12:41:39 PM10/10/16
to YALMIP
Yes, a complete Cholesky factorization of a N*N matrix is N*N. I'm writing general examples to account for low-rank factorizations

XM XM

unread,
Oct 10, 2016, 12:58:23 PM10/10/16
to YALMIP
Thanks for this clarification! Since my cov_mat is large (though sparse), it is difficult for me to find such a sparse R, which is M*N (M could be far less than N), that R' * R = cov_mat. But you suggest R = cholesky(cov_mat) which saves me a lot of time. 

May I confirm that, even if R is dense, which means that the upper-triangle of R is not sparse, would the model you suggest is still better than the original model which needs O(70000^2)?

Actually, when I got R = chol(cov_mat), I want to see if R's upper triangle is sparse, so I use sum(R(:)==0) to check the sparsity, but failed because of out of memory...

Johan Löfberg

unread,
Oct 10, 2016, 1:12:11 PM10/10/16
to YALMIP
Finding a low-rank factorization with M<N must be done if you use this approach when the matrix isn't positive definite, as the matrix doesn't have a Cholesky factorization in those cases

You would never be able to work with a dense 70000*70000 matrix in matlab. That is 40Gb om memory. YALMIP would get extremely slow even if you managed to load the data

The variable change approach described here is essentially only for the purpose of avoiding to  work with a symbolic quadratic function with up to N^2 monomials, and instead work with an expression with at most N symbolic terms. In the end when the solver is called, the data is still there somewhere. Some solvers might benefit from having the data in the quadratic cost and no equalities, and some might work better by moving it to equalities, and being guaranteed of having a perfectly well-conditioned quadratic objective.

XM XM

unread,
Oct 10, 2016, 1:32:53 PM10/10/16
to YALMIP
Thanks for clarifying these fundamentals! 

In terms of my problem, the covariance matrix cov_mat I am working with is definitely sparse and PSD. 

I want to confirm that, even if R is dense in upper-triangle, would the model e == R'\z, min e' * e is still better than the original model which needs O(70000^2)? 

Johan Löfberg

unread,
Oct 10, 2016, 1:55:23 PM10/10/16
to YALMIP
For the solver, it could or could not be a difference You still have O(70000^2) bytes of memory. In term of symbolic manipulations in YALMIP, it makes a huge difference as you only will have N additional linear monomials to work with (+ a dense linear symbolic multiplication which will start to get slow of course), and N quadratic monomials, compared to N^2 quadratic monomials which just won't work for N larger than 1000 or so perhaps. 

XM XM

unread,
Oct 10, 2016, 2:35:24 PM10/10/16
to YALMIP
Hi Johan, 

Thanks! So Matlab is still able to handle the problem even if R is dense in upper-triangle. Based on your explanation, I think if the inverse cholesky is more sparse than R, we would be better work with the inverse cholesky because we can save computation time. But at least we do not need to worry about "out of memory". 

BTW, in your opinion, which solver is better, cplex or gurobi? I understand this question may not be appropriate for you so you do not need to respond to it if you are not comfortable with it. 

Greatly appreciate your HUGE help!

Johan Löfberg

unread,
Oct 10, 2016, 2:43:00 PM10/10/16
to YALMIP
As I said, if R is dense (i.e., the underlying matrix X=R'*R is dense), you will not be able to do anything for N larger than, say, 20000. You will run out of memory, and you cannot perform any O(N^3) operations in matlab, so you will not be able to factorize it, or work with it in YALIMP, or solve optimization problems with that data.

If you by inverse cholesky mean inv(R), I don't see why that matrix should be any more sparse than R. inverse is typically an operation which destroys sparsity.

I have no opinion on cplex vs gurobi vs mosek. You can find problems where anyone is best.

Mark L. Stone

unread,
Oct 10, 2016, 2:48:37 PM10/10/16
to YALMIP
Johan, so for large QPs, are you better off calling the QP solver directly, e.g., cplexqp , without going through YALMIP?  Presuming you know how to formulate the problem, including the constraints, that is.  Directly calling the QP solver still allows you to use factorized form, and avoids all symbolic manipulation, not just reducing it to O(N).  Am I missing something?

Mark L. Stone

unread,
Oct 10, 2016, 2:52:45 PM10/10/16
to YALMIP
In my own stochastic optimizer in MATLAB , I call QP and nonnegative least squares solvers directly (without using YALMIP), but use YALMIP (optimize or optimizer) for SDP subproblems.

Johan Löfberg

unread,
Oct 10, 2016, 2:53:04 PM10/10/16
to YALMIP
Of course, if the problem is trivial enough, the fastest approach is always to get rid of the modelling layer and call the solver directly.

Johan Löfberg

unread,
Oct 10, 2016, 3:08:00 PM10/10/16
to YALMIP

In this model, 90% of the time is spent on the factorization, 1% in YALMIP and 9% in solver
S = sprandn(20000,20000,5e-5);S = S + S' + 100*speye(2e4);
R = chol(S);
e = sdpvar(2e4,1);
x = sdpvar(2e4,1);
optimize([e == R*(x-randn(2e4,1)), -1 <= x <= 1],e'*e)

Clearly would have been much faster to solve the problem directly with a solver call using S as the quadratic cost

XM XM

unread,
Oct 10, 2016, 3:15:28 PM10/10/16
to YALMIP
Hi Johan, for my problem, since the size is 70000, I cannot avoid the factorization as dealing with this 20000 problem, right?  

Johan Löfberg

unread,
Oct 10, 2016, 3:19:35 PM10/10/16
to YALMIP
Of course you can avoid factorizing it. If you simply call the solver directly without yalmip, there is no need to factorize. And if you want to factorize, whether that is expensive depends on the data. The following factorization of a sparse 70000x70000 matrix takes 0s

R = chol(speye(7e4));

XM XM

unread,
Oct 10, 2016, 3:25:04 PM10/10/16
to YALMIP
Yes, you are right, factorization depends on the matrix. If the matrix is sparse, it takes a second. 

For my problem, my covariance matrix cov_mat is sparse, though less sparse than a simple identity 70000*70000 matrix. So it does 2 seconds for me to get R = chol(cov_mat).

I think I would stick to using YALMIP, as the factorization takes little time so it seems there is not much difference if I call an opt solver directly without using YALMIP. 

Johan Löfberg

unread,
Oct 10, 2016, 3:30:47 PM10/10/16
to YALMIP
Sounds like you are lucky then with a low fill-in. You can have very sparse matrices with not that sparse cholesky factors (large fill-in)

XM XM

unread,
Oct 10, 2016, 3:37:06 PM10/10/16
to YALMIP
Yes, my cov_mat is sparse (though not as sparse as identity). And the cholesky factor may not be that sparse (I do not know how to check how much "dense" it is). I will definitely try your suggested factorization model.

Thanks again for helping with this problem. I have been trapped in it for a while with many trials failed.  

Johan Löfberg

unread,
Oct 10, 2016, 3:41:44 PM10/10/16
to YALMIP
If you actually manage to compute the cholesky factor of a 70000x70000 matrix in 2s, the cholesky factor is extremely sparse. a measure of sparsity is simply nnz(R)/numel(R)

XM XM

unread,
Oct 10, 2016, 4:06:00 PM10/10/16
to YALMIP
Greatly appreciate!!!
Reply all
Reply to author
Forward
0 new messages