Understanding SDP solving time and memory usage -- Advice on faster formulation?

64 views
Skip to first unread message

Nathan Dwek

unread,
May 6, 2025, 3:51:59 PMMay 6
to mo...@googlegroups.com
Hello,

I am trying to solve a problem of the form:


where e_s is a scalar for s=1, ..., S. R_s is symmetric sparse with the same sparsity pattern across all s. X should have the same sparsity pattern as well.

The initial hope was to solve this for matrices of size ~10 000 on "reasonable hardware" (Xeon CPU with around 200GB of RAM).

Please find attached the current implementation for the problem above (lines 6-50 are not directly relevant as their role is to produce the input data e_s and R_s). The implementation uses LMIs and attempts to take advantage of the sparsity pattern to reduce the number of decision variables and improve overall solving time.

Unfortunately, the solving time of this implementation is not at all encouraging: it rapidly explodes for matrices of size > 100, almost independently of the degree of sparsity.

Having no formal training when it comes to SDP, and with my relatively limited experience with mosek (certainly with SDPs), and given that I use it in a relatively "black box" manner, through matlab, I do not really know how to approach and/or diagnose this performance. I am looking for help understanding the solving time and scaling of the above problem, both from an analytical perspective as well as any design pitfalls I might have fallen into when implementing it.

Below some additional practical information:

The implementation has been run on three systems, using the old matlab toolbox API:
  1. Linux laptop with 16GB RAM:
    Matlab version   : 24.2.0.2740171 (R2024b) Update 1
    Architecture     : GLNXA64
    mosekopt path    : mosek/10.0/toolbox/r2017aom/mosekopt.mexa64
    MOSEK version    : 10.0.26
  2. Windows laptop:
    Matlab version   : 24.2.0.2863752 (R2024b) Update 5
    Architecture     : PCWIN64
    mosekopt path    : Mosek\11.0\toolbox\r2019bom\mosekopt.mexw64
    MOSEK version    : 11.0.17
  3. Xeon desktop with 36 cores and 255GB RAM:
    Matlab version   : 23.2.0.2365128 (R2023b)
    Architecture     : PCWIN64
    mosekopt path    : mosek\10.1\toolbox\R2017aom\mosekopt.mexw64
    MOSEK version    : 10.1.11

The attached code can be used to generate a task file.

Thank you first of all for the amazing solver, and second for any potential advice/help,
Kind Regards,
Nathan




 
demo_forced_sparsity.m

Michal Adamaszek

unread,
May 7, 2025, 2:26:04 AMMay 7
to mosek
Hi,

In your code you only minimize the sum of squares over the sparsity pattern, not the full Frobenius norm. I don't think there is a reason to assume that the remaining entries in the optimal X will be zero unless it somehow follows otherwise from your application. I assume that is intentional and we want to continue like that.

Exploiting sparsity this way you are then destroying it by adding the full vectorized constraint vec(X)>>0 through the SVECPSD cone. That is equivalent to additional dim*(dim+1)/2 constraints. The better approach would be to define X as a semidefinite variable (barvar) and use its entries in the evaluation of the norm. This way only the entries appearing in the sparsity pattern will appear in the model and the total number of constraints will be proportional to roughly S*density. Here are my back-of-the envelope experiments with CVX for S=1 (I didn't run your code because of missing randsample() but this is roughly equivalent to my barvar approach)

dim = 1000

R = sprand(dim, dim, 0.02)
k = find(R)

e = 1.3
cvx_solver mosek

cvx_begin sdp
cvx_solver_settings('write', 'model2.ptf')
variable X(dim, dim) semidefinite
minimize norm(vec(e*X(k)-R(k)))
cvx_end

That solves nicely and perhaps you can get to a few thousand this way.

Michal

Nathan Dwek

unread,
May 7, 2025, 6:00:40 AMMay 7
to mosek
Hi, thanks for the prompt response,

X is indeed desired to have the same sparsity pattern as the different R_s (which all have the same sparsity pattern). This is why I was trying to reduce the problem size by minimizing the sum of squares over the sparsity pattern only.

Unfortunately, I don't fully understand what you mean by
"you are then destroying it by adding the full vectorized constraint vec(X)>>0 through the SVECPSD cone."

My idea was to have only the non zero coefficients of the lower triangle of X in the decision variable, and to create the full vectorized lower triangle of X through Fx+g, which results in a lot of true zeros in this conic vector. I feel like I must be understanding something wrong then? Could you elaborate on how you see this?

Thank you for the example you provide. It is not exactly what I am after as it does not enforce the sparsity pattern of R on X, but it is still interesting to me. In particular I have the following question over the formulation choice and expected solving time/complexity.

Some context first: clearly your example runs faster than what I came up with. Looking at the PTF, I see that CVX formulates the Frobenius norm (over the sparsity pattern) by taking, for each non zero coefficient, the inner product of X with a matrix that is non-zero only at that coefficient, and then proceeding "as expected" from there. This results in as many symmetric matrices as there are non-zeros. Actually, this is the way I first implemented the Frobenius norm on my side as well (I don't think there is another way if X is a barvar given that the only available operations are inner products), which resulted in poor performance on my side. I then read documentation and conversations in this group, which led me to the LMI formulation I presented in my first email.

So my question is: did I get it wrong that this large number of symmetric matrices and inner products would result in poor performance? Is the LMI choice predictably slower in this case? How come the formulation by CVX is "so fast" even though it relies on large matrices instead of smaller vectors?

On a slightly unrelated note, I attach a simplified version of the initial demo that does not use randsample. R has a slightly different sparsity pattern as a result and is created in a naive way, but it does not really matter to this conversation I believe.

Thanks again for your time and looking forward to your explanations,
Kind Regards,
Nathan
demo_googlegroup.m

Michal Adamaszek

unread,
May 7, 2025, 6:42:17 AMMay 7
to mosek
Yes, I missed the part that your svecpsd constraint restricts some (or most, really) entries of the PSD matrix to 0, sorry. But in any case if you want to impose that the entries outside the sparsity pattern are 0 then you have around dim(dim+1)/2 constraints just to declare that, whether you use the svecpsd or barvar, and that is really killing your performance.

So my code should really look like this

dim = 1000

R = sprand(dim, dim, 0.02);
R = R+R';
k = find(R);
z = find(~R);


e = 1.3
cvx_solver mosek

cvx_begin sdp
cvx_solver_settings('write', 'model3.ptf')

variable X(dim, dim) semidefinite
minimize norm(vec(e*X(k)-R(k)))
X(z)==0
cvx_end


But that means you should really be dualizing the problem, as CVX does in this case. See the discussion in https://docs.mosek.com/latest/faq/faq.html#why-is-my-sdp-lmi-slow-or-running-out-of-memory and the links in that question

In particular the dual of my toy model with S=1 above is something like (modulo my mistakes)

maximize <V(k), R(k)>
s.t. norm((V(k)) <= 1
      V >> 0

and that operates only on the number of entries in the sparsity pattern.

Nathan Dwek

unread,
May 7, 2025, 6:53:18 AMMay 7
to mosek
Thank you for your time, I really appreciate it.

I will take a look at your example and the resources you point to w.r.t. dualizing, I think it's past time I dig more seriously on that side. I'll make sure to come back with at least some minimal observations/feedback/conclusions, but that might take a bit.


From: 'Michal Adamaszek' via mosek <mo...@googlegroups.com>
Sent: Wednesday, May 7, 2025 12:42 PM
To: mosek <mo...@googlegroups.com>
Subject: [mosek 3142] Re: Understanding SDP solving time and memory usage -- Advice on faster formulation?
 
--
You received this message because you are subscribed to a topic in the Google Groups "mosek" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mosek/nu4-VMMxoyg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mosek+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mosek/42e58da4-ac2f-4639-87b9-14ad68826184n%40googlegroups.com.

Nathan Dwek

unread,
May 7, 2025, 8:42:24 AMMay 7
to mosek
Just cross-checking with the PTF file, is it possible that the dual problem is

minimize <V(k), R(k)> (instead of maximize)
s.t. norm((V(k)) <= 1
      V >> 0

Once again, to make sure my intentions are understood as positive, I am asking this not for the sake of the arguing/finding mistakes, but because I am trying to verify my understanding of the dualization and I know my own understanding is quite limited.

BTW, if you'd like to take this as an opportunity to elaborate a bit on how you did the dualization "pen and paper", and how one (I 🙂 ) would recover X from V, that would be wonderful, but I realize I am asking for a lot. I am trying to familiarize myself with the theory of it all, but having someone hold your hand while doing it is certainly a convenience.

Thank you for giving this a chance!
Nathan



From: mo...@googlegroups.com <mo...@googlegroups.com> on behalf of Nathan Dwek <natha...@kuleuven.be>
Sent: Wednesday, May 7, 2025 12:50 PM
To: mosek <mo...@googlegroups.com>
Subject: Re: [mosek 3143] Re: Understanding SDP solving time and memory usage -- Advice on faster formulation?
 

Michal Adamaszek

unread,
May 8, 2025, 5:02:57 AMMay 8
to mosek
Yes, I missed one minus sign.

I wrote up my line of thinking more carefully in the attached PDF to convince myself and I hope it convinces you as well.  plus/minus, \frac12 and \sqrt{2} not guaranteed correct ;).

Michal
frobenius.pdf

Nathan Dwek

unread,
May 9, 2025, 3:02:15 AMMay 9
to mosek
Thank you so much! I appreciate your patience and this is very helpful.

For posterity's sake, attached a function that implements the problem you suggest, with great success.

I need to take some time to cautiously investigate, but a first very (too?) early takeaway is that while CVX dualizes automatically, it comes up with a different formulation for S>1 that seems (?) significantly less efficient and solves in a longer time that scales explosively with dimension.

I'll share more when/if I come to stronger conclusions, maybe it is an interesting topic for CVX itself.

I would like to reiterate my appreciation for your help and patience!
Kind Regards,
Nathan
fro_dual.m
Reply all
Reply to author
Forward
0 new messages