Optimization of logdet on spectrahedronfactory

48 views
Skip to first unread message

Xinyang Li

unread,
Feb 3, 2023, 11:11:17 AMFeb 3
to Manopt
Hi manopt team,

I encountered a problem in my research. My optimization problem is:

max_X logdet(I + sum_i H_i * X * X^T *H_i)

where H_i are NxM matrices for all i=1...num_H, X is a MxS (S<M) matrix of unit Frobenius norm to be optimized, and I is the identity matrix. I optimized this problem on spectrahedronfactory, and noticed that X * X^T always converges to the same matrix thus the optimal value also always converges to the same value, regardless of the used solver and initial points. 

This problem is non-convex in Euclidean space, so I have no idea if the obtained result is the global optimum. But it interests me very much, as it is relevant to my research. I'm not a guy from mathematics and have hardly any experience with Riemannian geometry. So could you please give me some hints or references about it?

Thank you so much in advance!

A small codes snippet to reproduce the problem:
M = 8;
N = 8;

num_H = 6;

H = zeros(N,M,num_H);
for i = 1:num_H
    H(:,:,i) = randn(N,M);
end

%%
S = 2;
manifold = spectrahedronfactory(M, S);
problem.M = manifold;
problem.cost = @(x) -cost(num_H, H, N, x);
problem = manoptAD(problem);
[x, xcost, info] = steepestdescent(problem);        

% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration #');
ylabel('Gradient norm');
title('Convergence on the sphere');

x * x'
xcost

function obj = cost(num_H, H, N, x)
    sum = 0;
    for i = 1:num_H
        sum = sum + H(:,:,i) * (x * x') * H(:,:,i)';
    end
    obj = log(det(eye(N) + sum));
end

Ronny Bergmann

unread,
Feb 7, 2023, 2:40:38 AMFeb 7
to Manopt
Hej,
I do not work much with the Matlab Manopt package, but concerning your question on convexity: It might indeed be, that the problem is convex on a manifold. Let me try to give you a bit of intuition by taking the Sphere S2 (surface of a ball in R3). One thing that is _very_ different is the “way to walk” in R3 you can just take a point x and a direction \xi and “walk” by saying x+\xi. Doing the same on the sphere, and you will directly leave the manifold. So first, we need to know “what are directions we can walk into”, and the answer is the tangent space - informally the collection of all derivatives of curves running though a point. In our image literally the plane tangent to a point on the sphere.
Now the + (or walking along a straight line in R3) is replaced by so-called geodesics, which is the shortest curve connecting two given points on the manifold, that stays on the manifold (where I left out the techical detail of how to measure length, which is also not unique).
With all these ideas at least mentioned (I'd recommend to read do Carmo: “Riemannian Geometry”) we can get an idea of convexity. Again, on Rn you measure convexity along lines: The function stays below the connecting line of (x, f(x)), (y,f(y)) if you look at ( x + t(y-x), f(x + t(y-x) ). Here y-x=\xi is the direction to walk into from the previous part.
On a manifold convexity is considered along geodesics (we do not have lines). But that means if you check a function given in R3 for convexity and then on S2, you choose then 2 points on S2 and their connecting geodesic. This geodesic is the great arc running through both points and that is different from the connecting line.
(Here it's nice to have the sphere to be able to compare that). So – there are functions that are convex on a manifold (in your case spectrahedron) and not convex when considered in the embedded Euclidean space (in your case the space of matrices).

Whether that is the case for your function – I do not know, for that I would have to sit down for a while and would have to get more familiar with the spectrahedron (again) as well as your function – but I hope this helps as a first (a bit unmathematical) intuition, why this might be the case.

Best
Ronny

Xinyang Li

unread,
Feb 7, 2023, 8:08:01 AMFeb 7
to Manopt
Hi Ronny,

First, there's a typo in the problem. The problem is (there should be a transpose on the last H_i):
max_X logdet(+ sum_i H_i * X * X^T *H_i^T)

Thanks for the suggestions. Actually, I've tried to prove its geodesic convexity but failed. What I've done is the following:

In the beginning, I doubt if it is convexity on the manifold, so I looked for some books and papers about that:

Absil, P-A., Robert Mahony, and Rodolphe Sepulchre. "Optimization algorithms on matrix manifolds." Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2009.
Vishnoi, Nisheeth K. "Geodesic convex optimization: Differentiation on manifolds, geodesics, and convexity." arXiv preprint arXiv:1806.06373 (2018).
Ferreira, O. P., A. N. Iusem, and S. Z. Németh. "Concepts and techniques of optimization on the sphere." Top 22 (2014): 1148-1170.

Then I tried to prove its convexity by plugging in the formula of the geodesic on the sphere. But I was not able to show it after struggling with the complicated and messy formula for several days. Then I noticed that if X is a solution to the problem, then XQ is also a solution to it when Q is an arbitrary orthogonal matrix, so I turned to the quotient manifold, spectrahedron in our case, i.e., sphere manifold with the orthogonal group. After getting familiar with the basic knowledge of quotient manifolds such as horizontal space, I started to prove its geodesic convexity on the quotient manifold. But again I failed because the proof became much more complicated than before. 

After that I came up with this paper:
Journée, Michel, et al. "Low-rank optimization on the cone of positive semidefinite matrices." SIAM Journal on Optimization 20.5 (2010): 2327-2351.

And it gives me the following insight:
If X of size MxS (S<M) is the solution to the problem on spectrahedron(M, S), i.e., the Riemannian gradient vanishes at X when we apply gradient descent methods, then the matrix X'=[X,0] (of size M x S+1) by appending a column of all 0 to X will also result in a vanishing Riemannian gradient on spectrahedron(M, S+1), but X' is usually not a solution to that problem. That means, the problem has other points at which the Riemannian gradient is zero but they might not be optimum. This brings me to the conjecture that the problem is not geodesic convex. But the simulation always converges to the same result instead of those stationary points, which makes me so curious about the properties of the problem.

Then I thought of the pseudoconvexity and wondered if it can be generalized to Riemannian manifolds. But I must admit that it is beyond my knowledge level and might have no time to dive into it because I spent one month on this single problem and I have to return to my research project.... 

If you could provide further insights about it, I'll appreciate it so much! If not, it doesn't matter and thank you all the same, and I'll come back to this problem after finishing my current project. I'll read your suggested book for a deeper understanding of Riemannian geometry. Thank you so much!

Best,
Xinyang
Reply all
Reply to author
Forward
0 new messages