Hessian Check for Stiefel Complex Manifold-Mixed Results

74 views
Skip to first unread message

jasbr...@gmail.com

unread,
May 6, 2019, 9:49:31 PM5/6/19
to Manopt
Hello.

I’m trying to optimize the function real(trace(C.U*.A.C)) where C and A are square complex matrices and U is a unitary matrix. For the factory, I used the Complex Stiefel Manifold with the gradient C.U.A+C*.U.A*. For some matrices C and A, the hessian check succeeds and optimizes properly. With other matrices C and A, the hessian check fails, yet the optimization still completes in ~5 steps and fits my expectations. For random matrices, the hessian check fails and will not converge. Naturally, I would think that my gradient function is not correct, but have seen on an old forum post with a similar problem that it was actually the manifold having an error. The examples are demonstrated below:

From my experiences using
n=length(C);
manifold=stiefelcomplexfactory(n,n,1);
problem.M=manifold;

problem.cost = @(U) -real(trace(C*U'*A*U));
problem.egrad = @(U) -(C*U*A+C'*U*A');

The following matrices give different results

A=[-0.4682 + 0.3235i, 0.8288 ; 0.32 , -0.4682 + 0.3235i ];
C=[-0.9656 + 0.2952i, 0.6655 ; 0.2 , -0.9656 + 0.2952i ];
Hessian Check Succeeds and Converges

A=[0,0.5861+0.5861i;0.2263+0.2263i,0]
C=[-0.9656 + 0.2952i, 0.6655 ; 0.2 , -0.9656 + 0.2952i ];
Hessian Check begins to deviate but algorithm converges in 4-5 steps

A=[-0.1206+0.4621i,0.1989+0.0384i;0.1860+0.2813i,0.0904-0.4059i];
C=[-0.4125+0.0886i,0.0024-0.2933i;0.1679-0.1481i,-0.1644-0.2378i];
Hessian check fails and no convergence in 1000 iterations.


Do you have any advice as to why these cases are acting so strangely and how to remedy them?

Nicolas Boumal

unread,
May 7, 2019, 8:26:35 AM5/7/19
to Manopt
Hello,

Thanks for your detailed question. (In your cost function description in your post, you meant real(trace(C.U*.A.C)) instead of real(trace(C.U*.A.U)), right?)

I ran your code: for the second set of matrices, checkgradient fails, so the problem occurs there already, not necessarily with the Hessian.

Here is the correct code for the gradient:

problem.egrad = @(U) -(A*U*C + A'*U*C');

And this is for the Hessian:

problem.ehess = @(U, Udot) -(A*Udot*C + A'*Udot*C');

This should converge pretty fast.

Best,
Nicolas
Message has been deleted

jasbr...@gmail.com

unread,
May 7, 2019, 2:20:01 PM5/7/19
to Manopt
Thank you so much. The cost function in my introduction had a typo, but the one I had listed below was correct, and thank you for fixing the gradient.

Do you think it would be possible to add a constraint to this optimization? My ultimate goal would be able to maximize the cost function with the restraint that the imaginary part is 0 (i.e. I'm looking to maximize the function on the real axis). Would you recommend that I look into editing the manifold factory to reflect the unitary matrices that satisfy this property, or would it be better to find a way to change the gradient?

Nicolas Boumal

unread,
May 7, 2019, 2:26:14 PM5/7/19
to Manopt
Hello,

I'm not sure I got the question right -- you want to optimize the same cost function over n-by-n real, orthogonal matrices, instead of over n-by-n complex, unitary matrices?

If so, you can do this:

manifold = stiefelfactory(n, n, 1); 
problem.M = manifold; 

problem.cost  = @(U) -real(trace(C*U'*A*U)); 
problem.egrad = @(U) -real(A*U*C + A'*U*C');
problem.ehess = @(U, Udot) -real(A*Udot*C + A'*Udot*C');

That is: work on Stiefel (which is real by default) instead of complex Stiefel, and make sure you keep only the real part of the gradient and Hessian.

Is that what you were asking for?

Best,
Nicolas

jasbr...@gmail.com

unread,
May 9, 2019, 2:51:43 PM5/9/19
to Manopt
Hi Nicholas.

Just to make clarify, I was hoping to optimize real(trace(C.U’.A.U)) such that imag(trace(C.U’.A.U))=0. Based on your comment, I checked to see if the Stiefelfactory would satisfy this constraint, but when the matrices A and C are complex, the trace will also have a nonzero imaginary part (also the optimization is not as great as the complex factory). For
A=[0.7924,0.2500;0.3486,0.3450] and C=[0.2997,0.6652;0.1590,0.6842], I tried the following code:

n=length(C);
manifold = stiefelfactory(n, n, 1);
problem.M = manifold;

problem.cost = @(U) -real(trace(C*U'*A*U));
problem.egrad = @(U) -real(A*U*C + A'*U*C');
problem.ehess = @(U, Udot) -real(A*Udot*C + A'*Udot*C');
[x] = trustregions(problem);
real(trace(C*x'*A*x))

and

manifold = stiefelcomplexfactory(n, n, 1);
problem.M = manifold;

problem.cost = @(U) -real(trace(C*U'*A*U));
problem.egrad = @(U) -(A*U*C + A'*U*C');
problem.ehess = @(U, Udot) -(A*Udot*C + A'*Udot*C');
[y] = trustregions(problem);
real(trace(C*y'*A*y))

The first method returns 0.8744 and the second method returns 0.9243. From my experience, the stiefelcomplexfactory optimizes very consistently with my expectations, but will often have a nonzero imaginary part. Would you prefer it if I opened a new thread for this question?

Thanks,
Jason




Nicolas Boumal

unread,
May 9, 2019, 3:24:22 PM5/9/19
to Manopt
Hello Jason,

I see. This is a bit more complicated. Do we know that there exists a unitary matrix U for which trace(C*U'*A*U) has zero imaginary part?

I tried running this to find one:

clear all; close all; clc;
A=[-0.4682 + 0.3235i, 0.8288 ; 0.32 , -0.4682 + 0.3235i ]; 
C=[-0.9656 + 0.2952i, 0.6655 ; 0.2 , -0.9656 + 0.2952i ]; 
n=length(C); 
manifold=stiefelcomplexfactory(n,n,1); 
problem.M=manifold; 
problem.cost = @(U) abs(imag(trace(C*U'*A*U))); % it's nonsmooth, but it's just a small example
U = rlbfgs(problem); % this solver might fare fine with non-smoothness
trace(C*U'*A*U) % outputs 0.7132 - 0.4136i

It didn't succeed.

Best,
Nicolas

jasbr...@gmail.com

unread,
May 9, 2019, 5:32:10 PM5/9/19
to Manopt
Hello Nicolas.

I apologize about the inconsistencies. I am working on a project to plot the boundary of the set {trace(C.U'.A.U)| U unitary}. It's been proven that this set is star-shaped and has its center at Trace(A)*Trace(C)/N where N is the length of the matrices, so we can assume that the sets are centered at the origin (let A=A-trace(A)/N*I). My first goal was to optimize for the greatest real part and then afterwards add on the constraint that the imaginary part is 0, however I neglected to change the matrices to reflect the center being at the origin. So given that trace(A)=0, there should always exist a unitary matrix such that imag(trace(C.U',A.U) )=0. I hope this clarifies the matter and thank you for taking the time to answer my questions.

Thanks,
Jason

Nicolas Boumal

unread,
May 10, 2019, 8:06:28 AM5/10/19
to manopt...@googlegroups.com
Ok, got it. So here's the code I use -- it's very "ad hoc"; given the very low dimension of the problem, it seems to work fine:

clear all; close all; clc;

A=[-0.4682 + 0.3235i, 0.8288 ; 0.32 , -0.4682 + 0.3235i ]; 
C=[-0.9656 + 0.2952i, 0.6655 ; 0.2 , -0.9656 + 0.2952i ]; 

% A=[0,0.5861+0.5861i;0.2263+0.2263i,0] ;
% C=[-0.9656 + 0.2952i, 0.6655 ; 0.2 , -0.9656 + 0.2952i ]; 

% A=[-0.1206+0.4621i,0.1989+0.0384i;0.1860+0.2813i,0.0904-0.4059i]; 
% C=[-0.4125+0.0886i,0.0024-0.2933i;0.1679-0.1481i,-0.1644-0.2378i]; 


A = A - mean(diag(A))*eye(size(A));


n = length(C); 
manifold = stiefelcomplexfactory(n, n, 1); 
problem.M = manifold; 

mu = 10; % play with this parameter
problem.cost = @(U) - real(trace(C*U'*A*U)) + mu*abs(imag(trace(C*U'*A*U)));

% The function is nonsmooth, so there may not exist a gradient; regardless: by default, Manopt approximates the gradient with finite differences, and since here the problem is low dimensional, that's affordable. So, when the gradient exists, we'll get a good approximation for it. When we're at or close to a point where the gradient is undefined, we may get something meaningless, but BFGS is often fairly good at handling that (this is not a theorem.)

U = rlbfgs(problem);
real(trace(C*U'*A*U))
imag(trace(C*U'*A*U))


Note that, running this program multiple times, I do tend to get a very small imaginary part (1e-9 or so), but I get quite different real parts, presumably depending on the (random) initialization.

Also, to get the boundary of the set you mentioned, you'll probably want to solve this problem with a cost function that "rotates" the complex number trace(C*U'*A*U), using a unit-modulus complex number, to get access to other points of the boundary.

Best,
Nicolas

jasbr...@gmail.com

unread,
May 16, 2019, 3:45:44 AM5/16/19
to Manopt
Hi Nicolas.

This code works great; occasionally it has a rough starting seed, but the optimization seems very consistent overall. This is exactly what I was looking for and thank you so much for taking the time to help.

Thanks,
Jason

Nicolas Boumal

unread,
May 16, 2019, 9:46:08 AM5/16/19
to Manopt
Great to hear!
Reply all
Reply to author
Forward
0 new messages