Simple optimization problem on Complex Stiefel manifold does not work well.

72 views
Skip to first unread message

jch...@gmail.com

unread,
Jan 23, 2019, 3:30:11 PM1/23/19
to Manopt
Hi,

I'm new to Manopt. Thank you for creating the library - it looks awesome.

As a first toy problem, I've decided to construct one which looks pretty trivial to me.

The idea is to optimize an orthonormal basis V of C_n (a unitary matrix) - hence optimizing on the complex Stiefel manifold.

The optimization goal is to align V to some basis U.
Alignment means that every column of V needs to point in the same direction as the corresponding column of U. A complex phase between them (dot product on unit circle) is Ok.
Thus the cost function is
C = trace(1 - abs(V' * U) .^ 2)

The egrad that I have computed is:
g = -2 * v .* u .* conj(u);

checkgradient seems to be happy.

You can find my code at the following gist:
https://gist.github.com/jchayat/d70a97ba33640038a98d5b3bac86d6fb


Here's the problem - unless I start at a point which is close to U, optimization does not converge. None of the solvers were able to converge from a random starting point (I did not fiddle with parameters too much).

Reason I believe this should not be a hard problem: basically we should rotate the initial guess by U * V_init' to get to U. Also, because we ignore phase difference, there should not be a problem of "handedness".


What am I missing? Is this a hard optimization problem (lots of saddle points or local minima)? Is it a hyperparameter problem? Something else?


Your help is much appreciated!
Best regards,
Jon

Nicolas Boumal

unread,
Jan 23, 2019, 5:15:53 PM1/23/19
to Manopt
Hello Jon,

Thanks for your clear question.

I just ran your code: checkgradient fails here. I get a slope of 1 instead of a slope of 2. Did it give a slope of 2 for you?

If I remove the gradient (commenting out problem.egrad = ...) then Manopt approximates the gradient with a finite difference (this is quite slow so it's not recommended, but it's nice just to check things). When I do that, trustregions converses very nicely.

Can you check the gradient again?

Best,
Nicolas

jch...@gmail.com

unread,
Jan 24, 2019, 9:42:42 AM1/24/19
to Manopt
Hi Nicolas,

Thank you for your prompt reply.

Indeed, after checking, the gradient calculation turned to be incorrect.
I did not discover the problem because I checked for U = I, which seems to be the initial guess for V.

for reference, the correct gradient is:
g = -2 * u * diag(dot(u, v));

After fixing the gradient, the optimization process was able to quickly converge!

the full corrected source can be found here:
https://gist.github.com/jchayat/bfe9b4da27f5a84dfcedbe03dd721693


Follow up question:
I'm trying to calculate the hessian, but I'm not sure exactly what I'm supposed to calculate. As far as I understand, I'm not supposed to calculate the complete hessian, but rather the derivative of the gradient along a certain direction. So for a scalar function, the gradient is a vector, the hessian is a matrix, and the directional derivative is also a vector. In our case, the gradient is already shaped as a matrix, so the directional derivative should be a matrix as well. Am I understanding this correctly?

Is the directional derivative the same as the hessian times a unit norm direction vector (Unfortunately my understanding of differential geometry is somewhat limited)? Also, how does all of this play out in the case of complex gradient (the gradient itself was computed using Wirtinger calculus)?

I got a little confused with all of the details here and can't seem to compute the correct thing.

Can you maybe help with deriving the correct hessian?

Thanks,
Jon

Nicolas Boumal

unread,
Jan 24, 2019, 10:02:59 AM1/24/19
to manopt...@googlegroups.com
Hello Jon,

Great, glad that you could fix the gradient.

Short answer to your question regarding the Hessian: just add this to your code:

    function h = ehess(v, vdot)
        h
= -2 * u * diag(dot(u, vdot));
   
end


    problem
.ehess = @ehess;

This code for the (Euclidean) Hessian is obtained by differentiating the expression for the (Euclidean) gradient with respect to v along the direction vdot. Both v and vdot are matrices, and the output h is also a matrix; all of them have the same size.


Somewhat longer answer:

When you work with points in R^n as column vectors, it is true that the gradient at v is itself a vector in R^n, and we typically think of the Hessian at v as a matrix of size nxn such that, if we apply that matrix to a vector vdot, then we get get a vector in R^n: the directional derivative of the gradient at v along vdot.

The directional derivative can be taken along any vector, it doesn't have to be a unit vector.

More generally, if we work in a linear space E, then the gradient at v is an element of E, and the Hessian is a linear operator from E to E. Applying E to vdot (an element of E) yields another element in E.

In your case, the linear space E is a set of matrices of size nxn. So points, the gradient, and the result of applying the Hessian to a direction vdot, all of these things are matrices of size nxn.

About the fact that you work with complex matrices: the important point is to give E a structure as a real linear space. The traditional way to do this is to think of the real and imaginary parts as separate, so that we really think of E as the set of pairs of real matrices (A, B) of size nxn each, where U = A + iB.

You can endow this space with a real inner product as follows:

<U, V> = real(trace(U'*V)) = real(sum(dot(U, V)).

(Notice that U' is the Hermitian conjugate transpose of U.)

I believe this view is compatible with what's called Wirtinger calculus, but I'm not completely sure.

I hope this helps.

Best wishes,
Nicolas

jch...@gmail.com

unread,
Jan 26, 2019, 11:55:22 AM1/26/19
to Manopt
You're awesome.

Again, the problem was carelessness when taking the derivative.
Works well now.

Jon

On Thursday, 24 January 2019 17:02:59 UTC+2, Nicolas Boumal wrote:
> Hello Jon,
>
>
> Great, glad that you could fix the gradient.
>
>
> Short answer to your question regarding the Hessian: just add this to your code:
>
>
>
>
>     function h = ehess(v, vdot)
>         h = -2 * u * diag(dot(u, vdot));
>     end
>
>
>     problem.ehess = @ehess;
>
>
> This code for the (Euclidean) Hessian is obtained by differentiating the expression for the (Euclidean) gradient with respect to v along the direction vdot. Both v and vdot are matrices, and the output h is also a matrix; all of them have the same size.
>
>
>
>
> Somewhat longer answer:
>
>
> When you work with points in R^n as column vectors, it is true that the gradient at v is itself a vector in R^n, and we typically think of the Hessian at v as a matrix of size nxn such that, if we apply that matrix to a vector vdot, then we get get a vector in R^n: the directional derivative of the gradient at v along vdot.
>
>
> The directional derivative can be taken along any vector, it doesn't have to be a unit vector.
>
>
> More generally, if we work in a linear space E, then the gradient at v is an element of E, and the Hessian is a linear operator from E to E. Applying E to vdot (an element of E) yields another element in E.
>
>
> In your case, the linear space E is a set of matrices of size nxn. So points, the gradient, and the result of applying the Hessian to a direction vdot, all of these things are matrices of size nxn.
>
>
> About the fact that you work with complex matrices: the important point is to give E a structure as a real linear space. The traditional way to do this is to think of the real and imaginary parts as separate, so that we really think of E as the set of pairs of real matrices (A, B) of size nxn each, where U = A + iB.
>
>
> You can endow this space with a real inner product as follows:
>
>
> <U, V> = Real(trace(U'*B)) = real(sum(dot(A, B)).
Reply all
Reply to author
Forward
0 new messages