Ignoring Invariance for a manifold

33 views
Skip to first unread message

ujjal2...@gmail.com

unread,
Apr 3, 2019, 1:13:55 AM4/3/19
to Manopt
Hi

I have a basic doubt. Let's say we have an objective f(L), L is a dxl matrix, s.t. we want to constrain L^TL=I, i.e. we want to constrain L to be on the Stiefel manifold St(l,d). But suppose f(L) is invariant by multiplication of L by an orthogonal matrix O of order lxl, then the correct geometry to use is consider a quotient manifold for L, which is the Grassmannian G(l,d).

In https://epubs.siam.org/doi/pdf/10.1137/080731359 it is mentioned that if we ignore the invariance, then the solutions are not isolated. This issue is not harmful for simple gradient schemes but it greatly affects the convergence of second-order methods.

My question is:
1. Is it conceptually wrong to ignore the invariance?
2. Empirically, it won't matter much if we ignore the invariance, atleast in some problems? At the end of the day all we want is an orthogonal matrix L, right? (Atleast in problems where we are getting good convergence by using GD, or CGD)


Regards
Ujjal

Nicolas Boumal

unread,
Apr 3, 2019, 8:10:37 AM4/3/19
to Manopt
Hello Ujjal,

Thank you for your clear question. Here are a few thoughts on this:

> "If we ignore the invariance, then the solutions are not isolated. This issue is not harmful for simple gradient schemes but it greatly affects the convergence of second-order methods."

This is correct, in theory at least. Gradient descent doesn't really care about one taking the quotient or not, since the gradient is naturally orthogonal to the invariance directions (as the cost function is constant along those). So, mostly, gradient descent does the right thing automatically. There is a caveat here that, if the retraction is not invariant under the choice of equivalence class representative, then of course the resulting algorithm is not really iterating on the quotient space. A retraction that works both on Stiefel and on Grassmann is the polar retraction for example. As for second order methods, it is true that, in theory, if solutions are not isolated, then their local convergence could be affected, and (although I do not remember where that is written out explicitly), I am pretty sure there are examples of functions for which fast local convergence fails due to the local minimizers not being isolated. This being said, in practice, I haven't seen such effects.

About the paper you reference: there is something special to that story. Specifically, they optimize over a matrix V, while really only caring about the product VV*. Admittedly, there is an O(r) invariance here, as V and VQ are equivalent for any small orthogonal matrix Q. But, (and this point is glossed over in that paper), the quotient space is not a quotient manifold, unless we explicitly restrict our attention to matrices V of full column rank. One way to see this is that, in a quotient manifold, the orbits / fibers / equivalence classes must be submanifolds of the same dimension, yet the equivalence class of, say, the zero matrix, is a single point. This is problematic because, as they show in their analysis, we actually hope to converge to a rank deficient matrix V. So, in this particular setup, I strongly recommend not taking the quotient. This is how I approached this problem in the following papers with my co-authors:

https://arxiv.org/abs/1606.04970 (shorter conference version)
https://arxiv.org/abs/1806.03763 (extension, more technical)


> "1. Is it conceptually wrong to ignore the invariance?"

I would say no. It's certainly nice, both mathematically and numerically, to harness the invariance when it's there and it's clear how to, but, pragmatically, I see no fundamental wrongdoing in ignoring it.

> "Empirically, it won't matter much if we ignore the invariance, at least in some problems?"

Empirically, I agree. I can't say things will be fine in all cases (due to the loss of isolation), but in my experience, empirically, things tend to be fine. For analysis, it's another story.
Message has been deleted

ujjal2...@gmail.com

unread,
Apr 3, 2019, 1:52:02 PM4/3/19
to Manopt
Hi Nicolas

Thanks a lot for the detailed answer. I have few more questions:

1. (Apologies in case it is present already, I could not find it). Do you have a separate manifold utility (like the product manifold ), for implementing quotient manifolds?

2. Particularly, I have the following problem:

f(B,C)=x^T*B*C^{1/2}*C^{1/2}*B^T*x, where x is a dx1 vector. B is a dxl matrix, constrained on Stiefel manifold St(l,d). C is a lxl matrix.

I have yet another vector to learn, let's call it v, a mx1 vector, say. Hence, my overall search space in this case would be a product manifold defined as:

tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.C = euclideanfactory(l,l);

M = productmanifold(tuple);

But, f(BQ,Q^T*C*Q)=f(B,C), for all Q belonging to the orthogonal group O_l. And hence, we are dealing with a quotient manifold, right?

3. Assuming my question in (2) to be yes, how do we implement it in Manopt?

4. What if I restrict C to be a diagonal matrix, then I can simply learn a vector c representing the diagonal of C, and would like to implement as follows:

tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.c = euclideanfactory(l);

M = productmanifold(tuple);

Is it correct to do so?

5. All my arguments for (2) and (3) above, would hold the same, if I constrain C to be on the SPD manifold, with the following implmentation:
tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.C = sympositivedefinitefactory(l);

M = productmanifold(tuple);

Am I right?

Kindly help.

Regards
Ujjal

Nicolas Boumal

unread,
Apr 3, 2019, 2:04:16 PM4/3/19
to Manopt
Hello Ujjal,

You can find some answers below -- unfortunately I won't have time now to answer fully to everything.


On Wednesday, April 3, 2019 at 1:52:02 PM UTC-4, ujjal2...@gmail.com wrote:
Hi Nicolas

Thanks a lot for the detailed answer. I have few more questions:

1. (Apologies in case it is present already, I could not find it). Do you have a separate manifold utility (like the product manifold ), for implementing quotient manifolds?


No. Unfortunately, figuring out the geometry of a quotient manifold is not as direct as figuring out the geometry of a product of known manifolds. You'll have to do the work manually. The best way is to follow the development in Absil et al.'s 2008 book.
 

2. Particularly, I have the following problem:

f(B,C)=x^T*B*C^{1/2}*C^{1/2}*B^T*x, where x is a dx1 vector. B is a dxl matrix, constrained on Stiefel manifold St(l,d). C is a lxl matrix.

I have yet another vector to learn, let's call it v, a mx1 vector, say. Hence, my overall search space in this case would be a product manifold defined as:

tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.C = euclideanfactory(l,l);

M = productmanifold(tuple);

But, f(BQ,Q^T*C*Q)=f(B,C), for all Q belonging to the orthogonal group O_l. And hence, we are dealing with a quotient manifold, right?


It certainly looks like it. To be sure, you can verify the assumptions of Theorem 21.10 in Lee's book on differential geometry from 2012.
 

3. Assuming my question in (2) to be yes, how do we implement it in Manopt?


Following the developments in Absil's book, you would first figure out the orbits (equivalence classes under group action): these should be submanifolds of the total space. The tangent spaces to these submanifolds are called the vertical spaces. These correspond to uninteresting directions since they move within the equivalence classes. The orthogonal complements of the vertical spaces, called the horizontal spaces, are one-to-one with the (abstract) tangent spaces of the quotient manifold, so you would use horizontal vectors to represent tangent vectors as matrices. From there, you'll need a retraction that respects invariances. How to get Riemannian gradients and Riemannian Hessians is also explained in Absil's book. This is fairly subtle, so it's best to take some time to read the parts of the book on this topic.
 

4. What if I restrict C to be a diagonal matrix, then I can simply learn a vector c representing the diagonal of C, and would like to implement as follows:

tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.c = euclideanfactory(l);

M = productmanifold(tuple);

Is it correct to do so?


It's not completely clear to me, but mostly because I didn't take enough time to think about it.
 

5. All my arguments for (2) and (3) above, would hold the same, if I constrain C to be on the SPD manifold, with the following implmentation:
tuple.v = euclideanfactory(m);
tuple.B = stiefelfactory(d, l);
tuple.C = sympositivedefinitefactory(l);

M = productmanifold(tuple);

Am I right?


Same here.

Best of luck!
Nicolas

Reply all
Reply to author
Forward
0 new messages