Condition number for Riemannian Hessian

Jesus Briales

unread,

Feb 3, 2017, 1:43:13 AM2/3/17

to Manopt

Hi everyone :)

I have a question regarding the computation of the condition number of the Riemannian Hessian, when this has some inherent zero eigenvalues we want to disregard.

The simplest approach would be to use `hessianspectrum` in order to obtain the whole set of eigenvalues, then use the ratio of the largest and the smallest (non-zero) eigenvalues. This approach however does not scale well.

The simplest alternative I could think of then is to use `hessianextreme`. To compute the maximum eigenvalue it's straightforward, but in the case of the minimum non-zero eigenvalue I'm not sure how I should proceed. I think `hessianextreme` is only able to return 1 eigenvalue for now, how difficult would it be to modify `hessianextreme` to return K eigenvalues in either side, in a similar fashion to `eigs` or `svds` in Matlab?

Best,

Jesus

Nicolas Boumal

unread,

Feb 3, 2017, 10:20:16 AM2/3/17

to Manopt

Hello Jesus,

You are right that hessianextreme only returns 1 eigenvalue (and eigenvector) on each end of the spectrum.

Here is my simplest answer:

Open up hessianspectrum:

https://manopt.org/reference/manopt/tools/hessianspectrum.html

locate the calls to eigs (line 160 if no preconditioner): all you want is to modify those calls to force the tool to get only the top / bottom eigenvalues...

If you work out a clean, retro-compatible way of modifying hessianspectrum to allow that, I'm interested in seeing it.

Trouble that may arise: the modification should be easy enough if M.vecmatareisometries() returns true for your manifold and there are no preconditioners (or the precon is available as a square root), 'cause then you can use eigs with 'LA' and 'SA' inputs; but otherwise, the Hessian operator is not symmetric, and even though it's eigenvalues are real indeed, eigs has no way to know that; so you won't be able to use LA and SA.

The way I see it, it would be acceptable to extend hessianspectrum only if those conditions are met (vecmatareisometries() true, and if precon, then as a sqrt).

(Note that since 3.0 there is a way to create generic isometric vec/mat operations, using tangentorthobasis: https://www.manopt.org/reference/manopt/tools/tangentorthobasis.html)

----- Other ideas that came up first, but are more complex (still worth considering though):

To obtain k on either side, you would need to implement a new tell: to replace tangentspherefactory

https://manopt.org/reference/manopt/tools/tangentspherefactory.html

by a new tool one would call tangentstiefelfactory or tangentgrassmannfactory.

From there, it would be fairly straightforward. But implementing this tool might take a few hours (or, let's be honest, a couple days ;) and I'm not sure what the best way to do it is.

If you decide to give it a try, most likely this tool from Manopt 3.0 will help: orthogonalize:

https://manopt.org/reference/manopt/tools/orthogonalize.html

That would give an equivalent of a qr orthogonalization (basis for a retraction).

Another approach, more ad hoc, could be to work from the manifold you currently use, and work out a version of it where you quotient out the invariance that lead to the 0's in the spectrum of the Hessian. Then, hessianextreme on that manifold would yield your intended result .. but that's also a lot of work and is usable only for this manifold.

Yet another one, and perhaps the simplest one conceptually: use the new tool from 3.0 called hessianmatrix:

https://manopt.org/reference/manopt/tools/hessianmatrix.html

This will create an actual matrix whose spectrum is identical to that of the Hessian. The, you can call eigs(...) on that matrix to get the extremes of the spectrum (or even eig(..) if that is sufficiently fast on that particular matrix.)

The call to hessianmatrix will be slow though... not sure you would gain a lot.

Best,

Nicolas

Jesus Briales

unread,

Feb 3, 2017, 2:46:33 PM2/3/17

to Manopt

Hi Nicolas, a great answer as usual!

I will try these (probably the simplest ones for now, as I'm in a hurry these days), but it's definitely a good idea to collect these ideas. Maybe in the future I could try some of the more complex ones :)

- Jesus

Jesus Briales

unread,

Feb 5, 2017, 4:15:44 AM2/5/17

to Manopt

Hi Nicolas,

I'm currently looking at how to modify `hessianspectrum` to get only the top / bottom eigenvalues.

Now that I read the code more thoroughly I see that, since there are *n-dim* zero eigenvalues corresponding

to the directions outside the tangent space, the trick is to compute the *dim* largest magnitude eigenvalues only,

thus assuring that the eigenvectors outside the tangent space will be ignored.

The obstacle in the alternative top/bottom eigenvalues we were discussing is then how to apply `eigs` with the 'SM' option

and still ignore the eigenvectors in the tangent space. Otherwise we would have to compute that extra burden of *n-dim* zero eigenvalues...

Regards,

- Jesus

Jesus Briales

unread,

Feb 6, 2017, 7:16:07 AM2/6/17

to Manopt

Hi again,

I have done some progress in the last days in this issue by modifying `hessianspectrum`.

Unfortunately, I have some crazy weeks coming due to deadlines, so I will not be able to clean and contribute anything for a while. Yet, I think it will be useful to share some insight here so that we can discuss it:

- How to compute the Smallest Magnitude (SM) eigenvalues in the spectrum?

Well, first thing using `eigs` is we'll need to provide a function handle AFUN that provides A\X. That is, in our case:

* we need to compute the inverse `hess_vec_inv(u_vec)` of the vectorized Hessian operator `hess_vec(u_vec)`.

For that purpose we could use either a Manopt-based approach as that in `preconhessiansolve`,

or an iterative method for the linear (vectorized) problem `hess_vec(u_vec)=d_vec`.

This can be done e.g. with the symmetric LQ method, `symmlq` in Matlab,

which is similar to `pcg` but admits indefinite matrices (as the Hessian needs not to be positive definite).

One should be careful though about the zero eigenvalues of the vectorized Hessian operator during the inversion.

Then, once we have the inverse operator, calling `eigs` with 'SM' option would usually find first all those *n-dim* zero eigenvalues

due to the normal directions in the ambient space. A (simple?) solution to prevent this seems to be:

* to implement specific `vec` and `mat` operators that have the minimum dimensionality (that of the tangent space),

so that the vectorized domain parameterizes the range of the tangent space only. This way we avoid spurious zero eigenvalues.

Care should be taken of course of choosing `vec` so that it is an isometry of the original problem.

With these two approaches, we should be able to modify `hessianspectrum` so that it may receive two new arguments 'K' and 'SIGMA',

akin to `eigs`, defining the number of eigenvalues and compute and its type (usually either 'LM' or 'SM').

This is something which should be done correctly for legacy, but when done should provide a nice interface quite similar

to that of the standard `eigs` tool in Matlab.

On a different basis, I have also observed some interesting facts about the computation of the spectrum involvin a preconditioner:

- How to obtain a (simpler) `sqrtprecon`?

From the documentation (and the code at hessian spectrum) it is stablished that `sqrtprecon` should be an operator

such that `precon = sqrtprecon o sqrtprecon`. That is, the unique non-negative square root as defined in Wikipedia.

That `sqrtprecon` operator should be defined from/to tangent space TxM.

I have tried to apply a more relaxed version of the square root so that `precon = sqrtprecon o (sqrtprecon)*`,

where * stands for the adjoint of the operator. This more general operator would `sqrtprecon` would be defined

from a (more generic) linear space to the tangent space TxM (opposite for the adjoint).

The good thing is that this version of `sqrtprecon` can be obtained more easily, e.g. through Cholesky decomposition

of the symmetric operator `precon`. Thus a composition of the type `sqrtprec_adj_vec(hess_vec(sqrtprec_vec(z_vec)))`

is still symmetric and efficient for finding the eigenvalues using `eigs`.

This seems to work at least in the cases I have tried with the Stiefel manifold,

although I have not been able to unify this general sqrtprecon and sqrprecon* decomposition

with the minimal vectorization approach to remove spurious eigenvalues.

This is because the composition `(sqrtprecon)* o Hess o sqrtprecon` is not directly from TxM to TxM,

but from some other intermediate domain the decomposition `sqrtprecon o (sqrtprecon)*` introduces,

making the search for an appropriate minimal `vec` operator in this case harder.

Well, it's difficult to compress everything, but I hope these hints will be useful to keep working on this

and discuss possible venues :)

I'm not an expert in most of the issues arising in here, so I would appreciate any theoretical or terminological correction :)

Best,

Jesus

Nicolas Boumal

unread,

Feb 6, 2017, 1:57:46 PM2/6/17

to Manopt

Hello Jesus,

I only have time for a short input at this stage:

The asymmetric case is quite hard to handle. We happen to know that the eigenvalues are real, but we cannot exploit that if the operator is asymmetric. So we end up being forced to used LM and SM in eigs, which is quite different from what we really want. Indeed, if the operator is indefinite, then there is no way to reliably get the top and bottom eigenvalues according to their order on the real line: we can only get them according to their order in magnitude, which is not enough. Of course you could try to work around that with some kind of shooting method / bisection, but that looks like it will be too expensive for the purpose.

(The reason all of this is not an issue in the current tool is that hessianspectrum gets the /whole/ spectrum; in that scenario, using LM actually works, because all we need to do is to exclude the eigenvalues that are structurally 0: these have smallest magnitude indeed, and we know how many to exclude.)

Also, using SM, which requires the terribly expensive operator of solving linear systems, will not give you any of the spectrum ends: it will only give those eigenvalues close to zero in absolute value.

So, I would advise to restrict attention to the symmetric case. This allows to use SA and LA, which reliably identifies both ends of the spectrum on the real line.

The first required ingredient to be in the symmetric case is to have an isometric vec/mat pair. A lot of factories give that already, and for those who don't, it might be easy enough to change them. For those who do not, there are now new tools on the GitHub version of Manopt: lincomb and tangent2vec. You can use them to generically create an isometric vec/mat pair, with the added benefit that the vector representation of tangent vectors has length exactly M.dim(). This sounds perfect, but the caveat is that you need to generate an orthogonal basis for the tangent space first, using tangentorthobasis. This is pretty slow if the dimension of the manifold is high. The advantage is that the complexity of this step is independent of the complexity of Hessian computations (it doesn't need any.)

The second ingredient required for this is access to a sqrtprecon. This I believe is difficult to circumvent efficiently.

About your second point: I think your proposal works out.

Here is my reasoning for it: if H is the Hessian operator and P is the preconditioner (both are symmetric), then we are interested in the eigenvalues of HP (this is not symmetric). I can apply a similarity to this without changing the eigenvalues; so apply sqrtm(P) . inv(sqrtm(P)) : we get sqrtm(P) H sqrtm(P), which is symmetric because sqrtm(P) is symmetric. This is where the sqrtprecon business comes from. You are suggesting to work with a kind of Cholesky factor of P instead of the square root. So let's say that P = LL*. Then HP = HLL*. I can apply L* . inv(L*) on both ends and get L*H L which is also symmetric and has the same eigenvalues as HP. So yes, that also works.

The only practical issue I see with this is: now we need to ask the user to supply both L and L* as operators, instead of only sqrtm(P). I am not sure if this is nicer for the user.