Basics on preconditioners

111 views
Skip to first unread message

Jesus Briales

unread,
Dec 29, 2016, 4:08:13 PM12/29/16
to Manopt
Hi everyone,
I'm dealing with preconditioners for the Conjugate Gradient method.
I think I understand the idea of preconditioning in usual linear systems, but I haven't fully understood yet how to define the corresponding operators when working with manifolds.
While I'm currently reading some of the theory to get a better idea, it would be very helpful to get some feedback in the particular question I'm trying to answer:
I'm dealing with a quadratic cost of the form tr(X'*M*X), where X is mxn and belongs to a manifold (e.g. Stiefel).
I wonder if there exists a simple preconditioner operator akin to the diagonal (Jacobi) preconditioner when the manifold is involved.

I read the related information in the tutorial thoroughly and I tried also looking at the folder /solvers/preconditioners, but I only found the example of preconhessiansolve there.

I'm sorry if the question is not clear enough or if I committed any conceptual mistake, but I'm afraid I'm still missing some important concepts here.
Yet, any answer or comment related to this could help me to improve my understanding of the topic.

Regards,

- Jesus

Nicolas Boumal

unread,
Dec 30, 2016, 11:17:19 AM12/30/16
to Manopt
Hello Jesus,

The preconditioner, which you supply through problem.precon, is a function that takes as input a point X and a tangent vector H at X, and returns another tangent vector at X. It must be a linear, symmetric, positive definite operator.

Ideally, it should approximate the inverse of the Hessian of the cost function, which itself is a symmetric, linear operator on the tangent space at X (but not necessarily positive definite.)

It is kind of an art to design preconditioners. On one hand, you want them to approximate the inverse Hessian well. On the other hand, you want them to be fast to compute.

A first step would be to obtain an explicit expression for the Hessian (the Riemannian Hessian, that is) of your cost function on the specific manifold you use, and try to see if you can simplify to the point that it is easily inverted. One example where this is done explicitly on the Grassmann manifold is:


See section 3.4 (the paper is titled: "Low-rank matrix completion via preconditioned optimization on the Grassmann manifold")

Another easy thing to try: approximate M to its diagonal, forcing all entries to be positive (D = diag(max(.1, diag(M))) and use problem.precon = @(X, H) problem.M.proj(X, D\H);

(Of course, you would not implement this that way; since you know D is diagonal, you should use bsxfun to scale the rows of H, but at least this is easy to try. There is no good reason why this should work though.)

I hope this helps!

Nicolas


BM

unread,
Dec 31, 2016, 10:51:54 PM12/31/16
to Manopt
Hello Jesus,

To add to Nicolas' comments, there is this paper at http://www.unige.ch/math/vandereycken/papers/published_VandereyckenV_2010.pdf that shows how to compute a preconditioner for structured problems on submanifolds. In principle, we could exploit that your problem as well.

I also worked on preconditioning a bit. The work is at Publisher’s pdf. The idea is to propose a Riemannian metric first and then work on the manifold with that metric.

Regards,
BM

Jesus Briales

unread,
Jan 6, 2017, 4:23:47 AM1/6/17
to Manopt
Thank you both for the detailed answers! They have turned really useful to fill the gaps in my understanding of the preconditioning in the Riemannian case! :)
Actually, after several days working on this, I think I may have found some promising contributions for the problem I'm working on. So, again, thanks a lot for the help!
Regretfully with the Christmas finishing I may not have the necessary time to develop these ideas further at least for a time, but I will keep you updated if I get anything interesting.

Best regards,

- Jesus

Jesus Briales

unread,
Jan 23, 2017, 4:33:59 AM1/23/17
to Manopt
Hi again,
I'm recovering this problem with preconditioning where I left it and I'm facing the following subproblem:

The preconditioning finally amounts to solving a linear system Q*U+N=T,
where U in T_xM (tangent space of a manifold) and N in N_xM (normal space of the manifold) are my unknowns.
The dimensions are (dn)x(dn) for the (psd) block matrix Q, and (dn)x(d) for the block vectors U,N and T.
The conditions that U and N lie in the tangent and normal space at a certain point X in the manifold are linear,
so I have been able so far (after vectorization) to build a complete linear system yielding the correct solution.

The approach above, however, is not using several available powerful tools:
- A previously computed Cholesky decomposition of Q=LL'
- The cheap closed-form projections from any point Y (dn)x(d) to the tangent and normal spaces where U and N lie

I wondered if there might be some alternative approach using optimization in the manifold
that provides a more efficient way to solve the linear system above.
I was considering a reformulation as the constrained minimization of a quadratic objective as
   min ||Q*U+N-T||_F^2 s.t. U in T_xM, N in N_xM.
The quadratic form in this objective has infinitely many solutions reaching the minimum value 0,
but as proved by the prior approach when restricted to the search space for U and N, there is a unique global minimum.
I'm not sure if this workaround is really promising or not, or if it will converge fast, but it may be interesting to try.

Looking at how to solve this problem using Manopt I saw there are already the *tangentspacefactory* which I could use,
and also a similar example given by *preconhessiansolve*. These are a great reference to begin with,
but I would appreciate your feedback with any additional materials or comments wrt this approach.
I assume e.g. that I should define a factory for the normal space (only the projector function changes?).
Is that all to be done? Am I missing anything else?

Remark: I'm specially interested in exploiting the fact that I can readily invert the matrix Q.
Note: If you need further details on any part of the explanation above or a more readable document, just say it!

Best regards!

Jesus

BM

unread,
Jan 24, 2017, 1:12:07 AM1/24/17
to Manopt
Hello Jesus, 

Thanks for your query. 

The optimization problem 

min ||Q*U+N-T||_F^2 s.t. U in T_xM, N in N_xM

seems to the best bet to solve the system iteratively. We could use standard solvers, but I wonder how efficient they are. 

Personally, to solve the above problem efficiently, I would again use Manopt (with trusregions), where my variable U belongs to the (linear) manifold M1:= {U: U'*X + X'*U = 0} and the variable N belongs to the (linear) manifold M2:= {N: N in the normal space characterization}. To use Manopt, basically, we need to write the manifold factories for M1 and M2, which are straightforward and they make use of the projection operators directly. The cool thing is the tangent space of M1 at U is M1 itself (nothing fancy, but worth observing for writing down the manifold factory and hence pretty efficiently doable).

Let me know if you require concrete help.

Regards,
Bamdev


















Jesus Briales

unread,
Jan 24, 2017, 8:47:57 AM1/24/17
to Manopt
Hi Bamdev,
thanks for the tips!
I'm trying to implement it but I'm having some issues with the tangent and normal space (linear manifolds).
Specifically, the Hessian check is failing for the problem I define. I'm working on it, and I will try to attach some test code later but,
meanwhile, there are some questions that have arisen in the process and I would appreciate if anyone could shed some light:
Let TxM and NxM be the tangent and normal spaces (linear manifolds), x the point in the original manifold M, y a point in the linear manifold and u a general vector.
- The tangent space projector for TxM does not depend on y but only in the fixed x, so proj_TxM(y,u)=proj_M(x,u).
  Similarly, for NxM the projection operator is just the projection onto the normal space of M at x.
  Do these assertions above make sense? :)
- How are the Riemannian gradient and Hessian related to the Euclidean counterparts? Is there any simple relation in this (linear manifold) case?
  Because I assume there must be some kind of correction in the Euclidean operators, since e.g. for f(y) nothing assures that the Euclidean grad lies into the tangent space TxM.
  For the Riemannian gradient I assume it's just the projection what I need. It's the Riemannian Hessian that I'm lost about :)

Kind regards,
Jesus Briales

Jesus Briales

unread,
Jan 24, 2017, 4:41:37 PM1/24/17
to Manopt
Hi again,
Just to confirm: I got the problem involving optimization on both the tangent and normal spaces working :) So this finally solved my questions.
So, as always, thanks again for Manopt, a great tool :D

- J Briales

Nicolas Boumal

unread,
Jan 24, 2017, 5:55:58 PM1/24/17
to Manopt
Great to hear! :)

Nicolas

BM

unread,
Jan 25, 2017, 12:58:32 AM1/25/17
to Manopt
Hello Jesus,

It is nice to know that you got it working (Manopt to solve min ||Q*U+N-T||_F^2 s.t. U in T_xM, N in N_xM, right?).

To complete the discussion, below are my answers to your earlier questions.


Let TxM and NxM be the tangent and normal spaces (linear manifolds), x the point in the original manifold M, y a point in the linear manifold and u a general vector.
- The tangent space projector for TxM does not depend on y but only in the fixed x, so proj_TxM(y,u)=proj_M(x,u).
Similarly, for NxM the projection operator is just the projection onto the normal space of M at x.
Do these assertions above make sense? :)


Yes, they do and that's why we can use the tangent space and normal projection operators directly.



- How are the Riemannian gradient and Hessian related to the Euclidean counterparts? Is there any simple relation in this (linear manifold) case? Because I assume there must be some kind of correction in the Euclidean operators, since e.g. for f(y) nothing assures that the Euclidean grad lies into the tangent space TxM.
For the Riemannian gradient I assume it's just the projection what I need. It's the Riemannian Hessian that I'm lost about :)


The Riemannian gradient and Hessian are the projected version of their Euclidean counterparts.

Comments on the writing the factories for the "linear" manifolds.

1) We take the euclideanfactory.m file as the template.
2) We introduce the projections for defining M.egrad2rgrad, M.ehess2rhess, M.rand, M.randvec, and M.tangent. 
3) For M.exp, we don't need the projection operator (ensure that the initial point is on the linear manifold).
  
Regards,
Bamdev

Jesus Briales

unread,
Jan 25, 2017, 7:21:18 AM1/25/17
to Manopt
Exactly Bamdev,
Your answer was perfect for completeness. With those tips it is quite easy to reproduce the tangent and normal (linear) manifolds.
Nevertheless, I wouldn't mind to contribute the version I developed when I get a spare moment (maybe in the next week). If you find it interesting :)

By the way, I found something suspicious when developing the factories. I check the code for the `tangentspacefactory`
and it doesn't seem to do the same operations you explained above (project in M.tangent, M.egrad2rgrad, M.ehess2rhess, etc.).
Is this a mistake? Or is this tangentspacefactory doing something different that what I intended to?

Regards,

J Briales

BM

unread,
Jan 25, 2017, 8:37:22 AM1/25/17
to Manopt
Hello Jesus, 

Nevertheless, I wouldn't mind to contribute the version I developed when I get a spare moment (maybe in the next week). If you find it interesting :)

That would be nice. Let's take this offline with Nicolas when you are ready.


By the way, I found something suspicious when developing the factories. I check the code for the `tangentspacefactory`
and it doesn't seem to do the same operations you explained above (project in M.tangent, M.egrad2rgrad, M.ehess2rhess, etc.).

I had referred to the euclideanfactory, which is at http://www.manopt.org/reference/manopt/manifolds/euclidean/euclideanfactory.html

Regards,
Bamdev


Nicolas Boumal

unread,
Jan 25, 2017, 8:39:39 AM1/25/17
to Manopt
Hello,

Now that I look at it, I'm also surprised that the code in tangentspacefactory for tangent, egrad2rgrad and ehess2rhess doesn't have a projection in it. It should look more like the code in other euclidien subspace factories. To be checked, a bit later today.

Thanks,
Nicolas

Nicolas Boumal

unread,
Jan 25, 2017, 2:55:30 PM1/25/17
to Manopt
Hello Jesus,

Thanks again for pointing this out. I pushed the changes to the GitHub version of Manopt. If you would like to update your files and let us know if the new version works for you, that would be great!


Thanks,
Nicolas

Jesus Briales

unread,
Jan 28, 2017, 6:55:28 AM1/28/17
to Manopt
Hi Nicolas,
the functions tangent, egrad2rgrad, ehess2rhess... seem right now. But regarding the projection in N,
since the tangent space is described by the projection of the original manifold at a given point x, shouldn't it be
     N.proj  = @(y, u) M.proj(x, u),
using the same notation as in tangentspacefactory?
I'm not sure about it, just asking :) You know I'm kind of a beginner in Riemannian geometry and related stuff yet!

Regards,
Jesus

BM

unread,
Jan 29, 2017, 9:54:32 AM1/29/17
to Manopt
Hello, 

A small diversion on solving Q*U+N=T, where U belongs to the tangent space and N belongs to the normal space. 

Apart from the earlier approach (that solves the minimization problem ||Q*U + N - T||^2_F), there is one more (equally efficient, I believe) approach of using Matlab's PCG directly. 

Let's assume that U and N are of size n-by-r. The pseudocode is below.

mytol = 1e-6;
mymaxiter = 50;
xsol = pcg(@compute_matrix_system, T(:), mytol, mymaxiter);

function lhsx = compute_matrix_system(x)
        Ux = reshape(x(1:n*r),n , r);
        Nx = reshape(x(n*r + 1 : end), n, r);

        Ux = applytangentprojector(Ux);
        Nx = applynormalprojector(Nx);

        LHSx = Q*Ux + Nx;
        lhsx = LHSx(:);

        disp(norm(lhsx - T(:)));
end

 Usol = reshape(xsol(1:n*r),n , r);
 Nsol = reshape(xsol(n*r + 1 : end), n, r);

% Final solution
Usol = applytangentprojector(Usol);
Nsol = applynormalprojector(Nsol);

Regards,
Bamdev

    



Jesus Briales

unread,
Jan 29, 2017, 1:49:44 PM1/29/17
to Manopt
Hi Bamdev!
That looks good and pretty simple. I will give it a try when I have a moment and compare it to the other approaches I've been trying :)
Although a general problem I'm facing here is that I'm trying to alleviate the bad conditioning,
but this stems from matrix Q, so using PCG still retains the convergence problem unless I find a good preconditioner for this new linear system.
I'm currently looking into that :) In fact, I have available a Cholesky decomposition of that matrix Q, so inversion of Q is feasible in practice.
However, the introduction of the projections interferes with the overall expression.
I will appreciate if you have any ideas or suggestions :)

Best,

Jesus

BM

unread,
Jan 29, 2017, 1:58:37 PM1/29/17
to manopt...@googlegroups.com
Hello Jesus, 

There are different ways of using preconditioners with PCG. 

In your case, we could supply a preconditioner handle (look at the documentation of PCG), where we can the following pseudocode. 

function Peta =applyprecon(eta)
    etaU = reshape(x(1:n*r),n , r);
    etaN = reshape(x(n*r + 1 : end), n, r);

    etaU = applytangentprojector(etaU);*project before applying precon.
    etaN = applynormalprojector(etaN);

    PetaU = Q\etaU; 
    PetaN = etaN;

    PetaU = applytangentprojector(PetaU);
    PetaN = applynormalprojector(PetaN);

    Peta = [PetaU(:); PetaN(:)];
end

Let me know if this works out for you.

Regards,
Bamdev

BM

unread,
Feb 2, 2017, 4:29:25 AM2/2/17
to manopt...@googlegroups.com
Hello all, 

Here is the final set of pseudo codes in case we want to use PCG to solve the problem. 

We intend to solve the problem: ||Q*U + N - T||^2_F subject to U in the tangent space and N in the normal space. Equivalently, we want to solve the following gradient system. 
applytangentprojector(QQU + QN - QT) = 0 and 
applynormalprojector(N - T + QU) = 0,
where U in the tangent space and N in the normal space.

mytol = 1e-6;
mymaxiter = 100;

rhs1 = applytangentprojector(QT);
rhs2 = applynormalprojector(T);
rhs = [rhs1(:); rhs2(:)]; 

xsol = pcg(@compute_matrix_system, rhs, mytol, mymaxiter, @applyprecon);

function lhsx = compute_matrix_system(x)
        Ux = reshape(x(1:n*r),n , r);
        Nx = reshape(x(n*r + 1 : end), n, r);
        
        % We can get away without the following two projections,
        % provided we initialize "x" in linear manifolds space.
        Ux = applytangentprojector(Ux);
        Nx = applynormalprojector(Nx);

        lhs1 = applytangentprojector(QQUx + QNx);
        lhs2 = applynormalprojector(Nx + QUx) 
        lhsx = [lhs1(:); lhs2(:)];

        disp(norm(lhsx - rhs));
end

function Peta = applyprecon(eta)
    etaU = reshape(x(1:n*r),n , r);
    etaN = reshape(x(n*r + 1 : end), n, r);

    % We can get away without the following two projections.
    etaU = applytangentprojector(etaU);
    etaN = applynormalprojector(etaN);

    PetaU = Q\(Q\etaU);% If Q is rank deficient, then we use Peta = (QQ + lambda I)\etaU.           
                       % For example, lambda = 1e-5.
    PetaN = etaN;

    % These two projections are required.
    PetaU = applytangentprojector(PetaU);
    PetaN = applynormalprojector(PetaN);

    Peta = [PetaU(:); PetaN(:)];
end
Usol = reshape(xsol(1:n*r),n , r);
Nsol = reshape(xsol(n*r + 1 : end), n, r);

% Final solution
Usol = applytangentprojector(Usol);
Nsol = applynormalprojector(Nsol);

Hope this helps.

Regards,
Bamdev

PS. It should be noted that PCG is applicable only to a symmetric system, and hence, we could apply PCG to the gradient system of ||Q*U + N - T||^2_F. This should be efficient. However, a different approach would be to use lsqr for solving the rectangular system QN + U = T directly with the constraints that U belongs to the tangent space and N belongs to the normal space. This would require setting up the function handles for lsqr appropriately. I have not tested the lsqr approach though. A difference to note is that the condition numbers of the symmetric system ||Q*U + N - T||^2_F and the rectangular system QN + U = T are different. 



Nicolas Boumal

unread,
Feb 2, 2017, 9:17:21 AM2/2/17
to Manopt
Thank you, Jesus! You are absolutely correct. And your comment also showed I introduced a similar error in N.ehess2rhess. Here are the corrected lines:

    N.proj  = @(y, u) M.proj(x, u);
    N.ehess2rhess = @(y, eg, eh, d) M.proj(x, eh);

This is all updated here:


If you could test your codes that use that tool to make sure it's not broken, that would be terrific.

Thanks again!
Nicolas

Jesus Briales

unread,
Feb 3, 2017, 1:58:45 AM2/3/17
to Manopt
Hi everyone,

Thanks for the detailed pseudo-code Bamdev, it was pretty useful. I check how using your tips the problem can be solved indeed.
Not surprisingly though, after trying a lot of different approaches with the hope of finding a good preconditioner, I realized the simplest one is giving the best trade-off (at least for now).
In a nutshell, to provide some feedback, the problem I originally proposed to solve was, Q*U+N=T, with U in T_xM and N in N_xM (tangent and normal space of the manifold, respectively). This actually originated from Proj_T(Q*U)=T, where Proj_T is the projection into the tangent space. After seeing the code you proposed, I seems that providing as `precon` method the simpler preconditioner consisting of
etaU = applytangentprojector(etaU);
PetaU = Q\etaU;
PetaU = applytangentprojector(PetaU);
is giving pretty good results in comparison to no preconditioning.

As for Nicolas' comment, WE thank you both for the great work you're doing with this toolbox :) I will check the last version and let you know if everything runs fine.

Best,
- Jesus

BM

unread,
Feb 3, 2017, 4:54:19 AM2/3/17
to Manopt
This actually originated from Proj_T(Q*U)=T, where Proj_T is the projection into the tangent space. After seeing the code you proposed, I seems that providing as `precon` method the simpler preconditioner consisting of
etaU = applytangentprojector(etaU);
PetaU = Q\etaU;
PetaU = applytangentprojector(PetaU);
is giving pretty good results in comparison to no preconditioning.

That's great. 

A more robust version would be to endow the Stiefel manifold with the Riemannian metric g_U(eta_U, xi_U) = trace(eta_U'*Q*xi_U), where eta_U and xi_U are tangent vectors. We create a Stiefel manifold factory with this new metric. The orthogonal projection operation (w.r.t the new metric) is also pretty straightforward. The benefit of such a factory is that we can directly call the steepesdescent solver (and other Manopt solvers), which should behave like a "preconditioned" steepest descent algorithm. 

This particular Riemannian metric is discussed in Table 3 of http://epubs.siam.org/doi/pdf/10.1137/140970860 (put A = Q, B = I, \omega_2 = 0, and \omega_1 =1.) 

Regards,
Bamdev


Jesus Briales

unread,
Feb 3, 2017, 5:24:09 AM2/3/17
to Manopt
Great comment Bamdev, that's actually the future approach I wanted to try.
I read your paper on Riemannian preconditioning when you suggested it at the beginning of this thread, and I must say I found it really interesting and promising. Regretfully, at that moment I didn't feel confident enough about Riemannian manifolds to give it a try.
Seeing your comment now, however, I feel it might be easier to try than I expected.
According to your comment, I assume the only necessary changes are in the methods `M.inner(x,u1,u2)` and `M.proj(x,u)`?
Even though I also assume it must not be complicated, I would appreciate some good reference on how to develop the orthogonal projection for the new metric. So far, I have only used the projection in the usual Stiefel manifold rather as a black-box, and I wouldn't know how to develop it for the new metric.
Finally, an important detail here, which I don't know if can be settled or not in your framework, is that the manifold I'm working with is a product of Stiefel manifolds rather than a single Stiefel manifold. Namely, I'm using the `stiefelstackedfactory`. Do you think your approach could be adapted to this case?

Best,

- Jesus

PS: I looked for some reference code in this respect at your webpage https://bamdevmishra.com/codes/preconditioning/, but couldn't find anything. I assumed you have probably not made it available yet?

Nicolas Boumal

unread,
Feb 3, 2017, 9:56:58 AM2/3/17
to Manopt
Hello again,

Two quick comments about things that came up:

 1) if you use Q\ as a predonditioner (and Q is constant, I suppose?), then I would encourage you to exploit lu or chol with linsolve in Matlab. I'm not sure how familiar you are with numerical linear algebra, but the point is that Q can be preprocessed such that solving linear systems in Q will be much faster afterwards. -- Also, keep in mind that the preconditioner should be a symmetric, positive definite operator on the tangent space. If it's not the case, algorithms may get in trouble.

 2) When you change the metric of a submanifold (that is, you pick something else than the standard inner product), then your manifold is no longer a Riemannian submanifold, and it is no longer true that the Riemannian gradient is simply the orthogonal projection of the Euclidean gradient to the tangent space. This is perhaps a bit too complicated to explain in full details here, but I just wanted to bring your attention to this. I believe this is explained in Bamdev's paper about preconditioning.

Best,
Nicolas

Jesus Briales

unread,
Feb 3, 2017, 2:42:40 PM2/3/17
to Manopt
Hi Nicolas,
About 1) you're very right. I was already using a Cholesky decomposition (with pivoting). So although I write inv(Q) for clearness, I really mean solving the triangular systems in the background :)
With respect to 2) it's true one must be careful after changing the metric as this modifies many other basic operations related to the manifold. Luckily, as you said, Bamdev's paper is quite enlightening in that sense.

- Jesus
Reply all
Reply to author
Forward
0 new messages