Parametrizing the retraction on a manifold

82 views
Skip to first unread message

Florian

unread,
Apr 24, 2014, 5:00:32 AM4/24/14
to manopt...@googlegroups.com
Dear Nicolas and Bamdev,

Dear Manopt users,

I am optimizing some function on a Grassman manifold using Manopt (with the grassmannfactory file). In order to argue on the choice of the retraction, I would like to compare the running time saved by using a retraction (compared to using a geodesic gradient descent).

Hence, I was wondering how to do this in a clean and neat way. So far, the quick and dirty option would be to create a fork of the grassmannfactory and to specify the geodesic (ie the expential function) as the retraction. I would be very glad to hear any suggestion about this.

Moreover, I would like to compare the QR retraction to the SVD and I was wondering if there was a proper way to do this (and that did not involve the fork of a factory file and the comment of portions of code).

Thanks in advance for you help and thank you again for this very nice toolbox.

Nicolas Boumal

unread,
Apr 24, 2014, 10:28:25 AM4/24/14
to manopt...@googlegroups.com
Hello Florian,

I think a quick way to do it is as follows:

Gr = grassmannfactory(...);
Gr.retr = Gr.exp;

That gives you a Grassmann factory where the retraction and the exponential are the same.

If you want to use just any retraction, you could do this:

Gr = grassmannfactory(...);
Gr.retr = @myretraction;
function Y = myretraction(X, U, t)
   ... code here ...
end

I believe this should work. This way, no file copying.

There is one caveat: possibly, other functions of Gr are using the retraction, in which case they will still be using the old one, I believe. So better to check explicitly in the file whether that is the case or not. I think not, but for other functions that could be the case.

Thanks for posting this interesting question on the forum!

Cheers,
Nicolas

Florian

unread,
May 7, 2014, 4:03:18 AM5/7/14
to manopt...@googlegroups.com
Hello Nicolas,

Thanks for your answer (and sorry for the delay of my own answer).
I modified the file 'dominant_invariant_subspace' and it works (I did not have the problem you mentionned, but it is better to know it can occur).

So far, I noticed that the QR retraction may be slightetely faster than the SVD retraction but I did not extensively check this.
Otherwise, using the exponential mapping instead of a retraction in a trust region method does not seem to be a good idea, since the method oscillates a lot around the optimal solution.

Cheers,
Florian

Nicolas Boumal

unread,
May 7, 2014, 3:24:16 PM5/7/14
to manopt...@googlegroups.com
It oscillates? That's weird, it shouldn't do that. At least, I see no reason why it would with the exponential and not with a retraction.
Perhaps there is an issue with the exponential?

Nicolas Boumal

unread,
May 7, 2014, 8:07:17 PM5/7/14
to manopt...@googlegroups.com
Hello Florian,

Thank you for passing me your files offline.

I also see oscillations when the retraction is set to the exponential, and none if the retraction is used. Weird!

I checked the exponential on the Grassmannian and assessed whether it might behave poorly (numerically) if the step is very small, as you suggested to me. But looking at it, I see no reason for it: SVD is stable, cosines and sines are very nice functions, and after that we do multiplication: should be alright I suppose ... Or perhaps we loose orthonormality as iterations accumulate? That's possible too...

Looking into this also made me realize that the retraction for Grassmann is buggy! It only works if k = 1 (working on a single Grassmannian, which thankfully is the common case). I'll fix that in the next release, but I can't do this properly as long as I'm not back in Beligum, end of the month. Should really start using GitHub!

Cheers, and big thanks for reporting this!

Nicolas

Nicolas Boumal

unread,
May 7, 2014, 8:18:10 PM5/7/14
to
Aha, so there might be an issue with losing orthonormality indeed. I run the code with this options structure:

options.statsfun = @mystatsfun;
function stats = mystatsfun(problem, X, stats)
    disp(norm(X'*X - eye(p)));
end

This way, at each iteration, we display the distance from X'*X to the identity. It should be zero, but numerically we might be off a bit.

With the retraction, this remains nicely close to 1e-15, but with the exponential we get numbers as high as 1e-10, and then the theorems that are supposed to make all of this work, well, don't work anymore.

All in all, this is what the code for retraction and exponential should look like in the grassmann.m file (changes in bold):

    M.retr = @retraction;
    function Y = retraction(X, U, t)
        if nargin < 3
            t = 1.0;
        end
        Y = X + t*U;
        for i = 1 : k
            % We do not need to worry about flipping signs of columns here,
            % since only the column space is important, not the actual
            % columns. Compare this with the Stiefel manifold.
            % [Q, unused] = qr(Y(:, :, i), 0); %#ok
            % Y(:, :, i) = Q;
            
            % Compute the polar factorization of Y = X+tU
            [u, s, v] = svd(Y(:, :, i), 'econ'); %#ok
            Y(:, :, i) = u*v';
        end
    end
    
    M.exp = @exponential;
    function Y = exponential(X, U, t)
        if nargin == 3
            tU = t*U;
        else
            tU = U;
        end
        Y = zeros(size(X));
        for i = 1 : k
            [u, s, v] = svd(tU(:, :, i), 0);
            cos_s = diag(cos(diag(s)));
            sin_s = diag(sin(diag(s)));
            Y(:, :, i) = X(:, :, i)*v*cos_s*v' + u*sin_s*v';
            % From numerical experiments, seems necessary to
            % re-orthonormalize. This is overall quite expensive.
            [q, unused] = qr(Y(:, :, i), 0); %#ok
            Y(:, :, i) = q;
        end
    end

No oscilliations anymore, but the exponential is, well, a tad expensive (which is not necessarily surprising: that's why we use retractions!).

Thank you very much for bringing this up!

Best,
Nicolas

Florian

unread,
May 7, 2014, 9:30:47 PM5/7/14
to manopt...@googlegroups.com
Nice catch !
I did not do much but anyway, I am glad my comment could help.

And thanks for debugging the retraction.

By the way, going back to the original topice, I was wondering, for the Grassman manifold, which retraction should be used ?
Hence, what would be the arguments pleading for the use of the SVD or QR ?

Cheers.

Nicolas Boumal

unread,
May 7, 2014, 9:38:20 PM5/7/14
to manopt...@googlegroups.com
It's a question of compatibility with the vector transport (which you only care about for the cg algorithm). If you go to my Web page and go to the project page, low rank matrix completion, and download the Rtrmc zip file. In there there is a journal paper that was never published. End of section 2 about the geometry, I explain why we use the svd rather than the qr. Besides, svd is a bit slower, but numerically it's magically stable, so I don't mind :).
Cheers,
Nicolas

Florian

unread,
May 7, 2014, 10:38:50 PM5/7/14
to manopt...@googlegroups.com
Are you referring to this tech report :
http://perso.uclouvain.be/nicolas.boumal/papers/Boumal_Absil_RTRMC_extended.pdf
if not, I am afraid I did not find any .pdf or .tex in the .zip file :-(

Anyway, I think I got the idea.
So far, I was only referring to the book "Optimization Algorithms on Matrix Manifolds" but in it, I could not find any argument for the choice of retraction for Grassman. So this reference could indeed be very helpful.

Thanks again :-) .
Cheers,

Florian


Nicolas Boumal

unread,
May 7, 2014, 11:44:10 PM5/7/14
to manopt...@googlegroups.com
My bad, this is currently only described in my thesis actually.
You can get it on the publications page on my home page, and go to section 2 in chapter 4, at the very end of that section.
Hope this is accurate this time, I'm writing from mobile ;)
Nicolas

Florian

unread,
May 8, 2014, 12:41:19 AM5/8/14
to manopt...@googlegroups.com
Ok, I found it.
It is indeed in the very end of the second section (named section 4.1) of the chapter 4.

Thanks.
Florian

BM

unread,
May 8, 2014, 6:41:05 AM5/8/14
to manopt...@googlegroups.com
Hello Nicolas, 

For the retraction you could only u rather than the multiplication u*v' of SVD.

I agree with you on SVD vs QR discussion to some extent. However as we need only the left subspace, one need not go for SVD. QR to me should suffice. Also, QR is actually invoked in computing SVD, http://en.wikipedia.org/wiki/Singular_value_decomposition#Calculating_the_SVD.

Regards,
Bamdev

Nicolas Boumal

unread,
May 8, 2014, 11:11:59 AM5/8/14
to manopt...@googlegroups.com
Hello Bamdev,

I agree with you that you get the right column space by keeping just u, and you don't need u*v' for that matter. And if you're going to just keep u, then you might as well use a QR rather than an SVD. I completely agree, and that is why the initial version of this geometry used a QR retraction.

But it later appeared that u is not sufficient if we want the retraction to be compatible with the vector transport. See page 73 in my thesis.

Basically, you want certain invariances. This stems from the fact that in practice we work with liftings: points on the Grassmannian are equivalence classes, but we work numerically with a representant of those classes. Then, we have to make sure that our algorithm will do the exact same steps (in the abstract space) regardless of which representant we happened to choose.

Let U, V be orthonormal matrices representing points (equivalence classes) on Grassmann [U], [V]; let H be a representant of a tangent vector at [U], lifted at U; let Q be an orthogonal matrix: for all Q, [UQ] = [U]. Then, we want this:

Retr_UQ(HQ) = Retr_U(H) Q
Transp_{VQ<-UQ}(HQ) = Transp_{V<-U}(H) Q

You do get these invariances if you use the polar factor instead of the QR for the retraction, combined with a simple orthogonal projection to the tangent space for the transport.

Cheers,
Nicolas

BM

unread,
May 9, 2014, 9:32:44 AM5/9/14
to manopt...@googlegroups.com
Naice.
Reply all
Reply to author
Forward
0 new messages