Matrix completion implementation using Grassmann product manifolds

142 views
Skip to first unread message

Hiroyuki Kasai

unread,
Oct 24, 2014, 5:20:46 AM10/24/14
to manopt...@googlegroups.com
Dear all, 

For understanding the product manifold implementation, I created the following test code.

This is a basic matrix completion problem. I used to use two Grassmann manifolds. 
The data generation utilized Nicolas's codes in his other projects.

My question is whether this code is generally correct or completely wrong except an efficiency point of view ?? 
The process seems not to work properly. 

In addition, checkgradient outputs "directional derivatives might be erroneous." However, I cannot understand the reason.

cost function is assumed below;

f = 0.5 * || C (UV' - X) ||_F^2

egradient is below;

g.U = S V;
g.V = S'U;

where S = C (UV' - X).

I am looking forward to getting your advices.

Best regards,

Hiro

=======================================================

function [x] = mc_test()
    n = 500;
    m = 500;
    r = 10;
    q = 4*r*(n+m-r); 

    A = randn(n, r)/r.^.25;
    B = randn(r, m)/r.^.25;
    
    [I, J, q] = randmask(n, m, q);    

    X = spmaskmult(A, B, I, J);

    mask = sparse(double(I), double(J), ones(q, 1), m, n, q); 
    X_mat = sparse(double(I), double(J), X, m, n, q);
    
    Man.U = grassmannfactory(n, r);
    Man.V = grassmannfactory(n, r);
    problem.M = productmanifold(Man);
    
    % Generate initial x
    [u, b, v] = svds(mask.*X_mat, r);
    x.U(:, :, 1) = u*(b.^(0.5)); 
    x.V(:, :, 1) = v*(b.^(0.5));

    problem.cost = @cost;
    function f = cost(x)
        f = norm(mask.*(x.U * x.V' - X_mat), 'fro').^2;
    end

    problem.egrad = @egrad;
    function g = egrad(x)
        S = mask.*(x.U * x.V' - X_mat);
        g.U = S * x.V;
        g.V = S' * x.U;
    end

    % Check numerically whether gradient is correct
    checkgradient(problem);

    %[x, xcost, info, options] = trustregions(problem, x);
    [x, xcost, info, options] = conjugategradient(problem, x);
   
end

Nicolas Boumal

unread,
Oct 24, 2014, 7:47:54 AM10/24/14
to manopt...@googlegroups.com
Dear Hiroyuki,

In general, it will not be possible to approximate a matrix X by a product UV' where both U and V are orthonormal matrices. Consider the singular value decomposition: it represents a matrix as USV', with an additional scaling matrix S in the middle.

Beyond that, you are very close to having the correct gradient. Here is the modified code:

    problem.cost = @cost;
    function f = cost(x)
        f = .5*norm(mask.*(x.U * x.V' - X_mat), 'fro').^2;
    end

    problem.egrad = @egrad;
    function g = egrad(x)
        S = mask.^2 .*(x.U * x.V' - X_mat);
        g.U = S * x.V;
        g.V = S' * x.U;
    end

As you can see, you were missing a factor 1/2 in the cost (because the derivative of x^2 is 2x, there is a 2 there); Also, in the gradient, the mask should be squared (entry-wise).

This gives the correct gradient, but the method is not very useful because of my comment above.

You may add a term S in your productmanifold though. That term could be symmetric positive definite for example (sympositivedefinitefactory.m)

Cheers,
Nicolas

Hiroyuki Kasai

unread,
Oct 24, 2014, 8:58:32 AM10/24/14
to manopt...@googlegroups.com
Dear Nicolas,

Thank you for your prompt helpful advices.

Regarding the manifolds, you are completely right. I will take a look at sympositivedefinitefactory.m.

The cost and gradient functions are also helpful.

Thank you!

Hiro

Hiroyuki Kasai

unread,
Oct 24, 2014, 9:11:08 AM10/24/14
to manopt...@googlegroups.com
Dear Nicolas,

I have one more thing.

If you use product manifold, Gr(n,p)xSym(p)xGr(n,p), and start from the initial point of SVD, 
are the resultant matrices U, S, V completely different from those of the initial matrices 
because S has no more dialog restriction?

In addition, is this product manifold close to the fixedrankfactory_3factors or not? 
I think that the metrics and some others are should be different.

Regards,

Hiro 


Nicolas Boumal

unread,
Oct 24, 2014, 9:52:13 AM10/24/14
to manopt...@googlegroups.com
Only the product of the three matrices is important, not tehe individual matrices. That is, unless you add something in the cost function to make the individual matrices important.

I don't quite recall the geometry you mention, and I am on the move. Can't check it out right now, sorry.

Best,
Nicolas

BM

unread,
Oct 26, 2014, 6:22:15 AM10/26/14
to manopt...@googlegroups.com
Dear Hiroyuki,

Here are my replies.


If you use product manifold, Gr(n,p)xSym(p)xGr(n,p), and start from the initial point of SVD, 
are the resultant matrices U, S, V completely different from those of the initial matrices 
because S has no more dialog restriction?

No, it is not going to be different. As Nicolas said, the product is what matters not the individual characterization.


In addition, is this product manifold close to the fixedrankfactory_3factors or not? 
I think that the metrics and some others are should be different.


Your understanding is correct. The factory, that you mentioned, indeed factors into three terms, the middle one being a symmetric positive definite matrix. The metric is of course different, and it is a quotient space but is not *completely* equivalent to \Grass \times PD \times \Grass.


Regards,
BM

Nicolas Boumal

unread,
Oct 26, 2014, 11:32:40 AM10/26/14
to
Actually, comme to think of it, without taking the quotient, Grass x Pos x Grass is not a correct geometry.

You have a function f(U, S, V) where you specify that U, V are on Grassmannians and S is positive semidefinite.

Saying that U and V are on Grassmannians means that, for any orthogonal matrices P and Q, you should have: f(UP, S, VQ) = f(U, S, V).

But that's just not true for your cost function. It would be true if S were replaced by P'SQ, which is what is done in the 3 factors geometry somehow, but it is not if you just take the product.

So, you really should be working with Stiefel x Pos x Stiefel. That woulkd be correct. You would be missing a quotient, but that is not necessarily a problem. (We like to think that the quotient makes things more tidy and mathematically cleaner, but you can get a decent algorithm without taking the quotient. It just depends on the applications).

Cheers,
Nicolas

Hiroyuki Kasai

unread,
Oct 27, 2014, 3:29:52 AM10/27/14
to manopt...@googlegroups.com
Dear Nicolas and Bamdev,

I thank you both for correcting my mistakes and misunderstandings. I am sorry that these questions are not 
special for Manopt itself but one application, namely, matrix completion problem.

However, this point still causes my basic questions.

I understand that f(UP, S, VQ) = f(U, S, V) is not correct under my gradients, and f(UP, P'SQ', VQ) = f(U, S, V)  
is correct. 

Actually, by borrowing the description Bamdev's paper "R3MC: A Riemannian three-factor algorithm for 
low-rank matrix completion", the critical points in the space St x R x St (in this case S is R) are not isolated for
all orthogonal matrices P_1, P_2 \in O(r) actions, and this problem is an optimization problem on the equivalence 
classes, St x R x St / (O x O). 

So, if you do not use "fixedrankfactory_3factors" factory, and use a general product manifold framework, I still 
wonder whether St x Pos x St is appropriate or not, and whether we need to use f(UP, P'SQ', VQ) \in Gr x Pos x Gr
instead of St x Pos x St or not. 

In the former case, according to the Bamdev's paper , this seems to be NOT appropriate due to no isolation of 
critical point. 

In the latter case, it is not clear for me how to define P and Q and gradients for all. In Ngo's paper: "Scaled gradient 
on Grassmann manifolds for matrix completion," the authors seem to adopt bi-Grassmann manifold Gr x Gr, and 
S (or P'SQ') seem to be independently treated. 

Here, come to think of it, is the reason why Ngo's paper is correct, because the authors adopt the product manifold 
Gr x Gr instead of Gr x Pos x Gr??? If so, it would be understandable for me. Is this correct???

It would be very appreciated to unlash my tangled understandings. In addition, I am so sorry for the inappropriate 
topics for this forum.  

Best regards,

Hiro




Nicolas Boumal

unread,
Oct 27, 2014, 4:15:25 AM10/27/14
to manopt...@googlegroups.com
Hello Hiroyuki,

Here, come to think of it, is the reason why Ngo's paper is correct, because the authors adopt the product manifold 
Gr x Gr instead of Gr x Pos x Gr??? If so, it would be understandable for me. Is this correct???

Yes, that is correct. In their approach, they have a cost function depending on U and V, and their cost is itself a minimization problem:

f(U, V) = min_S [ ... some function of USV' ... ] 

If you replace U and/or V by UP, VQ, then the optimal S will automatically change too, and become P'SQ, such that the product USV' remains the same. Hence, Grassmann is indeed appropriate here.

Best,
Nicolas

Hiroyuki Kasai

unread,
Oct 27, 2014, 4:52:32 AM10/27/14
to manopt...@googlegroups.com
Hello Nicolas,
I see. That makes sense!!! In case of your paper, "Low-rank matrix completion via preconditioned optimization on the Grassmann manifold, you do not use bi-Grassmann geometry, but use one Grassmann geometry, then, calculate W from the calculated from U. I see.

I really thank you for your help. If I get other questions, I will post them again.

Regards,

Hiro

Hiroyuki Kasai

unread,
Oct 28, 2014, 2:15:23 AM10/28/14
to manopt...@googlegroups.com
Dear Bamdev,

I still have one question for below;

If you use product manifold, Gr(n,p)xSym(p)xGr(n,p), and start from the initial point of SVD, 
are the resultant matrices U, S, V completely different from those of the initial matrices 
because S has no more dialog restriction?

No, it is not going to be different. As Nicolas said, the product is what matters not the individual characterization.

Do you mean that the resultant matrices, U, S, and V when using St(p,m) x Sym(p) x Gr(p,n) or U and V when 
using bi-Grassmann product Gr(p,m) x Gr(p,n) are NOT going to be different those of (thin?) SVD, which corresponds
St(p,m) x Diag+(p) x Gr(p,n) ?

In Section 6.1 in your paper, "Low-Rank optimization with trace norm penalty", you showed one of those comparisons.
In there, the results sees to be completely different. Is my understanding wrong? Or, this result is the special case for 
the cost function in (1)?

I guess my understanding is wrong. It would be appreciated to correct it.

Best regards,

Hiro

Nicolas Boumal

unread,
Oct 28, 2014, 5:06:17 AM10/28/14
to manopt...@googlegroups.com
Dear Hiroyuki,

The specific matrices U, S and V are irrelevant. Only their product matters, since only their product appears in the cost function.
So, yes, in general, you will get different matrices than those you would obtain with, say, svds in Matlab. But that does not mean they do not encode the same information.

I am not sure how to explain this differently, I apologize if it feels like I am just repeating the same thing and if this does not help. Perhaps Bamdev will have another take on this?

Best,
Nicolas

BM

unread,
Oct 28, 2014, 7:47:18 PM10/28/14
to
[Edited]

Hello Hiroyuki,

Thanks for the questions. They are quite relevant. I would like to add to Nicolas' answer. Below is my opinion.

It all depends on how we optimize. I will try to be a bit more general here. Say, X has the SVD as X = USV', where U and V are on Stiefel manifolds and S is a diagonal matrix with positive entries. Now, we want to optimize on the fixed rank manifold, a way is to optimize U, S, and V.

Case 1: Optimize U, S, V simultaneously, on St(p,m) x Diag+(p) x St(p,n). This performs poorly, this is what is shown in Section 6.1 of the mentioned paper. It seems that it is difficult to handle the positive diagonal elements on S owing to discrete permutations. 

Here the symmetry is due to permutation of rows and columns.


Case 2Optimize U, S, V simultaneously but we consider a *symmetric positive definite* matrix of the factor S, instead of a diagonal matrix. That is on St(p,m) x Sym(p) x St(p,n). This leads to a particular quotient structure. Factoring out the quotient, leads to a geometry. This performs better.

Here the symmetry is with respect to orthogonal matrices.


Conclusion from Case 1 and Case 2: It is better to deal with a factorization that results in a smooth quotient structure, e.g., Case 2.


Case 3: Optimize U, S, V simultaneously and this time allow S to be a non-singular matrix (need not be symmetric). On St(p,m) x GL(p) x St(p,n). This leads to a particular quotient structure again... Allows for metric tuning, i.e., we could design a particular Riemannian metric to suit the cost function.

Here the symmetry is with respect to a product of orthogonal matrices.


Case 4: We follow an alternating strategy. First, we update U and V simultaneously. Second, fixing U and V, we update S.  We alternate between these two. This leads to the geometry of Ngo and Saad (and earlier Keshavan et al). This works nicely because the step of updating S can be solved in closed-form by exploiting the structure of matrix completion.

When we update U and V simultaneously, it turns out that they lie on the bi-Grassmann manifold. Why? Because, consider the points (U, V) and (UP, VQ), where P and Q are orthogonal matrices. Observe that for the second step to update S, we have min_S  f(U, S, V) = min_S   f(UP, S, VQ) for all P and Q. 

In other words, argmin_S  f(UP, S, VQ) = P'  ( argmin_S  f(U, S, V)) Q.



Case 5: Nicolas' work uses a two factor decomposition for a fixed rank matrix, say, X = UW'. It eliminates one of the variables (W) by exploiting the particular structure of the matrix completion cost function. Finally, we have one variable (U) and that belongs to the Grassmann manifold. This is an optimization problem on the Grassmann manifold. 


Case 6: Till now we have tried to update U, S, V either simultaneously, alternatively or by eliminating the variables. What if we update the product USV' directly? That is what is broadly done in Bart Vandereycken's work.


Some questions: 

Why so many ideas? Because each has a motivation of its own, and gives an interesting perspective. Each of them exploits certain structures efficiently. Together they show the versatile nature of manifold optimization.

Which should be preferred? This is a research question. :)

Regards,
BM


Hiroyuki Kasai

unread,
Oct 29, 2014, 1:09:05 AM10/29/14
to manopt...@googlegroups.com
Dear Bamdev,

I thank you for your great mini lecture!

I have confirmed that the contents you explain fits your papers and the advices that you 
and Nicolas explained me in the past.
I think I have understood these.

Once again, I am sorry to bother you both, an I thank you both for this Q&A.

Best regards,

Hiro
Reply all
Reply to author
Forward
Message has been deleted
0 new messages