Thanks for the questions. They are quite relevant. I would like to add to Nicolas' answer. Below is my opinion.
It all depends on how we optimize. I will try to be a bit more general here. Say, X has the SVD as X = USV', where U and V are on Stiefel manifolds and S is a diagonal matrix with positive entries. Now, we want to optimize on the fixed rank manifold, a way is to optimize U, S, and V.
Case 1: Optimize U, S, V simultaneously, on St(p,m) x Diag+(p) x St(p,n). This performs poorly, this is what is shown in Section 6.1 of the mentioned paper. It seems that it is difficult to handle the positive diagonal elements on S owing to discrete permutations.
Here the symmetry is due to permutation of rows and columns.
Case 2: Optimize U, S, V simultaneously but we consider a *symmetric positive definite* matrix of the factor S, instead of a diagonal matrix. That is on St(p,m) x Sym(p) x St(p,n). This leads to a particular quotient structure. Factoring out the quotient, leads to a geometry. This performs better.
Here the symmetry is with respect to orthogonal matrices.
Conclusion from Case 1 and Case 2: It is better to deal with a factorization that results in a smooth quotient structure, e.g., Case 2.
Case 3: Optimize U, S, V simultaneously and this time allow S to be a non-singular matrix (need not be symmetric). On St(p,m) x GL(p) x St(p,n). This leads to a particular quotient structure again... Allows for metric tuning, i.e., we could design a particular Riemannian metric to suit the cost function.
Here the symmetry is with respect to a product of orthogonal matrices.
Case 4: We follow an alternating strategy. First, we update U and V simultaneously. Second, fixing U and V, we update S. We alternate between these two. This leads to the geometry of Ngo and Saad (and earlier Keshavan et al). This works nicely because the step of updating S can be solved in closed-form by exploiting the structure of matrix completion.
When we update U and V simultaneously, it turns out that they lie on the bi-Grassmann manifold. Why? Because, consider the points (U, V) and (UP, VQ), where P and Q are orthogonal matrices. Observe that for the second step to update S, we have min_S f(U, S, V) = min_S f(UP, S, VQ) for all P and Q.
In other words, argmin_S f(UP, S, VQ) = P' ( argmin_S f(U, S, V)) Q.
Case 5: Nicolas' work uses a two factor decomposition for a fixed rank matrix, say, X = UW'. It eliminates one of the variables (W) by exploiting the particular structure of the matrix completion cost function. Finally, we have one variable (U) and that belongs to the Grassmann manifold. This is an optimization problem on the Grassmann manifold.
Case 6: Till now we have tried to update U, S, V either simultaneously, alternatively or by eliminating the variables. What if we update the product USV' directly? That is what is broadly done in Bart Vandereycken's work.
Some questions:
Why so many ideas? Because each has a motivation of its own, and gives an interesting perspective. Each of them exploits certain structures efficiently. Together they show the versatile nature of manifold optimization.
Which should be preferred? This is a research question. :)
Regards,
BM