Quotient under Permutation group

30 views
Skip to first unread message

Carlos Feres

unread,
Sep 8, 2021, 2:35:36 PMSep 8
to Manopt
Hi all,

This is more of a theoretical question, even when relevant to the optimization problem I'm working on. A simplified version of the cost function is

f(W) = sum_{i=1}^N g(W*e_i)

with e_i the i-th canonical vector, i.e., the cost function depends on the columns of W only, with no particular ordering. Note that, because of the cost function structure, I don't care about permutations of W, that is, if \hat{W} is an optimum, \hat{W}*P with P a permutation matrix is also an optimum. 
(I don't know if it's important for the following question, but I can say that we want W in the Stiefel manifold { W^H * W = I } to have all columns pairwise orthogonal, if that helps further).

My question is that I don't know if it is really needed (theoretically) to quotient over the group of permutation matrices for theoretical guarantees or good behavior (besides that numerically, everything works fine). Moreover, I haven't seen any works taking the quotient with respect to the group of permutation matrices. 

Why is this? I have thought of some ideas, but can't reach a final satisfactory answer. 
- Is it that the equivalence class under permutation matrices are disconnected components?
- Or maybe, the solvers intrinsically quotient under the group of permutation matrices when performing SVDs, eigendecompositions, or other factorizations?

This might be too broad of a question, but any insights would be much appreciated. Thanks!

Ronny Bergmann

unread,
Sep 9, 2021, 7:30:41 AMSep 9
to Manopt
Hi,

if I understand you question correctly, then the setting that is a little more general is, that instead of only permutations, you allow for rotations (or a multiplication with any orthonormal matrix, not just permutation matrices) – that is you are only interested in the _span_ of the columns or in other words the Grassmann manifold. Usually Grassmann is still represented by a matrix (the W you have) but the W then represents a whole subspace (or in other words its equivalence class). Some functions change, for sure, but not all. You seem to be kind of “between” Stiefel and Grassmann, so maybe most people directly go to Grassmann in your case?

Carlos Feres

unread,
Sep 9, 2021, 2:46:25 PMSep 9
to Manopt
Yes, I am asking for a situation "in between" Stiefel and Grassman: my cost function is not invariant under orthonormal transformation of W (i.e., f(WQ) ~= f(W) for Q orthonormal), but it is under permutations of columns (i.e. f(WP) = f(W) with P a permutation matrix). 
It might be that nobody has tried a Riemmanian geometry in this type of cost functions yet. 
Even then, I have not seen any work dealing with quotients under the group of permutation matrices, and that's where my question comes from.

Nicolas Boumal

unread,
Sep 12, 2021, 3:50:40 AMSep 12
to Manopt
Hello Carlos,

You are right that the quotient of the Stiefel manifold by the permutation group is a smooth manifold: you can confirm this using the main theorem in Section 9.2 of my book, and checking that the group action which consists in permuting the columns of a Stiefel matrix is smooth, free and proper. Smooth is clear (it's a matrix product), proper is clear (the group is compact) and free is clear (a Stiefel matrix never has two identical columns, so permuting columns necessarily changes the matrix).

It's not a particularly "interesting" quotient manifold though, because the total space (Stiefel) and the quotient have the same dimension. That's because the permutation group is a Lie group of dimension 0 (it's discrete).

As a result, the quotient map $\pi$ from Stiefel to the quotient is actually a local diffeomorphism (and with the usual Riemannian metrics it would even be an isometry): as far as local behavior is concerned, optimizing on the total space or on the quotient space is exactly the same. So there's no need to adapt algorithms either.

That is not to say that there aren't interesting global phenomena worth studying. For example, it's often the case that symmetries cause the landscape of the function to be more complex (saddles etc.), without making the problem fundamentally harder to solve. This can have ramifications such as benign non-convexity.

I hope this clarifies things. Let me know if not.

Best,
Nicolas

Carlos Feres

unread,
Sep 12, 2021, 4:37:48 AMSep 12
to Manopt
Hello Nicolas,

Thanks for your enlightening answer: it is concise and very clear. Indeed, the function I'm studying has benign non-convexity, but it is good to know that locally, things remain the same without adapting algorithms. 
I might have been at loss with technical concepts, but intuitively, I might have made some sense with my comment above as understanding the quotient under permutation matrices as "disconnected components", and thus they are not affecting search directions. 

Thanks for your help!

Nicolas Boumal

unread,
Sep 12, 2021, 4:56:04 AMSep 12
to Manopt
Hello Carlos,

It's interesting that you have a form of benign non-convexity here: can you tell us a bit more about the problem, or is there a paper where I can read more about it?

Best,
Nicolas
Reply all
Reply to author
Forward
0 new messages