Cost function on Stiefel manifold depending on few entries

28 views

Skip to first unread message

Francisco Horta

unread,

Apr 17, 2019, 9:20:57 AM4/17/19

to Manopt

Say we want to minimize the trace distance of two matrices (https://en.wikipedia.org/wiki/Trace_distance) on a complex Stiefel manifold. The matrices are M, an input which we know is positive semidefinite and of trace 1, and N which is given by:

N = S*p*S^{\dag},

where p is an input, also positive semidefinite and of trace 1, and S is the Stiefel matrix.

I know how to obtain the gradient of this cost function. However, this is not exactly the problem I have. What I want to minimize is only the difference between some of the entries in M and N, not the whole matrices. I guess I should try to obtain some kind of entrywise gradient, but I don't know how to go about this. Any suggestions?

I realize that this might be a bit confusing, so if it's helpful I can write something up in LaTeX explaining my problem in greater depth.

Thanks!

Nicolas Boumal

unread,

Apr 17, 2019, 3:15:40 PM4/17/19

to Manopt

Hello Francisco,

One way to think about it is this: for a given matrix X, a model for the problem you're trying to solve is that you want to minimize:

f(X) = \sum_{i,j \in S} (X_{ij} - Y_{ij})^2

where S is a certain set of positions is the matrix. We can rewrite this as follows: define M to be a sparse, binary matrix, with M_{ij} = 1 if i, j \in S, and M_{ij} = 0 otherwise. Then, we can write

f(X) = || (X - Y) .* M ||_F^2

so, it's a squared Frobenius norm, except we've "masked" the difference X-Y before taking that norm. We can write this as

f(X) = < (X - Y).*M, (X - Y).*M >

where <A, B> = trace(A'*B). This is easy to differentiate:

Df(X)[Z] = 2 < Z .* M, (X - Y) .* M >

and within the inner product, we can move an entry-wise product to the other side:

Df(X)[Z] = 2 < Z, (X - Y) .* M .* M >

This is true for all directions Z, hence

grad f(X) = 2 (X - Y) .* M .* M

(Notice that, here, M .* M = M, so we can simplify a bit).

In other words: the gradient of the masked frobenius norm is just the usual gradient, masked.

Hope this helps,

Nicolas

Reply all

Reply to author

Forward

0 new messages