Solving for the gradient of rotation matrices

Chao Zhang

unread,

Oct 14, 2015, 3:55:07 PM10/14/15

to Manopt

hi Guys,

I'm having a problem with deriving the gradient of a optimisation problem using Stiefel manifold (d x d x k). The optimisation variables are k (k > 1) d x d matrices T.

The objective function is:

\sum_{i,j}^{k} \trace (T_{i}^{-1} T_{j} X^{j}) L^{j} (T_{i}^{-1} T_{j} X^{j})^{T}

where L^{j} is symmetric matrix and is independent on T_{j}

My aim is to get the gradient w.r.t. T variable (there are totally k such variables)

Could any one give me some idea on how to do this?

Cheers,

Chao

Nicolas Boumal

unread,

Oct 26, 2015, 6:05:16 AM10/26/15

to Manopt

Hello Chao,

Sorry for the long delay in our response.

My prefered way for deriving the gradient is to obtain an expression for the directional derivatives first (also called the Fréchet derivatives).

See this post for a general introduction: https://groups.google.com/forum/#!topic/manopttoolbox/uBdpvA0vVx4

In your case, you have inverses of matrices appearing. For this, you will need this formula:

f(X) = inv(X)

Df(X)[H] = - inv(X) * H * inv(X)

Notice how, if X is a scalar, this reduces to the well known formula for the derivative of 1/x, which is -1/x^2.

I hope this can help,

Best,

Nicolas

cz...@york.ac.uk

unread,

Oct 27, 2015, 6:09:49 AM10/27/15

to Manopt

Hi Nicolas,

Thanks for your kind reply!

Actually, I was doing very similar thing as you suggested. The only problem is that I didn't get passed the checkgradient routine.

After a while, I figure out that if my matrix variables are all orthogonal matrices, this cost term actually won't change as the rotation matrices are optimised. Therefore, the directional gradient has arbitrary direction. I guess to make this term meaningful, I got to relax the variable to ordinary matrix rather than orthogonal ones.

Cheers,

Chao

Nicolas Boumal

unread,

Oct 27, 2015, 6:32:49 AM10/27/15

to manopt...@googlegroups.com

Hello again Chao,

Now that you mention it, you are right: inside a Trace, you may cyclically permute matrices (i.e., Trace(ABC) = Trace(CAB) for exemple). So in your case, if the T_i's are orthogonal, the cost function is constant, and the gradient is zero.

To fix this, you should perhaps reconsider the reasons you wanted to make the T_i's orthogonal to begin with. If you allow the T_i's to be "free", then perhaps T_i = 0 for all i will be the (uninteresting) optimal solution. Determining how to best fix this problem will require going back to the original problem and mathematical modelling phase.