Solving for the gradient of rotation matrices

218 views
Skip to first unread message

Chao Zhang

unread,
Oct 14, 2015, 3:55:07 PM10/14/15
to Manopt
hi Guys, 

I'm having a problem with deriving the gradient of a optimisation problem using Stiefel manifold (d x d x k). The optimisation variables are k (k > 1) d x d matrices T. 

The objective function is: 
\sum_{i,j}^{k} \trace (T_{i}^{-1} T_{j} X^{j}) L^{j} (T_{i}^{-1} T_{j} X^{j})^{T}
where L^{j} is symmetric matrix and is independent on T_{j}
My aim is to get the gradient w.r.t. T variable (there are totally k such variables)

Could any one give me some idea on how to do this?

Cheers,
Chao

Nicolas Boumal

unread,
Oct 26, 2015, 6:05:16 AM10/26/15
to Manopt
Hello Chao,

Sorry for the long delay in our response.

My prefered way for deriving the gradient is to obtain an expression for the directional derivatives first (also called the Fréchet derivatives).

In your case, you have inverses of matrices appearing. For this, you will need this formula:

f(X) = inv(X)

Df(X)[H] = - inv(X) * H * inv(X)

Notice how, if X is a scalar, this reduces to the well known formula for the derivative of 1/x, which is -1/x^2.

I hope this can help,

Best,

Nicolas

cz...@york.ac.uk

unread,
Oct 27, 2015, 6:09:49 AM10/27/15
to Manopt
Hi Nicolas, 

Thanks for your kind reply!

Actually, I was doing very similar thing as you suggested. The only problem is that I didn't get passed the checkgradient routine. 
After a while, I figure out that if my matrix variables are all orthogonal matrices, this cost term actually won't change as the rotation matrices are optimised. Therefore, the directional gradient has arbitrary direction. I guess to make this term meaningful, I got to relax the variable to ordinary matrix rather than orthogonal ones. 

Cheers,
Chao

Nicolas Boumal

unread,
Oct 27, 2015, 6:32:49 AM10/27/15
to manopt...@googlegroups.com
Hello again Chao,

Now that you mention it, you are right: inside a Trace, you may cyclically permute matrices (i.e., Trace(ABC) = Trace(CAB) for exemple). So in your case, if the T_i's are orthogonal, the cost function is constant, and the gradient is zero.

To fix this, you should perhaps reconsider the reasons you wanted to make the T_i's orthogonal to begin with. If you allow the T_i's to be "free", then perhaps T_i = 0 for all i will be the (uninteresting) optimal solution. Determining how to best fix this problem will require going back to the original problem and mathematical modelling phase.

Best,

Nicolas


Reply all
Reply to author
Forward
0 new messages