Differentiation a cost function based on a matrix logarithm

197 views
Skip to first unread message

Florian

unread,
Nov 7, 2013, 10:35:52 AM11/7/13
to manopt...@googlegroups.com
Hello,

In a previous post on this forum, you kindly provided me the codes for handling optimization problems on symmetric definite positive (SDP) matrices and it works pretty well on various cost function on this manifold.

I am currently trying to optimize a cost function of the form :
F(X) = Trace(log( X^0.5 A X^0.5) log( X^0.5 B X^0.5) ) with X, A and B SDP matrices (and X^0.5 being the matrix square root of X).

It turns out that looking for the gradient of F (wrt X) is a rather tricky question. Indeed, when checking my gradient numerically with manopt, it seems that I made a mistake on my computation.
Even if it is not directly related to the manopt toolbox, I was wondering if you would have some hints about the calculation of this gradient ?

Thanks again for sharing all this work under an open-source licence !

Florian

BM

unread,
Nov 7, 2013, 10:37:42 AM11/7/13
to manopt...@googlegroups.com
log is matrix logarithm? 

Florian

unread,
Nov 7, 2013, 11:16:08 AM11/7/13
to manopt...@googlegroups.com
Yes, (sorry for the misleading notation), I am using the matrix logarithm (and not the element-wise).

Nicolas Boumal

unread,
Nov 7, 2013, 11:18:10 AM11/7/13
to manopt...@googlegroups.com
Hello Florian,

Your function F is very similar to functions f and g in this paper:
(go directly to Section 4).

There, an algorithm is given to compute directional derivatives of the matrix logarithm (which is the tricky part). That algorithm is explained in more detail in
(The main difference is that the first paper deals with PSD matrices (like you) whereas the second paper deals with rotation matrices.)

Attached, please find Matlab files (written in 2010, I haven't used them much since...) to compute the directional derivatives of logm and expm, as explained in the paper.

I hope this can help.

Are you trying to compute the Euclidean gradient or the Riemannian gradient? If the latter, which geometry are you using? (I imagine the same one as in the first paper linked.)

Don't hestitate to let us know if this doesn't quite help you, I'll try to be more explicit then. Also don't hesitate to share the details here if it worked.
dexpm.m
dfunm.m
dlogm.m

floria...@gmail.com

unread,
Nov 8, 2013, 12:18:16 PM11/8/13
to manopt...@googlegroups.com
Hello,

Thank you for your answers.

Indeed, my cost function is very similar to the function g.
I will implement the gradient to chech this out (thanks for your source code) and I will let you know.

To answer your question, I am looking for the Riemannian gradient (and I intend to use the affine invariant metric).

For now, after reading those two papers, the details on how to compute the gradient are not totally clear to me. Anyway, thanks for your help !

Nicolas Boumal

unread,
Nov 25, 2013, 9:18:41 AM11/25/13
to manopt...@googlegroups.com
Hello Florian,

Since you asked for a more detailed explanation of this, I believe the relevant part in my master thesis is a good place to start:


Pages 49++ : sections 5.1 to 5.3 explain how to obtain the gradient for a function which is a bit more complicated than yours, but includes yours as a subpart, so the information is in there, and I was rather explicit in the derivation (as far as I remember). The papers I referenced above are condensed versions of parts of my master thesis, so this should be easier to read. This being said, when writing the papers, I became aware of some shortucts that you can take (such as observing whether the differential of the matrix log is self adjoint and things like that), and those things are not in the master thesis.

Cheers,

Nicolas

Florian

unread,
Nov 28, 2013, 9:41:23 AM11/28/13
to manopt...@googlegroups.com
Hello Nicolas,

This is indeed very helpful !!

Thanks again for your help.

Nicolas Boumal

unread,
Jun 24, 2014, 12:51:54 PM6/24/14
to manopt...@googlegroups.com
I can't resist posting here a nice trick Nick Higham showed at a talk a couple weeks ago (it's an old trick, but I never knew about it).

If f is a matrix function (think for example: logm, expm, sqrtm, ...), and you want to compute its directional derivative at X along H (its Fréchet derivative), then you can do so using the matrix function itself, owing to this identity:

f([X, H ; 0, X]) = [f(X), Df(X)[H] ; 0, f(X)]

where 0 is a square matrix of zeros the same size as X and H, and Df(X)[H] is the quantity we're interested in.

So, a very, very simple way of computing the derivative of the matrix logarithm we were discussing above is:

function D = dlogm(X, H)
% function D = dlogm(X, H)
%
% Code to compute the directional derivative (the Fréchet derivative) of
% the matrix logarithm logm at X along H, according to a very nice trick
% Nick Higham talked mentioned in a talk I attended. This code will also
% work for expm, sqrtm, ... with appropriate modifications.
% Nicolas Boumal, UCLouvain, June 24, 2014.
    
    n = size(X, 1);
    
    assert(length(size(X)) == 2,     'X and H must be square matrices.');
    assert(length(size(H)) == 2,     'X and H must be square matrices.');
    assert(size(X, 1) == size(X, 2), 'X and H must be square matrices.');
    assert(all(size(X) == size(H)),  'X and H must have the same size.');
    
    Z = zeros(n);
    A = logm([X, H ; Z, X]);
    D = A(1:n, (n+1):end);
    
end

Nicolas Boumal

unread,
Aug 19, 2014, 11:16:32 AM8/19/14
to manopt...@googlegroups.com
Here is a reference for the trick above:


Theorem 2.1.
Reply all
Reply to author
Forward
0 new messages