Gradient of a cost function on symmetric positive definite manifold

sirinuku...@gmail.com

unread,

Jan 19, 2014, 12:58:46 AM1/19/14

to manopt...@googlegroups.com

Hi,

First of all many thanks for such an excellent toolbox. It really helps me with what I'm doing.

In the previous post, Florian asked about the gradient of a cost function of this form:

F(X,A,B) = Trace(log( X^0.5 A X^0.5) log( X^0.5 B X^0.5) ) with X, A and B SDP matrices (and X^0.5 being the matrix square root of X), and log denotes matrix logarithm.

My question is rather simple. What is the grad of F(X,A,B) with respect to A? Having looked into some of the material that you provided in the previous post, I found it is close to a gradient of square of geodesic distance, but not exactly the same.

Many thanks,
Korsuk

Nicolas Boumal

unread,

Jan 20, 2014, 4:02:41 AM1/20/14

to manopt...@googlegroups.com

Dear Korsuk,

Thank you for your message.
I believe the function you are interested in is in this paper:

http://perso.uclouvain.be/nicolas.boumal/papers/Boumal_Absil_Discrete_regression_methods_on_the_cone_of_positive_definite_matrices.pdf

but with different names.

Go to page 4234 (the third page in the PDF). A function g is defined there:

g(A, B, C) = <Log_A(B), Log_A(C)>_A,

where <.,.>_A is the inner product at A (see equation (5) and also equation (3) for the "normal" inner product), and Log_A(.) is the log map at A (see just below equation (5), where log is the matrix logarithm).

Thus, the function g is equal to:

g(A, B, C) = Trace( log(A^-0.5 B A^-0.5) log(A^-0.5 C A^-0.5) ).

This is the same function as yours but with these changes:

F becomes g

X becomes A^-1

A becomes B

B becomes C

You want the gradient of F with respect to A. This is thus equivalent to the gradient of g with respect to B.

This gradient is given in the right column of page 4234.

The formula there is a bit involved: note the paragraph just below it which explains the notation, and note that you need code for the directional derivative of the matrix logarithm. Code for this was given in the discussion with Florian if I recall correctly.

Don't hesitate if you need further help.

Best,

Nicolas

sirinuku...@gmail.com

unread,

Jan 21, 2014, 1:03:31 AM1/21/14

to manopt...@googlegroups.com

Hi Nicolas,

Thank you so much for the material. It's very helpful. I implemented it and it's working fine.

There are other queries:

1) When trust region method is used, it always shows the message that Hessian is not provided, and it uses numerical approximation instead. I was wondering if you can point me out how to derive Hessian of

g(A, B, C) = Trace( log(A^-0.5 B A^-0.5) log(A^-0.5 B A^-0.5) ) + Trace( log(A^-0.5 B A^-0.5) log(A^-0.5 C A^-0.5) )

with respect to B.

2) In the literature that you pointed to, it mentions Log-Euclidean metric. Since I have no background in differential geometry, it's not clear to me how this metric works. How to find the gradient and Hessian of g(A,B,C) with respect to B in this metric?

Many thanks,
Korsuk

Nicolas Boumal

unread,

Jan 21, 2014, 3:19:00 AM1/21/14

to manopt...@googlegroups.com

Hello Korsuk,

Good to know it works!

1) Computing the Hessian is difficult here, because you would need a second-order derivative of the matrix logarithm. I didn't try very hard, but I never got around doing it. If you do, can you please post your answer here? I'd be interested, and other readers might be too.

If the warning annoys you, you can disable it with the following command:
warning('off', 'manopt:getHessian:approx');

2) If you take the matrix logarithm of a positive matrix, you get a symmetric matrix. The set of symmetric matrices is a vector space (a linear space). The idea behind the Log-Euclidean metric is to take the matrix logarithm of your data, then work in the space of symmetric matrices as you would usually, and finally take the matrix exponential of your answer to go back to the positive definite matrices. Because of this, it is very easy to work with the Log-Euclidean metric. (Using that metric explicitly on the space of positive definite matrices is not so simple: it was not designed for it.)

Here is the relevant reference:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.218.3290&rep=rep1&type=pdf

So I suppose you would have to go back to the origin of your problem and see, from the start, how you can go to the log-domain before doing any treatment.

I hope this helps (sorry that both answers are more or less negative). As always: do not hesitate if you have further questions.

Cheers,

Nicolas

sirinuku...@gmail.com

unread,

Jan 22, 2014, 7:36:10 AM1/22/14

to manopt...@googlegroups.com

Hi Nicolas,

Hopefully, if I manage to work out the formula of Hessian, I will post it here.

Log-Euclidean metric seems promising to solve my original problem efficiently. The number of parameters involving in my cost function is quite high. So I wish that this alternative will help reducing computational complexity of the optimisation process. It will take me some times to understand the concept of log-Euclidean metric, and I'm not quite sure how to perform optimisation on a space of positive symmetric matrix. Can you give me some pointers?

Really appreciate your help.

Best wishes,
Korsuk

Nicolas Boumal

unread,

Jan 22, 2014, 8:38:12 AM1/22/14

to manopt...@googlegroups.com

Hello,

Once you take the matrix log of your positive-definite data, you land in the space of symmetric matrices (not positive symmetric matrices as you mention in your latest post). The set of symmetric matrices is simply a vector space, so you have many options to deal with that.

One option is to use Manopt: since Manopt can deal with Riemannian manifolds, it can deal with vector spaces in particular.

Here is some code you can put in a file name "symmetricfactory.m". You can then use this factory as a manifold (as you would normally with Manopt).

This is not tested code: it is just hacked together in the hope you can find it useful. If it is, we might put it in the next release.

Here is some example code that looks for the symmetric matrix closest to some given matrix A (arbitrary) in the Forbenius norm. Of course, the output should just be the symmetric part of A (and such is the case).

A = randn(5)

problem.M = symmetricfactory(5)

problem.cost = @(X) .5*norm(X-A, 'fro')^2;

problem.egrad = @(X) X-A;

checkgradient(problem)

trustregions(problem)

You can use the same factory to work with k symmetric matrices of size n at the same time, stored in a 3D array (or 3D matrix) of size n x n x k.

Cheers,

Nicolas

-------------------------------

function M = symmetricfactory(n, k)

% Returns a manifold struct to optimize over k symmetric matrices of size n

%

% function M = symmetricfactory(n)

% function M = symmetricfactory(n, k)

%

% Returns M, a structure describing the Euclidean space of n-by-n symmetric

% matrices equipped with the standard Frobenius distance and associated

% trace inner product, as a manifold for Manopt.

% By default, k = 1. If k > 1, points and vectors are stored in 3D matrices

% X of size nxnxk such that each slice X(:, :, i), for i = 1:k, is

% symmetric.

% This file is part of Manopt: www.manopt.org.

% Original author: Nicolas Boumal, Jan. 22, 2014.

% Contributors:

% Change log:

if ~exist('k', 'var') || isempty(k)

k = 1;

end

M.name = @() sprintf('(Symmetric matrices of size %d)^%d', n, k);

M.dim = @() k*n*(n+1)/2;

M.inner = @(x, d1, d2) d1(:).'*d2(:);

M.norm = @(x, d) norm(d(:), 'fro');

M.dist = @(x, y) norm(x(:)-y(:), 'fro');

M.typicaldist = @() sqrt(k)*n;

M.proj = @(x, d) multisym(d);

M.egrad2rgrad = M.proj;

%M.ehess2rhess = @(x, eg, eh, d) eh;

M.tangent = @(x, d) d;

M.exp = @exp;

function y = exp(x, d, t)

if nargin == 3

y = x + t*d;

else

y = x + d;

end

M.retr = M.exp;

M.log = @(x, y) y-x;

M.hash = @(x) ['z' hashmd5(x(:))];

M.rand = @() multisym(randn(n, n, k));

M.randvec = @randvec;

function u = randvec(x) %#ok<INUSD>

u = multisym(randn(n, n, k));

u = u / norm(u(:), 'fro');

end

M.lincomb = @lincomb;

function v = lincomb(x, a1, d1, a2, d2) %#ok<INUSL>

if nargin == 3

v = a1*d1;

elseif nargin == 5

v = a1*d1 + a2*d2;

else

error('Bad usage of euclidean.lincomb');

end

M.zerovec = @(x) zeros(n, n, k);

M.transp = @(x1, x2, d) d;

M.pairmean = @(x1, x2) .5*(x1+x2);

M.vec = @(x, u_mat) u_mat(:);

M.mat = @(x, u_vec) reshape(u_vec, [m, n]);

M.vecmatareisometries = @() true;

end

sirinuku...@gmail.com

unread,

Jan 28, 2014, 3:21:25 PM1/28/14

to manopt...@googlegroups.com

Thanks so much. I'm still struggling to better understand log-Eulcidean metric and haven't try the code in my application yet. But, with your example it seems to work fine.

Best wishes,
Korsuk

sirinuku...@gmail.com

unread,

Feb 10, 2014, 3:15:14 AM2/10/14

to manopt...@googlegroups.com

Hi Nicolas,

Can you please check if the Riemannian distance in sympositivedefinitefactory.m is correct?

M.dist = @dist;
function d = dist(X, Y)
d = norm(logm(X\Y), 'fro');
end

it does not give the same value as

d = norm(logm(sqrtm(Y)\X/sqrtm(X)),'fro') or
d = sqrt(sum(log(eig(X\Y)).^2))

Best wishes,
Korsuk

BM

unread,

Feb 10, 2014, 3:53:41 AM2/10/14

to manopt...@googlegroups.com

Hello,

For the time being could you replace the code by this

M.dist = @dist;
function d = dist(X, Y)

L = chol(X);
d = M.norm(X, L'*logm(L'\Y/L)*L);
end

Could you let us know whether this works for you?

We are looking into the other formulations. It seems so intriguing given the fact that all the formulas mathematically equivalent.

Regards,

Bamdev

P.S. Your alternate suggestion should be d = norm(logm(sqrtm(X)\Y/sqrtm(X)),'fro')

instead of d = norm(logm(sqrtm(Y)\X/sqrtm(X)),'fro').

Nicolas Boumal

unread,

Feb 10, 2014, 4:26:01 AM2/10/14

to manopt...@googlegroups.com

Hmm, that's weird.

One thing that's probably important is that even if X and Y are symmetric (ans positive definite), X\Y is not symmetric in general. Hence, even though it is true that

norm(log(eig(X)))

and

norm(logm(X), 'fro')

are equal, it is not true anymore if you replace X by X\Y.

That will probably fit somewhere in the explanation.

This being said, we have the following thing that is peculiar:

M = sympositivedefinitefactory(5)
X = M.rand()
Xdot = M.randvec(X);
Z = M.exp(X, Xdot)

fprintf('Xdot is unit norm: %g\n', M.norm(X, Xdot));

So we built Z as the exponential from X with a unit-norm vector.

And indeed:

M.dist(X, Z) % gives 1

But,

norm(logm(sqrtm(Z)\X/sqrtm(Z)),'fro')

does not give 1.

This is weird, 'cause we do want the distance to be one (if the exponential is correct) ...

This needs further investigation. Thanks for bringing it up!

PS: Bamdev, I believe the formula you give with the Cholesky factorization is equivalent to the current formula for M.dist, no?

BM

unread,

Feb 10, 2014, 6:26:06 AM2/10/14

to manopt...@googlegroups.com

You are right. My suggestion doesn't really help. Its a fancy way doing the same thing. :(

Nicolas Boumal

unread,

Mar 5, 2014, 3:42:43 AM3/5/14

to manopt...@googlegroups.com

Looking into this some more, I realize there is an even more important error in this geometry.

Here is the code for the inner product between two tangent vectors and the code for the norm of a tangent vector:

M.inner = @(X, eta, zeta) trace( (X\eta) * (X\zeta) );
M.norm = @(X, eta) norm(X\eta, 'fro');

These are actually incompatible. Indeed, running this code:

M = sympositivedefinitefactory(5)
X = M.rand()
Xdot = M.randvec(X);

sqrt(M.inner(X, Xdot, Xdot))
M.norm(X, Xdot)

we should have that the last two numbers turn out to be the same, but they are not.

The mistake is that M.inner(X, eta, eta) = trace( (X\eta)^2 ), but that is not the same as trace( (X\eta)' * (X\eta) ) since X\eta is not symmetric (even though X and eta are symmetric). Thus, it is not equal to the (squared) Frobenius norm of X\eta.

Turns out that this mistake made the distance function, logarithmic map and exponential map all look compatible.

I'll see what I can do to correct this.

Thank you again for bringing this up!

Nicolas

Nicolas Boumal

unread,

Mar 5, 2014, 4:46:51 AM3/5/14

to manopt...@googlegroups.com

Attached to this message, please find a revised geometry file for sympositivedefinitefactory.m.

Could you please let me know if this is better? That would be really useful!

As far as I can tell, it is now correct.

norm_X(H) = sqrt(trace((X\H)^2))
dist(X, Y) = sqrt(trace(logm(X\Y)^2))
Exp_X(H) = X*expm(X\H)
Log_X(Y) = X*logm(X\Y)

(Notice how we only use the inverse of X, not its square root or inverse square root. These functions do take into account the fact that X\Y and X\H are not symmetric.)

Cheers!

Nicolas

sympositivedefinitefactory.m

Message has been deleted

Nicolas Boumal

unread,

Apr 11, 2014, 7:47:11 AM4/11/14

to manopt...@googlegroups.com

Dear Korsuk,

I see that you replied to this thread but somehow your message was deleted, I apologize for the mishap.

Thank you for getting back in touch with us!

Best,

Nicolas

Reply all

Reply to author

Forward