Gradient and product manifold

163 views
Skip to first unread message

bahu....@gmail.com

unread,
Sep 18, 2014, 3:55:20 PM9/18/14
to manopt...@googlegroups.com
Hi, the following code produced a gradient error ( Here, W{1}, W{2} are symmetric matrices, and k>=1):

function [ X ] = min_LMF_test1(W,k)
n=size(W{1});

Man.A1 = symmetricfactory(k); % Symmetric matrices A_1
Man.A2 = symmetricfactory(k); % Symmetric matrices A_2
Man.P = stiefelfactory(n, k); % Stiefel manifold for P


problem.M = productmanifold(Man);

% Define the problem cost function
problem.cost = @cost;

function f = cost(X)
f=norm(W{1}-X.P*X.A1*X.P','fro')^2 + ...
norm(W{2}-X.P*X.A2*X.P','fro')^2;
end

% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));

function g = egrad(X)

g.A1=-2*X.P'*W{1}*X.P +2*X.A1;
g.A2=-2*X.P'*W{2}*X.P +2*X.A2;

g.P= -4*(W{1}*X.P*X.A1 + ...
W{2}*X.P*X.A2);


end

% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;


end

The strange thing is the following: If I do away with the variable X.A2 and change the manifold as well as the function and the gradient correspondingly, then the gradient is ok:

function [ X ] = min_LMF_test1(W,k)
n=size(W{1});

Man.A1 = symmetricfactory(k); % Symmetric matrices A_1
% Man.A2 = symmetricfactory(k); % Symmetric matrices A_2
Man.P = stiefelfactory(n, k); % Stiefel manifold for P


problem.M = productmanifold(Man);

% Define the problem cost function
problem.cost = @cost;

function f = cost(X)
f=norm(W{1}-X.P*X.A1*X.P','fro')^2 ;% + ...
% norm(W{2}-X.P*X.A2*X.P','fro')^2;
end

% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));

function g = egrad(X)

g.A1=-2*X.P'*W{1}*X.P +2*X.A1;
% g.A2=-2*X.P'*W{2}*X.P +2*X.A2;

g.P= -4*(W{1}*X.P*X.A1);% + ...
% W{2}*X.P*X.A2);


end

% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;

end


BM

unread,
Sep 18, 2014, 4:58:07 PM9/18/14
to manopt...@googlegroups.com
Hello Bahu,

The gradient computation was not correct. Attached is the corrected one. 

Regards,
BM
min_LMF_test1_BM.html

bahu....@gmail.com

unread,
Sep 19, 2014, 3:56:35 AM9/19/14
to manopt...@googlegroups.com
Thank for your comment! I think I understand what happened, which may be of general interest. What I did was the following:

Instead of
f = norm(W{1} - X.P*X.A1*X.P','fro')^2 + ...


norm(W{2} - X.P*X.A2*X.P','fro')^2;

let's consider for simplicity the functional

f=norm(W-PAP',fro)^2, (W symmetric)

Using the identity norm(X,'fro')^2=trace(X'*X), on the manifold P'P=I f is the same as

f0=-2trace(WPAP')+ const.

So I calculated the gradients with respect to f0. Of course the Euclidean gradients of f and f0 are not identical, not even on the manifold; the correct gradient is that given by you in your post.

But: minimizing f and f0 with Manopt leads to the same results (up to a rotation). So, even if there is an error message regarding the gradient, the optimization algorithm works well. I think this is the case because f0 is a good approximation to f on P'P=I. Can this be put into a general scheme?


BM

unread,
Sep 19, 2014, 6:13:31 AM9/19/14
to manopt...@googlegroups.com
I think, f0 should not be considered as an approximation of f. In that context, I believe that the critical points have to be different. Could you verify again that they are different by rotations only?  

You should also change the norm of A and W. Try making them bigger to see the difference. 

Do let us know about your observations.

Regards,
BM


Nicolas Boumal

unread,
Sep 19, 2014, 8:04:00 AM9/19/14
to
Actually, it would be fine to simplify the cost function using the manifold definition. For example, we do it when we optimize the Rayleigh quotient over the sphere: instead of working with x'Ax / x'x, we work with x'Ax, because we constrain x'x = 1 (sphere).

But in your case, the simplification appears to not be correct. You said the following:

Let's  consider for simplicity the functional

f=norm(W-PAP',fro)^2, (W symmetric)

Using the identity norm(X,'fro')^2=trace(X'*X), on the manifold P'P=I f is the same as

f0 = -2trace(WPAP') + const.


But that "const." includes a term corresponding to the squared norm of A, and that is not constant wrt your variables. So probably your gradient was just missing that term. (This echoes Bamdev's suggestion of changing the norm of A).

Cheers,
Nicolas
Message has been deleted

bahu....@gmail.com

unread,
Sep 19, 2014, 1:58:04 PM9/19/14
to manopt...@googlegroups.com
Nicolas, you are right, there is a typo in my post. It should read

f0 = -2trace(WPAP' + A^2) + const;

and that is exactly the version I made use of in my computations. Differentiating f0, one gets

g0.A=2(A-P'WP)
g0.P=-4WPA

and this is exactly the result which one obtains by inserting P'P=I into the exact Euclidean gradient of f, which is

g.A=2P'(PAP'-W)P
g.P=2(PAP'-W)PA


Thanks for your interest,
B.S.

bahu....@gmail.com

unread,
Sep 19, 2014, 2:35:27 PM9/19/14
to manopt...@googlegroups.com
These are the function for minimizing f and f0 respectively.

______________________________________________________________
function [ X ] = check_min_f(W,k,X0)

n = size(W,1);

Man.A = symmetricfactory(k); % Symmetric matrices A


Man.P = stiefelfactory(n, k); % Stiefel manifold for P

problem.M = productmanifold(Man);

% Define the problem cost function
problem.cost = @cost;

function f = cost(X)
f = norm(W - X.P*X.A*X.P','fro')^2;
end

% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));

function g = egrad(X)
% auxilliary variables
S = 2*(X.P*X.A*X.P' - W);

g.A = X.P'*S*X.P;
g.P = 2*S*X.P*X.A;

end

% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;

% options for the minimization algorithm
options.verbosity=2;
options.maxiter=500;

[X ] = conjugategradient(problem,X0,options);

end
_________________________________________________________
AND NOW f0:
_________________________________________________________
function [ X ] = check_min_f0(W,k,X0)

n = size(W,1);

Man.A = symmetricfactory(k); % Symmetric matrices A


Man.P = stiefelfactory(n, k); % Stiefel manifold for P

problem.M = productmanifold(Man);

% Define the problem cost function
problem.cost = @cost;

function f = cost(X)
f = -2*trace(W*X.P*X.A*X.P')+ trace(X.A*X.A);
end

% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));

function g = egrad(X)
g.A=2*(X.A-X.P'*W*X.P);
g.P=-4*W*X.P*X.A;

end

% Check numerically whether the gradient is correct
checkgradient(problem);
drawnow;

% options for the minimization algorithm
options.verbosity=2;
options.maxiter=500;

[X ] = conjugategradient(problem,X0,options);

end
__________________________________________________________

I did some tests with

W=rand(10);
W=W+W';
k=5,X0=[];

and then confirmed that the X.P - results of minimizing both functions are related by a rotation (using procrustes() ).

It is also instructive to use the result of minimizing one of these function as the starting point for the other one; in this case, the results agree.

Best,
B.S.

Reply all
Reply to author
Forward
0 new messages