function [ X ] = min_LMF_test1(W,k)
n=size(W{1});
Man.A1 = symmetricfactory(k); % Symmetric matrices A_1
Man.A2 = symmetricfactory(k); % Symmetric matrices A_2
Man.P = stiefelfactory(n, k); % Stiefel manifold for P
problem.M = productmanifold(Man);
% Define the problem cost function
problem.cost = @cost;
function f = cost(X)
f=norm(W{1}-X.P*X.A1*X.P','fro')^2 + ...
norm(W{2}-X.P*X.A2*X.P','fro')^2;
end
% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));
function g = egrad(X)
g.A1=-2*X.P'*W{1}*X.P +2*X.A1;
g.A2=-2*X.P'*W{2}*X.P +2*X.A2;
g.P= -4*(W{1}*X.P*X.A1 + ...
W{2}*X.P*X.A2);
end
% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;
end
The strange thing is the following: If I do away with the variable X.A2 and change the manifold as well as the function and the gradient correspondingly, then the gradient is ok:
function [ X ] = min_LMF_test1(W,k)
n=size(W{1});
Man.A1 = symmetricfactory(k); % Symmetric matrices A_1
% Man.A2 = symmetricfactory(k); % Symmetric matrices A_2
Man.P = stiefelfactory(n, k); % Stiefel manifold for P
problem.M = productmanifold(Man);
% Define the problem cost function
problem.cost = @cost;
function f = cost(X)
f=norm(W{1}-X.P*X.A1*X.P','fro')^2 ;% + ...
% norm(W{2}-X.P*X.A2*X.P','fro')^2;
end
% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));
function g = egrad(X)
g.A1=-2*X.P'*W{1}*X.P +2*X.A1;
% g.A2=-2*X.P'*W{2}*X.P +2*X.A2;
g.P= -4*(W{1}*X.P*X.A1);% + ...
% W{2}*X.P*X.A2);
end
% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;
end
Instead of
f = norm(W{1} - X.P*X.A1*X.P','fro')^2 + ...
norm(W{2} - X.P*X.A2*X.P','fro')^2;
let's consider for simplicity the functional
f=norm(W-PAP',fro)^2, (W symmetric)
Using the identity norm(X,'fro')^2=trace(X'*X), on the manifold P'P=I f is the same as
f0=-2trace(WPAP')+ const.
So I calculated the gradients with respect to f0. Of course the Euclidean gradients of f and f0 are not identical, not even on the manifold; the correct gradient is that given by you in your post.
But: minimizing f and f0 with Manopt leads to the same results (up to a rotation). So, even if there is an error message regarding the gradient, the optimization algorithm works well. I think this is the case because f0 is a good approximation to f on P'P=I. Can this be put into a general scheme?
Let's consider for simplicity the functional
f=norm(W-PAP',fro)^2, (W symmetric)
Using the identity norm(X,'fro')^2=trace(X'*X), on the manifold P'P=I f is the same as
f0 = -2trace(WPAP') + const.
f0 = -2trace(WPAP' + A^2) + const;
and that is exactly the version I made use of in my computations. Differentiating f0, one gets
g0.A=2(A-P'WP)
g0.P=-4WPA
and this is exactly the result which one obtains by inserting P'P=I into the exact Euclidean gradient of f, which is
g.A=2P'(PAP'-W)P
g.P=2(PAP'-W)PA
Thanks for your interest,
B.S.
______________________________________________________________
function [ X ] = check_min_f(W,k,X0)
n = size(W,1);
Man.A = symmetricfactory(k); % Symmetric matrices A
Man.P = stiefelfactory(n, k); % Stiefel manifold for P
problem.M = productmanifold(Man);
% Define the problem cost function
problem.cost = @cost;
function f = cost(X)
f = norm(W - X.P*X.A*X.P','fro')^2;
end
% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));
function g = egrad(X)
% auxilliary variables
S = 2*(X.P*X.A*X.P' - W);
g.A = X.P'*S*X.P;
g.P = 2*S*X.P*X.A;
end
% Check numerically whether gradient is correct
checkgradient(problem);
drawnow;
% options for the minimization algorithm
options.verbosity=2;
options.maxiter=500;
[X ] = conjugategradient(problem,X0,options);
end
_________________________________________________________
AND NOW f0:
_________________________________________________________
function [ X ] = check_min_f0(W,k,X0)
n = size(W,1);
Man.A = symmetricfactory(k); % Symmetric matrices A
Man.P = stiefelfactory(n, k); % Stiefel manifold for P
problem.M = productmanifold(Man);
% Define the problem cost function
problem.cost = @cost;
function f = cost(X)
f = -2*trace(W*X.P*X.A*X.P')+ trace(X.A*X.A);
end
% Define the problem gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));
function g = egrad(X)
g.A=2*(X.A-X.P'*W*X.P);
g.P=-4*W*X.P*X.A;
end
% Check numerically whether the gradient is correct
checkgradient(problem);
drawnow;
% options for the minimization algorithm
options.verbosity=2;
options.maxiter=500;
[X ] = conjugategradient(problem,X0,options);
end
__________________________________________________________
I did some tests with
W=rand(10);
W=W+W';
k=5,X0=[];
and then confirmed that the X.P - results of minimizing both functions are related by a rotation (using procrustes() ).
It is also instructive to use the result of minimizing one of these function as the starting point for the other one; in this case, the results agree.
Best,
B.S.