37 views

Skip to first unread message

Jul 20, 2023, 1:39:23 PMJul 20

to Manopt

Hello,

I have a problem of the no convergance of the model in the stiefle manifold,

Here is the code:

********************

n = 20;

p = 5;

A_1=zeros(n);

A_2 = randn(n);

A_2 = .5*(A_2+A_2');

manifold = stiefelfactory(n,p);

problem.M = manifold;

problem.cost = @(x) trace(x'*(diag_of(x)*A_2'*diag_of(x))*x);

problem.egrad = @(x) grad_of(A_1,A_2,x);

checkgradient(problem);

[X, xcost, info, options] = barzilaiborwein(problem);

figure;

semilogy([info.iter], [info.gradnorm], '.-');

xlabel('Iteration number');

ylabel('Norm of the gradient of f');

function D=diag_of(X)

D= diag(1 ./ sqrt(diag(X*X')));

end

function G=grad_of(A_1,A_2,X)

Y=X*X';

D= diag(1 ./ sqrt(diag(Y)));

H_2=D*A_2*D;

F=D^2.*(H_2*Y);

S= H_2 -F;

G=2*S*X;

end

p = 5;

A_1=zeros(n);

A_2 = randn(n);

A_2 = .5*(A_2+A_2');

manifold = stiefelfactory(n,p);

problem.M = manifold;

problem.cost = @(x) trace(x'*(diag_of(x)*A_2'*diag_of(x))*x);

problem.egrad = @(x) grad_of(A_1,A_2,x);

checkgradient(problem);

[X, xcost, info, options] = barzilaiborwein(problem);

figure;

semilogy([info.iter], [info.gradnorm], '.-');

xlabel('Iteration number');

ylabel('Norm of the gradient of f');

function D=diag_of(X)

D= diag(1 ./ sqrt(diag(X*X')));

end

function G=grad_of(A_1,A_2,X)

Y=X*X';

D= diag(1 ./ sqrt(diag(Y)));

H_2=D*A_2*D;

F=D^2.*(H_2*Y);

S= H_2 -F;

G=2*S*X;

end

********************

the slop gives around 2 (1.99991), so, the gradient is ok, But it doesn't converge, I've tried multiple solvers, but with the same result. Why does it fail in this case?

Thank you in advance,

Jul 20, 2023, 4:53:52 PMJul 20

to Manopt

Hi,

Best,

I tried your code and it puzzled me too. My best guess is that perhaps the cost function is not differentiable everywhere and that the optimizer gets stuck in regions where the cost indeed isn't differentiable.

To test this hypothesis, I modified the call to the solver as follows:

options.statsfun = statsfunhelper('mindiag', @(x) min(abs(diag(x*x'))));

[X, xcost, info, options] = trustregions(problem, [], options);

[X, xcost, info, options] = trustregions(problem, [], options);

This makes it so that, at each iteration, the solver records the minimum absolute value of the diagonal entries of x*x', and stores that number in the info struct-array.

Now, display the results as:

semilogy([info.mindiag])

and you will see that this quantity becomes dangerously close to zero. Unfortunately, that means x comes too close to a region where the cost function does not have a gradient (because your cost function divides by diag(x*x')).

You may want to reconsider the cost function to make it smooth. Alternatively, you could try nonsmooth solvers; I tried pso and neldermead, but they don't work on Stiefel (need a distance or a log function, which aren't known); I tried rlbfgs which often behaves well on nonsmooth functions, but here it goes to points with very small values on diag(x*x') even faster -- but perhaps that's not such a bad thing from the optimization point of view; this I cannot assess.

Best,

Nicolas

Jul 20, 2023, 5:42:22 PMJul 20

to Manopt

Thank you for response,

I forgot about the non differentiability part,

How Can I make it smooth? If you have some references to read, that would be great.

How Can I make it smooth? If you have some references to read, that would be great.

P.S, You can forget about the provate message I have sent earlier.

Best,

Jul 21, 2023, 9:41:03 AMJul 21

to Manopt

It's hard to tell what would make sense for your particular application; one option is to regularize by adding a small positive constant to diag(X*X') before your compute the inverse square root. Effectively, this replaces functions of the form 1/sqrt(z) by 1/sqrt(1e-4 + z) for example (I chose 1e-4 at random here).

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu