Checkgradient is good while It doesn't converge

Skip to first unread message

Alaeddine Zahir

Jul 20, 2023, 1:39:23 PMJul 20
to Manopt
I have a problem of the no convergance of the model in the stiefle manifold,
Here is the code:
n = 20;
p = 5;
A_2 = randn(n);
A_2 = .5*(A_2+A_2');
manifold = stiefelfactory(n,p);
problem.M = manifold;
 problem.cost  = @(x) trace(x'*(diag_of(x)*A_2'*diag_of(x))*x);
problem.egrad = @(x) grad_of(A_1,A_2,x);      
 [X, xcost, info, options] = barzilaiborwein(problem);
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

function D=diag_of(X)
 D= diag(1 ./ sqrt(diag(X*X')));
function G=grad_of(A_1,A_2,X)
    D= diag(1 ./ sqrt(diag(Y)));    
    S= H_2 -F;
the slop gives around 2 (1.99991), so, the gradient is ok, But it doesn't converge, I've tried multiple solvers, but with the same result. Why does it fail in this case?
Thank you in advance,

Nicolas Boumal

Jul 20, 2023, 4:53:52 PMJul 20
to Manopt

I tried your code and it puzzled me too. My best guess is that perhaps the cost function is not differentiable everywhere and that the optimizer gets stuck in regions where the cost indeed isn't differentiable.

To test this hypothesis, I modified the call to the solver as follows:

options.statsfun = statsfunhelper('mindiag', @(x) min(abs(diag(x*x'))));
[X, xcost, info, options] = trustregions(problem, [], options);

This makes it so that, at each iteration, the solver records the minimum absolute value of the diagonal entries of x*x', and stores that number in the info struct-array.

Now, display the results as:


and you will see that this quantity becomes dangerously close to zero. Unfortunately, that means x comes too close to a region where the cost function does not have a gradient (because your cost function divides by diag(x*x')).

You may want to reconsider the cost function to make it smooth. Alternatively, you could try nonsmooth solvers; I tried pso and neldermead, but they don't work on Stiefel (need a distance or a log function, which aren't known); I tried rlbfgs which often behaves well on nonsmooth functions, but here it goes to points with very small values on diag(x*x') even faster -- but perhaps that's not such a bad thing from the optimization point of view; this I cannot assess.


Alaeddine Zahir

Jul 20, 2023, 5:42:22 PMJul 20
to Manopt
Thank you for response,
I forgot about the non differentiability part,
How Can I make it smooth? If you have some references to read, that would be great.

P.S, You can forget about the provate message I have sent earlier.

Nicolas Boumal

Jul 21, 2023, 9:41:03 AMJul 21
to Manopt
It's hard to tell what would make sense for your particular application;  one option is to regularize by adding a small positive constant to diag(X*X') before your compute the inverse square root. Effectively, this replaces functions of the form 1/sqrt(z) by 1/sqrt(1e-4 + z) for example (I chose 1e-4 at random here).
Reply all
Reply to author
0 new messages