Gradient check + grad norm + Initial values

252 views
Skip to first unread message

Lu Wang

unread,
May 23, 2022, 1:35:59 PM5/23/22
to Manopt
Hi Nicolas,

Many thanks to your contributions on ManOpt! I have three questions that trouble me a lot. I would appreciate that you can provide some suggestions for me. Thanks a lot!

1) Gradient check
When I use checkgradient( ) function, the result shows like the following picture.
1.png
And the comment are says: 
The slope should be 2. It appears to be: 1.
If it is far from 2, then directional derivatives might be erroneous.
The residual should be 0, or very close. Residual: 4.20787e-13.
If it is far from 0, then the gradient is not in the tangent space.

In certain cases (e.g., hyperbolicfactory), the tangency test is inconclusive.
My question is: What does it mean by "over at least a few orders"? Is it right if I use the results of the slope 1 instead of 2?

2)grad norm
Given different problem settings (e.g. parameters, random variables), the grad norm sometimes goes up and sometimes goes down. like,
2.png3.png
Does this matter? Or as long as the grad norm converges, going up and going down are both okay?

3) Initial values
In my usage of manopt (spherecomplexfactory & conjugategradient, circlecomplexfactory & conjugategradient), the initialization seems to impact the optimization a lot. The obtained x are just some points around the initial values x0. In this case, my optimization problem is actually not solved even if the grad norm has converged.

Looking forward to your reply!
Best regards
Lu


Ronny Bergmann

unread,
May 24, 2022, 7:36:06 AM5/24/22
to Manopt
Hi Lu,
I can also answer here (because I did the gradient check in Julia).

1) the slope should be 2 (the dashed line), otherwise it is an indicator that you have a coding error in your gradient function. Over a few orders of magnitude means, that the blue and the dashed line should be parallel for quite some time (for small and large values rounding error for example might affect the result). A slope of 1 - as I said – is an indicator that either your gradient or your cost is wrong (i.e. the do not match).

2) The gradient norm should go to zero, since you are aming to find a critical point (with gradient norm zero).

3) Yes on the sphere you have no globally (geodesically) convex functions, so the local. minimum you run into depends on your starting points (initialisation), but you should (with a right gradient, see 1)) have convergence – depending a little bit on your cost, gradient and which solver you use.

Lu Wang

unread,
May 24, 2022, 8:20:31 AM5/24/22
to Manopt
Hi Ronny,

Thanks so much for your quick reply! And your answers let me know that there is something wrong with the derivation of my problem. (My objective/cost function is very complicated.)
Since I am kind of stuck in the simulation for a long time, your reply provides me new inspirations of debugging. So thanks again! Danke schön:)

Best regards,
Lu

Lu Wang

unread,
May 31, 2022, 9:28:26 AM5/31/22
to Manopt
Hi Ronny,

I have a follow-up question about the Gradient check.

As you explained, If I use the checkgradient() and the slope shows 1 instead of 2, my cost function and Euclidean gradient do not match.

I think my derivation is not wrong, but the slope still shows 1. The details are shown below:

In my paper, the Cost function is minWeChat Image_20220531150229.png(W and Rideal are both matrices. W is variable. Rideal is known and Hermitian.)
The Euclidean gradient2.png.

I also use the http://www.matrixcalculus.org/ to check if my derivation is right or not. It shows the same result.

4.png

And this is how I programme:

problem.cost  = @(x) norm(W*W'-Rideal,'fro')^2;
problem.egrad = @(x) 4*(W*W'-Rideal)*W;

problem.cost  = @(x) norm(W*W'-Rideal,'fro')^2;
problem.egrad = @(x) 2*(W*W'-Rideal)*W + 2*(W*W'+(-Rideal)')*Wnorm;

I tried both these two ways of programming. But the results of using checkgradient() show the same(The two lines are not parallel), namely:

The slope should be 2. It appears to be: 1.
If it is far from 2, then directional derivatives might be erroneous.


This makes me quite confused. Could you help me have a look what could be the reason when you have time?
Thanks so much!

Best regards,
Lu Wang

Nicolas Boumal

unread,
Jun 1, 2022, 7:41:28 AM6/1/22
to Manopt
Hello Lu,

Instead of  @(x), you want to write @(W) in the definitions of problem.cost and problem.egrad.

This should work then (the code below passes the checkgradient test):

clear all; close all; clc;
n = 5;
Rideal = randn(n) + 1i*randn(n);
Rideal = Rideal + Rideal';
problem.M = euclideancomplexfactory(n, n);
problem.cost = @(W) norm(W*W'-Rideal,'fro')^2;
problem.egrad = @(W) 4*(W*W'-Rideal)*W;
checkgradient(problem);


Best,
Nicolas

Lu Wang

unread,
Jun 1, 2022, 9:46:36 AM6/1/22
to Manopt
Hi Nicolas,

Thanks so much for your quick reply and also pointing out this @(x) error. I did not realize it before. This helps me a lot!
I will learn more basics in both math and programming. Have a nice day!

BR,
Lu

Lu Wang

unread,
Jun 7, 2022, 6:11:32 AM6/7/22
to Manopt
Hi Nicolas,

Hope everything's going well with you. I have 2 new questions about the manopt library:

1)  I use manifold optimization to optimize 3 variables. Their grad norm results generally show a decreasing tendency and are finally close to 0, shown as below:
w.jpgwrf.jpgv.jpg
My question is: Compared to the grad norm of v (the third figure), although the grad norm of W and WRF (the first two figures)  generally show a downward trend,  they fluctuate up and down a lot.
Do you know what this implies/indicates? For example, the optimization of W and WRF are not stable?

2) If the data structure of the variable is block diagonal, shown as below. In other words, the matrix is sparse, containing a lot of zeros.
block diagonal.png
Do you know if the manopt library can deal with this sparse matrix? I mean there would definitely be a optimization result. But if I optimize for each block of this sparse matrix using the same algorithm, the final result may be better than the one optimized with this sparse matrix.
Do you have any suggestions for the optimization of the sparse matrix as a variable?

Thanks a lot!

Best regards,
Lu Wang

Nicolas Boumal

unread,
Jun 7, 2022, 7:11:06 AM6/7/22
to Manopt
Hello Lu,

For the first point, it is not uncommon to see the gradient norm oscillate as the algorithm progresses, especially so with product manifolds. However, in the case of the plots you showed, the fluctuations are indeed quite large, and the overall downward trend is quite slow. I imagine you already ran a checkgradient and it all checks out. Did you try various algorithms?

For the second point: if your matrix is just block diagonal with, say, 2 diagonal blocks of size nxn each, then it may be easiest to use euclideanfactory(n, n, 2) and to optimze agains those blocks directly. If your sparsity pattern is more general than that, it's a bit tricky in Matlab because of how sparse matrices are stored, but you could still try to use euclideansparsefactory

Best,
Nicolas

Lu Wang

unread,
Jun 7, 2022, 8:32:55 AM6/7/22
to Manopt
Hi Nicolas,

Thanks so much for your quick reply! 
First, there are some supplements: 
1) All these three variables passed the checkgradient(). And the block diagonal matrix is a general one, not just 2 blocks.
2) For the 3 figures in my last email, 
Variable W is a block diagonal matrix. I use spherecomplexfactory conjugategradient to obtain the results.
Variable WRF is a block diagonal matrix. I use complexcirclefactoryconjugategradient to obtain the results.
Variable v is just a vector. I use complexcirclefactoryconjugategradient to obtain the results.

When you asked if I tried other algorithms, I think it refers to manifold algorithms in the library? So I tested some other algorithms (which need cost function and euclidean gradient) with results shown below:
  • barzilaiborwein
barzilaiborwein-W.jpgbarzilaiborwein-WRF.jpg
  • steepestdescent
rlbfgs-W.jpgsteepestdescent-WRF.jpg
  • rlbfgs
rlbfgs-W.jpgrlbfgs-WRF.jpg
According to these results, it seems that the algorithm do influence the optimization.
For variable W, the rlbfgs seems to work better than others.
For variable WRF, it seems all of them go down quite slowly.

Do you have any suggestions? For example, should I adjust the tolgradnorm/maxiter/maxtime/minstepsize, etc., instead of using the default values?

3) For the block diagonal matrix, I checked euclideansparsefactory. But I have fixed problem definition. For example, the amplitude of each variable should be 1. So I use complexcirclefactory.
So I am thinking to seperate the block diagonal matrix using iteration/loop to avoid the sparsity. How do you think of that?

Thanks again!

Best regards,
Lu Wang

Nicolas Boumal

unread,
Jun 7, 2022, 9:28:49 AM6/7/22
to Manopt
Interesting that the behavior varies so wildly, and that rlbfgs seems fairly good.

Can you also try with trustregions as the optimization algorithm?

> So I am thinking to seperate the block diagonal matrix using iteration/loop to avoid the sparsity.

That sounds good.

Lu Wang

unread,
Jun 7, 2022, 11:14:28 AM6/7/22
to Manopt
Hi Nicolas,

I tried trustregions (without providing Hessian derivative). Only the second variable WRF works well and the fluctuation disappears:)
trustregions-WRF.jpg
Thanks for that. I will test more.

Best regards,
Lu
Reply all
Reply to author
Forward
0 new messages