Some Questions About the checkgradient Function

Zhao Xingyu

unread,

Dec 1, 2025, 2:04:02 AM12/1/25

to Manopt

Hello,

I have encountered some issues while using the checkgradientfunction. The results vary across multiple runs. For example:

During one execution, the output was: # Gradient check The slope should be 2. It appears to be: 1.99973. If it is far from 2, then the gradient might be erroneous. The gradient at x must be a tangent vector at x. If so, the following number is zero up to machine precision: 3.70803e-16. If it is far from 0, the gradient is not tangent.

This suggests that the gradient calculation appears correct.

However, when I ran the function again, the result changed to: # Gradient check The slope should be 2. It appears to be: 1.00084. If it is far from 2, then the gradient might be erroneous. The gradient at x must be a tangent vector at x. If so, the following number is zero up to machine precision: 5.18517e-16. If it is far from 0, the gradient is not tangent.

This now indicates a potential issue with the gradient.

I am confused about why such inconsistencies occur. Could you help clarify whether my gradient derivation is actually correct?

Thank you for your assistance.

Sincerely, Xingyu Zhao

Nicolas Boumal

unread,

Dec 1, 2025, 4:33:43 AM12/1/25

to Manopt

Hello,

This could have several causes. Here are a few:

One possibility is that the function may be differentiable only at some points / along some directions, but not globally. Is that possible for your cost function?

Another possibility is that the cost function, as implemented, may not be deterministic. Is there any randomness used in the computation of f and grad f?

Yet another possibility is that caching may be inconsistent. Are you using caching to reduce redundant computations (e.g., via the store structure)?

By default, calling checkgradient(problem) will pick a point x and a tangent vector v at random, to then perform the check along a retraction curve c(t) = retraction(x, tv). What happens if you fix x and v yourself, then call checkgradient(problem, x, v)? Do you still see different outcomes when you call this code twice in a row?

Best,

Nicolas

Zhao Xingyu

unread,

Dec 2, 2025, 6:24:39 AM12/2/25

to Manopt

Thank you for your reply. Based on your suggestions, I have reviewed my code and the optimization problem. My observations are as follows:

1.
My function is globally differentiable with respect to the optimization variables.
2.
My function is deterministic and does not involve any stochastic components.
3.
I performed the gradient check at a fixed point (using checkgradient(problem, elems_ini)), and the issue still occurs.

My optimization problem is built using the productmanifold operation:

elems.z = euclideanfactory(n);
elems.Y = spherecomplexfactory(n, m);
manifold = productmanifold(elems);
problem.M = manifold;
problem.cost = @(elems) cost_func(para, elems);
problem.egrad = @(elems) grad_func(para, elems);

It is worth noting that within the grad_func, I applied the chain rule: I first computed the derivative with respect to x, and then multiplied by cos(z). I suspect whether the non-one-to-one mapping of x = sin(z) could be affecting my gradient. Does this possibility exist?

Zhao Xingyu

unread,

Dec 2, 2025, 6:28:32 AM12/2/25

to Manopt

Nicolas Boumal

unread,

Dec 2, 2025, 7:16:20 AM12/2/25

to Manopt

Interesting: the blue curve does have the right slope on some interval of values of t in both plots. Based on these plots, to me, the checkgradient is a success for both case. The orange segment is added by the checkgradient tool afterwards as a effort to try to find an interval over which the blue curve is as flat as possible, but in this case (left figure) there happen to be two fairly flat pieces in the blue curve, so it randomly picks up the wrong one.

In the end, what matters is that the blue curve should have the right slope over some (not too small) interval, and this is indeed the case in both plots. All good.

I see the error goes into very high values (~10^5) for large t: this may indicate that the function value f itself is huge at random points. It may help to normalize f. For example, if f is a sum over n terms, then it may be a good idea to divide f by n (then also the gradient and Hessian of course).

Reply all

Reply to author

Forward