Gradient checking with a function of a random variable

66 views
Skip to first unread message

Ryan Harvey

unread,
Apr 16, 2025, 1:06:11 PMApr 16
to Manopt
Hello All,

I have a question about the gradient checker when the cost-function involves a random variable (e.g., solving a non-linear least-squares maximum likelihood problem for a parameter on a Riemannian manifold).

I've gone through and checked that my function for the gradient is correct but I'm sometimes getting that the gradient checker thinks it is way off. The majority of the time it shows the gradient as being very accurate (e.g., below).
ExampleWithPassingCheckGradient.png
This has a slope of t being 2 up until the 5th order of magnitude and the tangent check at 1e-14, so it is reasonably correct.

However, occasionally (about once every 15 or 20 trials) I will get a realization of a sample where the gradient checker suggests that it is very off (e.g., below):

ExampleWithFailingCheckGradient.png

In this case, the slope for t is 1.0002 in the highlighted section, so exactly half of the desired slope. The tangency check shows that the number is so accurate that it just displays 0. instead of a specific floating point precision (which I'm guessing just means that it is at whatever epsilon machine precision is for Matlab).

These are both calling the same (costgrad) function on the same manifold, and the only difference is in the realization of the random variable in the cost function. 

Does anyone have any guesses as to what is going on here? 

Best,
Ryan 

Bamdev Mishra

unread,
Apr 17, 2025, 12:05:36 AMApr 17
to Ryan Harvey, Manopt
Might be a numerical issue. 

Best,

--
http://www.manopt.org
---
You received this message because you are subscribed to the Google Groups "Manopt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manopttoolbo...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/manopttoolbox/e5b4b2b1-6970-4788-ba4f-8bd03bedf28bn%40googlegroups.com.

Nicolas Boumal

unread,
Apr 17, 2025, 8:59:49 AMApr 17
to Manopt
Interesting situation. Can you tell us more about the manifold you are working on (which factory), and the cost function itself?

Ryan Harvey

unread,
Apr 17, 2025, 9:44:58 AMApr 17
to Manopt
The manifold is a product manifold with a 2-sphere and a 2D Euclidean space. It is similar to my previous post, and this case is actually a special case of the physics derived system I was using in that post (and curiously, in the "exact" non-linear cost function I don't have this issue). There's a direction vector x, a distance value d, and a velocity v (whose direction is -x, so we don't need to find it). d and v are constrained, so I use a smooth reparameterization (as suggested by Nicolas previously, thank you for that), where d = eta(1)^2, and v = (v_max+v_min)/2 + (v_max-v_min)/2 sin(eta(2)), such that the Euclidean factory doesn't need to be directly bounded. 

The cost function is one that myself and my PI have been calling "pseudo-linear" (though, if there's another name that makes more sense in a smooth manifolds context, I'm all ears). I've attached a screenshot of it, but it's basically:

f(x,d,v|z) = (1/2)(Ax + b(x)d - vz)^T Sigma^(-1) (Ax + b(x)d -vz)

where A is a Nx3 known matrix of rank 3, b(x) is a Nx1 vector where each entry is an inner product b_i = <y_i, y_i>_x where each y_i is a known 3x1 vector, and z is a normally distributed random variable on R^N with a known covariance matrix and whose mean we are approximating by (Ax + b(x)d)/v.

I've double checked that my Euclidean gradient is correct analytically, and that the Manopt implementation does the projection onto the tangent plane correctly (for the x components). It also generally passes the checkgradient testing, but every once in a while I'll get a sample of z where the slope of t is very close to 1.00 instead of the 2.00 that it's supposed to have.

Best,
Ryan 
Screenshot from 2025-04-17 09-39-05.png

Ryan Harvey

unread,
Apr 17, 2025, 9:50:44 AMApr 17
to Manopt
Note that in the above, there is a composition that is hidden, so that the implemented cost function is really:

f(x,eta|z) = (1/2)(Ax + b(x)*d(eta) - v(eta)z)^T Sigma^(-1) (Ax + b(x)d(eta)-v(eta)z)

I didn't feel like writing out the composition before. However, I should specify that it is there, and in the implementation the chain-rule from that composition is followed during the gradient calculation.

Nicolas Boumal

unread,
Apr 17, 2025, 10:02:46 AMApr 17
to Manopt
The manifold is certainly innocuous enough (if you had been optimizing on an incomplete manifold, then one explanation might have been that from time to time the gradient check makes you look at f along a retracted curve that "falls off" the manifold; but that's not possible here).

As for the cost function itself, I was thinking that your situation might arise if there are if/else branches or some other sources of nonsmoothness, but I don't see this here from the description.

I imagine you do a "clear all" at the beginning of every run?

Your point about randomness should, in principle, not matter. I mean this in the sense that once the random data has been generated, then it is fixed from that point on, and the cost function itself should be deterministic (from the standpoint of Manopt). The only situation I might imagine here is that perhaps from time to time the random samples make it so that the cost function is not differentiable, but it's not obvious from the cost description. -- Sigma is always invertible, I presume?

Ryan Harvey

unread,
Apr 17, 2025, 10:14:05 AMApr 17
to Manopt
There's no meaningful if/else branches in the cost function, so I don't think that is the reason. The only if/else branch is deciding whether to re-normalize the cost-function by dividing it by the square-root of the maximum diagonal of Sigma, but that is always on and doesn't depend on the state or input variables, so it shouldn't change. It's just an additional option I add to an options struct.

I do clear everything except for the final saved result between trials (which goes in a separate struct that Manopt never touches), so it shouldn't be holding onto anything from the last iteration in terms of the problem setup.

The Sigma matrix is simply Sigma = (sigma)^2 *(eye(N) + rho * (ones(N) - eye(N))), so we have a matrix with sigma^2 on the diagonals and rho*sigma^2 on the off-diagonals, with rho being 0.05 in the cases I tested. In principle this rho could be anything between (-1,1). The 0.05 is just coming from some empirical testing on real z data, but in practice we could have arbitrary off-diagonal correlations provided they normalize properly. Sigma should always be invertible and is initialized the same way on every trial.

I will say from testing it appears that the dependence on x "dominates" (in a non-formal way) such that being wrong in the direction vector costs you much more than being wrong in d or v. That's (unfortunately) just inherent to the geometry of A and the y_i's, but it should always be a positive definite cost-function. There's nowhere on the manifold that it should become indefinite or negative definite. 

Ryan Harvey

unread,
Apr 17, 2025, 10:15:14 AMApr 17
to Manopt
I forgot to add. When I renormalize, I divide both the cost and the gradient by the same value. Those are done in the same if conditional branch, so if it normalizes the cost function, it has to normalize the gradient. 

Nicolas Boumal

unread,
Apr 22, 2025, 2:35:29 AMApr 22
to Manopt
> dependence on x "dominates" (in a non-formal way) such that being wrong in the direction vector costs you much more than being wrong in d or v.

If you want to test whether the gradient is perhaps wrong for some of the variables but not all (in a way that wouldn't obviously appear because of this dominance you mentioned), you might try the following:

a) when calling checkgradient, you can specify a point x and a direction u. You might choose the direction u to be "0" along some of the variables, so that (depending on your retraction) they would be left unchanged along the retraction curve that is used for the finite-difference-based gradient check.

b) you could force some of the variables to remain constant using the constantfactory as a replacement for parts of your productmanifold.

Reply all
Reply to author
Forward
0 new messages