Problems calculating autodiff gradient

51 views
Skip to first unread message

Evan Zalys

unread,
Jan 15, 2025, 12:16:14 PMJan 15
to Ceres Solver
Hi all,

I'm trying to do some fairly basic curve fitting in Ceres. Suppose the following residual

struct SinusoidalResidual {
  static constexpr std::size_t num_parameters = 4;
  static constexpr std::array parameter_names = {"amplitude"sv, "frequency"sv, "phase"sv, "offset"sv};

  SinusoidalResidual(double x, double y) : x_(x), y_(y) {}

  template <typename T>
  bool operator()(const T* const amplitude, const T* const frequency, const T* const phase,
                  const T* const offset, T* residual) const {

    residual[0] = y_ - (0.5*amplitude[0] * sin(2*std::numbers::pi*frequency[0] * x_ + phase[0]) + offset[0]);
    return true;
  }

 private:
  // Observations for a sample.
  const double x_;
  const double y_;
};

When trying to fit some data with check_gradients on, I get the following --

.iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  1.395790e+01    0.00e+00    1.07e+02   0.00e+00   0.00e+00  1.00e+04        0    5.91e-01    6.37e-01
Iteration: 0 cost: 13.9579
Amplitude: 0.2, Frequency: 8.4, Phase: 0.203, Offset: 0.304

Gradient Error detected!
Extra info for this residual: Residual block id 207; depends on parameters [0x560bb1a36b50, 0x560bb1a36b58, 0x560bb1a36b60, 0x560bb1a36b68]

Detected 1 bad Jacobian component(s). Worst relative error was 0.769602.

========== Jacobian for block 0: (1 by 1)) ==========
 block  row  col        user dx/dy    num diff dx/dy         abs error    relative error         parameter          residual
     0    0    0          0.498173          0.498173                 0                 0               0.2         0.0505298
========== Jacobian for block 1: (1 by 1)) ==========
 block  row  col        user dx/dy    num diff dx/dy         abs error    relative error         parameter          residual
     1    0    0        -0.0174482       -0.00402002         0.0134282          0.769602               8.4         0.0505298 ------ (1,0,0) Relative error worse than 1e-08
========== Jacobian for block 2: (1 by 1)) ==========
 block  row  col        user dx/dy    num diff dx/dy         abs error    relative error         parameter          residual
     2    0    0       -0.00854044       -0.00854044       9.19403e-16       1.07653e-13             0.203         0.0505298
========== Jacobian for block 3: (1 by 1)) ==========
 block  row  col        user dx/dy    num diff dx/dy         abs error    relative error         parameter          residual
     3    0    0                -1                -1       1.11022e-16       1.11022e-16             0.304         0.0505298

Gradient Error detected!
Extra info for this residual: Residual block id 208; depends on parameters [0x560bb1a36b50, 0x560bb1a36b58, 0x560bb1a36b60, 0x560bb1a36b68]


Indeed when running the solver, it tends to run off into nowhere. Typically the amplitude component gets driven to zero and the frequency goes somewhere crazy. What might be going on?

Best wishes,
Evan

Sameer Agarwal

unread,
Jan 15, 2025, 1:16:36 PMJan 15
to ceres-...@googlegroups.com
Hi Evan,

Your autodiff functor looks fine. We use numeric differentiation to check the user provided derivatives (regardless of where they come from). so it is possible that in this case the numeric derivatives being computed have an error in them. 
The functor is simple enough that you should be able to check the analytic derivatives by hand to see which of the two is causing problems. I would be quite surprised if autodiff has a bug here.
As for the solver running off to nowhere, if the residual is going down as it goes to nowhere, then you have a local minima in that region, but if the residual is not changing much and its going into nowhere land, then your problem may actually be ill-posed, where there is a basin of parameters, possibly infinite where the data has more or less the same fit.

Sameer



--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ceres-solver/d04ec3d6-9625-4c26-bb8d-cf08c0c71648n%40googlegroups.com.

Evan Zalys

unread,
Jan 17, 2025, 1:49:53 PMJan 17
to Ceres Solver
Calculating the derivatives analytically sounds like a reasonable idea, I can give that a shot. That said, using scipy.optimize.curve_fit for the exact same objective function, and testing initial conditions with the functor I've shared above and with curve_fit finds that curve_fit converges frequently when ceres and my functor do not. Specifically this:

def offset_sine(x: np.ndarray, amplitude: float, frequency: float, phase: float, offset: float) -> np.ndarray:
    return offset + 0.5 * amplitude * np.sin(2 * np.pi * frequency * x + phase)

curve_fit(offset_sine, x, truth, p0=p0)


There's some missing piece of the puzzle here. I'm not sure if the problem is ill-posed or not, but somehow curve_fit is dealing with it better.

Best wishes,
Evan

Sameer Agarwal

unread,
Jan 17, 2025, 2:33:35 PMJan 17
to ceres-...@googlegroups.com

Do you have some sample data and code you can share with me?


Evan Zalys

unread,
Jan 24, 2025, 4:09:46 PMJan 24
to Ceres Solver
It's part of a bigger project I can't share right now, so I'll have to isolate the section that's showing this behavior and post it -- which I think will be worth it. Thank you a ton for the help. That said, I'll mention that I checked the derivatives by hand and they're all correct. Who coulda guessed. I'll attach a run with the intermediate params dumped out.

If I generate data using the parameters amplitude=0.102, frequency=0.404, phase=0.203, offset=0.304, the solver will proceed as follows

Iteration: 0 cost: 0.441371 Amplitude: 0.0857004, Frequency: 0.481405, Phase: 0.235273, Offset: 0.298447
Iteration: 1 cost: 0.240721 Amplitude: 0.00438387, Frequency: 0.480992, Phase: -0.476709, Offset: 0.310304
Iteration: 2 cost: 0.485988 Amplitude: 0.00438387, Frequency: 0.480992, Phase: -0.476709, Offset: 0.310304
Iteration: 3 cost: 0.48616 Amplitude: 0.00438387, Frequency: 0.480992, Phase: -0.476709, Offset: 0.310304
Iteration: 4 cost: 0.487111 Amplitude: 0.00438387, Frequency: 0.480992, Phase: -0.476709, Offset: 0.310304
Iteration: 5 cost: 0.489607 Amplitude: 0.00438387, Frequency: 0.480992, Phase: -0.476709, Offset: 0.310304
Iteration: 6 cost: 0.231591 Amplitude: 0.0389326, Frequency: -0.155596, Phase: 1.45354, Offset: 0.309244
Iteration: 7 cost: 0.226107 Amplitude: 0.0328169, Frequency: -0.136398, Phase: 1.08505, Offset: 0.312284
Iteration: 8 cost: 0.224782 Amplitude: 0.0365051, Frequency: -0.123301, Phase: 0.841453, Offset: 0.314294
Iteration: 9 cost: 0.22387 Amplitude: 0.0416112, Frequency: -0.113275, Phase: 0.655842, Offset: 0.316804
Iteration: 10 cost: 0.223103 Amplitude: 0.048511, Frequency: -0.1018, Phase: 0.43861, Offset: 0.32036
Iteration: 11 cost: 0.222588 Amplitude: 0.0559479, Frequency: -0.0937488, Phase: 0.287256, Offset: 0.324101
Iteration: 12 cost: 0.222188 Amplitude: 0.065573, Frequency: -0.0847778, Phase: 0.113814, Offset: 0.329006
Iteration: 13 cost: 0.221906 Amplitude: 0.0748036, Frequency: -0.0790416, Phase: 0.0038673, Offset: 0.333645
Iteration: 14 cost: 0.221687 Amplitude: 0.0874053, Frequency: -0.0719109, Phase: -0.136117, Offset: 0.340015
Iteration: 15 cost: 0.221519 Amplitude: 0.098203, Frequency: -0.0678837, Phase: -0.214161, Offset: 0.345421
Iteration: 16 cost: 0.22139 Amplitude: 0.115207, Frequency: -0.0616707, Phase: -0.337335, Offset: 0.35398
....

So it seems to drop, come back up, go down, and stagnate. Again what's most curious is scipy.optimize.curve_fit seems to have no problem with this set of initial conditions. If this isn't enough more information to be helpful, I'll write a minimal test case and post it on github, with sincerest appreciation for your help.

Best wishes,
Evan

Evan Zalys

unread,
Jan 24, 2025, 5:17:59 PMJan 24
to Ceres Solver
Sorry for the double reply but it appears as if I've resolved the problem by making options.initial_trust_region_radius = 1; Will probably experiment with this a bit. I'm curious how the 1e4 was chosen, if I'm band-aiding over something by doing this, these sorts of questions.

Best wishes,
Evan

On Friday, January 17, 2025 at 12:33:35 PM UTC-7 sandwi...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages