Autodiff converges slower than numeric differentiation

Hannes Ovrén

unread,

Jun 23, 2016, 5:25:23 AM6/23/16

to Ceres Solver

Hi!

I am using Ceres to optimize the control knots of a spline in SE(3), and I am having problems with using automatic differentiation.

In my simple test case, using numeric differentiation the solution converges quickly (~4 iterations), but very slowly when using automatic differentiation (~40 iterations).

The time for each iteration is also 2-3 times slower using autodiff.

Here is the output of my simple testcase:

------------------------- NUMERIC ------------------------------------
Solving...
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  4.539716e+02    0.00e+00    1.94e+00   0.00e+00   0.00e+00  1.00e+04        0    2.03e-02    2.50e-02
   1  4.278010e+01    4.11e+02    3.65e+00   2.30e+01   9.06e-01  2.15e+04        1    2.74e-02    5.24e-02
   2  3.885080e-02    4.27e+01    2.86e-01   6.42e+00   9.99e-01  6.45e+04        1    2.62e-02    7.86e-02
   3  3.527025e-07    3.89e-02    7.94e-04   1.98e-01   1.00e+00  1.93e+05        1    2.57e-02    1.04e-01

Solver Summary (v 1.11.0-eigen-(3.2.8)-lapack-suitesparse-(4.4.5)-cxsparse-(3.1.4)-openmp)

                                     Original                  Reduced
Parameter blocks                           10                       10
Parameters                                 70                       70
Effective parameters                       60                       60
Residual blocks                             1                        1
Residual                                  206                      206

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver threads                       1                        1
Linear solver ordering              AUTOMATIC                     1, 9

Cost:
Initial                          4.539716e+02
Final                            3.527025e-07
Change                           4.539716e+02

Minimizer iterations                        3
Successful steps                            3
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                           0.0046

  Residual evaluation                  0.0008
  Jacobian evaluation                  0.0958
  Linear solver                        0.0036
Minimizer                              0.1006

Postprocessor                          0.0000
Total                                  0.1052

Termination:                      CONVERGENCE (Parameter tolerance reached. Relative step_norm: 2.247801e-05 <= 1.000000e-04.)



------------------------- AUTO ---------------------------------------
Solving...
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  4.539716e+02    0.00e+00    1.94e+00   0.00e+00   0.00e+00  1.00e+04        0    5.42e-02    5.43e-02
   1  4.278010e+01    4.11e+02    8.92e+01   2.30e+01   9.06e-01  2.15e+04        1    5.73e-02    1.12e-01
   2  6.968391e+00    3.58e+01    3.20e+01   3.48e+01   9.57e-01  6.45e+04        1    6.13e-02    1.73e-01
   3  4.608782e+00    2.36e+00    1.36e+02   2.23e+01   7.94e-01  8.08e+04        1    5.94e-02    2.32e-01
   4  4.003150e+00    6.06e-01    1.34e+02   5.62e+00   9.54e-01  2.42e+05        1    5.73e-02    2.90e-01
   5  3.269654e+00    7.35e-01    3.39e+02   5.00e+00   9.04e-01  5.12e+05        1    5.67e-02    3.46e-01
   6  2.450740e+00    8.19e-01    1.98e+02   2.60e+00   9.88e-01  1.54e+06        1    5.78e-02    4.04e-01
   7  1.002151e+00    1.45e+00    3.95e+02   7.26e+00   9.76e-01  4.61e+06        1    5.90e-02    4.63e-01
   8  8.544217e-01    1.48e-01    9.16e+01   1.82e+00   9.53e-01  1.38e+07        1    5.79e-02    5.21e-01
   9  6.177940e-01    2.37e-01    1.13e+02   5.42e+00   9.15e-01  3.23e+07        1    5.72e-02    5.78e-01
  10  1.803505e-01    4.38e-01    1.56e+02   3.84e+00   9.83e-01  9.68e+07        1    5.74e-02    6.36e-01
  11  7.534106e-02    1.05e-01    7.31e+00   1.82e+00   9.69e-01  2.91e+08        1    5.69e-02    6.93e-01
  12  1.973327e-02    5.56e-02    5.18e+02   7.24e+00   8.71e-01  4.90e+08        1    5.81e-02    7.51e-01
  13  1.973327e-02    0.00e+00    5.18e+02   0.00e+00   0.00e+00  2.45e+08        1    9.01e-04    7.52e-01
  14  1.973327e-02   -1.03e-04    0.00e+00   7.45e-01  -6.68e-02  6.13e+07        1    1.08e-03    7.53e-01
  15  1.880588e-02    7.31e-04    1.51e+02   7.17e-01   2.85e-01  5.68e+07        1    5.82e-02    8.11e-01
  16  1.691065e-02    1.84e-03    2.61e+03   2.27e-01   8.97e-01  1.14e+08        1    5.78e-02    8.69e-01
  17  1.635031e-02    5.56e-04    1.89e+03   8.18e-02   9.53e-01  3.42e+08        1    5.81e-02    9.27e-01
  18  1.635031e-02    0.00e+00    1.89e+03   0.00e+00   0.00e+00  1.71e+08        1    8.92e-04    9.28e-01
  19  1.635031e-02    0.00e+00    1.89e+03   0.00e+00   0.00e+00  4.28e+07        1    8.10e-04    9.29e-01
  20  1.591807e-02    4.44e-04    1.43e+03   3.29e-02   9.98e-01  1.28e+08        1    6.52e-02    9.94e-01
  21  1.592116e-02    5.29e-05    3.95e+03   2.16e-01   2.09e-01  1.07e+08        1    5.93e-02    1.05e+00
  22  1.566454e-02    3.01e-04    3.46e+03   7.47e-02   6.28e-01  1.09e+08        1    5.85e-02    1.11e+00
  23  1.566454e-02    0.00e+00    3.46e+03   0.00e+00   0.00e+00  5.45e+07        1    6.06e-04    1.11e+00
  24  1.541450e-02    2.70e-04    3.13e+03   1.80e-02   7.03e-01  5.85e+07        1    5.84e-02    1.17e+00
  25  1.521305e-02    2.15e-04    2.78e+03   1.97e-02   7.47e-01  6.65e+07        1    5.81e-02    1.23e+00
  26  1.509922e-02    1.32e-04    2.71e+03   7.05e-03   6.66e-01  6.90e+07        1    5.73e-02    1.29e+00
  27  1.507768e-02    4.38e-05    2.49e+03   1.27e-02   4.44e-01  6.89e+07        1    5.82e-02    1.34e+00
  28  1.508256e-02    4.55e-06    2.52e+03   1.53e-02   6.53e-02  4.16e+07        1    5.73e-02    1.40e+00
  29  1.505078e-02    4.46e-05    2.38e+03   1.46e-02   4.65e-01  4.16e+07        1    5.89e-02    1.46e+00
  30  1.498873e-02    8.13e-05    2.26e+03   1.05e-02   6.35e-01  4.24e+07        1    5.89e-02    1.52e+00
  31  1.494304e-02    6.37e-05    2.19e+03   7.71e-03   5.82e-01  4.26e+07        1    5.81e-02    1.58e+00
  32  1.489138e-02    6.98e-05    2.07e+03   5.93e-03   6.59e-01  4.40e+07        1    5.79e-02    1.64e+00
  33  1.485574e-02    5.46e-05    2.06e+03   4.76e-03   6.23e-01  4.47e+07        1    5.84e-02    1.69e+00
  34  1.482602e-02    4.85e-05    1.98e+03   3.19e-03   6.23e-01  4.54e+07        1    5.74e-02    1.75e+00
  35  1.480335e-02    4.27e-05    1.99e+03   4.69e-03   5.90e-01  4.56e+07        1    6.35e-02    1.82e+00
  36  1.475347e-02    6.99e-05    1.88e+03   2.51e-03   7.30e-01  5.06e+07        1    5.94e-02    1.87e+00
  37  1.475347e-02    0.00e+00    1.88e+03   0.00e+00   0.00e+00  2.53e+07        1    6.19e-04    1.88e+00
  38  1.473419e-02    3.93e-05    1.83e+03   2.93e-03   6.37e-01  2.58e+07        1    5.87e-02    1.93e+00

Solver Summary (v 1.11.0-eigen-(3.2.8)-lapack-suitesparse-(4.4.5)-cxsparse-(3.1.4)-openmp)

                                     Original                  Reduced
Parameter blocks                           10                       10
Parameters                                 70                       70
Effective parameters                       60                       60
Residual blocks                             1                        1
Residual                                  206                      206

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver threads                       1                        1
Linear solver ordering              AUTOMATIC                     1, 9

Cost:
Initial                          4.539716e+02
Final                            1.473419e-02
Change                           4.539569e+02

Minimizer iterations                       38
Successful steps                           32
Unsuccessful steps                          6

Time (in seconds):
Preprocessor                           0.0001

  Residual evaluation                  0.0069
  Jacobian evaluation                  1.9002
  Linear solver                        0.0247
Minimizer                              1.9348

Postprocessor                          0.0000
Total                                  1.9349

Termination:                      CONVERGENCE (Parameter tolerance reached. Relative step_norm: 9.831943e-05 <= 1.000000e-04.)

It seems like the size of the gradient is consistenly over estimated. The above is a toy example, but I see similar behaviour on my real problem, which is more complicated.

The spline is using the SE(3) implementation provided by the Sophus package, which seems to be well know here. Specifically, I have been using the version provided by Steven Lovegrove (https://github.com/stevenlovegrove/Sophus)

since it supports ceres and provides a local parameterization.

The code for the minimal example above can be found here: https://github.com/hovren/minimal_ceres_sophus

Since this is my first attempt to use Ceres, I am not sure where to look for the problem. I am not sure if the problem is in my spline implementation, or in Sophus (or even in Ceres, but I guess that is less likely).

Any help or input is much appreciated. Thanks!

Regards,

Hannes Ovrén

Sameer Agarwal

unread,

Jun 23, 2016, 5:29:27 AM6/23/16

to Ceres Solver

first note, you should not use sparse_schur, it is only for bundle adjustment problems.

Your problem is small enough that you should switch to DENSE_QR. It is indeed odd that your jacobian evaluation is taking so long.

Are you building your code in release mode, cmake by default does debug mode, and it has a huge impact on the automatic differentiation performance.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/1375fec5-09cf-4617-ab71-24c4180fad29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hannes Ovrén

unread,

Jun 23, 2016, 5:34:51 AM6/23/16

to Ceres Solver

On Thursday, June 23, 2016 at 11:29:27 AM UTC+2, Sameer Agarwal wrote:

first note, you should not use sparse_schur, it is only for bundle adjustment problems.
Your problem is small enough that you should switch to DENSE_QR. It is indeed odd that your jacobian evaluation is taking so long.

The reason for using SPARSE_SCHUR is that my real cost function is a bundle adjustment problem.

But I tried switching to DENSE_QR, and I get the same result. So that does not seem to be the issue.

Are you building your code in release mode, cmake by default does debug mode, and it has a huge impact on the automatic differentiation performance.
Sameer

It is built in release mode.

/Hannes

Sameer Agarwal

unread,

Jun 23, 2016, 5:38:34 AM6/23/16

to Ceres Solver

I am not sure of how well the autodiff code in sophus works, so I cannot comment on it. It is possible to have autodiff code which is less efficient than computing finite differences.

I am not surprised that there is a difference in the numerical performance of the two automatic differentiation modes, but that autodiff performs so much worse is surprising. It could have something to do with how well/poorly conditioned your problem is and that you are getting lucky with the path taken by the numerical differentiation code ?

another suggestion is to rule out any Sophus related errors by explicitly using a axis-angle + translation parameterization for SE(3).

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/d602246d-117e-4196-a57b-71ab10eab28b%40googlegroups.com.

Alex Stewart

unread,

Jun 23, 2016, 6:35:20 AM6/23/16

to ceres-...@googlegroups.com

So it looks like you’re implementing Spline Fusion from Steven, Alonso & Gabe: http://www.stevenlovegrove.com/content/SplineFusion/Lovegrove_etal_BMVC2013.pdf

I think at least part of your problem is that you are using T (which maps to Jet or double) for *time* here:

https://github.com/hovren/minimal_ceres_sophus/blob/master/se3_spline.h#L72

when it should only be the parameter type of the control points transforms: G_{world}_{control_point} which are represented as Sophus::SE3Group<T> types.

Several things are important here. Crucially, part of the elegance of the Spline Fusion formulation is that *which* control points are weighted *and* their assigned weights for a given observation are *constant* throughout the optimisation, and are known a priori as the spline is parameterised in time. It is assumed that you know unambiguously the time of each observation, and that this is immutable, and that you have defined the start/end times and the control point period for the spline to be optimised. Once the latter is defined, the set of control points which bound the observation (in time) is both defined, and constant - so is the separation (in time) of the observation from each control point which defines its weight.

What you are then optimising is the control point poses, you are not optimising anything to do with the weights. This means that your operator()() should accept the 4 (fixed) bounding control point parameter blocks which you can associate with the cost function when you add it to Ceres - the constructor for your cost function should also take in the constant weight to be applied to each control point.

Very roughly - something like this:

template<template<class,int> class SEGroupTypeBase>

struct WeightedLieAlgebraErrorCost {

WeightedLieAlgebraErrorCost(

const SEGroupTypeBase<double, 0> &G_external_observation,

const Eigen::Vector4d &control_point_weights,

const LieAlgebraErrorWeightMatrix &lie_algebra_error_weight_matrix);

template<typename T>

bool operator()(const T* const control_point_im1,

const T* const control_point_i,

const T* const control_point_ip1,

const T* const control_point_ip2,

T* residuals) const {

const Eigen::Map<const SEGroupTypeBase<T, 0> >

G_external_control_point_im1(control_point_im1);

const Eigen::Map<const SEGroupTypeBase<T, 0> >

G_external_control_point_i(control_point_i);

const Eigen::Map<const SEGroupTypeBase<T, 0> >

G_external_control_point_ip1(control_point_ip1);

const Eigen::Map<const SEGroupTypeBase<T, 0> >

G_external_control_point_ip2(control_point_ip2);

// Use weights to weight the mapped control point poses.

}

const Eigen::Vector4d weights;

...

};

-Alex

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CABqdRUCVUFb9jsNjPzHf%2B9SmNOZ%3DA_572z-jtN%2Bur_1azb%2BEww%40mail.gmail.com.

Hannes Ovrén

unread,

Jun 23, 2016, 7:30:59 AM6/23/16

to Ceres Solver

On Thursday, June 23, 2016 at 12:35:20 PM UTC+2, Lex wrote:

So it looks like you’re implementing Spline Fusion from Steven, Alonso & Gabe: http://www.stevenlovegrove.com/content/SplineFusion/Lovegrove_etal_BMVC2013.pdf

That is indeed the case.

I think at least part of your problem is that you are using T (which maps to Jet or double) for *time* here:

https://github.com/hovren/minimal_ceres_sophus/blob/master/se3_spline.h#L72

when it should only be the parameter type of the control points transforms: G_{world}_{control_point} which are represented as Sophus::SE3Group<T> types.

Several things are important here. Crucially, part of the elegance of the Spline Fusion formulation is that *which* control points are weighted *and* their assigned weights for a given observation are *constant* throughout the optimisation, and are known a priori as the spline is parameterised in time. It is assumed that you know unambiguously the time of each observation, and that this is immutable, and that you have defined the start/end times and the control point period for the spline to be optimised. Once the latter is defined, the set of control points which bound the observation (in time) is both defined, and constant - so is the separation (in time) of the observation from each control point which defines its weight.

What you are then optimising is the control point poses, you are not optimising anything to do with the weights. This means that your operator()() should accept the 4 (fixed) bounding control point parameter blocks which you can associate with the cost function when you add it to Ceres - the constructor for your cost function should also take in the constant weight to be applied to each control point.

Yes, spline fusion is really elegant :)

The spline evaluation time is templated because of two reasons:

1. The reprojection error calls a function that does rolling shutter projection. The rolling shutter projection calls the spline evaluation multiple times for different timestamps/rows in the image.

For reasons I can't remember right now, I had to template the time parameter to get that working. I should probably look at that again though.

2. For my purposes I can't make the assumption that I know the true time, and instead there will be some time transformation that is going to be minimized as well.

Thus, I can't assume that I know the knot weights before hand.

Instead I have a window around each observation such that I only optimize for the N >= 4 spline knots [T_k, T_k+1, ..., T_k+N-1].

I tried changing the evaluate function to have time as type double instead of T, but the problem persists :(

/Hannes

Hannes Ovrén

unread,

Jun 23, 2016, 7:43:24 AM6/23/16

to Ceres Solver

On Thursday, June 23, 2016 at 11:38:34 AM UTC+2, Sameer Agarwal wrote:

I am not sure of how well the autodiff code in sophus works, so I cannot comment on it. It is possible to have autodiff code which is less efficient than computing finite differences.

I am not surprised that there is a difference in the numerical performance of the two automatic differentiation modes, but that autodiff performs so much worse is surprising. It could have something to do with how well/poorly conditioned your problem is and that you are getting lucky with the path taken by the numerical differentiation code ?

I don't think it is luck in this case. I can minimize different, much harder, cost functions using the numeric differentiation.

another suggestion is to rule out any Sophus related errors by explicitly using a axis-angle + translation parameterization for SE(3).

That would probably be a good idea. But I guess fixing it in Sophus (if that is indeed the problem) seems like the best solution.

My guess is that it is the SE3Type::log() and SE3Type::exp() functions that are the problem since Steven's code already contains ceres tests for the easier parts.

In general, are there any hints or best practices on how to debug this kind of problem?

/Hannes

Sameer Agarwal

unread,

Jun 23, 2016, 10:36:33 AM6/23/16

to Ceres Solver

if you are convinced it is not luck, then something is not right with your derivatives, or your function is too noisy and the derivatives are not giving any useful information.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/39bf08a2-93c3-41f0-bf0f-df1d5cfa4c75%40googlegroups.com.

William Rucklidge

unread,

Jun 23, 2016, 10:50:58 AM6/23/16

to ceres-...@googlegroups.com

There's definitely something not right with the derivatives. Take a look at the first two steps of the two cases:

Numerical diff:

iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time

0 4.539716e+02 0.00e+00 1.94e+00 0.00e+00 0.00e+00 1.00e+04 0 2.03e-02 2.50e-02

1 4.278010e+01 4.11e+02 3.65e+00 2.30e+01 9.06e-01 2.15e+04 1 2.74e-02 5.24e-02

Autodiff:

iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time

0 4.539716e+02 0.00e+00 1.94e+00 0.00e+00 0.00e+00 1.00e+04 0 5.42e-02 5.43e-02

1 4.278010e+01 4.11e+02 8.92e+01 2.30e+01 9.06e-01 2.15e+04 1 5.73e-02 1.12e-01

Check out the gradient at iteration 1. It looks like iteration 0 evaluated the same gradient in the two cases, so it took the same step. But at iteration 1, numerical diff gave you a gradient magnitude of 3.65, while autodiff gave you a gradient magnitude of 89.2.

You may find gradient_checker.h useful in figuring out where the difference is coming from.

-wjr

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CABqdRUAcz7_ZFkzhA4hYXV3sZJni1B3uRgEs%2Bx9K2r-qCVD5ag%40mail.gmail.com.

Adrian Haarbach

unread,

Jun 24, 2016, 8:46:50 AM6/24/16

to Ceres Solver

Hi,

Even though you are using Automatic Differentiation for the cost function, the LocalParameterizationSE3 uses Analytic Jacobians (in the file local_parameterization.h). I once had a similiar problem and was (and still am) unsure if the provided Jacobians there are correct, so I tested your code with an AutoDiffLocalParameterization of Sophus SE3 Plus operator (basically replacing double with T) as mentioned in one of my previous posts.

However, I still get the exact same gradient magnitude of 8.92e+01 in iteration 1. So the error must lie somewhere else...

Adrian

Hannes Ovrén

unread,

Jun 28, 2016, 7:38:23 AM6/28/16

to Ceres Solver

Hi all!

Thanks for all the replies, and suggestions.

I have made a few more observations which raised some questions.

1. If I change the stride template parameter to DynamicAutoDiffCostFunction from kStride=4 to kStride=2, I seem to get similar/equal results as the numeric derivatives.

Any clue why? My spontaneous guess was that it has something to do with the small number of parameter blocks.

Unfortunately, changing to the non-dynamic version is probably not an alternative as the number of parameter blocks are likely going to stay close to, but above 10.

The evaluation speed between numeric and autodiff seems to be the same in this case.

In the following two questions, autodiff and numeric give (more or less) exactly the same result.

2. Failed steps (cost_change < 0) seems to always coincide with gradient_max_norm = 0, as can be seen in iteration 14 in the example output above.

However, looking at the source code I can't really understand whether the gradient norm is always reported as 0 in this case, or if the gradient *is* actually zero, and that is what is

causing the failed step and thus the negative cost change.

I.e. I am not sure in which way the implication arrow is pointing :)

3. Changing the problem conditions slightly makes the problem either converge nicely, or not.

The parameters in this toy example is the number of spline knots (N) to estimate, and the number of uniformly distributed evaluation times along the spline (K).

For example, N=10 and K=30 converges nicely to a very low cost (4 iterations, cost=4e-10, #residuals=206), but N=10, K=5 converges slowly to a worse result (22 iterations, cost=9, #residuals=31).

I really don't like the behaviour of point 3, but I am not sure on how to debug where in my code the problem occurs. Any suggestions?

/Hannes

Keir Mierle

unread,

Jun 28, 2016, 2:54:39 PM6/28/16

to ceres-...@googlegroups.com

On Tue, Jun 28, 2016 at 4:38 AM, Hannes Ovrén <kig...@gmail.com> wrote:

Hi all!

Thanks for all the replies, and suggestions.

I have made a few more observations which raised some questions.

1. If I change the stride template parameter to DynamicAutoDiffCostFunction from kStride=4 to kStride=2, I seem to get similar/equal results as the numeric derivatives.
Any clue why? My spontaneous guess was that it has something to do with the small number of parameter blocks.

Interesting! It could be that there is a bug in DynamicAutoDiffCostFunction. Can you try other strides? Every stride should give the same value.

With that said, please use the gradient checker. It will compare against numeric differences.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/0af220f5-1e08-4f7b-977b-519aab2a8a00%40googlegroups.com.

Hannes Ovrén

unread,

Jun 29, 2016, 5:15:12 AM6/29/16

to Ceres Solver

Hi!

I have updated my test case at https://github.com/hovren/minimal_ceres_sophus with a probe_gradient executable that uses the gradient checker class.

The probing point is found by letting the optimizer take a single iteration.

I got the following results:

Error kStride=1: 31.5911

Error kStride=2: 2.01037e-07

Error kStride=3: 1.74004

Error kStride=4: 1.74004

Error kStride=5: 1.74004

Error kStride=6: 1.74004

Error kStride=7: 1.74004

Error kStride=8: 1.74004

So it seems like there is something wrong here.

(Note: You should be able to run it as "./probe_gradient --logtostderr=1" to get the debug output from the gradient checker.)

/Hannes

Sameer Agarwal

unread,

Jun 29, 2016, 5:28:29 PM6/29/16

to Ceres Solver

Hannes

this looks like a problem. I am traveling right now, I will take a look at this once I am back home over the weekend.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/79c324f7-c764-4ebc-b27c-209fd0838c6b%40googlegroups.com.

Sameer Agarwal

unread,

Jul 9, 2016, 6:00:30 PM7/9/16

to ceres-...@googlegroups.com

Hannes,

Thank you for sharing the example code. Sorry it has taken me so long to look at this.

I built your example against the current version of ceres in the git repo (same for Sophus, I used the version from github), and I did NOT get the same results as you.

I get a consistent error of 2.14e-7. The log is below.

Could you tell me,

1. what platform you are doing the experiment on? I am working on macos X using clang Apple LLVM version 7.3.0 (clang-703.0.31).

2. What version of ceres are you using? I am using the version at HEAD.

Sameer

============= GRADIENT CHECKER ====================

Reports the gradient_result.error_jacobians value for different values of kStride.

Run with --logtostderr=1 to see log output from GradientChecker.

iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time

0 1.333211e+02 0.00e+00 2.65e+00 0.00e+00 0.00e+00 1.00e+04 0 1.88e-03 2.09e-03

1 1.644687e+01 1.17e+02 4.70e+00 1.51e+01 8.80e-01 1.78e+04 1 2.61e-03 4.73e-03

------------------------- CHECK kStride=1 ---------------------------------------

Error kStride=1: 2.14753e-07

------------------------- CHECK kStride=2 ---------------------------------------

Error kStride=2: 2.14753e-07

------------------------- CHECK kStride=3 ---------------------------------------

Error kStride=3: 2.14753e-07

------------------------- CHECK kStride=4 ---------------------------------------

Error kStride=4: 2.14753e-07

------------------------- CHECK kStride=5 ---------------------------------------

Error kStride=5: 2.14753e-07

------------------------- CHECK kStride=6 ---------------------------------------

Error kStride=6: 2.14753e-07

------------------------- CHECK kStride=7 ---------------------------------------

Error kStride=7: 2.14753e-07

------------------------- CHECK kStride=8 ---------------------------------------

Error kStride=8: 2.14753e-07

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/79c324f7-c764-4ebc-b27c-209fd0838c6b%40googlegroups.com.

Hannes Ovrén

unread,

Jul 11, 2016, 3:20:21 AM7/11/16

to Ceres Solver

On Sunday, July 10, 2016 at 12:00:30 AM UTC+2, Sameer Agarwal wrote:

Hannes,
Thank you for sharing the example code. Sorry it has taken me so long to look at this.

No problem. I appreciate the fast feedback.

I built your example against the current version of ceres in the git repo (same for Sophus, I used the version from github), and I did NOT get the same results as you.
I get a consistent error of 2.14e-7. The log is below.

Could you tell me,

1. what platform you are doing the experiment on? I am working on macos X using clang Apple LLVM version 7.3.0 (clang-703.0.31).
2. What version of ceres are you using? I am using the version at HEAD.

Ok, So I tried running that example again using both current Ceres master, and the 1.11.0 version that is bundled with Fedora 24.

I get the exact same behaviour with both ceres versions, which is that it works with -DCMAKE_BUILD_TYPE=DEBUG but not RELEASE.

With the Debug build I get the value 2.01037e-07 for all values of kStride, for both Ceres versions. This is not exactly the same value as you got, but close

With Release, the values differ for different kStride.

I am using Fedora Linux 24, x86-64, with gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3).

Eigen is at version 3.2.8.

I am using the latest Sophus version from Lovegroves repo: https://github.com/stevenlovegrove/Sophus

Not sure if relevant, bu I *did* have to disable a bunch of warnings that were generated by Eigen after updating to the latest Sophus because it

seems like the newer GCC6 is not playing nice with Eigen 3.2.8.

Specifically there were errors on "ignoring attributes on template argument" and "std:::binder1st is deprecated".

/Hannes

Sameer Agarwal

unread,

Jul 11, 2016, 9:01:56 AM7/11/16

to ceres-...@googlegroups.com

Hannes,

Ok, So I tried running that example again using both current Ceres master, and the 1.11.0 version that is bundled with Fedora 24.
I get the exact same behaviour with both ceres versions, which is that it works with -DCMAKE_BUILD_TYPE=DEBUG but not RELEASE.
With the Debug build I get the value 2.01037e-07 for all values of kStride, for both Ceres versions. This is not exactly the same value as you got, but close
With Release, the values differ for different kStride.

This is starting to look like a compiler bug, where something about the optimization is going awry.

I am using Fedora Linux 24, x86-64, with gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3).
Eigen is at version 3.2.8.
I am using the latest Sophus version from Lovegroves repo: https://github.com/stevenlovegrove/Sophus

Not sure if relevant, bu I *did* have to disable a bunch of warnings that were generated by Eigen after updating to the latest Sophus because it
seems like the newer GCC6 is not playing nice with Eigen 3.2.8.
Specifically there were errors on "ignoring attributes on template argument" and "std:::binder1st is deprecated".

We use eigen extensively in our autodiff code. Do you have c++11 mode enabled in your build ?also it maybe worth checking to see if the eigen test suite passes on your machine and compiler combo.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/5963d56c-2138-4fb3-9136-d7411a8b9387%40googlegroups.com.

Hannes Ovrén

unread,

Jul 11, 2016, 10:16:57 AM7/11/16

to Ceres Solver

On Monday, July 11, 2016 at 3:01:56 PM UTC+2, Sameer Agarwal wrote:

Hannes,

Ok, So I tried running that example again using both current Ceres master, and the 1.11.0 version that is bundled with Fedora 24.
I get the exact same behaviour with both ceres versions, which is that it works with -DCMAKE_BUILD_TYPE=DEBUG but not RELEASE.
With the Debug build I get the value 2.01037e-07 for all values of kStride, for both Ceres versions. This is not exactly the same value as you got, but close
With Release, the values differ for different kStride.

This is starting to look like a compiler bug, where something about the optimization is going awry.

Yes, this seems quite likely.

I am using Fedora Linux 24, x86-64, with gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3).
Eigen is at version 3.2.8.
I am using the latest Sophus version from Lovegroves repo: https://github.com/stevenlovegrove/Sophus

Not sure if relevant, bu I *did* have to disable a bunch of warnings that were generated by Eigen after updating to the latest Sophus because it
seems like the newer GCC6 is not playing nice with Eigen 3.2.8.
Specifically there were errors on "ignoring attributes on template argument" and "std:::binder1st is deprecated".

We use eigen extensively in our autodiff code. Do you have c++11 mode enabled in your build ?also it maybe worth checking to see if the eigen test suite passes on your machine and compiler combo.

Yes, I am using C++11.

No, the Eigen test suite for Eigen version 3.2.8 does not pass completely:

The following tests FAILED:
 572 - umfpack_support_1 (Failed)
 573 - umfpack_support_2 (Failed)
 634 - openglsupport (Failed)
 656 - gmres_2 (Failed)
 659 - levenberg_marquardt (Failed)

Not sure which of these failures are actually relevant though, since some of them are in the "unsupported" directory of the Eigen source code.

/Hannes

Sameer Agarwal

unread,

Jul 11, 2016, 10:20:40 AM7/11/16

to Ceres Solver

is it possible to use sophus without c++11 support, if so, it maybe worth trying that out to see if it makes a difference to the compiler.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/4c23ae47-00d0-48ec-a1b6-2bef88bd95cf%40googlegroups.com.

Hannes Ovrén

unread,

Jul 11, 2016, 10:47:57 AM7/11/16

to Ceres Solver

On Monday, July 11, 2016 at 4:20:40 PM UTC+2, Sameer Agarwal wrote:

is it possible to use sophus without c++11 support, if so, it maybe worth trying that out to see if it makes a difference to the compiler.

It is a header library, so the only way to test that is to build all the code in non-C++11 mode.

If I understood correctly, Ceres requires shared_ptr and thus at least C++0x? I tried recompiling my code with -std=c++0x, to no luck (same problem).

/Hannes

Reply all

Reply to author

Forward