NaNs in cost function autodiff

Thomas Sharpless

unread,

Jul 6, 2015, 1:34:18 PM7/6/15

to ceres-...@googlegroups.com

Hi

I'm new to ceres and trying get my first optimization to work. The job is a 2-camera bundle adjustment, with matched pairs of rays on camera centers instead of internal matched image points (i.e. I'm using calibrated cameras). My cost function is supposed to compute the angles between the observed rays and the directions to the fitted 3D points. It looks reasonable to me,and compiles ok (with gcc 4.8.2) but is generating NaNs during Jacobian evaluation.

Can any of you experts see what is wrong with it?

/* per-point error estimate (autodifferentiated)

    ctor takes addrs of two constant rays

    ()  takes global rotation and translation params

        and the fitted point for this ray pair

    residual gets angles between fitted and given rays

*/

struct rayCloudError {

    rayCloudError( const double * pl, const double * pr )

        : l_(pl), r_(pr) {}

  // angle between 2 unit vectors, and its sine and cosine

    template <typename T>

    T uva( const T * r1,

           const T * r2,

           T * sa, T * ca )

    const {

        *ca = ceres::DotProduct(r1, r2);

        T t[3] = {

            r2[0] - r1[0] * *ca,

            r2[1] - r1[1] * *ca,

            r2[2] - r1[2] * *ca

};

        *sa = sqrt(ceres::DotProduct(t, t));

        return atan2(*sa, *ca);

  // normalize a vector, return its length

    template <typename T>

    T nrv( T* v ) const {

        T l = sqrt(ceres::DotProduct(v, v));

        T s = T(0.0);

        if( l > T(1.e-8)) s = T(1.0) / l;

        v[0] *= s;

        v[1] *= s;

        v[2] *= s;

        return l;

    template <typename T>

    bool operator()( const T * const rrrxyz,    // [6] global

                     const T * ppt,             // [3] per point

                     T * residuals ) const      // [2] per point

        T ds, dc, p1[3], p2[3];

        ceres::AngleAxisRotatePoint( rrrxyz, ppt, p2 );

        for(int i = 0; i < 3; i++){

            p1[i] = ppt[i];

            p2[i] = p2[i] - rrrxyz[3 + i];

        T l1 = nrv( p1 );

        T l2 = nrv( p2 );

        residuals[0] = uva( p1, (const T*)l_, &ds, &dc );

        residuals[1] = uva( p2, (const T*)r_, &ds, &dc );

        return true;

    static ceres::CostFunction* Create( const double * lft,

                                        const double * rgt ) {

    return (new ceres::AutoDiffCostFunction<rayCloudError, 2, 6, 3>(

                new rayCloudError(lft, rgt)));

  // pointers to rays

  const double * l_, * r_;

};

William Rucklidge

unread,

Jul 6, 2015, 1:41:00 PM7/6/15

to ceres-...@googlegroups.com

Just glancing at your code: you're using sqrt() which is always problematic when computing derivatives (sqrt'(0) is infinite). See if you're calling it with anything that's zero or very near zero.

-wjr

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/1b5019f4-6216-41dc-86e6-2a24d990ffb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Sharpless

unread,

Jul 6, 2015, 3:34:00 PM7/6/15

to ceres-...@googlegroups.com

wjr,
I have verified that none of the vectors passed to this function during initialization is anywhere near zero.
However it looks like the problem is happening when ceres sends in Jets for autodifferentiation. I have the feeling those could be "near zero", but I have no control over them.
Perhaps there is a "Jet-friendly" vector length function in ceres?
-- Tom

Sameer Agarwal

unread,

Jul 6, 2015, 3:44:28 PM7/6/15

to ceres-...@googlegroups.com

Thomas,

The way I diagnose these errors is by a combination of bisection and printing out the jets as they are evaluated.

Yes it is possible that one of the operations being performed on the jets is producing an NaN. If you can provide me with a reproduction, I will see what I can do to fix it.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/7140be66-a039-49f1-a206-d3ba769594ad%40googlegroups.com.

Thomas Sharpless

unread,

Jul 6, 2015, 9:43:58 PM7/6/15

to ceres-...@googlegroups.com

Thanks, Sameer.

It seems this happens when some of the initial residuals for the 2nd ray (rotated and shifted camera) are near zero. Testing with synthetic data, if I generate the rays from the 3D points then I get NaNs under the 2nd residual (but not the first one). If I perturb the rays even a little (order of 1e-7) the optimizer runs, and terminates on PARAMETER_TOLERANCE with values that look reasonable.

In the live data 8k pairs of rays are given and I initialize the 3D points to points of closest approach using reasonable guesses for the rotation and translation. In that case ceres is reporting all NaNs under both residuals. I have verified that there are no NaNs in the input. I guess I could perturb the initial points (and will try that) but this is a hit-or-miss solution and I'd prefer a solid one.

BTW this is with ceres version 1.8.0. I have version 1.10.0 source but my current MinGW-w64 compiler (gcc 4.8.2) crashes on one of its internal functions. Hoping to be able to build it soon (will use msvc if all else fails).

-- Tom

...

Thomas Sharpless

unread,

Jul 6, 2015, 10:39:58 PM7/6/15

to ceres-...@googlegroups.com

Modifying the initial point values or parameter settings makes no difference. I still get this:

Error in evaluating the ResidualBlock.

There are two possible reasons. Either the CostFunction did not evaluate and fil
l all
residual and jacobians that were requested or there was a non-finite value (nan/
infinite)
generated during the or jacobian computation.

Residual Block size: 2 parameter blocks x 2 residuals

For each parameter block, the value of the parameters are printed in the first c
olumn
and the value of the jacobian under the corresponding residual. If a ParameterBl
ock was
held constant then the corresponding jacobian is printed as 'Not Computed'. If a
n entry
of the Jacobian/residual array was requested but was not written to by user code
, it is
indicated by 'Uninitialized'. This is an error. Residuals or Jacobian values eva
luating
to Inf or NaN is also an error.

Residuals: 1.5708 1.5708

Parameter Block 0, size: 6

0 | nan nan
-0 | nan nan
0 | nan nan
1 | nan nan
0 | nan nan
0 | nan nan

Parameter Block 1, size: 3

0 | nan nan
0 | nan nan
0 | nan nan

D:\solver\ceres-solver-1.8.0\internal\ceres\trust_region_minimizer.cc:133 Termin
ating: Residual and Jacobian evaluation failed.

The modified values show up in the left column of this listing, the other 2 columns are nans

Here is the full text of the program that gave this result. The only nonstandard code is in the initialization of the 3D points. That can safely be ignored as it does not matter what values I give to those points.

/* rayBundler.cpp  copyright T K Sharpless

    part of the Panini stitching libraries

  Fit a 3d model to paired rays on two calibrated

  cameras (whose intrinsics have already been used

  to compute the ray directions).

  The model is

    Camera A defines the coordinate system up

    to scale.

    Scale unit is the distance between cameras.

    Camera B is rotated by RR and translated

    by 1 unit in the direction epi.

    Each ray pair has a fitted 3D point.

  If the rays are correctly matched points from two

  views of the same scene, the differences of polar

  angles with respect to epi (that are not 0) should

  all have the same sign.

  The main function, alignRayClouds, initializes the

  3D points using the passed values of the rotation

  and translation parameters.  It tries to validate

  the matches and may rearrange the data arrays, putting

  rejected pairs at the end.  It returns the number of

  accepted pairs, that belong to the final model.

  The optimizer is basically the ceres simple bundler

  adjuster example with camera intrinsics removed from

  the error evaluation.  The new error measure is the

  angle between the fitted and given rays.

*/

#include "rayBundler.h"

#include "ceres/ceres.h"

#include "ceres/rotation.h"

#include "spheregeom.h"

// bundle adjustment

static bool adjust( const double * pa,

                    const double * pb,

                    double * ppt,

                    int npnts,

                    double * rrrxyz

    ceres::Problem problem;

    for( int i = 0; i < npnts; i++ ){

        ceres::CostFunction * pcf =

            rayCloudError::Create( pa+3*i, pb+3*i );

        problem.AddResidualBlock( pcf,

                                  NULL,

                                  rrrxyz,

                                  ppt+3*i  );

    ceres::Solver::Options options;

    options.logging_type = ceres::SILENT;

    options.linear_solver_type = ceres::DENSE_SCHUR;

    ceres::Solver::Summary summary;

    ceres::Solve(options, &problem, &summary );

    std::cout<< summary.FullReport() << "\n";

  // return false if optimizer failed

    return summary.error.size() == 0;

int alignRayClouds( const double * raysA,  // input unit vectors

                    const double * raysB,  //   as A-B pairs

                    double * pnts,         // output 3D points

                    int npnts,

                    double & yaw, double & pitch, double & roll,   // output rotation of B

                    double & epix, double & epiy, double & epiz    // output translation of B

    double pry[3] = { pitch, roll, yaw };

    double RM[9];

    ceres::EulerAnglesToRotationMatrix<double>( pry, 3, RM );

    double rrrxyz[6] = { 0, 0, 0, epix, epiy, epiz };

  // initialize the angle-axis rotation

    ceres::RotationMatrixToAngleAxis( RM, rrrxyz );

#if 0   // makes no difference

  // try perturbing...

    for(int i = 0; i < 6; i++ )

        rrrxyz[i] += 0.0005;

#endif

  /* initialize the 3D points

     apply the given rotation and translation to raysB

     then compute point of nearest approach of rays

*/

    for( int i = 0; i < npnts; i++ ){

        int k = 3 * i;

        const double * ra = raysA + k,

                     * rb = raysB + k;

        double * pt = pnts + k;

        double p2[3];

        ceres::AngleAxisRotatePoint( rrrxyz, rb, p2 );

        gp3d orga(0,0,0),

             orgb(epix,epiy,epiz),

             pnta(ra),

             pntb(p2);

        plck3d pla(orga, pnta),

               plb(orgb, pntb);

        seg3d seg;

        double d = pla.nearPnts(plb, 0, &seg);

        gp3d m = seg.midpoint();

        if(m.isNaN()){

            m = gp3d(-1, -1, -1);

        } else if(m.isZero()){

            m = gp3d(1, 1, -1);

// makes no difference     m.clear();

// makes no difference     m = gp3d(0,0,-1000);

        m.put(pt);

    int nacc = npnts;

    bool ok = true;

    ok = adjust(raysA, raysB, pnts, nacc, rrrxyz);

    if(ok) return nacc;

    return 0;

-- Tom

...

Sameer Agarwal

unread,

Jul 6, 2015, 10:57:27 PM7/6/15

to ceres-...@googlegroups.com

Thomas, can you give the left and right ray values you used to create the cost function and then the parameter values that cause it to create an nan ?

Sameer

Thomas Sharpless

unread,

Jul 7, 2015, 11:04:39 AM7/7/15

to ceres-...@googlegroups.com

Sameer, can you give me a recipe for testing a "T" for nan? I can't find a way of applying std::isnan() that works with a Jet.

Thomas Sharpless

unread,

Jul 7, 2015, 11:10:24 AM7/7/15

to ceres-...@googlegroups.com

None of the ray or point values I'm passing in is zero or contains a NaN.

...

Sameer Agarwal

unread,

Jul 7, 2015, 11:36:20 AM7/7/15

to ceres-...@googlegroups.com

Thomas,

It is not the input but one of the intermediate values that is an NaN.

To check validity of a Jet or doubles, use ceres::IsNaN or ceres::IsInfinite or ceres::IsFinite. They are defined in jet.h and fpclassify.h

Sameer

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/4223dd0f-5082-4eb3-b157-3d57324133f1%40googlegroups.com.

Thomas Sharpless

unread,

Jul 7, 2015, 3:28:40 PM7/7/15

to ceres-...@googlegroups.com

thanks, Sameer.

I built a debug version of ceres so can now break at the error report function which is a member of ResidualBlock.

In the failing block, member index = 8680 of 8682 residual blocks. So I tried telling adjust there are only 8679 points, and sure enough, the optimizer started up OK. It terminated on parameter tolerance after 16 steps, only 2 of them "successful". I have not checked the result yet.

Adding checks with ceres::isNaN() and IsInfinite reveals that the nans are indeed generated in the sqrt() in function uva(). However real the fault seems to be a near-overflow in *ca = dotproduct(r1, r2) which has several magnitude e+302 numbers in it. There is nothing obviously wrong with the ray pair from which the Jets r1 and r2 were generated, but somehow their dot product is pathological.

uva() is just computing the angle between two vectors, so if there is a more Jet-friendly way to do this, I'd like to hear about it. Similarly with normalizing a vector, for which I am also using sqrt(dotprduct()).

In the wider view, I am also wondering why, when I skip the bad point, ceres is only able to reduce the residual error by less than 10% before quitting.

-- Tom

Sameer Agarwal

unread,

Jul 7, 2015, 3:39:24 PM7/7/15

to ceres-...@googlegroups.com

Thomas,

If you give me the ray pairs and the input data to that particular residual that is causing problems I can look at it more carefully to see what is going on.

I am not sure what is going on with your algorithm convergence, but lets first get the derivative computation correct.

Sameer

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/86f6d65d-51c1-4f64-b8a2-2bc911b19a66%40googlegroups.com.

Thomas Sharpless

unread,

Jul 7, 2015, 5:02:03 PM7/7/15

to ceres-...@googlegroups.com

Sameer,

It is not the data values. The optimizer also runs when I skip the first 3 points, or all but the last 3 points, or indeed drop 2 points at random. So it is some kind of memory allocation / indexing bug. So long as I post at least 8681 points, evaluation fails at point index 8680.

This is not the sort of thing programmers want to hear. So let's chalk it up to the weird build environment I'm using, and turn to the important question: why is my model getting fitted so badly, and how to fix that.

-- Tom

...

Sameer Agarwal

unread,

Jul 7, 2015, 5:03:07 PM7/7/15

to ceres-...@googlegroups.com

I am concerned that your derivatives have other problems which is causing the optimizer to converge prematurely.

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/80bce433-befe-4978-bc01-27fe3c86da4e%40googlegroups.com.

Thomas Sharpless

unread,

Jul 7, 2015, 5:18:01 PM7/7/15

to ceres-...@googlegroups.com

Me, too. I suggest I build the current ceres, and this program, with MSVC, and see if it does not behave better. This particular app does not need to be portable as it depends on a front end that is currently Windows-only.

--Tom

...

Sameer Agarwal

unread,

Jul 7, 2015, 5:18:52 PM7/7/15

to ceres-...@googlegroups.com

Sounds good.

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/48eaa08f-1a03-42f2-a981-7a339a0914b8%40googlegroups.com.

Reply all

Reply to author

Forward