NaNs in cost function autodiff

1,678 views
Skip to first unread message

Thomas Sharpless

unread,
Jul 6, 2015, 1:34:18 PM7/6/15
to ceres-...@googlegroups.com
Hi

I'm new to ceres and trying get my first optimization to work.  The job is a 2-camera  bundle adjustment, with matched pairs of rays on camera centers instead of internal matched image points (i.e. I'm using calibrated cameras).  My cost function is supposed to compute the angles between the observed rays and the directions to the fitted 3D points.  It looks reasonable to me,and compiles ok (with gcc 4.8.2) but is generating NaNs during Jacobian evaluation. 

Can any of you experts see what is wrong with it?


/* per-point error estimate (autodifferentiated)
    ctor takes addrs of two constant rays
    ()  takes global rotation and translation params
        and the fitted point for this ray pair
    residual gets angles between fitted and given rays
*/
struct rayCloudError {
    rayCloudError( const double * pl, const double * pr )
        : l_(pl), r_(pr) {}
  // angle between 2 unit vectors, and its sine and cosine
    template <typename T>
    T uva( const T * r1,
           const T * r2,
           T * sa, T * ca )
    const {
        *ca = ceres::DotProduct(r1, r2);
        T t[3] = {
            r2[0] - r1[0] * *ca,
            r2[1] - r1[1] * *ca,
            r2[2] - r1[2] * *ca
        };
        *sa = sqrt(ceres::DotProduct(t, t));
        return atan2(*sa, *ca);
    }

  // normalize a vector, return its length
    template <typename T>
    T nrv( T* v ) const {
        T l = sqrt(ceres::DotProduct(v, v));
        T s = T(0.0);
        if( l > T(1.e-8)) s = T(1.0) / l;
        v[0] *= s;
        v[1] *= s;
        v[2] *= s;
        return l;
    }

    template <typename T>
    bool operator()( const T * const rrrxyz,    // [6] global
                     const T * ppt,             // [3] per point
                     T * residuals ) const      // [2] per point
    {
        T ds, dc, p1[3], p2[3];
        ceres::AngleAxisRotatePoint( rrrxyz, ppt, p2 );
        for(int i = 0; i < 3; i++){
            p1[i] = ppt[i];
            p2[i] = p2[i] - rrrxyz[3 + i];
        }
        T l1 = nrv( p1 );
        T l2 = nrv( p2 );
        residuals[0] = uva( p1, (const T*)l_, &ds, &dc );
        residuals[1] = uva( p2, (const T*)r_, &ds, &dc );
        return true;
    }

    static ceres::CostFunction* Create( const double * lft,
                                        const double * rgt ) {
    return (new ceres::AutoDiffCostFunction<rayCloudError, 2, 6, 3>(
                new rayCloudError(lft, rgt)));
  }
  // pointers to rays
  const double * l_, * r_;
};


William Rucklidge

unread,
Jul 6, 2015, 1:41:00 PM7/6/15
to ceres-...@googlegroups.com
Just glancing at your code: you're using sqrt() which is always problematic when computing derivatives (sqrt'(0) is infinite). See if you're calling it with anything that's zero or very near zero.

-wjr


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/1b5019f4-6216-41dc-86e6-2a24d990ffb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Sharpless

unread,
Jul 6, 2015, 3:34:00 PM7/6/15
to ceres-...@googlegroups.com
wjr,
I have verified that none of the vectors passed to this function during initialization is anywhere near zero.
However it looks like the problem is happening when ceres sends in Jets for autodifferentiation.  I have the feeling those could be "near zero", but I have no control over them.
Perhaps there is a "Jet-friendly" vector length function in ceres?
-- Tom

Sameer Agarwal

unread,
Jul 6, 2015, 3:44:28 PM7/6/15
to ceres-...@googlegroups.com
Thomas,

The way I diagnose these errors is by a combination of bisection and printing out the jets as they are evaluated. 

Yes it is possible that one of the operations being performed on the jets is producing an NaN. If you can provide me with a reproduction, I will see what I can do to fix it.

Sameer


Thomas Sharpless

unread,
Jul 6, 2015, 9:43:58 PM7/6/15
to ceres-...@googlegroups.com
Thanks, Sameer.

It seems this happens when some of the initial residuals for the 2nd ray (rotated and shifted camera) are near zero.  Testing with synthetic data, if I generate the rays from the 3D points then I get NaNs under the 2nd residual (but not the first one).  If I perturb the rays even a little (order of 1e-7) the optimizer runs, and terminates on PARAMETER_TOLERANCE with values that look reasonable.

In the live data 8k pairs of rays are given and I initialize the 3D points to points of closest approach using reasonable guesses for the rotation and translation.  In that case ceres is reporting all NaNs under both residuals.  I have verified that there are no NaNs in the input.   I guess I could perturb the initial points (and will try that) but this is a hit-or-miss solution and I'd prefer a solid one.

BTW this is with ceres version 1.8.0.  I have version 1.10.0 source but my current MinGW-w64 compiler (gcc 4.8.2) crashes on one of its internal functions.  Hoping to be able to build it soon (will use msvc if all else fails).

-- Tom
...

Thomas Sharpless

unread,
Jul 6, 2015, 10:39:58 PM7/6/15
to ceres-...@googlegroups.com
Modifying the initial point values or parameter settings makes no difference.  I still get this:
Error in evaluating the ResidualBlock.

There are two possible reasons. Either the CostFunction did not evaluate and fil
l all
residual and jacobians that were requested or there was a non-finite value (nan/
infinite)
generated during the or jacobian computation.

Residual Block size: 2 parameter blocks x 2 residuals

For each parameter block, the value of the parameters are printed in the first c
olumn
and the value of the jacobian under the corresponding residual. If a ParameterBl
ock was
held constant then the corresponding jacobian is printed as 'Not Computed'. If a
n entry
of the Jacobian/residual array was requested but was not written to by user code
, it is
indicated by 'Uninitialized'. This is an error. Residuals or Jacobian values eva
luating
to Inf or NaN is also an error.

Residuals:           1.5708       1.5708

Parameter Block 0, size: 6

           0 |          nan          nan
          -0 |          nan          nan
           0 |          nan          nan
           1 |          nan          nan
           0 |          nan          nan
           0 |          nan          nan

Parameter Block 1, size: 3

           0 |          nan          nan
           0 |          nan          nan
           0 |          nan          nan



D:\solver\ceres-solver-1.8.0\internal\ceres\trust_region_minimizer.cc:133 Termin
ating: Residual and Jacobian evaluation failed.

The modified values show up in the left column of this listing, the other 2 columns are nans 

Here is the full text of the program that gave this result.  The only nonstandard code is in the initialization of the 3D points.  That can safely be ignored as it does not matter what values I give to those points.

/* rayBundler.cpp  copyright T K Sharpless
    part of the Panini stitching libraries

  Fit a 3d model to paired rays on two calibrated
  cameras (whose intrinsics have already been used
  to compute the ray directions).

  The model is
    Camera A defines the coordinate system up
    to scale.
    Scale unit is the distance between cameras.
    Camera B is rotated by RR and translated
    by 1 unit in the direction epi.
    Each ray pair has a fitted 3D point.

  If the rays are correctly matched points from two
  views of the same scene, the differences of polar
  angles with respect to epi (that are not 0) should
  all have the same sign.

  The main function, alignRayClouds, initializes the
  3D points using the passed values of the rotation
  and translation parameters.  It tries to validate
  the matches and may rearrange the data arrays, putting
  rejected pairs at the end.  It returns the number of
  accepted pairs, that belong to the final model.

  The optimizer is basically the ceres simple bundler
  adjuster example with camera intrinsics removed from
  the error evaluation.  The new error measure is the
  angle between the fitted and given rays.

*/
#include "rayBundler.h"
#include "ceres/ceres.h"
#include "ceres/rotation.h"
#include "spheregeom.h"
// bundle adjustment
static bool adjust( const double * pa,
                    const double * pb,
                    double * ppt,
                    int npnts,
                    double * rrrxyz
                  )
{

    ceres::Problem problem;
    for( int i = 0; i < npnts; i++ ){
        ceres::CostFunction * pcf =
            rayCloudError::Create( pa+3*i, pb+3*i );
        problem.AddResidualBlock( pcf,
                                  NULL,
                                  rrrxyz,
                                  ppt+3*i  );
    }
    ceres::Solver::Options options;
    options.logging_type = ceres::SILENT;
    options.linear_solver_type = ceres::DENSE_SCHUR;

    ceres::Solver::Summary summary;
    ceres::Solve(options, &problem, &summary );
    std::cout<< summary.FullReport() << "\n";
  // return false if optimizer failed
    return summary.error.size() == 0;
}

int alignRayClouds( const double * raysA,  // input unit vectors
                    const double * raysB,  //   as A-B pairs
                    double * pnts,         // output 3D points
                    int npnts,
                    double & yaw, double & pitch, double & roll,   // output rotation of B
                    double & epix, double & epiy, double & epiz    // output translation of B
                  )
{
    double pry[3] = { pitch, roll, yaw };
    double RM[9];
    ceres::EulerAnglesToRotationMatrix<double>( pry, 3, RM );
    double rrrxyz[6] = { 0, 0, 0, epix, epiy, epiz };
  // initialize the angle-axis rotation
    ceres::RotationMatrixToAngleAxis( RM, rrrxyz );
#if 0   // makes no difference
  // try perturbing...
    for(int i = 0; i < 6; i++ )
        rrrxyz[i] += 0.0005;
#endif

  /* initialize the 3D points
     apply the given rotation and translation to raysB
     then compute point of nearest approach of rays
  */
    for( int i = 0; i < npnts; i++ ){
        int k = 3 * i;
        const double * ra = raysA + k,
                     * rb = raysB + k;
        double * pt = pnts + k;
        double p2[3];
        ceres::AngleAxisRotatePoint( rrrxyz, rb, p2 );
        gp3d orga(0,0,0),
             orgb(epix,epiy,epiz),
             pnta(ra),
             pntb(p2);
        plck3d pla(orga, pnta),
               plb(orgb, pntb);
        seg3d seg;
        double d = pla.nearPnts(plb, 0, &seg);
        gp3d m = seg.midpoint();
        if(m.isNaN()){
            m = gp3d(-1, -1, -1);
        } else if(m.isZero()){
            m = gp3d(1, 1, -1);
        }
// makes no difference     m.clear();
// makes no difference     m = gp3d(0,0,-1000);
        m.put(pt);
    }

    int nacc = npnts;
    bool ok = true;
    ok = adjust(raysA, raysB, pnts, nacc, rrrxyz);
    if(ok) return nacc;
    return 0;
}

-- Tom 
...

Sameer Agarwal

unread,
Jul 6, 2015, 10:57:27 PM7/6/15
to ceres-...@googlegroups.com
Thomas, can you give the left and right ray values you used to create the cost function and then the parameter values that cause it to create an nan ?

Sameer

Thomas Sharpless

unread,
Jul 7, 2015, 11:04:39 AM7/7/15
to ceres-...@googlegroups.com
Sameer, can you give me a recipe for testing a "T" for nan? I can't find a way of applying std::isnan() that works with a Jet.

Thomas Sharpless

unread,
Jul 7, 2015, 11:10:24 AM7/7/15
to ceres-...@googlegroups.com
None of the ray or point values I'm passing in is zero or contains a NaN.
...

Sameer Agarwal

unread,
Jul 7, 2015, 11:36:20 AM7/7/15
to ceres-...@googlegroups.com
Thomas,
It is not the input but one of the intermediate values that is an NaN.
To check validity of a Jet or doubles, use ceres::IsNaN  or ceres::IsInfinite or ceres::IsFinite. They are defined in jet.h and fpclassify.h
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Thomas Sharpless

unread,
Jul 7, 2015, 3:28:40 PM7/7/15
to ceres-...@googlegroups.com
thanks, Sameer.

I built a debug version of ceres so can now break at the error report function which is a member of ResidualBlock.
In the failing block, member index = 8680 of 8682 residual blocks. So I tried telling adjust there are only 8679 points, and sure enough, the optimizer started up OK.  It terminated on parameter tolerance after 16 steps, only 2 of them "successful".  I have not checked the result yet.

Adding checks with ceres::isNaN() and IsInfinite reveals that the nans are indeed generated in the sqrt() in function uva().  However real the fault seems to be a near-overflow in  *ca = dotproduct(r1, r2) which has several magnitude e+302 numbers in it. There is nothing obviously wrong with the ray pair from which the Jets r1 and r2 were generated, but somehow their dot product is pathological.

uva() is just computing the angle between two vectors, so if there is a more Jet-friendly way to do this, I'd like to hear about it.  Similarly with normalizing a vector, for which I am also using sqrt(dotprduct()).

In the wider view, I am also wondering why, when I skip the bad point, ceres is only able to reduce the residual error by less than 10% before quitting.  

-- Tom

Sameer Agarwal

unread,
Jul 7, 2015, 3:39:24 PM7/7/15
to ceres-...@googlegroups.com
Thomas,

If you give me the ray pairs and the input data to that particular residual that is causing problems I can look at it more carefully to see what is going on.

I am not sure what is going on with your algorithm convergence, but lets first get the derivative computation correct.

Sameer



--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Thomas Sharpless

unread,
Jul 7, 2015, 5:02:03 PM7/7/15
to ceres-...@googlegroups.com
Sameer,
It is not the data values.  The optimizer also runs when I skip the first 3 points, or all but the last 3 points, or indeed drop 2 points at random.  So it is some kind of memory allocation / indexing bug.  So long as I post at least 8681 points, evaluation fails at point index 8680.

This is not the sort of thing programmers want to hear.  So let's chalk it up to the weird build environment I'm using, and turn to the important question: why is my model getting fitted so badly, and how to fix that.

-- Tom
...

Sameer Agarwal

unread,
Jul 7, 2015, 5:03:07 PM7/7/15
to ceres-...@googlegroups.com
I am concerned that your derivatives have other problems which is causing the optimizer to converge prematurely.


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Thomas Sharpless

unread,
Jul 7, 2015, 5:18:01 PM7/7/15
to ceres-...@googlegroups.com
Me, too.  I suggest I build the current ceres, and this program, with MSVC, and see if it does not behave better.  This particular app does not need to be portable as it depends on a front end that is currently Windows-only.
--Tom
...

Sameer Agarwal

unread,
Jul 7, 2015, 5:18:52 PM7/7/15
to ceres-...@googlegroups.com
Sounds good.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages