Slow Jacobian evaluation

300 views
Skip to first unread message

jens.d...@gmail.com

unread,
May 14, 2017, 9:50:26 AM5/14/17
to Ceres Solver
Hi,

I'm using ceres solver for bundle adjustment. All my code related to the bundle adjustment is written in c++/cli and mostly based on bundle_adjuster.cc example (using autodiff).
I'm experiencing high jacobian evaluation times while bundling. I don't know whether its the size of my problem, my implementation or my build thats bad.

platform: windows 10
compiler: msvc (visual studio 2017)
I have build ceres as a shared library in release with suitesparse (build using https://github.com/jlblancoc/suitesparse-metis-for-windows) and mkl for lapack/blas.

When building the ceres binaries i had to disable optimization (/Od) as this would cause an internal compiler error (could this have a big impact on the performance?). Inline fuction expansion were still left at 'any suitable' (/Ob2). Is there anything else to keep in mind when building ceres/suitesparse on windows?

Thanks


Below the output of ba of 303 camera positions, 1 sensor & 228707 tiepoints (using intel i7 4712)

iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  6.692679e+06    0.00e+00    7.80e+09   0.00e+00   0.00e+00  1.00e+04        0    8.73e+01    9.74e+01    (0) reprojerror: 1,66995254173999
   1  2.493742e+06    4.20e+06    1.35e+09   8.13e+01   8.67e-01  1.65e+04        1    1.01e+02    1.99e+02     (1) reprojerror: 1,60412814795569 
   2  1.961772e+06    5.32e+05    4.61e+08   6.57e+01   8.23e-01  2.27e+04        1    9.30e+01    2.93e+02     (2) reprojerror: 1,60222112893366
   3  1.867735e+06    9.40e+04    1.42e+08   6.51e+01   8.16e-01  3.03e+04        1    9.29e+01    3.86e+02     (3) reprojerror: 1,60242512460827
   4  2.370079e+06   -5.02e+05    0.00e+00   6.81e+01  -2.39e+01  1.52e+04        1    5.67e+00    3.92e+02    (4) reprojerror: 1,60242512460827 
   5  1.879732e+06   -1.20e+04    0.00e+00   3.91e+01  -5.76e-01  3.79e+03        1    3.92e+00    3.96e+02     (5) reprojerror: 1,60242512460827 
   6  1.853042e+06    1.47e+04    9.06e+07   1.32e+01   7.11e-01  4.10e+03        1    9.16e+01    4.88e+02     (6) reprojerror: 1,60205086952319 
   7  1.855796e+06   -2.75e+03    0.00e+00   1.25e+01  -4.35e-01  2.05e+03        1    5.71e+00    4.94e+02     (7) reprojerror: 1,60205086952319 
   8  1.855261e+06   -2.22e+03    0.00e+00   7.22e+00  -3.54e-01  5.12e+02        1    3.98e+00    4.98e+02     (8) reprojerror: 1,60205086952319 
   9  1.853674e+06   -6.32e+02    0.00e+00   2.50e+00  -1.04e-01  6.40e+01        1    4.03e+00    5.03e+02     (9) reprojerror: 1,60205086952319 
  10  1.851763e+06    1.28e+03    4.18e+07   7.97e-01   2.52e-01  5.71e+01        1    9.50e+01    5.98e+02    (10) reprojerror: 1,60217444368666 
  11  1.849476e+06    2.29e+03    2.18e+07   4.64e-01   7.20e-01  6.24e+01        1    8.87e+01    6.87e+02    (11) reprojerror: 1,60211402361321  
  12  1.851250e+06   -1.77e+03    0.00e+00   6.19e-01  -1.90e+00  3.12e+01        1    5.50e+00    6.93e+02   (12) reprojerror: 1,60211402361321 
  13  1.850789e+06   -1.31e+03    0.00e+00   5.26e-01  -1.59e+00  7.79e+00        1    3.74e+00    6.97e+02   (13) reprojerror: 1,60211402361321
  14  1.849755e+06   -2.79e+02    0.00e+00   3.55e-01  -4.71e-01  9.74e-01        1    3.79e+00    7.01e+02     (14) reprojerror: 1,60211402361321  
D:\libraries\ceres-solver-master\internal\ceres\trust_region_minimizer.cc:703 Terminating: Function tolerance reached. |cost_change|/cost: 8.487573e-05 <= 1.000000e-04
Final report:

Solver Summary (v 1.13.0-eigen-(3.3.3)-lapack-suitesparse-(4.5.5)-cxsparse-(3.1.9)-openmp)

                                     Original                  Reduced
Parameter blocks                       229011                   229011
Parameters                             687948                   687948
Effective parameters                   687946                   687946
Residual blocks                        885894                   885894
Residual                              1771788                  1771788

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     8                        8
Linear solver threads                       8                        8
Linear solver ordering              AUTOMATIC               228707,304
Schur structure                         2,3,d                    2,3,d

Cost:
Initial                          6.692679e+06
Final                            1.849476e+06
Change                           4.843204e+06

Minimizer iterations                       15
Successful steps                            7
Unsuccessful steps                          8

Time (in seconds):
Preprocessor                          10.0891

  Residual evaluation                  2.8553
  Jacobian evaluation                607.6319
  Linear solver                       52.6175
Minimizer                            695.4765

Postprocessor                          0.2051
Total                                705.7707

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 8.487573e-05 <= 1.000000e-04)

Sameer Agarwal

unread,
May 14, 2017, 11:22:26 AM5/14/17
to Ceres Solver

Without optimization enabled the automatic differentiation is extremely slow.


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/556854ff-33ce-4b1c-aa01-ca1111e9dbb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jens.d...@gmail.com

unread,
May 14, 2017, 11:57:26 AM5/14/17
to Ceres Solver
Thank you for the fast reaction Sameer! I'll be switching back to vs2015 & hope that will fix the internal compiler error problem.

jens.d...@gmail.com

unread,
May 14, 2017, 4:40:37 PM5/14/17
to Ceres Solver
I have rebuild Ceres under vs2015 with optimalization enabled. The performance has increase around 2x but the jacobian evaluation still accounts for ~90% of total time. Is there any other way i could have messed up my build?


Solver Summary (v 1.13.0-eigen-(3.3.90)-lapack-suitesparse-(4.5.5)-cxsparse-(3.1.9)-openmp)

                                     Original                  Reduced
Parameter blocks                       229011                   229011
Parameters                             687948                   687948
Effective parameters                   687946                   687946
Residual blocks                        885894                   885894
Residual                              1771788                  1771788

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     8                        8
Linear solver threads                       8                        8
Linear solver ordering              AUTOMATIC               228707,304
Schur structure                         2,3,d                    2,3,d

Cost:
Initial                          6.692679e+06
Final                            1.849476e+06
Change                           4.843204e+06

Minimizer iterations                       15
Successful steps                            7
Unsuccessful steps                          8

Time (in seconds):
Preprocessor                           5.8341

  Residual evaluation                  1.6550
  Jacobian evaluation                343.6808
  Linear solver                       13.7191
Minimizer                            366.2774

Postprocessor                          0.0874
Total                                372.1989

Sameer Agarwal

unread,
May 14, 2017, 4:50:20 PM5/14/17
to Ceres Solver

That definitely is not normal looking.
Are you sure you are compiling with high enough optimization level that the templates are being inlined?

Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

jens.d...@gmail.com

unread,
May 14, 2017, 6:10:07 PM5/14/17
to Ceres Solver
Build ceres several times with different optimizations (including highest) and experience similar results. I don't need that high performace as my code is currently for research purpose where speed doesn't matter. However it still intrigues me to know why exactly i can't get it to run any faster.

Jens


Time (in seconds):
Preprocessor                           5.3926

  Residual evaluation                  1.4335
  Jacobian evaluation                326.7548
  Linear solver                       12.6374
Minimizer                            347.8599

Postprocessor                          0.0798
Total                                353.3323


In case it might matter, here is my reprojection struct:

struct ReprojectionErrorSingleCamera {

ReprojectionErrorSingleCamera(const double observed_x, const double observed_y)
: observed_x(observed_x), observed_y(observed_y) {};
template <typename T>
bool operator()(const T* const intrinsics, const T* const R_t, const T* const X, T* residuals) const {

const T& focal_length_x = intrinsics[OFFSET_FOCAL_LENGTH_X];
const T& focal_length_y = intrinsics[OFFSET_FOCAL_LENGTH_Y];
const T& principal_point_x = intrinsics[OFFSET_PRINCIPAL_POINT_X];
const T& principal_point_y = intrinsics[OFFSET_PRINCIPAL_POINT_Y];
const T& k1 = intrinsics[OFFSET_K1];
const T& k2 = intrinsics[OFFSET_K2];
const T& k3 = intrinsics[OFFSET_K3];
const T& p1 = intrinsics[OFFSET_P1];
const T& p2 = intrinsics[OFFSET_P2];
T x[3];

ceres::AngleAxisRotatePoint(R_t, X, x);
x[0] += R_t[3];
x[1] += R_t[4];
x[2] += R_t[5];

T xn = x[0] / x[2];
T yn = x[1] / x[2];
T predicted_x, predicted_y;

T r2 = x*x + y*y;
T r4 = r2 * r2;
T r6 = r4 * r2;
T r_coeff = (T(1) + k1*r2 + k2*r4 + k3*r6);
T xd = x * r_coeff + T(2)*p1*x*y + p2*(r2 + T(2)*x*x);
T yd = y * r_coeff + T(2)*p2*x*y + p1*(r2 + T(2)*y*y);

predicted_x = focal_length_x * xd + principal_point_x;
predicted_y = focal_length_y * yd + principal_point_y;
residuals[0] = predicted_x - T(observed_x);
residuals[1] = predicted_y - T(observed_y);
return true;
}
const double observed_x;
const double observed_y;
};

Sameer Agarwal

unread,
May 16, 2017, 12:07:09 AM5/16/17
to Ceres Solver
Jens,
The code looks sane. I do not use visual studio so its hard for me to tell what is going on with your compiler. But slow Jacobian evaluation in autodiff happens when the compiler is unable to inline the templated functions.
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

jens.d...@gmail.com

unread,
May 16, 2017, 4:08:59 AM5/16/17
to Ceres Solver
Thank you for the assistance Sameer. I may look into this at a later time. If i find a solution i'll be sure to post it.

jens.d...@gmail.com

unread,
May 18, 2017, 5:05:22 PM5/18/17
to Ceres Solver
Managed to 'fix' it by simply using ICC to compile my code. 
Since my app was written in c++/cli i couldn't use icc directly. I Had to create a second native dll with just the cost functions and compile this with icc (i also tried compiling this dll with msvc but the problem persisted). 
Evaluation time went from 400+ seconds to just under 8 seconds. 

Apparently msvc is just really bad at inlining, even with all the __forceinline's in both ceres & eigen. I'm sure there is a way to actually fix it for msvc in my case but i haven't managed to find it.

For the people that managed to build ceres solver in msvc with decent minimizer times for autodiff i'd still be very interested in seeing the compiler setting/commandline.

Jens
Reply all
Reply to author
Forward
0 new messages