Is bundle adjustment really 3x faster in Julia than using ceres-solver?

Skip to first unread message

Oliver Woodford

Aug 23, 2023, 8:10:38 PMAug 23
to Ceres Solver
Hi all

I've been writing a NLLS solver, inspired by ceres-solver, in Julia. Today I ran a like-for-like comparison for the first time, and found my library to be 3x faster than ceres. TBH I'm very surprised, and slightly sceptical, about this result, but I believe this was building ceres using -O3 compiler optimization. I'm writing a blog post about it, but just wanted to see if anyone else can run the two codes and confirm the results, first? I don't want to have made any mistakes building ceres-solver.

By like-for-like performance, I mean identical cost functions, the same type of linear solver (sparse LDL, using SuiteSparse; I haven’t implemented any Schur complement functionality yet), same number of iterations, and the same number of computation threads: one.

The Julia result I get is:
NLLSsolver optimization took 7.876930 seconds and 36 iterations to reduce the cost from 973669.449272 to 14382.738395 (a 98.52% reduction), using:
   38 cost computations in 0.065379 seconds (0.83% of total time),
   36 gradient computations in 1.550764 seconds (19.69% of total time),
   38 linear solver computations in 5.094126 seconds (64.67% of total time),
   0.041203 seconds for initialization (0.52% of total time), and
   1.125458 seconds for other stuff (14.29% of total time).
Reason(s) for termination:
   Relative decrease in cost below threshold.

The ceres-solver result I get is:
Initial                          9.736694e+05
Final                            1.438271e+04
Change                           9.592867e+05

Minimizer iterations                       39
Successful steps                           32
Unsuccessful steps                          7

Time (in seconds):
Preprocessor                         0.114813

  Residual only evaluation           1.178426 (31)
  Jacobian & residual evaluation     5.431002 (32)
  Linear solver                      5.845264 (38)
Minimizer                           23.274499

Postprocessor                        0.003208
Total                               23.392520

These were both run on the same machine, with an Apple M1 Pro CPU. I used the dubrovnik-problem-16–22106 BAL dataset.

You can reproduce the Julia results by installing and running Julia version 1.9.2, then installing the NLLSsolver package within Julia (by typing ]add NLLSsolver at the Julia prompt), and simply running the code in the examples/BundleAdjustment.jl file. This will automatically download the data for you.

For ceres, I used this clone of ceres-solver, forked today, with some minor changes to the bundle_adjuster application parameters to ensure the like-for-like optimization. Unzip the dubrovnik-problem-16–22106-pre.txt.bz2 file that NLLSsolver downloaded for you, then build and run the bundle_adjuster application, providing as input the path to the data file.

The results indicate the following:
  • In terms of the objective, starting values were identical, and final values were within 0.0002% of each other, which is encouraging.

  • Overall, my method is 3x faster.

  • The linear solver times, which in my method accounts for 65% of the total time, are comparable; mine is 1.16x faster, using a pure Julia re-implementation of SuiteSparse, instead of SuiteSparse itself.

  • My cost function computation is 22x faster

  • My Jacobian computation is 4x faster.

  • All the other stuff is 10x faster in my implementation.

Can anyone confirm that these figures are correct (approximately)? If you need any help doing so, please let me know.

Many thanks.

Sameer Agarwal

Aug 23, 2023, 8:27:53 PMAug 23
Hi Oliver,
I have an M1Pro. Here is the set of commands I ran.  For me the solver runs in 3 seconds end to end. See below.

30834  [2023-08-23 17:21:54 -0700] cd /tmp

30835  [2023-08-23 17:21:58 -0700] git clone

30836  [2023-08-23 17:22:04 -0700] cd ceres-solver/

30837  [2023-08-23 17:22:06 -0700] git checkout dev

30838  [2023-08-23 17:22:10 -0700] mkdir cmake-build

30839  [2023-08-23 17:22:12 -0700] cd cmake-build

30840  [2023-08-23 17:22:28 -0700] cmake ../ -G Ninja 

30841  [2023-08-23 17:22:36 -0700] ninja bundle_adjuster

30842  [2023-08-23 17:23:43 -0700] ./bin/bundle_adjuster  --input=${HOME}/Downloads/problem-16-22106-pre.txt

iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time

   0  9.736694e+05    0.00e+00    1.88e+06   0.00e+00   0.00e+00  1.00e+04        0    2.56e-02    4.06e-02

   1  1.074873e+05    8.66e+05    1.18e+06   0.00e+00   1.59e+00  3.00e+04        1    1.27e-01    1.68e-01

   2  5.421333e+04    5.33e+04    4.35e+05   1.16e+03   7.13e-01  3.25e+04        1    8.39e-02    2.52e-01


  38  1.438271e+04    2.30e-02    3.78e+00   1.46e-01   2.00e+00  2.18e+15        1    8.20e-02    3.04e+00

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)

                                     Original                  Reduced

Parameter blocks                        22122                    22122

Parameters                              66478                    66478

Effective parameters                    66462                    66462

Residual blocks                         83718                    83718

Residuals                              167436                   167436

Minimizer                        TRUST_REGION

Trust region strategy     LEVENBERG_MARQUARDT

Sparse linear algebra library    SUITE_SPARSE 

                                        Given                     Used


Threads                                     1                        1

Linear solver ordering               22106,16                 22106,16


Initial                          9.736694e+05

Final                            1.438271e+04

Change                           9.592867e+05

Minimizer iterations                       39

Successful steps                           32

Unsuccessful steps                          7

Time (in seconds):

Preprocessor                         0.014988

  Residual only evaluation           0.084480 (31)

  Jacobian & residual evaluation     0.647794 (32)

  Linear solver                      2.145158 (38)

Minimizer                            3.030012

Postprocessor                        0.000629

Total                                3.045629

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 38.)


You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Oliver Woodford

Aug 24, 2023, 8:21:23 AMAug 24
to Ceres Solver
Many thanks, Sameer. That's very helpful, and more like the numbers I was expecting. I'll try out your build commands (I used xcode before), and see if I can replicate that performance, once I'm back at my machine.

Oliver Woodford

Aug 30, 2023, 11:07:14 AMAug 30
to Ceres Solver
TL/DR: No, it isn't.

I have replicated very similar timings to yours, Sameer, on my laptop, using the build commands you shared. Thanks again for those. So basically Julia/my implementation is about 2.5x slower in the end, with most of that slow-down coming from the linear solver (not mine).

However, I note that my cost evaluation is actually about 1.58x faster. I suspect some of this is down to the fact that I'm storing rotations as matrices, not quaternions, which speeds up rotations of points.


Sameer Agarwal

Aug 30, 2023, 11:12:44 AMAug 30
Hi Oliver,

Yes if you are not doing trigonometric functions evaluation that can certainly affect the jacobian evaluation time.
It is also possible that you have better function inlining and autodiff performance in general in julia.  
We are depending on a combination of eigen code and the c++ compiler doing the right thing with common subexpression elimination etc. 
We know from experience that when performance is critical that hand crafted analytic derivatives are generally speaking faster than autodiff.


Oliver Woodford

Aug 30, 2023, 3:14:26 PMAug 30
to Ceres Solver
Hi Sameer

Note that I was just referring to cost (i.e. residuals) evaluation, not cost and Jacobian. My Jacobian calculation time is slower, though this time includes more work, as it actually updates the normal equations too. I personally doubt Julia has better autodiff performance as the code is JIT compiled - my program is compiled in considerably less time than ceres, so there's less opportunity for such compiler optimizations. And Julia uses LLVM, just like clang, for compilation. I'm also using forward mode autodiff with dual numbers, same as ceres.

Also note that multiplying a point by a quaternion doesn't involve any trigonometric functions. It does involve more multiplications and additions than with a matrix though. When normalizing the quaternions (which the code currently does), there's also a 1/sqrt. Even just assuming unit quaternions (which they are when using the manifold) gets ceres to within 1.28x slower than my code for the cost evaluation. This gain applies to the Jacobian computation, too, of course.

Definitely nothing will beat (well) hand written residual and Jacobian computation. However, the goal of my library (like ceres) is to provide flexibility and ease of use, with decent performance. Julia's multiple dispatch functionality should allow it to be more flexible even than ceres, without impacting performance too much. But it looks like it will always be a bit slower.


Sameer Agarwal

Aug 30, 2023, 3:38:01 PMAug 30
oh I thought when you said cost, you meant cost and jacobian.
cost evaluations are usually a fairly small part of the overall solve time that I do not worry about them too much. they do become a thing if there are bounds constraints and the line search to enforce them can get expensive. 


Oliver Woodford

Aug 30, 2023, 3:50:30 PMAug 30
to Ceres Solver
Fair enough. But any speed-ups to cost evaluation tend to also apply to the Jacobian evaluation, since you're just multiplying the work by the length of the autodiff dual numbers. Ignore my Jacobian times - they encapsulate more work than ceres' timings. Basically you can speed up bundle adjustment Jacobian computation a fair bit with some minor tweaks, even without hand-written Jacobians.
Reply all
Reply to author
0 new messages