275 views

Skip to first unread message

Aug 23, 2023, 8:10:38 PM8/23/23

to Ceres Solver

Hi all

I've been writing a NLLS solver, inspired by ceres-solver, in Julia. Today I ran a like-for-like comparison for the first time, and found my library to be 3x faster than ceres. TBH I'm very surprised, and slightly sceptical, about this result, but I believe this was building ceres using -O3 compiler optimization. I'm writing a blog post about it, but just wanted to see if anyone else can run the two codes and confirm the results, first? I don't want to have made any mistakes building ceres-solver.

By like-for-like performance, I mean identical cost functions, the same type of linear solver (sparse LDL, using SuiteSparse; I haven’t implemented any Schur complement functionality yet), same number of iterations, and the same number of computation threads: one.

The Julia result I get is:

NLLSsolver optimization took 7.876930 seconds and 36 iterations to reduce the cost from 973669.449272 to 14382.738395 (a 98.52% reduction), using:

38 cost computations in 0.065379 seconds (0.83% of total time),

36 gradient computations in 1.550764 seconds (19.69% of total time),

38 linear solver computations in 5.094126 seconds (64.67% of total time),

0.041203 seconds for initialization (0.52% of total time), and

1.125458 seconds for other stuff (14.29% of total time).

Reason(s) for termination:

Relative decrease in cost below threshold.

38 cost computations in 0.065379 seconds (0.83% of total time),

36 gradient computations in 1.550764 seconds (19.69% of total time),

38 linear solver computations in 5.094126 seconds (64.67% of total time),

0.041203 seconds for initialization (0.52% of total time), and

1.125458 seconds for other stuff (14.29% of total time).

Reason(s) for termination:

Relative decrease in cost below threshold.

The ceres-solver result I get is:

Cost:

Initial 9.736694e+05

Final 1.438271e+04

Change 9.592867e+05

Minimizer iterations 39

Successful steps 32

Unsuccessful steps 7

Time (in seconds):

Preprocessor 0.114813

Residual only evaluation 1.178426 (31)

Jacobian & residual evaluation 5.431002 (32)

Linear solver 5.845264 (38)

Minimizer 23.274499

Postprocessor 0.003208

Total 23.392520

Initial 9.736694e+05

Final 1.438271e+04

Change 9.592867e+05

Minimizer iterations 39

Successful steps 32

Unsuccessful steps 7

Time (in seconds):

Preprocessor 0.114813

Residual only evaluation 1.178426 (31)

Jacobian & residual evaluation 5.431002 (32)

Linear solver 5.845264 (38)

Minimizer 23.274499

Postprocessor 0.003208

Total 23.392520

These were both run on the same machine, with an Apple M1 Pro CPU. I used the dubrovnik-problem-16–22106 BAL dataset.

You can reproduce the Julia results by installing and running Julia version 1.9.2, then installing the NLLSsolver package within Julia (by typing ]add NLLSsolver at the Julia prompt), and simply running the code in the examples/BundleAdjustment.jl file. This will automatically download the data for you.

For ceres, I used this clone of ceres-solver, forked today, with some minor changes to the bundle_adjuster application parameters to ensure the like-for-like optimization. Unzip the dubrovnik-problem-16–22106-pre.txt.bz2 file that NLLSsolver downloaded for you, then build and run the bundle_adjuster application, providing as input the path to the data file.

The results indicate the following:

In terms of the objective, starting values were identical, and final values were within 0.0002% of each other, which is encouraging.

Overall, my method is

**3***x***faster**.The linear solver times, which in my method accounts for 65% of the total time, are comparable; mine is 1.16

*x*faster, using a pure Julia re-implementation of SuiteSparse, instead of SuiteSparse itself.My cost function computation is

**22***x*fasterMy Jacobian computation is

**4**.*x*fasterAll the other stuff is

**10**in my implementation.*x*faster

Can anyone confirm that these figures are correct (approximately)? If you need any help doing so, please let me know.

Many thanks.

Oliver

Aug 23, 2023, 8:27:53 PM8/23/23

to ceres-...@googlegroups.com

Hi Oliver,

I have an M1Pro. Here is the set of commands I ran. For me the solver runs in 3 seconds end to end. See below.

30834 [2023-08-23 17:21:54 -0700] cd /tmp

30835 [2023-08-23 17:21:58 -0700] git clone https://github.com/ojwoodford/ceres-solver/

30836 [2023-08-23 17:22:04 -0700] cd ceres-solver/

30837 [2023-08-23 17:22:06 -0700] git checkout dev

30838 [2023-08-23 17:22:10 -0700] mkdir cmake-build

30839 [2023-08-23 17:22:12 -0700] cd cmake-build

30840 [2023-08-23 17:22:28 -0700] cmake ../ -G Ninja

30841 [2023-08-23 17:22:36 -0700] ninja bundle_adjuster

30842 [2023-08-23 17:23:43 -0700] ./bin/bundle_adjuster --input=${HOME}/Downloads/problem-16-22106-pre.txt

iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time

0 9.736694e+05 0.00e+00 1.88e+06 0.00e+00 0.00e+00 1.00e+04 0 2.56e-02 4.06e-02

1 1.074873e+05 8.66e+05 1.18e+06 0.00e+00 1.59e+00 3.00e+04 1 1.27e-01 1.68e-01

2 5.421333e+04 5.33e+04 4.35e+05 1.16e+03 7.13e-01 3.25e+04 1 8.39e-02 2.52e-01

<snip>

38 1.438271e+04 2.30e-02 3.78e+00 1.46e-01 2.00e+00 2.18e+15 1 8.20e-02 3.04e+00

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)

Original Reduced

Parameter blocks 22122 22122

Parameters 66478 66478

Effective parameters 66462 66462

Residual blocks 83718 83718

Residuals 167436 167436

Minimizer TRUST_REGION

Trust region strategy LEVENBERG_MARQUARDT

Sparse linear algebra library SUITE_SPARSE

Given Used

Linear solver SPARSE_NORMAL_CHOLESKY SPARSE_NORMAL_CHOLESKY

Threads 1 1

Linear solver ordering 22106,16 22106,16

Cost:

Initial 9.736694e+05

Final 1.438271e+04

Change 9.592867e+05

Minimizer iterations 39

Successful steps 32

Unsuccessful steps 7

Time (in seconds):

Preprocessor 0.014988

Residual only evaluation 0.084480 (31)

Jacobian & residual evaluation 0.647794 (32)

Linear solver 2.145158 (38)

Minimizer 3.030012

Postprocessor 0.000629

Total 3.045629

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 38.)

Sameer

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/aff3d50f-762b-4be0-86a6-18b08f2d8909n%40googlegroups.com.

Aug 24, 2023, 8:21:23 AM8/24/23

to Ceres Solver

Many thanks, Sameer. That's very helpful, and more like the numbers I was expecting. I'll try out your build commands (I used xcode before), and see if I can replicate that performance, once I'm back at my machine.

Cheers,

Oliver

Aug 30, 2023, 11:07:14 AM8/30/23

to Ceres Solver

TL/DR: No, it isn't.

However, I note that my cost evaluation is actually about 1.58x faster. I suspect some of this is down to the fact that I'm storing rotations as matrices, not quaternions, which speeds up rotations of points.

Oliver

Aug 30, 2023, 11:12:44 AM8/30/23

to ceres-...@googlegroups.com

Hi Oliver,

Yes if you are not doing trigonometric functions evaluation that can certainly affect the jacobian evaluation time.

It is also possible that you have better function inlining and autodiff performance in general in julia.

We are depending on a combination of eigen code and the c++ compiler doing the right thing with common subexpression elimination etc.

We know from experience that when performance is critical that hand crafted analytic derivatives are generally speaking faster than autodiff.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/87a785f7-2133-42ca-b94b-8416ceaa70d7n%40googlegroups.com.

Aug 30, 2023, 3:14:26 PM8/30/23

to Ceres Solver

Hi Sameer

Note that I was just referring to cost (i.e. residuals) evaluation, not cost and Jacobian. My Jacobian calculation time is slower, though this time includes more work, as it actually updates the normal equations too. I personally doubt Julia has better autodiff performance as the code is JIT compiled - my program is compiled in considerably less time than ceres, so there's less opportunity for such compiler optimizations. And Julia uses LLVM, just like clang, for compilation. I'm also using forward mode autodiff with dual numbers, same as ceres.

Also note that multiplying a point by a quaternion doesn't involve any trigonometric functions. It does involve more multiplications and additions than with a matrix though. When normalizing the quaternions (which the code currently does), there's also a 1/sqrt. Even just assuming unit quaternions (which they are when using the manifold) gets ceres to within 1.28x slower than my code for the cost evaluation. This gain applies to the Jacobian computation, too, of course.

Definitely nothing will beat (well) hand written residual and Jacobian computation. However, the goal of my library (like ceres) is to provide flexibility and ease of use, with decent performance. Julia's multiple dispatch functionality should allow it to be more flexible even than ceres, without impacting performance too much. But it looks like it will always be a bit slower.

Best,

Oliver

Aug 30, 2023, 3:38:01 PM8/30/23

to ceres-...@googlegroups.com

oh I thought when you said cost, you meant cost and jacobian.

cost evaluations are usually a fairly small part of the overall solve time that I do not worry about them too much. they do become a thing if there are bounds constraints and the line search to enforce them can get expensive.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/3e03e06f-6de1-499c-a23d-00c42d28fa9dn%40googlegroups.com.

Aug 30, 2023, 3:50:30 PM8/30/23

to Ceres Solver

Fair enough. But any speed-ups to cost evaluation tend to also apply to the Jacobian evaluation, since you're just multiplying the work by the length of the autodiff dual numbers. Ignore my Jacobian times - they encapsulate more work than ceres' timings. Basically you can speed up bundle adjustment Jacobian computation a fair bit with some minor tweaks, even without hand-written Jacobians.

Reply all

Reply to author

Forward

0 new messages