Slow optimizing big problems

1,732 views
Skip to first unread message

cDc

unread,
Apr 6, 2017, 11:01:12 AM4/6/17
to Ceres Solver
Hi,

I am using CERES for solving SfM bundle adjustment problems. I am getting good results, thank you for this great library. I am having problems though with solving big problems. As an example, when trying to solve a scene containing 1600 cameras, all sharing the same intrinsics, with 1.4M points in 3D (each seen by about 8 cameras), it takes between 1.5 to 2.5 hours (depending on how accurate is the initial state) on a i7-7700K at 5GH. I use the following options:

    options.num_threads = 8;
    options.max_num_iterations = 100;
    options.callbacks.push_back(&terminator);
    options.eta = 1e-1;
    options.use_nonmonotonic_steps = true;
    options.max_consecutive_nonmonotonic_steps = 5;
    options.trust_region_strategy_type = LEVENBERG_MARQUARDT;
    options.dogleg_type = TRADITIONAL_DOGLEG;
    options.use_inner_iterations = false;
    options.linear_solver_type = ITERATIVE_SCHUR;
    options.preconditioner_type = SCHUR_JACOBI;
    options.visibility_clustering_type = SINGLE_LINKAGE;
    options.use_explicit_schur_complement = false;
    options.num_linear_solver_threads = 8;

Is this normal or I am doing something wrong?
And on the same topic, the speed in debug mode is extremely slow (~5min for only 2 cameras with 1k points). Is there a way to speed up CERES debug mode?

Thank you,
Dan

Sameer Agarwal

unread,
Apr 6, 2017, 11:32:48 AM4/6/17
to Ceres Solver

Dan,
Are you building in release mode?
Can you share the execution log and the output of summary:fullreport?
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/834863ea-5937-4e92-a0ce-f6a50240e2c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cDc

unread,
Apr 6, 2017, 5:34:02 PM4/6/17
to Ceres Solver
Sameer,

In release, of course, since the debug version is unusable. I am on Windows 10 x64, with Eigen 3.2.10 and SuiteSparse 4.4.4, though on Ubuntu I get similar timings.
I noticed that even though 8 threads are set in the options, 95% of the time only one thread is used.
Please find attached the log.
Does CERES support dumping and loading its state (problem, options, etc) to a file for debugging? If so, I can send you the example in order to run it locally.

Regards,
Dan
log.txt

Sameer Agarwal

unread,
Apr 6, 2017, 8:21:40 PM4/6/17
to Ceres Solver
Dan,
All your time is being spent in the linear solver. And that part of the solver is not mult-threaded so number of threads does not matter. Only the preconditioner construction is threaded. We have tried threading the matrix-vector multiply before and it does result in much improvements.

The bigger problem here seems to be that your problem is likely poorly conditioned. So you should look into how you can re-scale the problem to be better conditioned. You may also want to try the ClusterJacobi preconditioner to see if it helps.

Sameer



cDc

unread,
Apr 6, 2017, 11:19:58 PM4/6/17
to Ceres Solver
Sameer,

Any suggestions to improve the problem conditioning without preconditioner?
Can the condition number be displayed, so that I can measure any change I do in this regard?

Thanks!

Sameer Agarwal

unread,
Apr 7, 2017, 12:23:43 AM4/7/17
to ceres-...@googlegroups.com
Dan,
There is no one general solution. Scaling the variables to be of the same general magnitude is one step in this direction.

The other is that I recommend you look at the more expensive preconditioners like cluster-jacobi. Since you are spending so much time solving, the setup cost of these more expensive preconditioners maybe worth it.

Sameer



cDc

unread,
Apr 7, 2017, 5:25:06 AM4/7/17
to Ceres Solver
Sameer,

That is strange, as I normalize all parameters like the scene with the mean to 0 and scale to 1, and even normalize the camera intrinsics with regard to image resolution.
Just tried cluster_jacobi: improves a bit the speed (both in time and convergence), but not significantly. Something is fundamentally wrong with my pipeline, since the same scene is globally adjusted in other SfM pipelines in few minutes (though not using CERES) versus few hours in mine.

Dan

Sameer Agarwal

unread,
Apr 7, 2017, 9:29:06 AM4/7/17
to ceres-...@googlegroups.com
Dan,
A couple of things.

1. It maybe worth looking at -v=3 logs to see if the time is indeed going in the CG iterations and not somewhere else.

2. When you say you solve it in other pipelines much faster, are you saying you are able to perform bundle adjustment on the same set of cameras and points much faster using another optimization library? or is it a different reconstruction method?


Sameer



cDc

unread,
Apr 7, 2017, 11:13:38 AM4/7/17
to Ceres Solver
1. I do not find a way to set the verbosity level for minilog (I'm not using glog)
2. different reconstruction method, comparing though the final bundle adjustment of all the cameras with similar number of 3D points

Pierre Moulon

unread,
Apr 7, 2017, 11:50:50 AM4/7/17
to ceres-...@googlegroups.com
If you are using miniglob you can control the verbosity level by defining the log level in miniglog/glog/logging.h

You need to find the one that match v=3


// define a default LOG severity #define MAX_LOG_LEVEL (ERROR)


Regards/Cordialement,
Pierre M

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/196d5bf6-4af0-4cea-9c90-ace279e68dc5%40googlegroups.com.

Sameer Agarwal

unread,
Apr 7, 2017, 2:29:59 PM4/7/17
to ceres-...@googlegroups.com
when you are using a different reconstruction method, how is the final bundle adjustment being performed? what bundle adjustment algorithm is being used?

Sameer


cDc

unread,
Apr 21, 2017, 9:16:40 AM4/21/17
to Ceres Solver
In the bal_problem example the scene is normalized to scale 100:
   "Scale so that the median absolute deviation of the resulting reconstruction is 100"
Please explain why 100, what is the logic behind this value? What should be the scale difference between scene and residuals for an optimal conditioning of the problem?

Thank you!

Sameer Agarwal

unread,
Apr 21, 2017, 11:37:24 PM4/21/17
to Ceres Solver
Its a heuristic. 
Anything where the size of the reconstruction does not have is too large or too small will do.
Sameer

cDc

unread,
Nov 18, 2017, 7:09:19 AM11/18/17
to Ceres Solver
Hi Sameer,

I recently discovered the issue that causes CERES to run so slow. For big BA problems with more than 1000 images I configured CERES to use:
sparse_linear_algebra_library_type = SUITE_SPARSE;
linear_solver_type
= ITERATIVE_SCHUR;
preconditioner_type
= CLUSTER_JACOBI;
visibility_clustering_type
= SINGLE_LINKAGE;
with all other params to default. Following your documentation, variations on this seems to be appropriate for SfM.

However, anything other than:
sparse_linear_algebra_library_type = SUITE_SPARSE;
linear_solver_type
= SPARSE_SCHUR;
preconditioner_type
= JACOBI;
visibility_clustering_type
= CANONICAL_VIEWS;
performs 20x slower.

For example:
sparse_linear_algebra_library_type = SUITE_SPARSE;
linear_solver_type
= ITERATIVE_SCHUR;
preconditioner_type
= SCHUR_JACOBI;
visibility_clustering_type
= CANONICAL_VIEWS;
solves the BA in 2165s while the previous takes only 109s. See the attached CERES logs (I used EIGEN_SPARSE for these 2 runs as the Preprocessor time is much lower compared to SUITE_SPARSE, while the rest is the same).

What am I doing wrong here? Why do I get such bad performance with "optimal" configuration for SfM?

Thanks!


On Saturday, April 22, 2017 at 6:37:24 AM UTC+3, Sameer Agarwal wrote:
Its a heuristic. 
Anything where the size of the reconstruction does not have is too large or too small will do.
Sameer
slow.log
fast.log

Sameer Agarwal

unread,
Nov 18, 2017, 5:35:45 PM11/18/17
to ceres-...@googlegroups.com
Dan,

The difference here is that SPARSE_SCHUR  is a factorization based linear solver, where as ITERATIVE_SCHUR solves it using Conjugate Gradients.

The cost (time and memory) of solving a linear system using SPARSE_SCHUR depends on the number of cameras and their relative connectivity, and it is a fixed cost per iteration of the solver.

The cost of solving a linear system using ITERATIVE_SCHUR depends on the conditioning of the problem even though the cost of one iteration (modulo the construction of the preconditioner) is much smaller and fixed.

There is no optimal linear solver for bundle adjustment. 

The choice of the particular linear solver to use depends on the problem size and conditioning. In your case, you are solving for about a 1000 cameras, which is well within reach of a sparse cholesky factorization algorithm like SuiteSparse/CHOLMOD.  This cost depending on the connectivity of your cameras, can rise up sharply as the number of cameras go up.

ITERATIVE_SCHUR it appears is spending far too much line iterating, which indicates that the problem is poorly conditioned and the preconditioner is not helping much (BTW visibility_clustering_type only affects CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL preconditioners).

Sameer

cDc

unread,
Nov 19, 2017, 3:11:18 AM11/19/17
to Ceres Solver
Thank you for the answer, Sameer, but I'm not sure that the picture is much clearer in my mind. I knew that ITERATIVE_SCHUR solves it using Conjugate Gradients, and that is the reason I was hoping to perform faster then the precise SPARSE_SCHUR solver. About conditioning we discussed before, and while there is not much we can do to improve on top of what I am already doing, it is strange that all the problems are poorly conditioned, as the slow CERES solving happens every single time for every single problem, no matter the dataset, the scene, the parameters optimized, etc. I saw in your documentation that visibility_clustering_type only affects CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL preconditioners, and that is why I tried also with CLUSTER_JACOBI as you see in my example (actually that was my default configuration). It doesn't help, it is as slow as it gets.

What to understand from this, that there is nothing better than SPARSE_SCHUR? I doubt that, so I must do something else wrong. Any idea what can be? Just to remind you, about the problem conditioning, I normalize the scene as in your BA sample app. Apart of that, my observations (known pixel projections), are also normalized by the image width. There is only one intrinsics shared by all 1000 cameras. Apart of this I can not think of anything out of ordinary.

Sameer Agarwal

unread,
Nov 21, 2017, 3:04:43 PM11/21/17
to ceres-...@googlegroups.com
On Sun, Nov 19, 2017 at 12:11 AM cDc <cdc.s...@gmail.com> wrote:
Thank you for the answer, Sameer, but I'm not sure that the picture is much clearer in my mind. I knew that ITERATIVE_SCHUR solves it using Conjugate Gradients, and that is the reason I was hoping to perform faster then the precise SPARSE_SCHUR solver.

The speed of conjugate gradients is a function of the conditioning of the matrix. There are no guarantees about it being faster than sparse_schur. For up to a couple of thousand cameras, SPARSE_SCHUR is the way to go.
 
About conditioning we discussed before, and while there is not much we can do to improve on top of what I am already doing, it is strange that all the problems are poorly conditioned, as the slow CERES solving happens every single time for every single problem, no matter the dataset, the scene, the parameters optimized, etc. I saw in your documentation that visibility_clustering_type only affects CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL preconditioners, and that is why I tried also with CLUSTER_JACOBI as you see in my example (actually that was my default configuration). It doesn't help, it is as slow as it gets.

What to understand from this, that there is nothing better than SPARSE_SCHUR?

If you problem is small to medium sized, DENSE_SCHUR/SPARSE_SCHUR are generally speaking unbeatable. 
 
I doubt that, so I must do something else wrong. Any idea what can be? Just to remind you, about the problem conditioning, I normalize the scene as in your BA sample app. Apart of that, my observations (known pixel projections), are also normalized by the image width. There is only one intrinsics shared by all 1000 cameras. Apart of this I can not think of anything out of ordinary.

Shared intrinsics can make the problem quite poorly conditioned. Were the problems you were solving earlier also with shared intrinsics? 

Sameer


 

cDc

unread,
Nov 21, 2017, 5:10:34 PM11/21/17
to Ceres Solver
Yes, the earlier problem, like all the other modern SfM BA problems use shared intrinsics. I imagine this is an issue for the solver, but sharing the intrinsics improves a lot the quality of the reconstructed scene. Indeed, CERES doesn't seem to be optimized for this case, as the specialized schur does not support variable f-block size, which very often is caused by only one set of 9 parameters for the intrinsics, while the other 2000+ block are poses of mostly 6 params. How much helps using a specialized schur?

Sameer Agarwal

unread,
Nov 27, 2017, 12:58:11 AM11/27/17
to ceres-...@googlegroups.com
On Tue, Nov 21, 2017 at 2:10 PM cDc <cdc.s...@gmail.com> wrote:
Yes, the earlier problem, like all the other modern SfM BA problems use shared intrinsics. I imagine this is an issue for the solver, but sharing the intrinsics improves a lot the quality of the reconstructed scene. Indeed, CERES doesn't seem to be optimized for this case, as the specialized schur does not support variable f-block size, which very often is caused by only one set of 9 parameters for the intrinsics, while the other 2000+ block are poses of mostly 6 params. How much helps using a specialized schur?

The variable f-block only affects the runtime performance of some low level linear algebra routines. it has no impact on the convergence behavior. The preconditioners we use are definitely not optimized for the shared intrinsics case. This maybe something which is worth looking into.

Sameer


 
Reply all
Reply to author
Forward
0 new messages