I have two questions regarding solving of large problems.
I'm solving a few hundred ceres problems. Most of them are relatively small (hundreds to a few hundred thousands of residuals and parameters), but a handful of them are quite big ones
I'm running on a 24 core / 48 hyperthreads computer with two NUMA domains and 160Gb of ram
The small problems I solve in parallel with 48 threads and each problem getting 1 thread. When solving these smaller problems in parallel, I get 100% CPU usage on the system.
When getting to the larger problems, my application always gets killed due to running out of memory.
The first question regards multithreading:
to save on memory, as the problems get larger, I switch to solving one problem at a time but setting solver_options.num_threads = 48. However, I only get about 10% of CPU usage at this stage. I'm using linear_solver_type = SPARSE_NORMAL_CHOLESKY and tried both with sparse_linear_algebra_library_type = EIGEN_SPARSE (Eigen 3.3.9 compiled with default defines) and sparse_linear_algebra_library_type = SUITE_SPARSE (SuiteSparse 4.5.6) but with both of them I see very low CPU usage when solving one ceres problem at the time with solver_options.num_threads = 48.
So the first question is: Is there something I can do to get better CPU utilization when solving one big, sparse ceres::Problem at a time but with solver_options.num_threads = 48?
Second question: when getting to the largest problems, I run out of memory also when solving them 1 by 1, already during building the problem.
My currently largest problem has about a billion residuals (in a billion residual blocks) and about 50 million parameters (in about 50 million parameter blocks)
it is extremely sparse though. Each residual depends only on 80 parameters (in 11 parameter blocks), and there are a few hundred residuals depending on each parameter block. The cost functors and the observation data (which I already compressed as good as possible in memory and share between cost functors) take about 50gb, so the rest seems to be overhead from building the problem.
What options do I have to reduce the memory footprint of building and solving such a large, sparse problem?
Thanks!