Inquiry on Thread Configuration in Ceres Solver

36 views
Skip to first unread message

bobo si

unread,
Aug 13, 2025, 2:15:52 AMAug 13
to Ceres Solver
Dear all,

I have been struggling with an issue for several days. My current platform is a multi-core aarch64 system using Ceres Solver. When enabling Ceres multithreading, the CPU utilization distribution across cores is uneven (70%, 10%, 10%, 10%). I aim to achieve more balanced workload allocation.

After reviewing documentation and forum discussions(see the References below), I identified the current configuration as Ceres multithreading with BLAS single-threading. To improve CPU balance, I attempted to switch to Ceres single-threading with BLAS multithreading.

I compiled a multithreaded OpenBLAS version and verified its behavior via cblas_dgemm, which confirmed the expected thread count alignment with compilation options. However, when integrating this with Ceres, BLAS multithreading fails to activate—the total threads remain equal to Ceres' thread count, even with Ceres set to single-threading.

‌Question:‌ Is the "Ceres single-threading & BLAS multithreading" configuration feasible? If so, what are the correct implementation steps?

‌Implementation:
I tested this configuration in example/bundle_adjuster with the following approach:

‌Compilation Options for OpenBLAS:‌
Added the following flags to OpenBLAS' compilation:
-DUSE_THREAD=1 -DUSE_OPENMP=1 -DNUM_THREADS=8

‌Compilation Options for Ceres Solver:‌
Enabled the following flags in Ceres' compilation:
-DCERES_NO_OPENMP=ON -DBLAS_VENDOR=OpenBLAS

‌Code Modification in bundle_adjuster:‌
Updated the thread configuration as follows:
DEFINE_int32(num_threads, 1, "...");

Just to add : I noticed the following code snippet in bundle_adjuster: DEFINE_string(sparse_linear_algebra_library, "suite_sparse", "...");
Could the inability to enable OpenBLAS multithreading in Ceres be related to the SuiteSparse configuration?

Looking forward to your response. Thanks.

Dmitriy Korchemkin

unread,
Aug 13, 2025, 4:01:04 AMAug 13
to Ceres Solver
Hi,

The amount of parallelism depends on problem size, linear solver being used, etc.

Can you share the full report (Solver::Summary::FullReport()) and, if possible, your optimization problem?

bobo si

unread,
Aug 17, 2025, 10:33:47 PMAug 17
to Ceres Solver
Hi, 
I pasted some reports below:

1.
Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.10.1)-eigensparse)

                                     Original                  Reduced
Parameter blocks                           55                        2
Parameters                                242                       16
Effective parameters                      189                       15
Residual blocks                            56                       52
Residuals                                 144                      117

Minimizer                        TRUST_REGION
Trust region strategy                  DOGLEG (TRADITIONAL)
Sparse linear algebra library    SUITE_SPARSE

                                        Given                     Used
Linear solver          SPARSE_NORMAL_CHOLESKY   SPARSE_NORMAL_CHOLESKY
Threads                                     4                        4
Linear solver ordering              AUTOMATIC                        2

Cost:
Initial                          9.913740e+02
Final                            5.930128e+01
Change                           9.320727e+02

Minimizer iterations                        4
Successful steps                            4
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000650

  Residual only evaluation           0.000223 (3)
  Jacobian & residual evaluation     0.003340 (4)
  Linear solver                      0.000378 (3)
Minimizer                            0.004483

Postprocessor                        0.000022
Total                                0.005156

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 3.)

2.
Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.10.1)-eigensparse)

                                     Original                  Reduced
Parameter blocks                           79                        2
Parameters                                338                       16
Effective parameters                      261                       15
Residual blocks                            86                       82
Residuals                                 204                      177

Minimizer                        TRUST_REGION
Trust region strategy                  DOGLEG (TRADITIONAL)
Sparse linear algebra library    SUITE_SPARSE

                                        Given                     Used
Linear solver          SPARSE_NORMAL_CHOLESKY   SPARSE_NORMAL_CHOLESKY
Threads                                     4                        4
Linear solver ordering              AUTOMATIC                        2

Cost:
Initial                          1.603648e+02
Final                            1.375430e+02
Change                           2.282171e+01

Minimizer iterations                        4
Successful steps                            4
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000639

  Residual only evaluation           0.000153 (3)
  Jacobian & residual evaluation     0.002622 (4)
  Linear solver                      0.000358 (3)
Minimizer                            0.003633

Postprocessor                        0.000028
Total                                0.004300

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 3.)

3.
Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.10.1)-eigensparse)

                                     Original                  Reduced
Parameter blocks                           72                        2
Parameters                                310                       16
Effective parameters                      240                       15
Residual blocks                            81                       77
Residuals                                 194                      167

Minimizer                        TRUST_REGION
Trust region strategy                  DOGLEG (TRADITIONAL)
Sparse linear algebra library    SUITE_SPARSE

                                        Given                     Used
Linear solver          SPARSE_NORMAL_CHOLESKY   SPARSE_NORMAL_CHOLESKY
Threads                                     4                        4
Linear solver ordering              AUTOMATIC                        2

Cost:
Initial                          2.534165e+02
Final                            1.968733e+02
Change                           5.654321e+01

Minimizer iterations                        4
Successful steps                            4
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000611

  Residual only evaluation           0.000230 (3)
  Jacobian & residual evaluation     0.002686 (4)
  Linear solver                      0.000346 (3)
Minimizer                            0.003809

Postprocessor                        0.000024
Total                                0.004445

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 3.)

4.
Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.10.1)-eigensparse)

                                     Original                  Reduced
Parameter blocks                          826                      560
Parameters                               4534                     2406
Effective parameters                     3861                     1866
Residual blocks                          2265                     1921
Residuals                                7454                     4202

Minimizer                        TRUST_REGION
Trust region strategy                  DOGLEG (TRADITIONAL)
Sparse linear algebra library    SUITE_SPARSE

                                        Given                     Used
Linear solver          SPARSE_NORMAL_CHOLESKY   SPARSE_NORMAL_CHOLESKY
Threads                                     4                        4
Linear solver ordering              AUTOMATIC                      560

Cost:
Initial                          3.478014e+03
Final                            1.702650e+03
Change                           1.775365e+03

Minimizer iterations                        6
Successful steps                            6
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.023454

  Residual only evaluation           0.007089 (5)
  Jacobian & residual evaluation     0.024464 (6)
  Linear solver                      0.106971 (5)
Minimizer                            0.161416

Postprocessor                        0.001996
Total                                0.186866

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 5.)

Sameer Agarwal

unread,
Aug 17, 2025, 11:50:38 PMAug 17
to ceres-...@googlegroups.com
Thanks for sharing the output.

Fundamentally these are small problems which won't benefit much from threading.

Further, you are using sparse normal cholesky which is not threaded. So the only threading benefit you can hope to get is in evaluating the Jacobian and residual. 

And if the underlying blas operations that CHOLMOD is performing are large enough to benefit from a threaded blas (I doubt that).

Even for your largest problem, you may want yo see it using dense normal cholesky is better/a bit faster for you.

Sameer 


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ceres-solver/984c2726-1221-4026-b81b-aaa29f112535n%40googlegroups.com.
Message has been deleted

bobo si

unread,
Aug 22, 2025, 6:24:12 PMAug 22
to Ceres Solver
Thank you for your advices. I tried with multi-threading suitesparse . It worked, but 10% slowdown. 😅
Reply all
Reply to author
Forward
0 new messages