CUDA support for curve fitting

46 views
Skip to first unread message

Ursela Barteczko

unread,
Dec 11, 2022, 4:39:25 PM12/11/22
to Ceres Solver
Hello Sameer,

I read that the cuda support for DENSE_QR is there in the new ceres version! That´s great news. I am very new to cuda (ok, this is my first time working with it) and I only found the hint to enable USE_CUDA, so I added this to my CMakeLists: set(USE_CUDA 1) added CUDA as a language to the project and renamed my file with a .cu extension.
This compiles and executes fine, but takes exactly the same amount of time as previously. Do I need to make more adjustments to the ceres code? My problem pretty much looks like the curve fitting from the tutorial, just bigger!
Thank you for any ideas or hints to further resources

Cheers,
Ursela

Sameer Agarwal

unread,
Dec 11, 2022, 4:46:04 PM12/11/22
to ceres-...@googlegroups.com
Ursela, 
Can you share the output of summary::full report? How big are your problems?
Also did you set solver::Options::dense_linear_algebra_library to CUDA?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/49b06a37-e8de-4331-b9da-579fa4852162n%40googlegroups.com.

Ursela Barteczko

unread,
Dec 12, 2022, 9:34:00 AM12/12/22
to Ceres Solver
Hi Sameer,
thank you for the hint. I added using ceres::CUDA
and have the following solver options:

Solver::Summary summary;
Solver::Options options;
options.dense_linear_algebra_library_type = CUDA;
options.linear_solver_type = ceres::DENSE_QR
options.minimizer_progress_to_stdout = true;

Solve(options, &problem, &summary);

I went through your test_files to find some additional hints, and I also tried searching github.
Is there some easy minimal example of ceres with CUDA that I missed?

This is the output of the summary:

Residual blocks                          8000                     8000
Residuals                                8000                     8000

Minimizer                        TRUST_REGION

Dense linear algebra library             CUDA
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC                        2

Cost:
Initial                          4.769645e+07
Final                            1.834855e+02
Change                           4.769627e+07

Minimizer iterations                        6
Successful steps                            6
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.669198

  Residual only evaluation           0.014601 (6)
  Jacobian & residual evaluation     4.412829 (6)
  Linear solver                      0.017282 (6)
Minimizer                            4.450352

Postprocessor                        0.000144
Total                                5.119694

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.087332e-09 <= 1.000000e-06)

This one was pretty fast, but my other models are more complicated and running them takes up to two days,.
I guess the problem is that the no of threads is 1? What to do about this?
Thank you for your help,
Ursela

Sameer Agarwal

unread,
Dec 12, 2022, 9:36:33 AM12/12/22
to ceres-...@googlegroups.com
The top of the summary is cut off. How many parameters/parameter blocks do you have? 
Can you share the summary full report it a larger/longer running instance?

Sameer 

Ursela Barteczko

unread,
Dec 12, 2022, 9:43:58 AM12/12/22
to Ceres Solver
missing top:

Solver Summary (v 2.1.0-eigen-(3.4.0)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp-cuda-(11070))

                                     Original                  Reduced
Parameter blocks                            2                        2
Parameters                                 31                       31

More complicated instance:
Number of parameterblocks is max 4 across the models I am testing.

Ceres Solver Report: Iterations: 251, Initial cost: 8.842008e+05, Final cost: 1.240999e+02, Termination: NO_CONVERGENCE

Solver Summary (v 2.1.0-eigen-(3.4.0)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp-cuda-(11070))

                                     Original                  Reduced
Parameter blocks                            3                        3
Parameters                                 61                       61

Residual blocks                          8000                     8000
Residuals                                8000                     8000

Minimizer                        TRUST_REGION

Dense linear algebra library             CUDA
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC                        3

Cost:
Initial                          8.842008e+05
Final                            1.240999e+02
Change                           8.840767e+05

Minimizer iterations                      251
Successful steps                          177
Unsuccessful steps                         74

Time (in seconds):
Preprocessor                         0.002877

  Residual only evaluation           5.622990 (250)
  Jacobian & residual evaluation   523.481755 (177)
  Linear solver                      1.022119 (250)
Minimizer                          530.371524

Postprocessor                        0.000249
Total                              530.374650

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 250.)


Sameer Agarwal

unread,
Dec 12, 2022, 9:59:16 AM12/12/22
to ceres-...@googlegroups.com
Ursela,

All the time here seems to be going in the Jacobian evaluation. The Jacobian itself is quite small and easily solved even on the cpu. The simplest thing you can do is increase the number of threads, which will parallelize the jacobian and residual evaluation.

Are you using numeric differentiation?

Sameer 

Sameer Agarwal

unread,
Dec 12, 2022, 10:01:02 AM12/12/22
to ceres-...@googlegroups.com
Also unless your problem is rather poorly scaled, you can use dense_normal_cholesky as your linear solver, it will be very fast.
Sameer 

Ursela Barteczko

unread,
Dec 12, 2022, 10:07:53 AM12/12/22
to Ceres Solver
I am using AutoDiffCostFunction.
I never worked with threads, does ceres have some built-in implementations for that or is it a general set-up thing?
Thank you so much for your patience btw.
I guess it is 'poorly scaled' as we are also trying to find some interactions between the features, so I would prefer to have the CUDA support.
But I will definitely also try with the dense_normal_cholesky solver then, thank you!

Sameer Agarwal

unread,
Dec 12, 2022, 10:25:30 AM12/12/22
to ceres-...@googlegroups.com
Are you compiling your code with optimization enabled? Autodiffcostfunction relies quite heavily on the compiler to do a bunch of optimizations, without which it can be quite slow.

As for threads solver::options::num_threads is all you need to set to whatever number of cores you have on your machine.

Sameer 

Ursela Barteczko

unread,
Dec 12, 2022, 10:53:52 AM12/12/22
to Ceres Solver
Ok, great! Increasing the number of threads is already helping me a great deal. I am not totally sure about the optimization, maybe I will switch to numeric differentiation in that case.
This already saved me a lot of time definitely!

Thank you and have a great day,
Ursela

Sameer Agarwal

unread,
Dec 12, 2022, 11:07:09 AM12/12/22
to ceres-...@googlegroups.com
Don't switch to numeric differentiation. Make sure that you are building with optimizations enabled.
Sameer 

Reply all
Reply to author
Forward
0 new messages