How can I best exploit the sparsity nature of my problem?

Jean-Pierre Gervasoni

unread,

Jan 25, 2016, 11:03:26 AM1/25/16

to Ceres Solver

I am trying to use Ceres to solve the pressure / flow problem in a process simulation software. The problem to be solved is sparse. Here is an example of the obtained jacobian matrix for a very simple problem:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14

0	2,326	0	0	0	0	0	-1,211	0	0	0	0	0	0	0	0
1	0	2,973	-1,425	0	0	0	0	0	0	0	0	0	0	0	0
2	0	-1,475	2,876	0	-1,375	0	0	0	0	0	0	0	0	0	0
3	0	0	0	2,898	0	0	0	0	0	0	-1,487	-1,385	0	0	0
4	0	0	-1,426	0	2,777	0	0	0	0	0	0	0	0	0	-1,324
5	0	0	0	0	0	2,572	0	0	0	0	0	0	-1,216	0	-1,327
6	-1,15	0	0	0	0	0	2,448	0	0	-1,27	0	0	0	0	0
7	0	0	0	0	0	0	0	2,681	0	-1,271	0	-1,382	0	0	0
8	0	0	0	0	0	0	0	0	2,343	0	0	0	-1,218	0	0
9	0	0	0	0	0	0	-1,212	-1,327	0	2,566	0	0	0	0	0
10	0	0	0	-1,438	0	0	0	0	0	0	2,999	0	0	0	0
11	0	0	0	-1,436	0	0	0	-1,329	0	0	0	2,791	0	0	0
12	0	0	0	0	0	-1,274	0	0	-1,157	0	0	0	2,461	0	0
13	0	0	0	0	0	0	0	0	0	0	0	0	0	0,02309	0
14	0	0	0	0	-1,377	-1,272	0	0	0	0	0	0	0	0	2,677

Using boost cuthill_mckee_ordering routine, one can obtain the following matrix:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14

0	0,02309	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	2,973	-1,425	0	0	0	0	0	0	0	0	0	0	0	0
2	0	-1,475	2,876	-1,375	0	0	0	0	0	0	0	0	0	0	0
3	0	0	-1,426	2,777	-1,324	0	0	0	0	0	0	0	0	0	0
4	0	0	0	-1,377	2,677	-1,272	0	0	0	0	0	0	0	0	0
5	0	0	0	0	-1,327	2,572	-1,216	0	0	0	0	0	0	0	0
6	0	0	0	0	0	-1,274	2,461	-1,157	0	0	0	0	0	0	0
7	0	0	0	0	0	0	-1,218	2,343	0	0	0	0	0	0	0
8	0	0	0	0	0	0	0	0	2,326	-1,211	0	0	0	0	0
9	0	0	0	0	0	0	0	0	-1,15	2,448	-1,27	0	0	0	0
10	0	0	0	0	0	0	0	0	0	-1,212	2,566	-1,327	0	0	0
11	0	0	0	0	0	0	0	0	0	0	-1,271	2,681	-1,382	0	0
12	0	0	0	0	0	0	0	0	0	0	0	-1,329	2,791	-1,436	0
13	0	0	0	0	0	0	0	0	0	0	0	0	-1,385	2,898	-1,487
14	0	0	0	0	0	0	0	0	0	0	0	0	0	-1,438	2,999

I am currently trying to solve this using only one parameter block and one residual block:

auto pCostFunction = new CostFunctor(p_DynProblem, (int)m_NbVariables, (int)m_NbVariables);

m_Problem.AddParameterBlock(&m_Values[0], (int)m_NbVariables);

m_Problem.AddResidualBlock(pCostFunction, nullptr, &m_Values[0]);

My understanding of parameter and residual blocks leads me to believe that this could be best solved by dividing the above problem in 3, by defining 3 blocks of residuals:

	Parameter block
residual block 1	0
residual block 2	1-2-3-4-5-6-7
residual block 3	8-9-10-11-12-13-14

But each sub-problem is quite sparse. Isn't there a way to further exploit sparsity ?

Sameer Agarwal

unread,

Jan 25, 2016, 11:07:52 AM1/25/16

to Ceres Solver

Jean,

What is the size of the problem you are interested in solving? how many residuals and how many parameters ?

Since we are dealing with general sparsity here, how about starting with every parameter being its own parameter block and every residual being its own residual black.

From there I would look at what residual computations have the same dependence on parameters and can be combined into a single residual block, and similarly what parameters co-occur almost always and should be combined into a single parameter block.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/8513acc0-63b0-4da4-9063-faab3c36ba8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-Pierre Gervasoni

unread,

Jan 25, 2016, 11:21:08 AM1/25/16

to Ceres Solver

Sameer,

The number of parameters is equal to the number of residuals. The problem size can vary from very few to several tens of thousands.

I could declare one parameter per parameter block and one residual per residual block. Does that mean that a parameter can contribute to different residual blocks ?

Sameer Agarwal

unread,

Jan 25, 2016, 11:30:51 AM1/25/16

to Ceres Solver

Yes any parameter can be in any residual block.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/0d916331-dd7a-4c1f-a868-07d6f3a58887%40googlegroups.com.

Sameer Agarwal

unread,

Jan 25, 2016, 11:32:09 AM1/25/16

to Ceres Solver

Or rather any parameter block can contribute to any residual block.
Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CABqdRUDfXe5hg_-rqZJ0k%2BgU4UrmZysAMho_6RtTPOLFkTZL8A%40mail.gmail.com.

Jean-Pierre Gervasoni

unread,

Feb 2, 2016, 4:23:42 AM2/2/16

to Ceres Solver

Dear Sameer,

I have set up a sample side project in order to tune the ceres solver more easily. I chose 2D heat transfer because it exhibits some similarities with my problem (sparsity, problem size).

Compilation of Ceres under Windows

- Visual Studio 2015

- latest git sources

- CXSparse fork https://github.com/PetterS/CXSparse

- Intel Math Kernel Library

- miniglog

Compilation / linking issues (solved)

I modified PATH to allow MKL detection by cmake (FindBlas.cmake and FindLapack.cmake). I add to alter two source files:

/internal/ceres/covariance_impl.cc:

#pragma omp parallel for num_threads(num_threads) schedule(dynamic) collapse(2)

#pragma omp parallel for num_threads(num_threads) schedule(dynamic) // collapse(2) <--- unkown keyword

\include\ceres\jet.h

jet.h(489): error C4996: 'j0': The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name: _j0. See online help for details.

j0 was replaced by _j0

j1 was replaced by _j1

jn was replaced by _jn

I chose miniglog because I had linking issues with gflags / glog.

Problem setup

m_Options.minimizer_progress_to_stdout = false;

m_Options.logging_type = ceres::SILENT;

m_Options.minimizer_type = ceres::MinimizerType::LINE_SEARCH;

m_Options.linear_solver_type = ceres::LinearSolverType::SPARSE_NORMAL_CHOLESKY;

m_Options.use_inner_iterations = false;

m_Options.use_nonmonotonic_steps = false;

m_Options.line_search_direction_type = ceres::LineSearchDirectionType::LBFGS;

m_Options.line_search_type = ceres::LineSearchType::WOLFE;

m_Options.line_search_interpolation_type = ceres::LineSearchInterpolationType::BISECTION;

m_Options.use_approximate_eigenvalue_bfgs_scaling = true;

m_Options.parameter_tolerance = 1e-6;

m_Options.function_tolerance = 1e-6;

m_Options.gradient_tolerance = 1e-6;

m_Options.max_num_iterations = 2000;

m_Options.num_threads = 1;

m_Options.num_linear_solver_threads = 1;

Residual block definition (1 block per unknown)

for (int j = 0; j < m_H; ++j)

{

for (int i = 0; i < m_L; ++i)

{

m_Problem.AddResidualBlock(new CostFunctor(...), nullptr, parameters[0], parameters[1], parameters[2], parameters[3], parameters[4]);

}

There are m_H*m_L unknowns / residual blocks. A residual only depends on 5 parameters.

CostFunctor constructor

The constructor is used to set some constants and size residual and parameter blocks:

set_num_residuals(1);

mutable_parameter_block_sizes()->resize(5, 1);

CostFunctor evaluate

The real problem is nonlinear, this one was setup just to play with ceres configuration

if (jacobians != nullptr)

{

if (jacobians[0] != nullptr) { jacobians[0][0] = -sz; }

if (jacobians[1] != nullptr) { jacobians[1][ 0] =-sx; }

if (jacobians[2] != nullptr) { jacobians[2][0] = (1.0 + 2.0*sz + 2.0*sx); }

if (jacobians[3] != nullptr) { jacobians[3][0] = -sz; }

if (jacobians[4] != nullptr) { jacobians[4][0] = -sx; }

}

residuals[0] = -sz*parameters[0][0] - sx*parameters[1][0] + (1.0 + 2.0*sz + 2.0*sx)*parameters[2][0] - sz*parameters[3][0] - sx*parameters[4][0] - T - Q*f;

Questions / Remarks:

OpenMP related:

With only one thread, the CPU load is between 10 and 20%, and the simulation (for a 200x200 size) runs at ~5fps. I can increase the number of threads, but it only increase the CPU load without inreasing the fps. If i set num_threads to 4 by example, the CPU load is 50% but the fps remain const. I have read somewhere on this google group that the general nonlinear solver wasn't multithreaded - that must be the reason.

CostFunction related:

Is it efficient to declare one costfunction per residual ? I have read that I could use one global cost function, but how the cost function can now on which residual it is working ?

Performance:

I think with only 10% of CPU load there is still room for improvement - but honestly, I don't know how to speed up the computation. Do you have any tips ?

Thank you again, working with Ceres is fun (head scratching sometimes but fun) .

Best regards,

Jean-Pierre

Jean-Pierre Gervasoni

unread,

Feb 2, 2016, 7:52:10 AM2/2/16

to Ceres Solver

Here is an example of summary:

Solver Summary (v 1.12.0-eigen-(3.2.92)-lapack-cxsparse-(2.3.0)-openmp)

Original Reduced

Parameter blocks 40000 40000

Parameters 40000 40000

Residual blocks 40000 40000

Residual 40000 40000

Minimizer LINE_SEARCH

Line search direction LBFGS (20)

Line search type BISECTION WOLFE

Given Used

Threads 1 1

Cost:

Initial 9.833939e+03

Final 5.913305e-04

Change 9.833938e+03

Minimizer iterations 4

Line search steps 5

Time (in seconds):

Preprocessor 0.0120

Residual evaluation 0.0000

Line search cost evaluation 0.0000

Jacobian evaluation 0.0949

Line search gradient evaluation 0.0472

Line search polynomial minimization 0.0000

Minimizer 0.1848

Postprocessor 0.0026

Total 0.1995

Termination: CONVERGENCE (Parameter tolerance reached. Relative step_norm: 5.665696e-07 <= 1.000000e-06.)

Sameer Agarwal

unread,

Feb 2, 2016, 9:45:56 AM2/2/16

to ceres-...@googlegroups.com

Jean Pierre,

My comments are inline.

<SNIP>

Problem setup
m_Options.minimizer_progress_to_stdout = false;
m_Options.logging_type = ceres::SILENT;
m_Options.minimizer_type = ceres::MinimizerType::LINE_SEARCH;

Why are you using the line_search minimizer?

Questions / Remarks:

OpenMP related:
With only one thread, the CPU load is between 10 and 20%, and the simulation (for a 200x200 size) runs at ~5fps. I can increase the number of threads, but it only increase the CPU load without inreasing the fps. If i set num_threads to 4 by example, the CPU load is 50% but the fps remain const. I have read somewhere on this google group that the general nonlinear solver wasn't multithreaded - that must be the reason.

This is likely because of the line search minimizer. Only the gradient evaluation is threaded. The bulk of the work is in the LBFGS update which is not threaded.

If you were to switch to the trust region minimizer, you will get better solution quality but the per iteration time will likely go up as there is more work to do.

CostFunction related:
Is it efficient to declare one costfunction per residual ? I have read that I could use one global cost function, but how the cost function can now on which residual it is working ?

Only in memory. All you need to do is to create the cost function once and just pass the same object to the AddResidualBlock method. Ceres maintains the binding between the parameter blocks and cost functions internally.

CostFunction* cost_function = new CostFunctor();

for (int j = 0; j < m_H; ++j)

{

for (int i = 0; i < m_L; ++i)

{

m_Problem.AddResidualBlock(cost_function nullptr, parameters[0], parameters[1], parameters[2], parameters[3], parameters[4]);

}

will do the needful.

Performance:
I think with only 10% of CPU load there is still room for improvement - but honestly, I don't know how to speed up the computation. Do you have any tips ?

I am a bit surprised by the 10% cpu load, maybe worth profiling more carefully?

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/dee93236-7f42-4b33-9c16-45b9cc8904b3%40googlegroups.com.

Sameer Agarwal

unread,

Feb 2, 2016, 9:47:39 AM2/2/16

to ceres-...@googlegroups.com

On Tue, Feb 2, 2016 at 4:52 AM Jean-Pierre Gervasoni <jean.pierr...@gmail.com> wrote:

Here is an example of summary:

Solver Summary (v 1.12.0-eigen-(3.2.92)-lapack-cxsparse-(2.3.0)-openmp)

Original Reduced
Parameter blocks 40000 40000
Parameters 40000 40000
Residual blocks 40000 40000
Residual 40000 40000

Minimizer LINE_SEARCH
Line search direction LBFGS (20)
Line search type BISECTION WOLFE

BISECTION is not a terribly efficient line search.

Given Used
Threads 1 1

Cost:
Initial 9.833939e+03
Final 5.913305e-04
Change 9.833938e+03

Minimizer iterations 4
Line search steps 5

Time (in seconds):
Preprocessor 0.0120

Residual evaluation 0.0000
Line search cost evaluation 0.0000
Jacobian evaluation 0.0949
Line search gradient evaluation 0.0472
Line search polynomial minimization 0.0000
Minimizer 0.1848

Postprocessor 0.0026
Total 0.1995

as you can see the bulk of the time is going in jacobian evaluation, but not that much, you should expect to get some improvement from using more threads but not a dramatic improvement.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/8c827196-e63c-4d57-a6b7-0fc25d91d37f%40googlegroups.com.

Reply all

Reply to author

Forward