How can I best exploit the sparsity nature of my problem?

128 views
Skip to first unread message

Jean-Pierre Gervasoni

unread,
Jan 25, 2016, 11:03:26 AM1/25/16
to Ceres Solver
I am trying to use Ceres to solve the pressure / flow problem in a process simulation software. The problem to be solved is sparse. Here is an example of the obtained jacobian matrix for a very simple problem:


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 2,326 0 0 0 0 0 -1,211 0 0 0 0 0 0 0 0
1 0 2,973 -1,425 0 0 0 0 0 0 0 0 0 0 0 0
2 0 -1,475 2,876 0 -1,375 0 0 0 0 0 0 0 0 0 0
3 0 0 0 2,898 0 0 0 0 0 0 -1,487 -1,385 0 0 0
4 0 0 -1,426 0 2,777 0 0 0 0 0 0 0 0 0 -1,324
5 0 0 0 0 0 2,572 0 0 0 0 0 0 -1,216 0 -1,327
6 -1,15 0 0 0 0 0 2,448 0 0 -1,27 0 0 0 0 0
7 0 0 0 0 0 0 0 2,681 0 -1,271 0 -1,382 0 0 0
8 0 0 0 0 0 0 0 0 2,343 0 0 0 -1,218 0 0
9 0 0 0 0 0 0 -1,212 -1,327 0 2,566 0 0 0 0 0
10 0 0 0 -1,438 0 0 0 0 0 0 2,999 0 0 0 0
11 0 0 0 -1,436 0 0 0 -1,329 0 0 0 2,791 0 0 0
12 0 0 0 0 0 -1,274 0 0 -1,157 0 0 0 2,461 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0,02309 0
14 0 0 0 0 -1,377 -1,272 0 0 0 0 0 0 0 0 2,677


Using boost cuthill_mckee_ordering routine, one can obtain the following matrix:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 0,02309 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 2,973 -1,425 0 0 0 0 0 0 0 0 0 0 0 0
2 0 -1,475 2,876 -1,375 0 0 0 0 0 0 0 0 0 0 0
3 0 0 -1,426 2,777 -1,324 0 0 0 0 0 0 0 0 0 0
4 0 0 0 -1,377 2,677 -1,272 0 0 0 0 0 0 0 0 0
5 0 0 0 0 -1,327 2,572 -1,216 0 0 0 0 0 0 0 0
6 0 0 0 0 0 -1,274 2,461 -1,157 0 0 0 0 0 0 0
7 0 0 0 0 0 0 -1,218 2,343 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 2,326 -1,211 0 0 0 0 0
9 0 0 0 0 0 0 0 0 -1,15 2,448 -1,27 0 0 0 0
10 0 0 0 0 0 0 0 0 0 -1,212 2,566 -1,327 0 0 0
11 0 0 0 0 0 0 0 0 0 0 -1,271 2,681 -1,382 0 0
12 0 0 0 0 0 0 0 0 0 0 0 -1,329 2,791 -1,436 0
13 0 0 0 0 0 0 0 0 0 0 0 0 -1,385 2,898 -1,487
14 0 0 0 0 0 0 0 0 0 0 0 0 0 -1,438 2,999

I am currently trying to solve this using only one parameter block and one residual block:

auto pCostFunction = new CostFunctor(p_DynProblem, (int)m_NbVariables, (int)m_NbVariables);
m_Problem.AddParameterBlock(&m_Values[0], (int)m_NbVariables);
m_Problem.AddResidualBlock(pCostFunction, nullptr, &m_Values[0]);

My understanding of parameter and residual blocks leads me to believe that this could be best solved by dividing the above problem in 3, by defining 3 blocks of residuals:

Parameter block
residual block 1 0
residual block 2 1-2-3-4-5-6-7
residual block 3 8-9-10-11-12-13-14
            
But each sub-problem is quite sparse. Isn't there a way to further exploit sparsity ? 




Sameer Agarwal

unread,
Jan 25, 2016, 11:07:52 AM1/25/16
to Ceres Solver
Jean,

What is the size of the problem you are interested in solving? how many residuals and how many parameters ?

Since we are dealing with general sparsity here, how about starting with every parameter being its own parameter block and every residual being its own residual black.

From there I would look at what residual computations have the same dependence on parameters and can be combined into a single residual block, and similarly what parameters co-occur almost always and should be combined into a single parameter block.

Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/8513acc0-63b0-4da4-9063-faab3c36ba8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-Pierre Gervasoni

unread,
Jan 25, 2016, 11:21:08 AM1/25/16
to Ceres Solver
Sameer,

The number of parameters is equal to the number of residuals. The problem size can vary from very few to several tens of thousands. 

I could declare one parameter per parameter block and one residual per residual block. Does that mean that a parameter can contribute to different residual blocks ?

Sameer Agarwal

unread,
Jan 25, 2016, 11:30:51 AM1/25/16
to Ceres Solver

Yes any parameter can be in any residual block.


Sameer Agarwal

unread,
Jan 25, 2016, 11:32:09 AM1/25/16
to Ceres Solver

Or rather any parameter block can contribute to any residual block.
Sameer


Jean-Pierre Gervasoni

unread,
Feb 2, 2016, 4:23:42 AM2/2/16
to Ceres Solver
Dear Sameer,

I have set up a sample side project in order to tune the ceres solver more easily. I chose 2D heat transfer because it exhibits some similarities with my problem (sparsity, problem size).

Compilation of Ceres under Windows
- Visual Studio 2015
- latest git sources
- Intel Math Kernel Library 
- miniglog

Compilation / linking issues (solved)
I modified PATH to allow MKL detection by cmake (FindBlas.cmake and FindLapack.cmake). I add to alter two source files:

/internal/ceres/covariance_impl.cc:
#pragma omp parallel for num_threads(num_threads) schedule(dynamic) collapse(2)
#pragma omp parallel for num_threads(num_threads) schedule(dynamic) // collapse(2)     <--- unkown keyword


\include\ceres\jet.h
jet.h(489): error C4996: 'j0': The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name: _j0. See online help for details.
j0 was replaced by _j0
j1 was replaced by _j1
jn was replaced by _jn

I chose miniglog because I had linking issues with gflags / glog. 

Problem setup
m_Options.minimizer_progress_to_stdout = false;
m_Options.logging_type = ceres::SILENT;
m_Options.minimizer_type = ceres::MinimizerType::LINE_SEARCH;
m_Options.linear_solver_type = ceres::LinearSolverType::SPARSE_NORMAL_CHOLESKY;
m_Options.use_inner_iterations = false;
m_Options.use_nonmonotonic_steps = false;
m_Options.line_search_direction_type = ceres::LineSearchDirectionType::LBFGS;
m_Options.line_search_type = ceres::LineSearchType::WOLFE;
m_Options.line_search_interpolation_type = ceres::LineSearchInterpolationType::BISECTION;
m_Options.use_approximate_eigenvalue_bfgs_scaling = true;
m_Options.parameter_tolerance = 1e-6;
m_Options.function_tolerance = 1e-6;
m_Options.gradient_tolerance = 1e-6;
m_Options.max_num_iterations = 2000;
m_Options.num_threads = 1;
m_Options.num_linear_solver_threads = 1;

Residual block definition (1 block per unknown)
        for (int j = 0; j < m_H; ++j)
        {
    for (int i = 0; i < m_L; ++i)
            {
                m_Problem.AddResidualBlock(new CostFunctor(...), nullptr, parameters[0], parameters[1], parameters[2], parameters[3], parameters[4]);
            }
        }
There are m_H*m_L unknowns / residual blocks. A residual only depends on 5 parameters. 

CostFunctor constructor
The constructor is used to set some constants and size residual and parameter blocks:
set_num_residuals(1);
mutable_parameter_block_sizes()->resize(5, 1);

CostFunctor evaluate
The real problem is nonlinear, this one was setup just to play with ceres configuration
if (jacobians != nullptr)
{
if (jacobians[0] != nullptr) { jacobians[0][0] = -sz; }
if (jacobians[1] != nullptr) { jacobians[1][ 0] =-sx; }
if (jacobians[2] != nullptr) { jacobians[2][0] = (1.0 + 2.0*sz + 2.0*sx); }
if (jacobians[3] != nullptr) { jacobians[3][0] = -sz; }
if (jacobians[4] != nullptr) { jacobians[4][0] = -sx; }
}

residuals[0] = -sz*parameters[0][0] - sx*parameters[1][0] + (1.0 + 2.0*sz + 2.0*sx)*parameters[2][0] - sz*parameters[3][0] - sx*parameters[4][0] - T - Q*f;

Questions / Remarks:

OpenMP related:
With only one thread, the CPU load is between 10 and 20%, and the simulation (for a 200x200 size) runs at ~5fps. I can increase the number of threads, but it only increase the CPU load without inreasing the fps. If i set num_threads to 4 by example, the CPU load is 50% but the fps remain const. I have read somewhere on this google group that the general nonlinear solver wasn't multithreaded - that must be the reason.

CostFunction related:
Is it efficient to declare one costfunction per residual ? I have read that I could use one global cost function, but how the cost function can now on which residual it is working ? 

Performance:
I think with only 10% of CPU load there is still room for improvement - but honestly, I don't know how to speed up the computation. Do you have any tips ?
 

Thank you again, working with Ceres is fun (head scratching sometimes but fun) .

Best regards,

Jean-Pierre 

Jean-Pierre Gervasoni

unread,
Feb 2, 2016, 7:52:10 AM2/2/16
to Ceres Solver
Here is an example of summary:

Solver Summary (v 1.12.0-eigen-(3.2.92)-lapack-cxsparse-(2.3.0)-openmp)

                                     Original                  Reduced
Parameter blocks            40000                    40000
Parameters                     40000                    40000
Residual blocks               40000                    40000
Residual                          40000                    40000

Minimizer                         LINE_SEARCH
Line search direction         LBFGS (20)
Line search type               BISECTION WOLFE

                                       Given                     Used
Threads                                     1                        1

Cost:
Initial                          9.833939e+03
Final                            5.913305e-04
Change                           9.833938e+03

Minimizer iterations                        4
Line search steps                           5

Time (in seconds):
Preprocessor                           0.0120

  Residual evaluation                  0.0000
    Line search cost evaluation        0.0000
  Jacobian evaluation                  0.0949
    Line search gradient evaluation    0.0472
  Line search polynomial minimization  0.0000
Minimizer                              0.1848

Postprocessor                          0.0026
Total                                  0.1995

Termination:                      CONVERGENCE (Parameter tolerance reached. Relative step_norm: 5.665696e-07 <= 1.000000e-06.)

Sameer Agarwal

unread,
Feb 2, 2016, 9:45:56 AM2/2/16
to ceres-...@googlegroups.com
Jean Pierre, 
My comments are inline.

<SNIP>

Problem setup
m_Options.minimizer_progress_to_stdout = false;
m_Options.logging_type = ceres::SILENT;
m_Options.minimizer_type = ceres::MinimizerType::LINE_SEARCH;

Why are you using the line_search minimizer? 
 
Questions / Remarks:

OpenMP related:
With only one thread, the CPU load is between 10 and 20%, and the simulation (for a 200x200 size) runs at ~5fps. I can increase the number of threads, but it only increase the CPU load without inreasing the fps. If i set num_threads to 4 by example, the CPU load is 50% but the fps remain const. I have read somewhere on this google group that the general nonlinear solver wasn't multithreaded - that must be the reason.

This is likely because of the line search minimizer. Only the gradient evaluation is threaded. The bulk of the work is in the LBFGS update which is not threaded.

If you were to switch to the trust region minimizer, you will get better solution quality but the per iteration time will likely go up as there is more work to do.
 
CostFunction related:
Is it efficient to declare one costfunction per residual ? I have read that I could use one global cost function, but how the cost function can now on which residual it is working ? 

Only in memory. All you need to do is to create the cost function once and just pass the same object to the AddResidualBlock method. Ceres maintains the binding between the parameter blocks and cost functions internally.
 
CostFunction* cost_function = new CostFunctor();
 for (int j = 0; j < m_H; ++j)
        {
    for (int i = 0; i < m_L; ++i)
            {
                m_Problem.AddResidualBlock(cost_function nullptr, parameters[0], parameters[1], parameters[2], parameters[3], parameters[4]);
            }
        }

will do the needful.


Performance:
I think with only 10% of CPU load there is still room for improvement - but honestly, I don't know how to speed up the computation. Do you have any tips ?

I am a bit surprised by the 10% cpu load, maybe worth profiling more carefully?

Sameer
 

Sameer Agarwal

unread,
Feb 2, 2016, 9:47:39 AM2/2/16
to ceres-...@googlegroups.com
On Tue, Feb 2, 2016 at 4:52 AM Jean-Pierre Gervasoni <jean.pierr...@gmail.com> wrote:
Here is an example of summary:

Solver Summary (v 1.12.0-eigen-(3.2.92)-lapack-cxsparse-(2.3.0)-openmp)

                                     Original                  Reduced
Parameter blocks            40000                    40000
Parameters                     40000                    40000
Residual blocks               40000                    40000
Residual                          40000                    40000

Minimizer                         LINE_SEARCH
Line search direction         LBFGS (20)
Line search type               BISECTION WOLFE

BISECTION is not a terribly efficient line search.  
 

                                       Given                     Used
Threads                                     1                        1

Cost:
Initial                          9.833939e+03
Final                            5.913305e-04
Change                           9.833938e+03

Minimizer iterations                        4
Line search steps                           5

Time (in seconds):
Preprocessor                           0.0120

  Residual evaluation                  0.0000
    Line search cost evaluation        0.0000
  Jacobian evaluation                  0.0949
    Line search gradient evaluation    0.0472
  Line search polynomial minimization  0.0000
Minimizer                              0.1848

Postprocessor                          0.0026
Total                                  0.1995

as you can see the bulk of the time is going in jacobian evaluation, but not that much, you should expect to get some improvement from using more threads but not a dramatic improvement.

Sameer


 
Reply all
Reply to author
Forward
0 new messages