Storage of Jacobians

puzzlepaint

unread,

Aug 3, 2018, 12:40:13 PM8/3/18

to Ceres Solver

Hi,

when using ceres to solve a problem, I noticed that (unless I am mistaken, in which case I am sorry for asking a dumb question :) ) ceres always seems to store the residual Jacobians as a (sparse) matrix of size [total_num_residuals x total_num_effective_parameters] (at least internal/ceres/compressed_row_jacobian_writer.cc allocates such a matrix), and this presumably can consume a lot of memory if the number of residuals is very high. Out of interest, in case that observation is correct then I am wondering why the Jacobians are always stored this way. With a normal-equations-based solver, wouldn't it often save large amounts of memory to store only the coefficients of the least-squares update equation, which would be a (potentially sparse) matrix of [total_num_effective_parameters x total_num_effective_parameters] and a vector [total_num_effective_parameters x 1], and thus be independent of the number of residuals? I.e., do what e.g. DenseNormalCholeskySolver does for accumulating these coefficients, but do it already during the residual Jacobians computation so that one never has to store all Jacobians at the same time. Or is there any benefit to storing the residual Jacobians individually?

Thanks,

Thomas

Sameer Agarwal

unread,

Aug 3, 2018, 2:40:51 PM8/3/18

to ceres-...@googlegroups.com

We use different Jacobian storage depending on the linear solver type used by the user.

For the cases where we actually construct the normal equations, it is possible to save memory by building normal equations.

but then there is another tradeoff, and that is threading.

which is, that we can quite simply thread without contention the evaluation of the Jacobian because rows do not interact.

but with normal equations we will need to deal with per parameter block pair mutexes to implement threading.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/04be9e8e-b646-4b59-a105-e822f14d40a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oliver Woodford

unread,

Oct 16, 2018, 2:34:34 PM10/16/18

to Ceres Solver

Hi Sameer

I'm facing exactly this problem. Storing residuals and Jacobians requires too much memory, whereas storing gradient and Hessian is feasible (basically, num_residuals_per_block >> num_parameters_per_block). You talk about a way to build normal equations. But am I right in thinking that in the way that ceres is architected, you must store the residuals and Jacobians? Or is there a way to avoid this?

Thanks,

Olly

Sameer Agarwal

unread,

Oct 16, 2018, 2:36:30 PM10/16/18

to ceres-...@googlegroups.com

On Tue, Oct 16, 2018 at 11:34 AM Oliver Woodford <oliver....@gmail.com> wrote:

Hi Sameer

I'm facing exactly this problem. Storing residuals and Jacobians requires too much memory, whereas storing gradient and Hessian is feasible (basically, num_residuals_per_block >> num_parameters_per_block). You talk about a way to build normal equations. But am I right in thinking that in the way that ceres is architected, you must store the residuals and Jacobians? Or is there a way to avoid this?

No unfortunately not, without significant re-architecting the system.

What sized system are you trying to solve and using what linear solver?

can you share Summary::FullReport() for a typical (smaller) system?

Sameer

Thanks,
Olly

On Friday, 3 August 2018 11:40:51 UTC-7, Sameer Agarwal wrote:
We use different Jacobian storage depending on the linear solver type used by the user.
For the cases where we actually construct the normal equations, it is possible to save memory by building normal equations.

but then there is another tradeoff, and that is threading.
which is, that we can quite simply thread without contention the evaluation of the Jacobian because rows do not interact.
but with normal equations we will need to deal with per parameter block pair mutexes to implement threading.

Sameer

On Fri, Aug 3, 2018 at 9:40 AM puzzlepaint <tom.s...@gmail.com> wrote:
Hi,

when using ceres to solve a problem, I noticed that (unless I am mistaken, in which case I am sorry for asking a dumb question :) ) ceres always seems to store the residual Jacobians as a (sparse) matrix of size [total_num_residuals x total_num_effective_parameters] (at least internal/ceres/compressed_row_jacobian_writer.cc allocates such a matrix), and this presumably can consume a lot of memory if the number of residuals is very high. Out of interest, in case that observation is correct then I am wondering why the Jacobians are always stored this way. With a normal-equations-based solver, wouldn't it often save large amounts of memory to store only the coefficients of the least-squares update equation, which would be a (potentially sparse) matrix of [total_num_effective_parameters x total_num_effective_parameters] and a vector [total_num_effective_parameters x 1], and thus be independent of the number of residuals? I.e., do what e.g. DenseNormalCholeskySolver does for accumulating these coefficients, but do it already during the residual Jacobians computation so that one never has to store all Jacobians at the same time. Or is there any benefit to storing the residual Jacobians individually?

Thanks,
Thomas

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/04be9e8e-b646-4b59-a105-e822f14d40a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/489132a2-0d72-4d19-b753-91d3fb1add33%40googlegroups.com.

Oliver Woodford

unread,

Oct 16, 2018, 4:35:20 PM10/16/18

to Ceres Solver

This is the summary on a *much* smaller problem:

Solver Summary (v 2.0.0-eigen-(3.3.5)-lapack-suitesparse-(5.3.0)-cxsparse-(3.1.9)-acceleratesparse-eigensparse-no_openmp)

Original Reduced

Parameter blocks 1562 1561

Parameters 4784 4777

Effective parameters 4770 4764

Residual blocks 3510 3510

Residuals 673920 673920

Minimizer TRUST_REGION

Dense linear algebra library EIGEN

Trust region strategy LEVENBERG_MARQUARDT

Given Used

Linear solver DENSE_SCHUR DENSE_SCHUR

Threads 1 1

Linear solver ordering AUTOMATIC 1534,27

Schur structure 192,3,6 d,d,d

Cost:

Initial 8.004881e+02

Final 6.216486e+02

Change 1.788396e+02

Minimizer iterations 52

Successful steps 47

Unsuccessful steps 5

Time (in seconds):

Preprocessor 0.010246

Residual only evaluation 1.071377 (52)

Jacobian & residual evaluation 12.653191 (47)

Linear solver 7.167313 (52)

Minimizer 23.880280

Postprocessor 0.000180

Total 23.890706

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 7.634968e-07 <= 1.000000e-06)

The reason I ask is that for my full problem size, the size of Jacobian actually causes the member int num_nonzeros_ of BlockSparseMatrix to wrap around, so even if I wanted to run it on a machine with enough memory, I can't. If you want to support larger problems, the SparseMatrix container needs to use larger, unsigned types for the size and index values, and explicitly check for saturation to detect overflow. Note that checking for overflow by asserting on a negative value (block_sparse_matrix.cc:79) won't work all the time; you can wrap all the way round to a positive number again.

Oliver Woodford

unread,

Oct 16, 2018, 4:53:12 PM10/16/18

to Ceres Solver

Sammer,

I think what I need is a solver that computes g and H from r and J for each residual block, when the residual is evaluated, then accumulates g and H in a dense or sparse matrix. Is the architecture such that I could do this by creating a new ProblemImpl or Program implementation? Or is the r and J formulation too deeply embedded?

Thanks,

Olly

Sameer Agarwal

unread,

Oct 16, 2018, 7:03:13 PM10/16/18

to ceres-...@googlegroups.com

Oliver,

Thanks for sharing. Some comments and questions

Is this a structure from motion problem? I am curious as to why you are using DENSE_SCHUR. The schur structure being detected, is quite weird. In fact its not clear to me its working at all. You really have a 192 dimensional parameter block? Also how much sparsity do you have in your Jacobian?

The reason I ask is that for my full problem size, the size of Jacobian actually causes the member int num_nonzeros_ of BlockSparseMatrix to wrap around, so even if I wanted to run it on a machine with enough memory, I can't.

Hmm this is a known bug,

https://github.com/ceres-solver/ceres-solver/issues/412

I will look at fixing it.

If you want to support larger problems, the SparseMatrix container needs to use larger, unsigned types for the size and index values, and explicitly check for saturation to detect overflow. Note that checking for overflow by asserting on a negative value (block_sparse_matrix.cc:79) won't work all the time; you can wrap all the way round to a positive number again.

Yes, I have been hesitant to use long ints for indexing because it has a non-trivial memory cost on smaller problems.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/6564f301-6644-485c-972b-9ce7aa0d7499%40googlegroups.com.

Sameer Agarwal

unread,

Oct 16, 2018, 7:05:25 PM10/16/18

to ceres-...@googlegroups.com

Oliver,

Unfortunately there is no simple way to do this :/

What is the size of the problem you are ultimately interested in solving?

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/778fbddd-1cdf-449c-9e81-c8a2eff90ab1%40googlegroups.com.

Oliver Woodford

unread,

Oct 16, 2018, 7:56:24 PM10/16/18

to Ceres Solver

Hi Sameer (sorry for earlier typo!)

Yes, my residual blocks are 192 long, with parameter blocks of 6, 6, 6, 6 and 3. It is a bipartite problem, similar in structure to a bundle adjustment problem: there are many of the the last parameter block (like landmarks) and far fewer of the first 4 blocks (like cameras), and no residual block links two landmarks, so they should be automatically complemented out (I hope).

The problem I'm struggling with (which is one of the smaller ones I want to solve) has 294 "camera" blocks, 91319 "landmark" blocks and 535007 residual blocks. This leads to a memory requirement of 192*27*535007*8 = 20.7GB for the Jacobian alone. However, the dense schur complement matrix should be only 294*6x294*6 (a mere 23.7MB). So as you can see, it's a large scale problem. But then, ceres is a large scale solver :).

Best,

Olly

Oliver Woodford

unread,

Oct 16, 2018, 7:58:15 PM10/16/18

to Ceres Solver

Sorry, not bipartite! The camera blocks are linked through residuals. But the landmarks can easily be complemented out still.

Sameer Agarwal

unread,

Oct 16, 2018, 8:16:55 PM10/16/18

to ceres-...@googlegroups.com

Thanks for the background Oliver.

On Tue, Oct 16, 2018 at 4:58 PM Oliver Woodford <oliver....@gmail.com> wrote:

Sorry, not bipartite! The camera blocks are linked through residuals. But the landmarks can easily be complemented out still.

On Tuesday, 16 October 2018 16:56:24 UTC-7, Oliver Woodford wrote:
Hi Sameer (sorry for earlier typo!)

Yes, my residual blocks are 192 long, with parameter blocks of 6, 6, 6, 6 and 3. It is a bipartite problem, similar in structure to a bundle adjustment problem: there are many of the the last parameter block (like landmarks) and far fewer of the first 4 blocks (like cameras), and no residual block links two landmarks, so they should be automatically complemented out (I hope).

The problem I'm struggling with (which is one of the smaller ones I want to solve) has 294 "camera" blocks, 91319 "landmark" blocks and 535007 residual blocks. This leads to a memory requirement of 192*27*535007*8 = 20.7GB for the Jacobian alone. However, the dense schur complement matrix should be only 294*6x294*6 (a mere 23.7MB). So as you can see, it's a large scale problem. But then, ceres is a large scale solver :).

So your problem is primarily of building a large enough BlockSparseMatrix more than anything else. I think a better path here is to fix the indexing problem in block_sparse_matrix to allow for larger matrices than trying to introduce a Gauss-Newton Hessian abstraction.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/d0607c28-f550-4da1-80a0-b74f0ddf027b%40googlegroups.com.

Markus Moll

unread,

Oct 17, 2018, 2:18:49 AM10/17/18

to ceres-...@googlegroups.com

Hi

Oliver Woodford <oliver....@gmail.com> hat am 17. Oktober 2018 um 01:58 geschrieben:

Sorry, not bipartite! The camera blocks are linked through residuals. But the landmarks can easily be complemented out still.

On Tuesday, 16 October 2018 16:56:24 UTC-7, Oliver Woodford wrote:
Hi Sameer (sorry for earlier typo!)

Yes, my residual blocks are 192 long, with parameter blocks of 6, 6, 6, 6 and 3. It is a bipartite problem, similar in structure to a bundle adjustment problem: there are many of the the last parameter block (like landmarks) and far fewer of the first 4 blocks (like cameras), and no residual block links two landmarks, so they should be automatically complemented out (I hope).

The problem I'm struggling with (which is one of the smaller ones I want to solve) has 294 "camera" blocks, 91319 "landmark" blocks and 535007 residual blocks. This leads to a memory requirement of 192*27*535007*8 = 20.7GB for the Jacobian alone. However, the dense schur complement matrix should be only 294*6x294*6 (a mere 23.7MB). So as you can see, it's a large scale problem. But then, ceres is a large scale solver :).

Just a silly idea...

Would it be prohibitively expensive to compute your Hessian (H = J^T J) and gradient (g = J^T r) at every step and to use dense Cholesky factorization on H (H=L^T L) to create a fake Jacobian (L) and residual (L^-T g)? That should at least work, shouldn't it (it would be horribly wasteful computational-wise, but...)

Markus

Markus Moll

unread,

Oct 17, 2018, 4:52:41 AM10/17/18

to ceres-...@googlegroups.com

Sorry for the noise, I hadn't thought this through, I guess you can safely ignore that "idea".

Markus Moll <moll....@arcor.de> hat am 17. Oktober 2018 um 08:18 geschrieben:

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/1837053858.58253.1539757126067%40mail.vodafone.de.

Oliver Woodford

unread,

Oct 17, 2018, 2:50:09 PM10/17/18

to Ceres Solver

Hi Markus

I had a similar idea, which was for each residual block to do an SVD on the J, to produce a J and r which are only 27x27 and 27x1, instead of 192x27 and 192x1. This would reduce memory requirements by a factor of 7.1. So not totally silly.

Olly

Sameer Agarwal

unread,

Oct 18, 2018, 2:39:33 PM10/18/18

to ceres-...@googlegroups.com

I am looking into changing the indexing inside ceres, it is a non-trivial change and will take some time.

Sameer

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/50d93bf6-7dfe-495d-be59-7a40966fd039%40googlegroups.com.

Oliver Woodford

unread,

Oct 22, 2018, 6:30:39 PM10/22/18

to Ceres Solver

On Thursday, 18 October 2018 11:39:33 UTC-7, Sameer Agarwal wrote:

I am looking into changing the indexing inside ceres, it is a non-trivial change and will take some time.
Sameer

Sammer

Changing the size of the num_nonzeros_ member variable to int64_t (plus associated changes) fixed the issue for me. No need to change the size of indexes.

Olly

Sameer Agarwal

unread,

Oct 22, 2018, 6:33:15 PM10/22/18

to ceres-...@googlegroups.com

interesting, I was making a much deeper change.

would you care to share your change on gerrit?

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/5c5fb4a1-9c63-4fac-9588-22619078da1a%40googlegroups.com.

Oliver Woodford

unread,

Oct 22, 2018, 9:23:28 PM10/22/18

to Ceres Solver

I can't, unfortunately. My company has not signed Google's CLA. But it's very small. Change the type in block_sparse_matrix.h, then fix all the compile errors by changing the types elsewhere, and static_cast all writes to the variable. 7 files changed with 13 additions and 13 deletions.

Sameer Agarwal

unread,

Oct 23, 2018, 9:37:21 AM10/23/18

to ceres-...@googlegroups.com

np, I think I know what needs done.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/78365d21-4b26-42b3-affb-0b3d583dd4cd%40googlegroups.com.

Robert

unread,

Aug 9, 2022, 11:13:40 AM8/9/22

to Ceres Solver

Hi all,

my apologies for digging up this topic in 2022, but I am currently running into the same problem as the original author and Oliver.

I have a few hundred unknowns but several millions of observations, which I currently need to limit due to memory constraints (~8 million for 32GB RAM or ~16 million for 64GB).

It would be nice to use a normal-equation based solver and "stream" observations into the Hessian approximation J^T J and the right-hand side J^Tf, basically accumulate J_k^TJ_k and J_k^Tf_k.

(I would probably need to accumulate block-wise in multiple steps, i.e. stream in blocks of a few hundreds, and then add those blocks, to avoid numerical cancellation effects, but that will be a technicality to solve afterwards.)

Is there a way Ceres supports this by now, or any hints how I could use the existing framework without breaking too many paradigms/mechanics? I'm thinking of setting up "partial" problems and naively extracting/merging the partial Jacobians.

Any hints are appreciated

- Robert

Sameer Agarwal

unread,

Aug 9, 2022, 11:23:08 AM8/9/22

to ceres-...@googlegroups.com

Robert,

I looked into this issue and going from 32bit indices to 64bit indices is a very large change to ceres solver internals.

Currently we do not support a way for you to input J'J instead of J :/

Having said that, if you only have a few hundred unknowns do you really need to use millions of observations to estimate them?

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/afd0c222-bf74-4665-981c-90eea5ad9f25n%40googlegroups.com.

Message has been deleted

Sameer Agarwal

unread,

Aug 12, 2022, 9:59:41 AM8/12/22

to ceres-...@googlegroups.com

Robert,

Your particular case where you have a dense jacobian and are using DENSE_NORMAL_CHOLESKY solver can likely be dealt with. It is still a large change but smaller than the one that requires changing all indexing across all matrices.

If you are not using any loss functions, then another path would be for you to hack TinySolver for your purposes. There you can change TinySolver to compute J'J directly.

Sameer

On Thu, Aug 11, 2022 at 9:41 PM Robert <winkler...@gmx.net> wrote:

Hi Sameer,

thanks for your quick reply! I understand that changing something that is barely needed/requested is not a good investment of time.

In our case, we observed that we get better results when using very many (up to 10s of millions of) data points, probably because not all of the unknowns are equally observable and it is difficult to automatically "pre-select" a good batch of data.
For now, I will probably try some "trivial block Kaczmarz" approach, that is using different subsets of the observations for different iterations, and see if that converges to an accurate solution (or at all).

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/3d63c895-25a9-4194-a2db-52a1ac5b1fe7n%40googlegroups.com.

Александр Иванов

unread,

Aug 18, 2022, 9:23:02 AM8/18/22

to Ceres Solver

Hi!

I am not sure if it will help. I faced the same problem as you (total number of nonzero jacobian elements in my problem is about 3426464133 > 2^31). As were recommended above I manualy set
```
ceres::SolverOptions solver_options;

solver_options.linear_solver_type = ceres::DENSE_NORMAL_CHOLESKY;
```

It seems that in this case no full jacobian is allocated but at the same time no threading is used. At least in my case it worked.

вторник, 9 августа 2022 г. в 18:13:40 UTC+3, Robert:

Александр Иванов

unread,

Aug 18, 2022, 9:31:14 AM8/18/22

to Ceres Solver

It seems that the main problem is silent integer overlfow. Since changing indexing is a big change then maybe adding more verbose message to checking num_nonzeros_ > 0 is an option. Probably with recommendation to change solver type. But I would also added check to any operation which increases num_nonzeros_ since according to C++ standard signed integer overflow is undefined (but not unsigned integer overlow which happens by modulo 2**32 for uint32_t)

четверг, 18 августа 2022 г. в 16:23:02 UTC+3, Александр Иванов:

Reply all

Reply to author

Forward