Suggestion: Method to obtain covariance submatrix without JTJ inversion

Nima Keivan

unread,

May 12, 2013, 6:14:50 PM5/12/13

to ceres-...@googlegroups.com

I have noticed posts on the forum regarding obtaining covariance values for solutions computed through Ceres. I'm also in this basket as I need to obtain marginal distributions of parameters solved by Ceres.

Unfortunately, obtaining the full covariance matrix will require inversion of the Hessian approximation JTJ. Some recent posts have inquired about obtaining the Jacobian at the solution, presumably in order to undertake the JTJ inversion operation outside Ceres. The new Evaluation feature (introduced in 1.5 I believe) is very useful in this regard, but it hits the auto-diff pipeline a second time after the solve, which can be undesirable at times. Furthermore, inverting JTJ for large problems might simply be too costly.

I have been looking into this issue and it seems there is an elegant solution to obtaining a subset of the covariance matrix without needing to calculate it in its entirety [1][2]. It seems to be only applicable to factored approaches, which are the Cholesky and QR solvers in Ceres. In both cases, a factor matrix R is internally calculated during the solve process.

____________________________

Some analysis ahead, if you'd like to see the proposed implementation feel free to skip this section

In short, a particular column of the covariance matrix C, can be calculated by solving the following

(1) RTR * C_i = e_i

Where R is the Cholesky or QR factor matrix, C_i is the covariance column of interest, and e_i is the i_th column of the identity matrix. This relation holds due since JTJ * C = I and RTR = JTJ.

(1) can be solved using forward/backward substitution as in choldmod_solve(), and will produce the ith column of C. This can be useful for state estimation applications where we are only interested in the covariance columns of the latest state X. If X has a dimension of n, we require n solves to obtain the n covariance columns containing the variances of X along with the covariances with all other parameters. Note that X is not the solution state vector, but a small subset of it, corresponding to the parameters for which we are interested in obtaining covariances.

If we are only interested in the variance of X, we can further optimize the calculation. In this case, we only need the nxn submatrix of C that contains the variances on its diagonal. If we place the parameters of X at the end of the state vector, due to the triangularity of R, there will only need to be n substitutions per column of covariance. This will require a total of n^2 substitutions to obtain the variances of X, which is a massive saving over inverting a potentially enormous JTJ, or solving for n full columns of C. The chomod_solve2() function already supports solving for a subset of the state vector, all that is needed for efficiency, is that this subset be at the end.

____________________________

Implementation details:

The proposed implementation would be to allow the user to "select" a particular parameter block for covariance extraction. Ceres would then place this parameter block at the end of the state vector when calculating the factors.

In the particular solver implementation, the appropriate identity column corresponding to each parameter (e_i) is used alongside the factor matrix R to solve for either the whole column (variances and cross covariances), or just the nxn covariance submatrix, where n is the size of the Ceres parameter block. In the latter case, only the required number of substitutions will be performed, by either taking a submatrix of R (in the case of Eigen-based solvers) or using chomod_solve2 for sparse solvers.

If presented as a feature, I believe this would greatly streamline the use of Ceres in state estimation and filtering applications, and would also obviate the need to calculate and manually manipulate the Jacobian/Hessian. However, my exposure to the Ceres codebase is limited as of yet, so I may have missed some major roadblocks to implementation.

Sameer Agarwal

unread,

May 12, 2013, 6:42:53 PM5/12/13

to ceres-...@googlegroups.com

Nima,

On Sun, May 12, 2013 at 3:14 PM, Nima Keivan <nim...@gmail.com> wrote:

I have noticed posts on the forum regarding obtaining covariance values for solutions computed through Ceres. I'm also in this basket as I need to obtain marginal distributions of parameters solved by Ceres.

Are you primarily interested in the block diagonals of the covariance matrix? or the off diagonal entries too?

Unfortunately, obtaining the full covariance matrix will require inversion of the Hessian approximation JTJ. Some recent posts have inquired about obtaining the Jacobian at the solution, presumably in order to undertake the JTJ inversion operation outside Ceres. The new Evaluation feature (introduced in 1.5 I believe) is very useful in this regard, but it hits the auto-diff pipeline a second time after the solve, which can be undesirable at times. Furthermore, inverting JTJ for large problems might simply be too costly.

How big if your jacobian? and how much time are you spending on evaluating it? If the jacobian evaluation is a performance bottleneck, I am quite curious to hear about it.

As for inverting all of J'J is not feasible, but I do not think you need to do that.

https://github.com/jluttine/cholmod-extra

is one way to only evaluate the entries that you need once you have the cholesky factorization. Part of the trick here would be to cache the symbolic factorization and re-use it across subsequent calls to the cholesky factorization routines.

I have been looking into this issue and it seems there is an elegant solution to obtaining a subset of the covariance matrix without needing to calculate it in its entirety [1][2]. It seems to be only applicable to factored approaches, which are the Cholesky and QR solvers in Ceres. In both cases, a factor matrix R is internally calculated during the solve process.

____________________________
Some analysis ahead, if you'd like to see the proposed implementation feel free to skip this section

In short, a particular column of the covariance matrix C, can be calculated by solving the following

(1) RTR * C_i = e_i

Where R is the Cholesky or QR factor matrix, C_i is the covariance column of interest, and e_i is the i_th column of the identity matrix. This relation holds due since JTJ * C = I and RTR = JTJ.

(1) can be solved using forward/backward substitution as in choldmod_solve(), and will produce the ith column of C. This can be useful for state estimation applications where we are only interested in the covariance columns of the latest state X. If X has a dimension of n, we require n solves to obtain the n covariance columns containing the variances of X along with the covariances with all other parameters. Note that X is not the solution state vector, but a small subset of it, corresponding to the parameters for which we are interested in obtaining covariances.

If we are only interested in the variance of X, we can further optimize the calculation. In this case, we only need the nxn submatrix of C that contains the variances on its diagonal. If we place the parameters of X at the end of the state vector, due to the triangularity of R, there will only need to be n substitutions per column of covariance. This will require a total of n^2 substitutions to obtain the variances of X, which is a massive saving over inverting a potentially enormous JTJ, or solving for n full columns of C. The chomod_solve2() function already supports solving for a subset of the state vector, all that is needed for efficiency, is that this subset be at the end.

I have just seen cholmod_solve2() but I have not played with to have a sense of its performance, but I am inclined to think that the cholmod-extra solution is going to be much faster and more flexible in general, since it allows you to compute the entries that you desire directly.

____________________________

Implementation details:

The proposed implementation would be to allow the user to "select" a particular parameter block for covariance extraction. Ceres would then place this parameter block at the end of the state vector when calculating the factors.

This won't quite work. Since different linear solvers have differing ordering requirements. e.g., the schur complement based solvers which are used in slam as well as bundle adjustment

If presented as a feature, I believe this would greatly streamline the use of Ceres in state estimation and filtering applications, and would also obviate the need to calculate and manually manipulate the Jacobian/Hessian. However, my exposure to the Ceres codebase is limited as of yet, so I may have missed some major roadblocks to implementation.

Before we go down the road of including this functionality in the main solver instead of a post processing as our plan is now, I would like to understand the performance bottleneck a bit better. Is it the Jacobian evaluation, the cholesky factorization or the inversion?

Could you tell us a bit more about your work load? size of problem, which part of the covariance matrix you are interested in etc.

Thanks,

Sameer

--
--
----------------------------------------
Ceres Solver Google Group
http://groups.google.com/group/ceres-solver?hl=en?hl=en

---
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Keir Mierle

unread,

May 12, 2013, 7:02:06 PM5/12/13

to ceres-...@googlegroups.com

Also if you would be willing to share your cost function, that would be useful as well.

Nima Keivan

unread,

May 12, 2013, 7:27:20 PM5/12/13

to ceres-...@googlegroups.com

Hi Sameer,

Are you primarily interested in the block diagonals of the covariance matrix? or the off diagonal entries too?

At this point I'm primarily interested in the diagonals.

How big if your jacobian? and how much time are you spending on evaluating it? If the jacobian evaluation is a performance bottleneck, I am quite curious to hear about it.

We are looking at one to several thousand parameters. The Jacobian calculation is definitely not the bottleneck, however I would prefer not to do it twice if I can avoid it.

As for inverting all of J'J is not feasible, but I do not think you need to do that.

https://github.com/jluttine/cholmod-extra

is one way to only evaluate the entries that you need once you have the cholesky factorization. Part of the trick here would be to cache the symbolic factorization and re-use it across subsequent calls to the cholesky factorization routines.

I had not seen chomod-extra before. From what I gather, this still calculates the entire inverse, but exploits the sparsity structure that it finds in the factor. As you pointed out, it would be feasible to cache the factor matrix and re-use it later for specific solvers. I will look into how fast this runs. However, I don't see where I can specify the exact submatrix coordinates in chomod-extra. It seems to return the full inverse? However, the recursive algorithm in the reference can calculate any particular member of the inverse. I'm not sure whether this is faster than the substitution method or not.

This won't quite work. Since different linear solvers have differing ordering requirements. e.g., the schur complement based solvers which are used in slam as well as bundle adjustment

In this case I believe the substitution will be more expensive, as the ordering will necessitate substitution of equations which are not directly solving for the required inverse values.

Before we go down the road of including this functionality in the main solver instead of a post processing as our plan is now, I would like to understand the performance bottleneck a bit better. Is it the Jacobian evaluation, the cholesky factorization or the inversion?

I this this might change from application to application. For simple problems with large numbers of parameters, the inversion might dominate, in which case using the already-computed factor in the solver makes sense. In our application, due to the large number of parameters, I would guess that the factorization is the bottleneck in the optimization, as the Jacobians are relatively simple. However, due to the large number of parameters I would still be averse to re-evaluate the Jacobians in each iteration in order to computer variances.

Could you tell us a bit more about your work load? size of problem, which part of the covariance matrix you are interested in etc.

The application is for a real time SLAM system. The parameter count varies depending on the number of landmarks tracked, but is more than 1000 parameters. The requirement is to obtain the variances for the current pose, which would be a 6x6 submatrix in the covariance matrix.

I might have a deeper look into chomod-extra in order to compare its performance to substitution-based methods. I think for our needs, both of these solutions would work quite well.

Thanks,

Nima

Sameer Agarwal

unread,

May 12, 2013, 7:38:19 PM5/12/13

to ceres-...@googlegroups.com

I had not seen chomod-extra before. From what I gather, this still calculates the entire inverse, but exploits the sparsity structure that it finds in the factor. As you pointed out, it would be feasible to cache the factor matrix and re-use it later for specific solvers. I will look into how fast this runs. However, I don't see where I can specify the exact submatrix coordinates in chomod-extra. It seems to return the full inverse? However, the recursive algorithm in the reference can calculate any particular member of the inverse. I'm not sure whether this is faster than the substitution method or not.

From my understanding it does not compute the entire inverse. You can give it a binary matrix mask corresponding to the entries that you want to compute and it will just compute those.

Before we go down the road of including this functionality in the main solver instead of a post processing as our plan is now, I would like to understand the performance bottleneck a bit better. Is it the Jacobian evaluation, the cholesky factorization or the inversion?

I this this might change from application to application. For simple problems with large numbers of parameters, the inversion might dominate, in which case using the already-computed factor in the solver makes sense. In our application, due to the large number of parameters, I would guess that the factorization is the bottleneck in the optimization, as the Jacobians are relatively simple. However, due to the large number of parameters I would still be averse to re-evaluate the Jacobians in each iteration in order to computer variances.

Fair enough.

Could you tell us a bit more about your work load? size of problem, which part of the covariance matrix you are interested in etc.

The application is for a real time SLAM system. The parameter count varies depending on the number of landmarks tracked, but is more than 1000 parameters. The requirement is to obtain the variances for the current pose, which would be a 6x6 submatrix in the covariance matrix.

So in this case, it is actually worth writing something more specialized, basically something which will do the schur complement trick and then invert the reduced camera matrix.

I might have a deeper look into chomod-extra in order to compare its performance to substitution-based methods. I think for our needs, both of these solutions would work quite well.

It would be great if you are able to experiment with them and report back. it would help decide which of these two algorithmic approaches we should use.

Thanks,

Sameer

Thanks,

Nima

Sameer Agarwal

unread,

May 12, 2013, 10:33:19 PM5/12/13

to ceres-...@googlegroups.com

Nima,

I looked at the cholmod_solve2 function a bit more closely and I think its an excellent idea and offers a fairly robust way of estimating the sparse inverse of J'J.

I am going to write a small covariance estimation routine using it and see how that does. You can track the progress on

https://code.google.com/p/ceres-solver/issues/detail?id=104

One question I have for you and for others planning on using this functionality is, do you expect the sparsity structure of the problem to remain the same across calls or not? I am asking to see if its worth caching the symbolic factorization or not. I am going to start my implementation assuming that caching is not worth it, and go from there.

Sameer

Nima Keivan

unread,

May 13, 2013, 10:42:52 AM5/13/13

to ceres-...@googlegroups.com

I agree that it might not be worth caching the factorization matrix between solves. Just curious, which method are you going to use to get the covariance using colmod_solve2? Would you calculate it during the solve and pass it back via the problem structure? Or would it be calculated after the solve.

You expressed an interest in designating the covariance calculation to be outside the main solver. Do you think it would be possible to get access to the factor matrix for selected solvers? That way the user could opt to use it as they wish, or they could use potential Ceres functions which would use the factor to calculate sections of the covariance matrix. I'm not sure if this would work with all solvers in Ceres though.

Sameer Agarwal

unread,

May 13, 2013, 11:06:44 AM5/13/13

to ceres-...@googlegroups.com

Nima,

On Mon, May 13, 2013 at 7:42 AM, Nima Keivan <nim...@gmail.com> wrote:

I agree that it might not be worth caching the factorization matrix between solves. Just curious, which method are you going to use to get the covariance using colmod_solve2? Would you calculate it during the solve and pass it back via the problem structure? Or would it be calculated after the solve.

As a post process after the solve.

You expressed an interest in designating the covariance calculation to be outside the main solver. Do you think it would be possible to get access to the factor matrix for selected solvers? That way the user could opt to use it as they wish, or they could use potential Ceres functions which would use the factor to calculate sections of the covariance matrix. I'm not sure if this would work with all solvers in Ceres though.

There are several problems with giving access to the factors

1. Factors are permuted and the user getting access to the permutation is hard.

2. The matrix being factorized is not J'J. It is scaled for numerical stability and in the case of LM, a diagonal matrix has been added to it.

So even if the factors were available to you, they would not be useful. Further there is the issue of correctly handling local parameterizations.

This is why I am inclined to pursue my current route, which is going to be covariance object to which you give the problem and the set of blocks in the covariance matrix that you are interested in. It figures out the cheapest way to do the computation and gives you random access to covariances.

Sameer

Sameer Agarwal

unread,

May 19, 2013, 1:32:20 PM5/19/13

to ceres-...@googlegroups.com

Ceres now has covariance estimation support checked into the master branch.

Please take a look include/ceres/covariance.h and let us know if it suits your needs.

Sameer

Sergey Sharybin

unread,

May 19, 2013, 1:42:19 PM5/19/13

to ceres-...@googlegroups.com

Hey Sameer!

This is just awesome news! Unfortunately, very much exhausted today, will give it a try for our keyframe selection code tomorrow and let you guys know.

Thanks for the commit! :)

With best regards, Sergey Sharybin

Nima Keivan

unread,

May 20, 2013, 12:08:08 PM5/20/13

to ceres-...@googlegroups.com

Thank you Sameer, that was very quick! I just tried to include it in our project, and got the following compile error:

invalid application of 'sizeof' to incomplete type 'ceres::internal::CovarianceImpl'

It seems as though the destructor of the scoped_ptr instance inside Covariance doesn't deal well with the forward declaration of CovarianceImpl. The problem was fixed when I added an empty destructor to covariance.h (following the pattern of problem.h). Is this a problem that can be solved without modifying covariance.h? Sorry if I missed something obvious.

Sameer Agarwal

unread,

May 20, 2013, 12:11:45 PM5/20/13

to ceres-...@googlegroups.com

thanks for the bug report. Fixing now.

Sameer

Sameer Agarwal

unread,

May 20, 2013, 12:17:48 PM5/20/13

to ceres-...@googlegroups.com

https://ceres-solver-review.googlesource.com/#/c/3262/

is submitted for review and should be checked in whenever Keir wakes up :)

Sameer

Sameer Agarwal

unread,

May 20, 2013, 4:01:31 PM5/20/13

to ceres-...@googlegroups.com

Merged.

Sameer

Nima Keivan

unread,

May 21, 2013, 11:57:45 AM5/21/13

to ceres-...@googlegroups.com

Thanks Sameer. The covariance calculation seems to be working and producing valid results on our SLAM setup, based on a cursory observation. I will do a more detailed analysis and report back with any inconsistencies. The main issue for us however, is that the solve time has more than doubled. I'm guessing this is due to the fact that variable reordering is not being performed prior to the cholesky decomposition, thereby taking even longer than the normal solve operation?

Do you have any plans to potentially improve upon the performance of the covariance toolchain? I know you were previously thinking about caching the cholesky factor. Could this potentially be used to perform the covariance calculation without necessitating another jacobian evaluation and cholesky decomposition? I understand that there is some reordering that takes place in the general solve case. However, if the ordering were known, my understanding is that it would still be possible to reconstruct the original covariance submatrix if the columns of the full reordered covariance matrix were available. Obviously this would only work with the sparse/dense cholesky solvers, but it would be a major speedup for those cases.

Just some ideas. Thanks again for getting this working so quickly!

Sameer Agarwal

unread,

May 21, 2013, 12:43:04 PM5/21/13

to ceres-...@googlegroups.com

Nima,

The current implementation is not particularly optimized. I wanted to get correctness and API sorted out before working on performance. That said, before we dive into ways of fixing things I would like to understand where the time is going. So a couple of questions.

What is the size/structure of your jacobian?

What version of suitesparse are you using? If you are using something older than 4.2.0 can you upgrade and report back?

My suspicion is that its not the factorization but rather the inversion which is taking time. There are two ways of accelerating the inversion computation. One by doing a sparse solve, the other by threading. I now know how to do the latter, the former will require some indexing gymnastics that I need to figure out. But both of them are on my radar.

Is it possible for you to share an example problem/code that I can run and benchmark? Also when you say the solve time has doubled, how much actual wall time are we talking about?

Sameer

Nima Keivan

unread,

May 21, 2013, 1:13:34 PM5/21/13

to ceres-...@googlegroups.com

What is the size/structure of your jacobian?
What version of suitesparse are you using? If you are using something older than 4.2.0 can you upgrade and report back?

We are at about 600 parameters and 5000 residuals. The structure is that of simple bundle adjustment problem with 3D poses and landmarks. It turns out I was using SuiteSparse 4 (as it is the version on macports) I will upgrade and report back

My suspicion is that its not the factorization but rather the inversion which is taking time. There are two ways of accelerating the inversion computation. One by doing a sparse solve, the other by threading. I now know how to do the latter, the former will require some indexing gymnastics that I need to figure out. But both of them are on my radar.

By inversion do you mean the calls to cholmod_solve and cholmod_solve2 in cvariance_impl.cc? As I understand it, these should only be performing forward/backward substitution (which are order n^2), whereas the call to cholmod_factorize which produces the factor is order n^3. My guess is that the lion's share of the cost is in this call, which is why I was hoping to be able to avoid calling it twice.

Is it possible for you to share an example problem/code that I can run and benchmark? Also when you say the solve time has doubled, how much actual wall time are we talking about?

I'll ask and see if it would be possible to share the code, although it's very tightly linked into a much larger project.

I realized that I had made a mistake in my time calculation. The actual time for the solve (without covariance) is averaging at 17ms whereas the solve+covariance is averaging at 27ms. So the increased cost is a little less than what I previously mentioned. Sorry about that.

Sameer Agarwal

unread,

May 21, 2013, 1:30:44 PM5/21/13

to ceres-...@googlegroups.com

On Tue, May 21, 2013 at 10:13 AM, Nima Keivan <nim...@gmail.com> wrote:

What is the size/structure of your jacobian?
What version of suitesparse are you using? If you are using something older than 4.2.0 can you upgrade and report back?

We are at about 600 parameters and 5000 residuals. The structure is that of simple bundle adjustment problem with 3D poses and landmarks. It turns out I was using SuiteSparse 4 (as it is the version on macports) I will upgrade and report back

We maybe able to improve things a bit here by giving CHOLMOD a custom ordering. But we should wait to do this and see if cholmod_factorize is indeed the bottleneck here.

My suspicion is that its not the factorization but rather the inversion which is taking time. There are two ways of accelerating the inversion computation. One by doing a sparse solve, the other by threading. I now know how to do the latter, the former will require some indexing gymnastics that I need to figure out. But both of them are on my radar.

By inversion do you mean the calls to cholmod_solve and cholmod_solve2 in cvariance_impl.cc? As I understand it, these should only be performing forward/backward substitution (which are order n^2), whereas the call to cholmod_factorize which produces the factor is order n^3.

Yes, but the accounting is not quite that straightforward, since the matrices are sparse. The cube factor depends on the number of knots on the ordering. Its not straight up n^3 in the number of columns. I will add some timers to the covariance object in a forthcoming CL that will let you see the timings spent a bit better.

Is it possible for you to share an example problem/code that I can run and benchmark? Also when you say the solve time has doubled, how much actual wall time are we talking about?

I'll ask and see if it would be possible to share the code, although it's very tightly linked into a much larger project.

I realized that I had made a mistake in my time calculation. The actual time for the solve (without covariance) is averaging at 17ms whereas the solve+covariance is averaging at 27ms. So the increased cost is a little less than what I previously mentioned. Sorry about that.

No problem.

Sameer

Sameer Agarwal

unread,

May 21, 2013, 1:41:47 PM5/21/13

to ceres-...@googlegroups.com

Nima,

Here is a patch that adds some timers to the covariance computation code. Could you patch this into your build

and then run your code with

-logtostderr -vmodule=wall_time=3

and share the results. This will give us a much better idea of where the time is being consumed.