Question about sparse problem

482 views
Skip to first unread message

tr...@cornell.edu

unread,
Mar 19, 2014, 11:31:39 AM3/19/14
to ceres-...@googlegroups.com
Hi,

I'm working on a sparse problem with a sparsity structure that changes over time. I know that each residual will only depend on a small number of the parameters, but I don't know which ones until I evaluate the function, and the parameters can move to different residuals as the optimization proceeds.

I'm currently modeling this as a dense problem, which works ok, but some of the problems require too much memory. Is there a good way to model this type of problem in ceres currently? If not, how much work would it be to implement? Basically I'm thinking about something that has the same interface as a dense problem (one parameter block and one residual block), but that accepts the jacobian as a sparse matrix rather than a dense matrix.

Thanks, Tim

Sameer Agarwal

unread,
Mar 19, 2014, 12:55:04 PM3/19/14
to ceres-...@googlegroups.com
Hi Tim,

There are two different kinds of sparsity in problem. Structural sparsity and numerical sparsity. Structural sparsity comes from knowing apriori that certain elements of the jacobian are zero, generally by virtue of there not being any interaction between a pair of parameters.

Numerical sparsity depends on the exact value of the parameters. 

Structural sparsity remains constant over the life of an optimization problem. Numerical sparsity comes and goes. 

Are you saying that the numerical sparsity of your problem is changing? or is it the structural sparsity?

If it is the numerical sparsity, then there isn't much we can do. The problem is that we make assumptions about the sparsity of the matrix being constant over the life of the solve. This allows us to do fill reducing orderings, allocate the memory for storing the factorization just once etc.

Or are you talking about limitations in the Problem API that prevents you from expressing your structural sparsity correctly?

Sameer




--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/a18f6fdc-4fda-44cb-8f81-0549f148d196%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

tr...@cornell.edu

unread,
Mar 19, 2014, 1:28:51 PM3/19/14
to ceres-...@googlegroups.com
Thanks for the input Sameer. I'm actually not sure how to classify it. Basically, for each residual, I know (a priori) that only a small number of parameters affect it. The number of parameters that affects each residual is constant. But for each residual it is a different set of parameters. And these sets can change depending on the values of the parameters. So it's similar to structural sparsity in that I know a priori (given the value of the parameters) which elements of the jacobian are 0, and the number of zeros is constant over the lifetime of the problem (except for numerical zeros that can pop up). But it's like numerical sparsity in that the structure depends on the value of the parameters, which can change at each step.

So it seems there's not much to be done, given that the sparsity depends on the values of the parameters.

Thanks for the help, Tim

Sameer Agarwal

unread,
Mar 19, 2014, 1:32:18 PM3/19/14
to ceres-...@googlegroups.com
On Wed, Mar 19, 2014 at 10:28 AM, <tr...@cornell.edu> wrote:
Thanks for the input Sameer. I'm actually not sure how to classify it. Basically, for each residual, I know (a priori) that only a small number of parameters affect it. The number of parameters that affects each residual is constant. But for each residual it is a different set of parameters. And these sets can change depending on the values of the parameters. So it's similar to structural sparsity in that I know a priori (given the value of the parameters) which elements of the jacobian are 0, and the number of zeros is constant over the lifetime of the problem (except for numerical zeros that can pop up).

So do you have some sort of switching logic inside your residual blocks based on the numerical values of parameters?

 
But it's like numerical sparsity in that the structure depends on the value of the parameters, which can change at each step.

Yes this looks like numerical sparsity.
 
So it seems there's not much to be done, given that the sparsity depends on the values of the parameters.

If the size of the jacobian is becoming too huge and the time and space complexity is not worth it, you may want to look into using the LINE_SEARCH minimizer in ceres with LBFGS. It has fairly minimal memory requirements, but trades that for convergence and solution quality, which depending on your problem may or may not be an issue.

Sameer

 

Thanks for the help, Tim


On Wednesday, March 19, 2014 12:55:04 PM UTC-4, Sameer Agarwal wrote:
Hi Tim,

There are two different kinds of sparsity in problem. Structural sparsity and numerical sparsity. Structural sparsity comes from knowing apriori that certain elements of the jacobian are zero, generally by virtue of there not being any interaction between a pair of parameters.

Numerical sparsity depends on the exact value of the parameters. 

Structural sparsity remains constant over the life of an optimization problem. Numerical sparsity comes and goes. 

Are you saying that the numerical sparsity of your problem is changing? or is it the structural sparsity?

If it is the numerical sparsity, then there isn't much we can do. The problem is that we make assumptions about the sparsity of the matrix being constant over the life of the solve. This allows us to do fill reducing orderings, allocate the memory for storing the factorization just once etc.

Or are you talking about limitations in the Problem API that prevents you from expressing your structural sparsity correctly?

Sameer




On Wed, Mar 19, 2014 at 8:31 AM, <tr...@cornell.edu> wrote:
Hi,

I'm working on a sparse problem with a sparsity structure that changes over time. I know that each residual will only depend on a small number of the parameters, but I don't know which ones until I evaluate the function, and the parameters can move to different residuals as the optimization proceeds.

I'm currently modeling this as a dense problem, which works ok, but some of the problems require too much memory. Is there a good way to model this type of problem in ceres currently? If not, how much work would it be to implement? Basically I'm thinking about something that has the same interface as a dense problem (one parameter block and one residual block), but that accepts the jacobian as a sparse matrix rather than a dense matrix.

Thanks, Tim

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

tr...@cornell.edu

unread,
Mar 19, 2014, 1:47:57 PM3/19/14
to ceres-...@googlegroups.com
Yes there's fairly simple switching logic based on the parameter values to only evaluate the elements of the jacobian which aren't zero. Thanks for the suggestion, I'll look into using LINE_SEARCH with LBFGS.

Sameer Agarwal

unread,
Mar 19, 2014, 2:04:26 PM3/19/14
to ceres-...@googlegroups.com
And how do you guarantee that the number of parameter blocks actually used in a residual block is constant? Or is your claim that it is generally sparse but you do not care about the exact nature of the sparsity as in the number of zeros...

How big are your problems?
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/55403cdd-074b-4390-ae9d-0f6fcdf0f619%40googlegroups.com.

tr...@cornell.edu

unread,
Mar 19, 2014, 2:39:52 PM3/19/14
to ceres-...@googlegroups.com


On Wednesday, March 19, 2014 2:04:26 PM UTC-4, Sameer Agarwal wrote:
And how do you guarantee that the number of parameter blocks actually used in a residual block is constant? Or is your claim that it is generally sparse but you do not care about the exact nature of the sparsity as in the number of zeros...
 
If you're asking how I use the API, I model it as just having one residual block and one parameter block, and evaluate the jacobian myself. If you're asking why the problem is like that, I'm optimizing an approximation of a function over a surface. So the residuals are points on the surface, and they are affected by the k nearest control points. We optimize the position (3 parameters) and value (3 parameters) of each control point. So the control points can move around and change which residuals they affect.


How big are your problems?

Some can be as big as 200000 x 18000. But k is small (~30), and each row of the jacobian has a maximum of 6*k nonzeros.

Sameer Agarwal

unread,
Mar 19, 2014, 4:03:06 PM3/19/14
to ceres-...@googlegroups.com
If you're asking how I use the API, I model it as just having one residual block and one parameter block, and evaluate the jacobian myself. If you're asking why the problem is like that, I'm optimizing an approximation of a function over a surface. So the residuals are points on the surface, and they are affected by the k nearest control points. We optimize the position (3 parameters) and value (3 parameters) of each control point. So the control points can move around and change which residuals they affect.

So inside the residual evaluation you have a k-nearest neighbor query. This sounds like the moving least squares surface fitting like? Isn't this function non-differentiable? 



How big are your problems?

Some can be as big as 200000 x 18000. But k is small (~30), and each row of the jacobian has a maximum of 6*k nonzeros.


Yes I can see why you are looking to save memory and time here. One way around this might be a middle ground, where you do not go to the extreme of assuming each point is connected to each point .. but rather that each point can only interact with some K >> k points around it in the initial starting configuration.  Where K can be a generous overestimate of k. Assuming that you are starting from a reasonable initial configuration this should allow you to substantially increase the sparsity of your problem at the cost of some additional accounting to be done inside each residual block.

Sameer


 

Sameer Agarwal

unread,
Apr 28, 2014, 4:11:29 PM4/28/14
to ceres-...@googlegroups.com
So it turns out, right around the time we were having this conversation Richard Stebbing had a patch in the works which lets Ceres handle problems with dynamic sparsity. This has now been merged into the master branch. 

We would be curious to hear of your experiences with this code. All you need to do is set Solver::Options::dynamic_sparsity to true. 

Thanks,
Sameer

Oliver Woodford

unread,
May 12, 2015, 5:37:57 PM5/12/15
to ceres-...@googlegroups.com
On Monday, April 28, 2014 Sameer Agarwal wrote:
So it turns out, right around the time we were having this conversation Richard Stebbing had a patch in the works which lets Ceres handle problems with dynamic sparsity. This has now been merged into the master branch. 
We would be curious to hear of your experiences with this code. All you need to do is set Solver::Options::dynamic_sparsity to true.  

I have a mesh surface fitting problem, similar to the ellipse_approximation example, but one dimension higher. I've looked at that code, though haven't yet run it. However, I note a couple of things:
1) It doesn't use auto-differentiation
2) It seems to have the index to the nearest line segment as a parameter, to be optimized in the loop.

My goal is this:
1) To alternate between an iteration of the continuous problem (updating the shape of my surface), and an update to the data point to surface element assignment (which is discrete and non-differentiable, and dynamically defines the structural sparsity of the problem).
2) To use automatic differentiation on the continuous part.
3) To exploit the sparsity by not computing a dense Jacobian (as this would use too much memory).

I could achieve all this by creating a new ceres problem for each iteration, but I expect this carries some unnecessary overhead. Instead, it would be great if you could simply update the parameter pointers to each residual block, in the loop (e.g. in the iteration callback, or between calls to solve), and have ceres redo any computation necessary to handle the change in structure. Note that things like the number of residual blocks and the number of parameters they depend on would not change.

Is this possible, currently?

Many thanks,
Oliver 

Sameer Agarwal

unread,
May 12, 2015, 11:55:19 PM5/12/15
to ceres-...@googlegroups.com
Oliver,

My replies are inline.

I have a mesh surface fitting problem, similar to the ellipse_approximation example, but one dimension higher. I've looked at that code, though haven't yet run it. However, I note a couple of things:
1) It doesn't use auto-differentiation
2) It seems to have the index to the nearest line segment as a parameter, to be optimized in the loop.

My goal is this:
1) To alternate between an iteration of the continuous problem (updating the shape of my surface), and an update to the data point to surface element assignment (which is discrete and non-differentiable, and dynamically defines the structural sparsity of the problem).
2) To use automatic differentiation on the continuous part.
3) To exploit the sparsity by not computing a dense Jacobian (as this would use too much memory).

I could achieve all this by creating a new ceres problem for each iteration, but I expect this carries some unnecessary overhead.

Before we worry about this overhead, its worth actually measuring this. Constructing a ceres problem is usually quite fast, it is nowhere near the cost of actually solving a problem. Unless you are solving really small problems or latency is a big issue for you, this should be fine.
 
Instead, it would be great if you could simply update the parameter pointers to each residual block, in the loop (e.g. in the iteration callback, or between calls to solve), and have ceres redo any computation necessary to handle the change in structure.

This is not possible. We make an internal copy of the parameter block values and work with it. Changing structure like this is a recipe for much trouble. 

Sameer

 
Note that things like the number of residual blocks and the number of parameters they depend on would not change.

Is this possible, currently?

Many thanks,
Oliver 

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Oliver Woodford

unread,
May 13, 2015, 5:39:09 AM5/13/15
to ceres-...@googlegroups.com
Thanks, Sameer. My replies inline, too. 

I have a mesh surface fitting problem, similar to the ellipse_approximation example, but one dimension higher. I've looked at that code, though haven't yet run it. However, I note a couple of things:
1) It doesn't use auto-differentiation
2) It seems to have the index to the nearest line segment as a parameter, to be optimized in the loop.

My goal is this:
1) To alternate between an iteration of the continuous problem (updating the shape of my surface), and an update to the data point to surface element assignment (which is discrete and non-differentiable, and dynamically defines the structural sparsity of the problem).
2) To use automatic differentiation on the continuous part.
3) To exploit the sparsity by not computing a dense Jacobian (as this would use too much memory).

I could achieve all this by creating a new ceres problem for each iteration, but I expect this carries some unnecessary overhead.

Before we worry about this overhead, its worth actually measuring this. Constructing a ceres problem is usually quite fast, it is nowhere near the cost of actually solving a problem. Unless you are solving really small problems or latency is a big issue for you, this should be fine.

I'll do this.
 
 
Instead, it would be great if you could simply update the parameter pointers to each residual block, in the loop (e.g. in the iteration callback, or between calls to solve), and have ceres redo any computation necessary to handle the change in structure.

This is not possible. We make an internal copy of the parameter block values and work with it. Changing structure like this is a recipe for much trouble. 

Efficiency is important on this project. I've realised that there are some other issues which might make it necessary to roll my own optimizer anyway. For example, many residuals reuse the same intermediate computation on the variables. I don't know if there's a way in Ceres to do that computation only once? I guess not.

Thanks,
Oliver

Keir Mierle

unread,
May 13, 2015, 2:21:19 PM5/13/15
to ceres-...@googlegroups.com
Having each residual evaluated independently makes it possible for Ceres to play some tricks - like compute all residuals in parallel, or in the case of inner iterations, run coordinate descent on a subset of the parameters (and only evaluate the relevant residuals).

We've had this request before (to share computation) but have decided not to add it since it would result in pervasive changes throughout Ceres, to only benefit a portion of users. With that said, it would be nice to better understand your use case; what part of the computation is shared?

Thanks,
Keir

Oliver Woodford

unread,
May 13, 2015, 6:41:30 PM5/13/15
to ceres-...@googlegroups.com
Hi Keir
 
Having each residual evaluated independently makes it possible for Ceres to play some tricks - like compute all residuals in parallel, or in the case of inner iterations, run coordinate descent on a subset of the parameters (and only evaluate the relevant residuals).

We've had this request before (to share computation) but have decided not to add it since it would result in pervasive changes throughout Ceres, to only benefit a portion of users.

Yes, I can well imagine. Like I say, I think it's something I should implement myself.
 
With that said, it would be nice to better understand your use case; what part of the computation is shared?

I'm optimizing the vertices of a triangular mesh. Those vertices can be used to compute a coordinate frame for each face (which involves a fair bit of computation). Each coordinate frame is used in on average 10 or so different residuals. If the structure is known then you can always split those residuals into distinct blocks, so only compute the coordinate frames once. However, as I mentioned, my sparsity structure changes. But then I could simply create a new Ceres instance per iteration, as Sameer suggested. Hmm.

Oliver
 
Message has been deleted

Andrew Fitzgibbon

unread,
May 14, 2015, 5:04:56 AM5/14/15
to ceres-...@googlegroups.com

Hi Keir, [Hi Olly!]

I guess one option you've probably considered is just to provide a version of Evaluate which returns the Jacobian in some sparse storage format, e.g. CRS.  Then people with complicated requirements could just use one giant parameter block and handle their own Jacobian computation. 

Of course they lose some of what Ceres provides, notably autodiff, but there's still a lot of value in what remains.   I've seen a lot of poorly implemented hand-rolled optimizers, which would be avoided if people used Ceres even partially.

And often in problems like this there are one or two different parameter types, e.g. vertices and colours, so that one can use a small number of large parameter blocks, and coarse block sparsity is stiill visible to Schur, inner iterations, etc.

Andrew

PS Olly, we've used Ceres for problems like yours in a few papers (Jon Taylor & al, CVPR14, Mingsong Dou & al, CVPR15, Sameh Khamis & al, CVPR15), you should drop by MSR to chat  :)

 

Sameer Agarwal

unread,
May 14, 2015, 9:57:09 AM5/14/15
to ceres-...@googlegroups.com
Andrew,

We have had providing an external evaluator on our list of things to do for a while now. Technically it is straightforward to do, but it requires some API design work to get it right. I will look into it soonish. 

Sameer




--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Oliver Woodford

unread,
May 16, 2015, 7:32:56 AM5/16/15
to ceres-...@googlegroups.com

On Thursday May 14 2015 Andrew Fitzgibbon wrote:

Olly, we've used Ceres for problems like yours in a few papers (Jon Taylor & al, CVPR14, Mingsong Dou & al, CVPR15, Sameh Khamis & al, CVPR15), you should drop by MSR to chat  :)

Thanks for the references and invitation, Andrew. I'll take a look at those, and if I decide to go down this route I'll certainly take you up on your offer. Olly 

Oliver Woodford

unread,
May 18, 2015, 11:38:47 AM5/18/15
to ceres-...@googlegroups.com
Hi Sameer
 
My goal is this:
1) To alternate between an iteration of the continuous problem (updating the shape of my surface), and an update to the data point to surface element assignment (which is discrete and non-differentiable, and dynamically defines the structural sparsity of the problem).
2) To use automatic differentiation on the continuous part.
3) To exploit the sparsity by not computing a dense Jacobian (as this would use too much memory).

I could achieve all this by creating a new ceres problem for each iteration, but I expect this carries some unnecessary overhead.

Before we worry about this overhead, its worth actually measuring this. Constructing a ceres problem is usually quite fast, it is nowhere near the cost of actually solving a problem. Unless you are solving really small problems or latency is a big issue for you, this should be fine.

I've done the timings for my problem. Here's what I get:
0.017s - Problem instantiation (allocating and adding residual blocks)
0.010s - Pre-processing time
0.076s - Time per iteration
0.0004s - Post processing time

So if I were to create a new problem every iteration, that would increase the optimization time by around 35% over not needing to do this. I'd be happy to have a quick Skype chat to discuss further if you'd like. Just email me direct if you'd like to arrange that.

Best wishes,
Oliver

Andrew Fitzgibbon

unread,
May 19, 2015, 4:46:50 AM5/19/15
to ceres-...@googlegroups.com
Interesting.   Looks feasible then at least as a baseline for experimentation.   Once you learn exactly what optimizer settings work, you'll probably want to go custom-everything for the production code, so x1.35 is not a killer for testing.

Also, 17ms for the initialization is still pretty high.  Can you break out the cost of allocating the residual blocks, then for each restart you would simply update the vertex indices for each residual block rather than reallocating and rebuilding the whole block.

Also, if you do go this route, you'll need to propagate enough of Ceres's internal state to simulate a true next iteration.   E.g. if the optimizer is Levenberg Marquardt, you would have to recover the Lambda (a.k.a. regularization) parameter after the single iteration and then feed it into Ceres before the next restart.

Oliver Woodford

unread,
May 19, 2015, 8:53:19 AM5/19/15
to ceres-...@googlegroups.com


On Tuesday, May 19, 2015 at 9:46:50 AM UTC+1, Andrew Fitzgibbon wrote:
On Monday, May 18, 2015 at 4:38:47 PM UTC+1, Oliver Woodford wrote:
Hi Sameer
 
My goal is this:
1) To alternate between an iteration of the continuous problem (updating the shape of my surface), and an update to the data point to surface element assignment (which is discrete and non-differentiable, and dynamically defines the structural sparsity of the problem).
2) To use automatic differentiation on the continuous part.
3) To exploit the sparsity by not computing a dense Jacobian (as this would use too much memory).

I could achieve all this by creating a new ceres problem for each iteration, but I expect this carries some unnecessary overhead.

Before we worry about this overhead, its worth actually measuring this. Constructing a ceres problem is usually quite fast, it is nowhere near the cost of actually solving a problem. Unless you are solving really small problems or latency is a big issue for you, this should be fine.

I've done the timings for my problem. Here's what I get:
0.017s - Problem instantiation (allocating and adding residual blocks)
0.010s - Pre-processing time
0.076s - Time per iteration
0.0004s - Post processing time

So if I were to create a new problem every iteration, that would increase the optimization time by around 35% over not needing to do this. I'd be happy to have a quick Skype chat to discuss further if you'd like. Just email me direct if you'd like to arrange that.


Interesting.   Looks feasible then at least as a baseline for experimentation.   Once you learn exactly what optimizer settings work, you'll probably want to go custom-everything for the production code, so x1.35 is not a killer for testing.

Agreed. 
 
Also, 17ms for the initialization is still pretty high.  Can you break out the cost of allocating the residual blocks, then for each restart you would simply update the vertex indices for each residual block rather than reallocating and rebuilding the whole block.

You can. You just have to make sure the solver doesn't de-allocate the cost functions, using the relevant setting in the options structure. I had considered this, but hadn't checked if it was possible until just now, so the timing doesn't account for that.
 
Also, if you do go this route, you'll need to propagate enough of Ceres's internal state to simulate a true next iteration.   E.g. if the optimizer is Levenberg Marquardt, you would have to recover the Lambda (a.k.a. regularization) parameter after the single iteration and then feed it into Ceres before the next restart.

Ideally. The summary structure doesn't return that value, currently. Very easy to add in, I'm sure. I would, but I'm using a compiled library at the moment. Building ceres on Windows was a bit finicky, so I'm just using a lib that was given to me.

I'm also using Eigen's sparse solver instead of SuiteSparse. Using the latter would speed up the minimization, making the preprocessing steps relatively more expensive.

To put those times into some context, the problem had 9720 variables, and 11571 residuals, all in blocks of 1.

Sameer Agarwal

unread,
Jun 18, 2015, 5:41:08 PM6/18/15
to ceres-...@googlegroups.com
Also, if you do go this route, you'll need to propagate enough of Ceres's internal state to simulate a true next iteration.   E.g. if the optimizer is Levenberg Marquardt, you would have to recover the Lambda (a.k.a. regularization) parameter after the single iteration and then feed it into Ceres before the next restart.

Ideally. The summary structure doesn't return that value, currently. Very easy to add in, I'm sure. I would, but I'm using a compiled library at the moment. Building ceres on Windows was a bit finicky, so I'm just using a lib that was given to me.

I'm also using Eigen's sparse solver instead of SuiteSparse. Using the latter would speed up the minimization, making the preprocessing steps relatively more expensive.

To put those times into some context, the problem had 9720 variables, and 11571 residuals, all in blocks of 1.

That is not a terribly huge problem, but eigen's sparse solver is not terribly fast either. The preprocessing that ceres does for suitesparse is not that expensive. I recommend using suitesparse instead of the simplicial cholesky that eigen is doing.

Sameer

  

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages