Beginner ceres-user question

1,426 views
Skip to first unread message

Christopher

unread,
Feb 14, 2013, 5:58:31 AM2/14/13
to ceres-...@googlegroups.com
Hi,

I am a master thesis student at Linköpings University in Sweden. I´m currently working on non-rigid registration / embedded shape deformation [1, 2, 3].
First of all i want to say that this is the first experience i have with this kind of optimizations so bare with me =)

I basically have a graph which is a coarse sampling of an embedded mesh. Each graph node, xi,,  have an associated affine transformations Ai (3x3 matrix) and a translation vector bi. These affine transformations are solved for by minimizing a set of energy functions. One of these energy functions have the form

where v is the set of graph nodes and N(i) is the set of adjacent graph nodes to node i. This brings me to my question. The way that i have implemented this with ceres right now is to define a CostFunction which calculates a single residual of the total sum. To clarify, since i want to use the AutoDiff functionality i have implemented a CostFunction Esmooth as:

Class ESmooth
{
ESmooth(T_Vec3 _xi, T_Vec3 _xj, const Weight& weight)

template <typename U>
bool operator()(const U* Ai, const U* bi, const U* bj, U* residual) const;
}

Is there anyway else of implementing this. I would be nice if you could define a CostFunction the calculates the inner sum of E_smooth and passing a list of adjacent nodes to the costfunc. This way of would have to create lootts of residualblocks each time.

Best,
Christopher


[1] Morten Bojsen-Hansen, Hao Li, and Chris Wojtan. 2012. Tracking surfaces with evolving topology. ACM Trans. Graph. 31, 4, Article 53 (July 2012), 10 pages. DOI=10.1145/2185520.2185549 http://doi.acm.org/10.1145/2185520.2185549

[2] Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. In ACM SIGGRAPH Asia 2009 papers (SIGGRAPH Asia '09). ACM, New York, NY, USA, , Article 175 , 10 pages. DOI=10.1145/1661412.1618521 http://doi.acm.org/10.1145/1661412.1618521

[3] Robert W. Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3, Article 80 (July 2007). DOI=10.1145/1276377.1276478 http://doi.acm.org/10.1145/1276377.1276478

Sameer Agarwal

unread,
Feb 14, 2013, 11:41:58 AM2/14/13
to ceres-...@googlegroups.com
Christopher,

The way you have implemented the sum is perfectly fine and is a good way of doing so. Are you worried about having too many terms in the optimization? Is speed a concern?

You could implement a Functor which does a variable number of residuals instead of just 3 and do all the terms for a single vertex in your mesh. Look at the documentation for AutoDiffCostFunction, it allows a dynamic number of residuals. That will result in some savings in terms of memory for the size of the Problem object and should speed up the residual/jacobian evaluation a bit.

Let me know if the documentation of AutoDiffCostFunction does not make sense, I can show you a more explicit example.

Sameer





--
--
----------------------------------------
Ceres Solver Google Group
http://groups.google.com/group/ceres-solver?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Christopher

unread,
Feb 18, 2013, 4:15:45 AM2/18/13
to ceres-...@googlegroups.com
Speed is definitely a concern so if that could speed up the evaluation process i will try that out since the minimization (as of now) is really slow compared to the results presented in the papers i´m reading. (Those solvers are probably optimized for this specific problem so I´m not expecting the same)

Regarding dynamic residuals: I haven't studied the insides of Ceres but as i have it right know i guess it is simple to know that Ai, bi and bj is the parameters of this residual block (Esmooth) . However if i would output all residual for say node x0, the parameters of that Residual block would be A0, Aj and bj where j \in N(j). Surely i can pass these in the constructor of the functor but how will ceres know what to differentiate? I mean i cant really pass "all" these parameters as arguments to the functor or is it possible to e.g, send an array of 3 dim paramaters?

Christopher

Sameer Agarwal

unread,
Feb 18, 2013, 10:29:02 AM2/18/13
to ceres-...@googlegroups.com
Christopher,

My apologies, I misread the algebraic expression in your problem. The solution I suggested will not work. But there are two other ways to do this.

1. Have a look DynamicAutoDiffCostFunction, which allows automatic differentiation with the number of parameter blocks determined at run time. This uses a slightly different functor definition. You can see example code in the test.

2. Since your residuals are linear, the best way is actually to bypass autodiff and implement a CostFunction object. You can see an example of how to do this in the include/ceres/normal_prior.h, where we implement a simple |Ax -b|^2 term. CostFunctions are the objects that Ceres actually deals in. Numeric differentiation and automatic differentiation are layers on top of it. Since its trivial for you to compute derivatives yourself, you will get the best speed by filling out the jacobian matrices for the residual block yourself.

Also, since your problem is linear, you should only have to run the ceres solver for exactly one iteration. 

The following settings should give you the best performance.

Solver::Options options;
options.linear_solver_type = SPARE_NORMAL_CHOLESKY;
options.max_num_iterations = 1;
options.initial_trust_region_radius = options.max_trust_region_radius

should do the right thing for you.

Sameer

Christopher

unread,
Feb 19, 2013, 2:31:46 AM2/19/13
to ceres-...@googlegroups.com
No apologies needed, getting this much help on a non-commercial API is not the usual case from my experience =) Thanks for pointing me to some nice examples. Unfortunately i have left a bit out from my problem. I have two other energy functions i am minimizing at the same time

where ai1, ai2, ai3 are the column vectors of Ai and v is the set of vertices/noded

and

where xi are node positions, ci are correspondance points on a surface with normal ni which i´m trying to align wih. alpha_point and alpha_plane are known constants.


And the totalal energy function is E_tot = alpha_fit*E_fit + alpha_ reg*(E_rigid +0.1*E_smooth) and it is not linear. I Am also doing relaxation, starting alpha_fit = 0.1, alpha_reg = 1000 then each the relative change of the cost is below some threshold i divide alpha_reg by 10. I continue this until alpha_reg falls below 0.1. I have taken this scheme from the papers i posted.

Regarding using a costFunction: Honestly i haven't spent any time on trying to differentiate these since i saw there was an autodiff functionality. But if that will speed up the solver? i will most definitely take some time and implement a costFunction instead. If i figure out the differentiation i guess i could make some nice optimization calculating both the residuals and jacobian when i have full control.

Best, Christopher

Sameer Agarwal

unread,
Feb 19, 2013, 10:34:58 AM2/19/13
to ceres-...@googlegroups.com
Christopher,



This is an easy enough term, or rather terms. You should ideally add six new terms per vertex here. Which you should implement via autodiff or analytic differentiation yourself. One thing to be careful of is that the last three terms involve square roots of the squared norm of a_i1 etc. If this gets close to zero, the autodiff will have problems. 

Also it looks like you are trying to make A into a rotation matrix. Why not actually parameterize your transformation using a rotation matrix itself?

where ai1, ai2, ai3 are the column vectors of Ai and v is the set of vertices/noded

and

again this is a straight forward term to implement using autodiff. 

where xi are node positions, ci are correspondance points on a surface with normal ni which i´m trying to align wih. alpha_point and alpha_plane are known constants.


And the totalal energy function is E_tot = alpha_fit*E_fit + alpha_ reg*(E_rigid +0.1*E_smooth) and it is not linear. I Am also doing relaxation, starting alpha_fit = 0.1, alpha_reg = 1000 then each the relative change of the cost is below some threshold i divide alpha_reg by 10. I continue this until alpha_reg falls below 0.1. I have taken this scheme from the papers i posted. 

You may want to look at ScaledLoss and LossFunctionWrapper in loss_function.h to help you with annealing the scalars around each of these terms so that you do not have to rebuild the entire problem from scratch every time you change these values.
 

Regarding using a costFunction: Honestly i haven't spent any time on trying to differentiate these since i saw there was an autodiff functionality. But if that will speed up the solver? i will most definitely take some time and implement a costFunction instead. If i figure out the differentiation i guess i could make some nice optimization calculating both the residuals and jacobian when i have full control.

Generally speaking you should use autodiff for terms, but in some cases like your edge term it maybe better to have analytic differentiation. But before you do all that its worth figuring out if your optimization is spending time in jacobian evaluation at all or not. Summary::FullReport will tell you that. You will need to be on the master branch for this, since the time logging support was added just a week or so ago.

Sameer

Christopher

unread,
Feb 21, 2013, 4:21:18 AM2/21/13
to ceres-...@googlegroups.com

This is an easy enough term, or rather terms. You should ideally add six new terms per vertex here. Which you should implement via autodiff or analytic differentiation yourself. One thing to be careful of is that the last three terms involve square roots of the squared norm of a_i1 etc. If this gets close to zero, the autodiff will have problems. 

Also it looks like you are trying to make A into a rotation matrix. Why not actually parameterize your transformation using a rotation matrix itself?


Yes that is correct. A should be close to a rotation matrix. I´m unsure about the reasons for not modeling Ai as a rotation matrix or perhaps using a quaternion and removing this term completely. Maybe they want Ai to have that freedom of being "close" to a rotation matrix :? I haven't read any paper yet that has raise this question. Then again it wouldn't be that hard to try out and i will probably do this at a later stage.

"You should ideally add six new terms per vertex here". I have implemented this with 1 autodiff residual block per vertex that spits out 6 residuals. Is there any difference to this approach compared to using 6 residual blocks per vertex for this term.


You may want to look at ScaledLoss and LossFunctionWrapper in loss_function.h to help you with annealing the scalars around each of these terms so that you do not have to rebuild the entire problem from scratch every time you change these values.

I have acutally cheated this by passing a reference to a global weight that is evaluated in each functor so i can change it from outside during the optimization (i´m using a callback in which i´m doing the relaxation based on the cost change). However a lossfunction seems better and i will probably go this way. 

 
Generally speaking you should use autodiff for terms, but in some cases like your edge term it maybe better to have analytic differentiation. But before you do all that its worth figuring out if your optimization is spending time in jacobian evaluation at all or not. Summary::FullReport will tell you that. You will need to be on the master branch for this, since the time logging support was added just a week or so ago.


Yes i agree. I´ll have to get the master. Thanks.


Everything works now and i getting the result i want. However yesterday i notice that my problem always begin with 8-10 "unsuccessful" iteration before it actually does anything. Is there a way to track down the cause of this.

//Christopher


 

Sameer Agarwal

unread,
Feb 21, 2013, 9:26:56 AM2/21/13
to ceres-...@googlegroups.com
On Thu, Feb 21, 2013 at 1:21 AM, Christopher <chrb...@student.liu.se> wrote:

This is an easy enough term, or rather terms. You should ideally add six new terms per vertex here. Which you should implement via autodiff or analytic differentiation yourself. One thing to be careful of is that the last three terms involve square roots of the squared norm of a_i1 etc. If this gets close to zero, the autodiff will have problems. 

Also it looks like you are trying to make A into a rotation matrix. Why not actually parameterize your transformation using a rotation matrix itself?


Yes that is correct. A should be close to a rotation matrix. I´m unsure about the reasons for not modeling Ai as a rotation matrix or perhaps using a quaternion and removing this term completely. Maybe they want Ai to have that freedom of being "close" to a rotation matrix :? I haven't read any paper yet that has raise this question. Then again it wouldn't be that hard to try out and i will probably do this at a later stage.

"You should ideally add six new terms per vertex here". I have implemented this with 1 autodiff residual block per vertex that spits out 6 residuals. Is there any difference to this approach compared to using 6 residual blocks per vertex for this term.

Since each matrix A_i is a single parameter block, it is better to have a single residual that computes 6 residuals for a single residual block. This will be more efficient, instead of adding six terms with one residual each. basically if your residual block is hiding sparsity between parameter blocks, it should be split into separate residual blocks. e.g. suppose you have the following expression

(x1 + x2 + x3 + x4 - y)^2 + x1^2  + x2^2 + x3^2 + x4^2

where x1, x2, x3 and x4 are different parameter blocks. 
one way to implement this would be to create one residual block with five terms

x1 + x2 + x3 + x4 - y
x1
x2
x3
x4

but notice here that the bottom four rows of the residual block are actually very sparse, but if you make them part of the same residual block, ceres can't see the parsity, it treats each row to be dense in the parameter blocks it depends on.

in this case, its better to have five different terms, it will save memory as well expose more sparsity to the linear solver.

 


You may want to look at ScaledLoss and LossFunctionWrapper in loss_function.h to help you with annealing the scalars around each of these terms so that you do not have to rebuild the entire problem from scratch every time you change these values.

I have acutally cheated this by passing a reference to a global weight that is evaluated in each functor so i can change it from outside during the optimization (i´m using a callback in which i´m doing the relaxation based on the cost change). However a lossfunction seems better and i will probably go this way. 

Having a global scalar that you pass is fine. Changing the weight during the optimization is a very bad idea. It changes the objective function during optimization and breaks Ceres' reasoning as to what is progress.  Please don't do this. During a Solve call, the mathematical form of your objective function should remain constant.
  

Everything works now and i getting the result i want. However yesterday i notice that my problem always begin with 8-10 "unsuccessful" iteration before it actually does anything. Is there a way to track down the cause of this.

this means that your initial trust region size is too large. Try reducing Solver::Options::initial_trust_region_radius. Are you changing the default value when you start the optimization?

Sameer

 

//Christopher

Christopher

unread,
Feb 21, 2013, 9:39:27 AM2/21/13
to ceres-...@googlegroups.com

This is an easy enough term, or rather terms. You should ideally add six new terms per vertex here. Which you should implement via autodiff or analytic differentiation yourself. One thing to be careful of is that the last three terms involve square roots of the squared norm of a_i1 etc. If this gets close to zero, the autodiff will have problems. 

Also it looks like you are trying to make A into a rotation matrix. Why not actually parameterize your transformation using a rotation matrix itself?


Yes that is correct. A should be close to a rotation matrix. I´m unsure about the reasons for not modeling Ai as a rotation matrix or perhaps using a quaternion and removing this term completely. Maybe they want Ai to have that freedom of being "close" to a rotation matrix :? I haven't read any paper yet that has raise this question. Then again it wouldn't be that hard to try out and i will probably do this at a later stage.

"You should ideally add six new terms per vertex here". I have implemented this with 1 autodiff residual block per vertex that spits out 6 residuals. Is there any difference to this approach compared to using 6 residual blocks per vertex for this term.

Since each matrix A_i is a single parameter block, it is better to have a single residual that computes 6 residuals for a single residual block. This will be more efficient, instead of adding six terms with one residual each. basically if your residual block is hiding sparsity between parameter blocks, it should be split into separate residual blocks. e.g. suppose you have the following expression

(x1 + x2 + x3 + x4 - y)^2 + x1^2  + x2^2 + x3^2 + x4^2

where x1, x2, x3 and x4 are different parameter blocks. 
one way to implement this would be to create one residual block with five terms

x1 + x2 + x3 + x4 - y
x1
x2
x3
x4

but notice here that the bottom four rows of the residual block are actually very sparse, but if you make them part of the same residual block, ceres can't see the parsity, it treats each row to be dense in the parameter blocks it depends on.

in this case, its better to have five different terms, it will save memory as well expose more sparsity to the linear solver.

Thanks, good to know!!

 

 


You may want to look at ScaledLoss and LossFunctionWrapper in loss_function.h to help you with annealing the scalars around each of these terms so that you do not have to rebuild the entire problem from scratch every time you change these values.

I have acutally cheated this by passing a reference to a global weight that is evaluated in each functor so i can change it from outside during the optimization (i´m using a callback in which i´m doing the relaxation based on the cost change). However a lossfunction seems better and i will probably go this way. 

Having a global scalar that you pass is fine. Changing the weight during the optimization is a very bad idea. It changes the objective function during optimization and breaks Ceres' reasoning as to what is progress.  Please don't do this. During a Solve call, the mathematical form of your objective function should remain constant.


Ok! I guess i have to terminate the solver then change the weight and call solve again? Does this completely reset the solver or do i have to actually create a new problem object and pass it to solve?
 
  

Everything works now and i getting the result i want. However yesterday i notice that my problem always begin with 8-10 "unsuccessful" iteration before it actually does anything. Is there a way to track down the cause of this.

this means that your initial trust region size is too large. Try reducing Solver::Options::initial_trust_region_radius. Are you changing the default value when you start the optimization?


Nop, its on default! But i´ll try and lower it.
 

Sameer Agarwal

unread,
Feb 21, 2013, 9:44:06 AM2/21/13
to ceres-...@googlegroups.com
I have acutally cheated this by passing a reference to a global weight that is evaluated in each functor so i can change it from outside during the optimization (i´m using a callback in which i´m doing the relaxation based on the cost change). However a lossfunction seems better and i will probably go this way. 

Having a global scalar that you pass is fine. Changing the weight during the optimization is a very bad idea. It changes the objective function during optimization and breaks Ceres' reasoning as to what is progress.  Please don't do this. During a Solve call, the mathematical form of your objective function should remain constant.


Ok! I guess i have to terminate the solver then change the weight and call solve again? Does this completely reset the solver or do i have to actually create a new problem object and pass it to solve?

You can pass the same problem in. just change the global weight and tell the solver to start again, you do not need to create a new problem.
 

Everything works now and i getting the result i want. However yesterday i notice that my problem always begin with 8-10 "unsuccessful" iteration before it actually does anything. Is there a way to track down the cause of this.

this means that your initial trust region size is too large. Try reducing Solver::Options::initial_trust_region_radius. Are you changing the default value when you start the optimization?


Nop, its on default! But i´ll try and lower it.

Also in such cases, a log generated by  setting  Solver::Options::minimizer_progress_to_stdout to true is useful (in addition to Solver::Summary::FullReport()).
 
Sameer

Bryan Sun

unread,
Jun 3, 2016, 11:07:49 AM6/3/16
to Ceres Solver
have you finally implemented the paper?
thank you

在 2013年2月14日星期四 UTC+8下午6:58:31,Christopher写道:
Reply all
Reply to author
Forward
0 new messages