Using Ceres Solver for Systems of Nonlinear Equations

964 views
Skip to first unread message

Matt Stinson

unread,
Mar 29, 2016, 1:02:04 PM3/29/16
to Ceres Solver
I am considering using the Ceres solver for solving a system of nonlinear equations (where number of equations equals number of unknowns). I note that many solvers make simplifications and optimizations for these cases: for example if I want to solve for a newton step I can solve the following system: delx = -J^-1 * F instead of delx = -(J_T*J)^-1 * (J_T*F). Ideally this should not make a difference, but for poorly conditioned systems, there is typically a difference numerically. The systems I am solving have this issue of a large condition number. For example, the condition number of J might be 1e15 and the condition number of J_T*J 1e30. As far as I understand, the ceres solver does not check for the case when the number of equations equals the amount of unknowns in order to make any simplifications, is this correct? And if not, any plans for such features? Any suggestions for my situations/or other pieces of software that are optimized for systems of equations?

Sameer Agarwal

unread,
Mar 29, 2016, 3:22:52 PM3/29/16
to Ceres Solver
you can use the DENSE_QR linear solver and not worry about the condition number problem.
The squaring of condition number is only a problem with linear solvers that work on the normal equations, like cholesky factorization.
Broader discussion around these optimization only make sense once you have a concrete example. For example, if you are using the LM algorithm, then the system is not rectangular anymore.

So go ahead use Ceres and I hope you will be pleasantly surprised by how numerically robust it is.
Sameer


On Tue, Mar 29, 2016 at 10:02 AM Matt Stinson <mrmat...@gmail.com> wrote:
I am considering using the Ceres solver for solving a system of nonlinear equations (where number of equations equals number of unknowns). I note that many solvers make simplifications and optimizations for these cases: for example if I want to solve for a newton step I can solve the following system: delx = -J^-1 * F instead of delx = -(J_T*J)^-1 * (J_T*F). Ideally this should not make a difference, but for poorly conditioned systems, there is typically a difference numerically. The systems I am solving have this issue of a large condition number. For example, the condition number of J might be 1e15 and the condition number of J_T*J 1e30. As far as I understand, the ceres solver does not check for the case when the number of equations equals the amount of unknowns in order to make any simplifications, is this correct? And if not, any plans for such features? Any suggestions for my situations/or other pieces of software that are optimized for systems of equations?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/c0ea7e42-cc0e-45fb-86d5-ce7c65ec9f35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Stinson

unread,
Mar 29, 2016, 4:22:49 PM3/29/16
to Ceres Solver
Thanks, I appreciate the response. For your reference, the systems I have tend to be sparse (1% or less density and 250-1000 equations). Let me know if that changes your recommendation for dense QR. Also, to clarify, what is meant exactly by the "normal equations"?

A couple more questions:

In addition to the large condition number I have, the system of equations tends to be poorly scaled in the solution variables x and the residual variables F. Any specific techniques or tips for how to deal with this within Ceres? Would you argue using the loss function would even make sense in the case for systems of equations where all the residuals should be zero at the final solution (I don't think the concept of outlier really applies to systems of equations). However, with that said, when I am far from my solution, various residual terms are going to tend to dominate, so if there is a robust recommendation to handle that please let me know. Also, is there an option for x scaling that I am not seeing? I do note that LM algorithm has automatic x scaling built in, so there is that option. Ideally if I were to implement any x or F scaling, I would prefer it to be automatic instead of manual due to range of systems I need to use this on.

And can you explain your comment: "if you are using the LM algorithm, then the system is not rectangular anymore", I'm not sure if I understand why the system is not rectangular, or even square for that matter (in my case)?

Sameer Agarwal

unread,
Mar 29, 2016, 9:18:04 PM3/29/16
to ceres-...@googlegroups.com
Comments inline.

On Tue, Mar 29, 2016 at 1:22 PM Matt Stinson <mrmat...@gmail.com> wrote:
Thanks, I appreciate the response. For your reference, the systems I have tend to be sparse (1% or less density and 250-1000 equations). Let me know if that changes your recommendation for dense QR. Also, to clarify, what is meant exactly by the "normal equations"?

There are two ways of solving unsymmetric (possibly rectangular linear systems)

Jx =f

one is to symmetrize them as

J'J x = Jf

and then apply cholesky (sparse or dense) factorization to it. This linear system is also known as the "normal equations". It has square the condition number of the original linear system.

The other is to compute a QR factorization of J which gives us

QR x = f
Q'QRx = Q'f

since Q is an orthonormal matrix, Q'Q is identity so we get

R x = Q'f
where R is an upper triangular matrix and easily inverted. DENSE_QR refers to using this method. 
Your linear system size should be fine for solving with DENSE_QR.
 

A couple more questions:

In addition to the large condition number I have, the system of equations tends to be poorly scaled in the solution variables x and the residual variables F. Any specific techniques or tips for how to deal with this within Ceres?

Ceres by default uses Jacobi scaling to scale the linear system before solving it. That said, it is always a good idea to scale your system to be as well conditioned as possible before giving it to a solver.
 
Would you argue using the loss function would even make sense in the case for systems of equations where all the residuals should be zero at the final solution (I don't think the concept of outlier really applies to systems of equations).

I do not think a loss function makes much sense here.
 
However, with that said, when I am far from my solution, various residual terms are going to tend to dominate, so if there is a robust recommendation to handle that please let me know.

I do not have anything to recommend here. 
 
Also, is there an option for x scaling that I am not seeing?

 There is no option in the API for scaling x. That is the user's responsibility.

I do note that LM algorithm has automatic x scaling built in, so there is that option. Ideally if I were to implement any x or F scaling, I would prefer it to be automatic instead of manual due to range of systems I need to use this on.

Jacobi scaling is performed automatically for all solvers.
 
And can you explain your comment: "if you are using the LM algorithm, then the system is not rectangular anymore", I'm not sure if I understand why the system is not rectangular, or even square for that matter (in my case)?

When you are using levenberg-marquardt, you are not solving

Jx = f

instead of you are solving the least squares problemm

|Jx - f|^2 + |Dx|^2

where D is a diagonal matrix determined by the levenberg marquradt algorithm. Which can be re-written as the least squares system

|J| x = |f|
|D|       |0|

which as you can see is not square even if J is.

Sameer

 



On Tuesday, March 29, 2016 at 2:22:52 PM UTC-5, Sameer Agarwal wrote:
you can use the DENSE_QR linear solver and not worry about the condition number problem.
The squaring of condition number is only a problem with linear solvers that work on the normal equations, like cholesky factorization.
Broader discussion around these optimization only make sense once you have a concrete example. For example, if you are using the LM algorithm, then the system is not rectangular anymore.

So go ahead use Ceres and I hope you will be pleasantly surprised by how numerically robust it is.
Sameer


On Tue, Mar 29, 2016 at 10:02 AM Matt Stinson <mrmat...@gmail.com> wrote:
I am considering using the Ceres solver for solving a system of nonlinear equations (where number of equations equals number of unknowns). I note that many solvers make simplifications and optimizations for these cases: for example if I want to solve for a newton step I can solve the following system: delx = -J^-1 * F instead of delx = -(J_T*J)^-1 * (J_T*F). Ideally this should not make a difference, but for poorly conditioned systems, there is typically a difference numerically. The systems I am solving have this issue of a large condition number. For example, the condition number of J might be 1e15 and the condition number of J_T*J 1e30. As far as I understand, the ceres solver does not check for the case when the number of equations equals the amount of unknowns in order to make any simplifications, is this correct? And if not, any plans for such features? Any suggestions for my situations/or other pieces of software that are optimized for systems of equations?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Matt Stinson

unread,
Mar 30, 2016, 10:28:20 PM3/30/16
to Ceres Solver
Thanks for the info. I definitely learned something with the orthogonal transformation technique that is used for LM. In my tests in matlab for LM I had been solving delx with (J'*J+gamma*I)\(J'*f) and ran into the large condition number issue. Curious about this issue for larger systems: is there no sparse QR method that could be applied in the same way as your dense QR?

Also, can you point me to the details of the automatic jacobi scaling, either some documentation or the code directly? Is this equivalent to the automatic jacobi scaling that Minpack does? I believe in Minpack the scale factors are basically the norm of the Jacobian column vectors, not allowing the scale factors to decrease during the solve...

I have mostly been focusing on powell dogleg methods for systems of equations. For these systems I have been solving the newton step (-J\f) with LU factorization since my system is square and calculating the steepest descent direction with -J'*f. I'm curious, is there still some solver features I would theoretically be missing out on by not using a QR solver for my specific case? In other words, would Ceres benefit by adding a LU factorization solver (dense and/or sparse versions) for systems of equations for powell dogleg and for your gauss newton line search methods?

Also, I saw your post several years back regarding the very good solution quality of the Ceres solver compared to Minpack, HBN, Gnuplot, etc. These results are very encouraging, however, I'm wondering if there is something you can primarily attribute the success of this solver to? As far as I can tell the basic methodologies in Minpack for example and likely some of the others are very similar, but apparently their implementation doesn't come close to Ceres's. Again, very encouraging, i'm just trying to understand the root cause(s) for the increased success of Ceres?

Finally, it appears there is a strong recommendation/preference by the Ceres solver (based on your comments and the documentation) to use LM as opposed to other trust region methods (dogleg family of methods). Meanwhile, in the literature, there seems to be a a recommendation and preference to use powell dogleg for systems of equations. For example, even in the reference that you seem to use heavily (K. Madsen, H.B. Nielsen, O. Tingleff, 2004), they plainly state that powell dogleg is better than LM for systems of nonlinear equations and is considered the best solver approach for systems of equations. Any insight or comments here?

Sameer Agarwal

unread,
Mar 31, 2016, 12:08:44 AM3/31/16
to ceres-...@googlegroups.com
On Wed, Mar 30, 2016 at 7:28 PM Matt Stinson <mrmat...@gmail.com> wrote:
Thanks for the info. I definitely learned something with the orthogonal transformation technique that is used for LM. In my tests in matlab for LM I had been solving delx with (J'*J+gamma*I)\(J'*f) and ran into the large condition number issue. Curious about this issue for larger systems: is there no sparse QR method that could be applied in the same way as your dense QR?

There is SuitesparseQR, which is a pretty good Sparse QR factorization method, but in my experience for large systems, state of the art supernodal sparse cholesky solvers like CHOLMOD are still better, both in runtime as well as memory consumption. 
 
Also, can you point me to the details of the automatic jacobi scaling, either some documentation or the code directly? Is this equivalent to the automatic jacobi scaling that Minpack does? I believe in Minpack the scale factors are basically the norm of the Jacobian column vectors, not allowing the scale factors to decrease during the solve...

yes, it is the same.

 
I have mostly been focusing on powell dogleg methods for systems of equations. For these systems I have been solving the newton step (-J\f) with LU factorization since my system is square and calculating the steepest descent direction with -J'*f. I'm curious, is there still some solver features I would theoretically be missing out on by not using a QR solver for my specific case? In other words, would Ceres benefit by adding a LU factorization solver (dense and/or sparse versions) for systems of equations for powell dogleg and for your gauss newton line search methods?

no. we do not use line search with our gauss newton either. Also as I suggested earlier, before we worry about all these conditioning issues and theorize about solutions, its a good idea to actually use ceres to see how it does.
 
Also, I saw your post several years back regarding the very good solution quality of the Ceres solver compared to Minpack, HBN, Gnuplot, etc. These results are very encouraging, however, I'm wondering if there is something you can primarily attribute the success of this solver to? As far as I can tell the basic methodologies in Minpack for example and likely some of the others are very similar, but apparently their implementation doesn't come close to Ceres's. Again, very encouraging, i'm just trying to understand the root cause(s) for the increased success of Ceres?

I have no idea why minpack performs the way it does.The LM loop we use is the same as HBN and its quite good, the step size scaling logic there is very good. Beyond that, just a careful implementation and a ton of bug fixes accumulated by observing the solver work in a production environment. 

Unlike other solvers I have the luxury of watching people across google use it, report bugs in live problems and being able to reproduce them (because I have access to what they are doing), this makes improving the quality of the solver in various corner cases much easier than just bug reports on the mailinglist.

 
Finally, it appears there is a strong recommendation/preference by the Ceres solver (based on your comments and the documentation) to use LM as opposed to other trust region methods (dogleg family of methods). Meanwhile, in the literature, there seems to be a a recommendation and preference to use powell dogleg for systems of equations. For example, even in the reference that you seem to use heavily (K. Madsen, H.B. Nielsen, O. Tingleff, 2004), they plainly state that powell dogleg is better than LM for systems of nonlinear equations and is considered the best solver approach for systems of equations. Any insight or comments here?

I have no experience solving equations, I mostly work on nonlinear least squares. And there LM has generally worked better. The reason people recommend dogleg over LM is because LM has to factorize/solve for the step every iteration so the cost of an unsuccessful iteration is high, whereas dogleg methods can reuse the factorization till a successful step is made. The problem is that the quality of that step may not be as good as the one LM computes. But this I think is fairly problem dependent. I recommend trying out the various solvers to see what best fits your needs. 

Maybe there is a bug in our dogleg solver, but I have never found it dominate LM.

HTH,
Sameer



 
Reply all
Reply to author
Forward
0 new messages