Initialization of Levenberg-Marquardt

999 views
Skip to first unread message

Hariprasad Kannan

unread,
Jun 20, 2015, 7:36:00 AM6/20/15
to ceres-...@googlegroups.com
For the Levenberg-Marquardt method a "damping" diagonal matrix is added to the Hessian/Hessian approximation. This diagonal matrix is usually the diagonal of the Hessian/Hessian approximation. How to fix the initial value of the factor that multiplies this diagonal matrix?

Sameer Agarwal

unread,
Jun 20, 2015, 12:33:10 PM6/20/15
to ceres-...@googlegroups.com

Hari,
Why do you want to fix this value? Ceres already takes care of this for you. There is a multiplier what you can control.

Solver::options::initial_trust_region_radius wholse inverse determines the strength of the starting diagonal.


On Sat, Jun 20, 2015, 4:36 AM Hariprasad Kannan <hka...@gmail.com> wrote:
For the Levenberg-Marquardt method a "damping" diagonal matrix is added to the Hessian/Hessian approximation. This diagonal matrix is usually the diagonal of the Hessian/Hessian approximation. How to fix the initial value of the factor that multiplies this diagonal matrix?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/77a3d32e-ce40-4f6c-9687-7b337490ae98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hariprasad Kannan

unread,
Jun 21, 2015, 2:47:29 AM6/21/15
to ceres-...@googlegroups.com
Hi
I am curious about Levenberg Marquardt implementation as such. For my work, I require an unconstrained optimization solver. Trust region approach is preferable, as hessian is only PSD and never strictly PD and line search approach results in huge step length and poor search direction.

While implementing LM approach, the algorithm seems to be very sensitive to this multiplying factor. Dogleg is not an option, because problem size is half to a million variables. I am now trying to implement Steihaug method but it requires preconditioning. However, it is not working with diagonal preconditioner for the first 90% of the iterations, towards the last iterations, diagonal preconditioner is effective and necessary.

Regards
Hari

--
You received this message because you are subscribed to a topic in the Google Groups "Ceres Solver" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ceres-solver/8kGKi8m8FuQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CAK0oyEqa6MM3HrJ4iPpnT3S_L0naQ6Fh-aBvO6bpRjShGPTgUQ%40mail.gmail.com.

Sameer Agarwal

unread,
Jun 21, 2015, 2:50:02 AM6/21/15
to ceres-...@googlegroups.com

Have a look at the code in trust_region_minimizer.cc and levenberg_marquardt_stratetegy.cc

Also for unconstrained optimization, have you looked into lbfgs?

Sameer


Hariprasad Kannan

unread,
Jun 21, 2015, 2:57:37 AM6/21/15
to ceres-...@googlegroups.com
Thanks for the pointers. I will look into them.

I have not seriously looked into lbfgs so far. I have to figure out the extra computational cost and memory requirement.

Hari

Hariprasad Kannan

unread,
Jun 21, 2015, 3:30:02 AM6/21/15
to ceres-...@googlegroups.com
It will be nice to know the practical advantages of lbfgs.

Hari

Hariprasad Kannan

unread,
Jul 8, 2015, 1:45:51 PM7/8/15
to ceres-...@googlegroups.com
Is lbfgs a good idea if the Hessian of the problem has a huge (> 10^6) condition number? Will it require trust region kind of safe guard?

Hari

Sameer Agarwal

unread,
Jul 8, 2015, 1:49:02 PM7/8/15
to ceres-...@googlegroups.com
I do not have any particular experience comparing TR with LBFGS on ill conditioned problems, but  if your condition number is huge, then you have a scaling/conditioning problem and you should change your model. Different optimization algorithms will work to varying degrees, but nothing will work well. 

Sameer



Hariprasad Kannan

unread,
Jul 8, 2015, 4:10:43 PM7/8/15
to ceres-...@googlegroups.com
Thanks for your inputs.

Hari

Hariprasad Kannan

unread,
Jul 29, 2015, 3:39:37 PM7/29/15
to Ceres Solver, sameer...@google.com
Hi Sameer
Can you throw some light on what you mean by changing my model?

Thanks.

Sameer Agarwal

unread,
Jul 29, 2015, 10:51:37 PM7/29/15
to Hariprasad Kannan, Ceres Solver
I mean that you scale your variables so that the hessian for your problem has an eigenvalue distribution concentrated around a single value. Imagine your hessian defines an ellipse (lets assume PSD for now), then you are trying to reduce its eccentricity.

Nocedal & Wright or pretty much any text on numerical optimization has a discussion of conditioning.

Sameer

Hariprasad Kannan

unread,
Jul 30, 2015, 1:11:10 AM7/30/15
to Sameer Agarwal, Ceres Solver
Yes, I have read about preconditioning from Nocedal and Wright. I was wondering whether I am missing something.

Thanks.

Hariprasad Kannan

unread,
Jul 30, 2015, 1:51:16 AM7/30/15
to Sameer Agarwal, Ceres Solver
Preconditioning can influence the algorithm in two ways - lesser number of iterations for the linear system and getting a more suitable direction for the problem.

Preconditioning modifies the trust region shape to an ellipsoid in Steihaug method and hence, influences getting a more suitable direction. Otherwise, in L-M kind of approach, I am not able to see how it can help with finding a better direction.

Sameer Agarwal

unread,
Jul 30, 2015, 4:11:14 AM7/30/15
to Hariprasad Kannan, Ceres Solver
Let us assume that you are using a direct solver so the number of linear solver iterations is not an issue. Then the only issue is the quality of trust region step.

If the problem is poorly conditioned, it basically means that different variables require vastly varying magnitudes of change to decrease the objective function value. 

Suppose you are using a scaled identity matrix to dampen the gauss-newton matrix, in that case the scale is inversely related to the trust region size, and given the wildly varying scales of the various variables, it is constrained to be rather large so that the most sensitive variable does not move too much, and in doing so you may either decrease the convergence rate considerably because you are taking very small steps, or you may get stuck somewhere entirely.

Changing the damping matrix to something better than the identity, like the diagonal of J'J improves things, but the basic idea remains the same.

Sameer

Hariprasad Kannan

unread,
Jul 30, 2015, 6:38:46 AM7/30/15
to Sameer Agarwal, Ceres Solver
Thanks for the points. I have been thinking about what you said.

I see that having the diagonal of the Hessian, instead of identity, can influence the obtained search direction. I am going to try this and see.

Otherwise, I am not able to fully understand how preconditioning can help with quality of trust region step. If we apply pre-conditioning after adding the damping matrix to the hessian, we are already in a modified landscape, then how can we constrain the trust region shape according to the conditioning of the hessian?

Is it true that without pre-conditioning, we may get a less accurate answer to the linear system and hence, a poorer direction?

Sameer Agarwal

unread,
Jul 30, 2015, 10:39:39 AM7/30/15
to Hariprasad Kannan, Ceres Solver

I am not sure I understand your first question but the answer to your second question is yes.

Hariprasad Kannan

unread,
Jul 30, 2015, 11:34:11 AM7/30/15
to Sameer Agarwal, Ceres Solver
If you take general formulation of trust region method and apply pre-conditioning, it is equivalent to solving the sub-problem subject to a modified (ellipsoid) trust region (as shown in pg. 95 & 174 in Nocedal and Wright).

I can see a similar modification of trust region shape in L-M, when the damping matrix is the diagonal of the hessian.

However, I am not able to see a similar modification of the trust region shape in L-M, when you apply preconditioning to the linear system.

Since, you have answered yes to the second question, I see that pre-conditioning can help with better estimation of the linear system solution. That can make a big difference.

Sameer Agarwal

unread,
Jul 30, 2015, 11:58:27 AM7/30/15
to Hariprasad Kannan, Ceres Solver
when you choose a particular damping matrix, it is equivalent to choosing a particular distance metric in the domain of the optimization problem. This when written out as a newton step results in the linear system being re-scaled. This changes the step you take in exact arithmetic. 

If you start with the linear system itself, and you are using a direct solver + exact arithmetic, then preconditioning has no effect on the solution. In floating point + direct solver it will improve the accuracy of the solution, in floating point + iterative solver, it will be the difference between getting a usable solution in a reasonable amount of time or not.

But these two things are different even if they look the same at surface.

If you are looking to learn how optimization algorithms work, I recommend starting with small low dimensional problems - there are plenty of them in the examples directory of ceres and elsewhere, and visualizing the progress of your solver on them and playing with it. 

Million variable problems are a bad way to go about this.

Sameer

Hariprasad Kannan

unread,
Jul 30, 2015, 12:10:27 PM7/30/15
to Sameer Agarwal, Ceres Solver
Thanks for the information that you have shared.

I have been developing my implementation and understanding with smaller sized toy problems. But as problem size increases, new issues are coming up - larger condition number, trust region becoming the right choice compared to line search etc.

I understand the difference that you are pointing out between the two concepts.

Nocedal and Wright is pretty good. At the same time, discussing with someone like you is also helpful. Computational Science on stack exchange is also pretty good.

Sameer Agarwal

unread,
Jul 30, 2015, 12:16:02 PM7/30/15
to ceres-...@googlegroups.com
I would argue, that except for runtime performance of code which does vary with problem size, memory locality, cache sizes etc. As far as numerical behavior of the algorithm goes, everything is doable with reasonable sized problems.

The benchmark problems in the examples directory, test the conditioning performance of the solver pretty severely. It helped us debug a lot of issues that we were never able to even identify with when working with large scale problems. 

Look at the problems in nist.cc as a great starting point for non-linear least squares problems and more_garbo_hillstrom for more general problems.

Sameer


Hariprasad Kannan

unread,
Jul 30, 2015, 12:51:18 PM7/30/15
to Ceres Solver
It is an interesting point you are making that one can test numerical issues with reasonable sized problems itself.

I will look into those resources for testing and understanding what's going on.

Reply all
Reply to author
Forward
0 new messages