Optim.jl line search problems

695 views
Skip to first unread message

Thomas Covert

unread,
Jul 30, 2014, 12:27:36 PM7/30/14
to julia...@googlegroups.com
Recently I've encountered line search errors when using Optim.jl with BFGS.  Here is an example error message

ERROR: assertion failed: lsr.slope[ib] < 0

 in bisect! at /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:577

 in hz_linesearch! at /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:273

 in hz_linesearch! at /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:201

 in bfgs at /pathtojulia/.julia/v0.3/Optim/src/bfgs.jl:121

 in optimize at /pathtojulia/.julia/v0.3/Optim/src/optimize.jl:113

while loading /pathtocode/code.jl, in expression starting on line 229


I've seen this error message before, and its usually because I have a bug in my code that erroneously generates function values or gradients which are very large (i.e., 1e100).  However, in this case I can confirm that the "x" I've passed to the optimizer is totally reasonable (abs value of all points less than 100), the function value at that x is reasonable (on the order of 1e6), the gradients are  reasonable (between -100 and +100), and the entries in the approximate inverse Hessian are also reasonable (smallest abs value is about 1e-9, largest is about 7).  


This isn't a failure on the first or second iteration of BFGS - it happens on the 34th iteration.


Unfortunately its pretty hard for me to share my code or data at the moment, so I understand that it might be challenging to solve this problem but any advice you guys can offer is appreciated!


-Thom

jbeginner

unread,
Jul 30, 2014, 1:43:16 PM7/30/14
to julia...@googlegroups.com
This is not really a solution for this problem but have you tried the NLopt library? From my experience it produces much more stable results and because of problems like the one you describe I have switched to it. I think there is an L-BFGS option also. Although I did not get AD to work with it. The description for all algorithms can be seen here:

Thomas Covert

unread,
Jul 30, 2014, 1:45:43 PM7/30/14
to julia...@googlegroups.com
No, I haven't tried that yet - might someday, but I like the idea of running julia native code all the way...  

However, I did find that manually switching the line search routine to "backtracking_linesearch!" did the trick, so at least we know the problem isn't in Optim.jl's implementation of BFGS itself!

-thom

John Myles White

unread,
Jul 30, 2014, 4:44:51 PM7/30/14
to julia...@googlegroups.com
Would be useful to understand exactly what goes wrong if we want to fix this problem. I’m mostly used to errors caused by inaccurate gradients, so I don’t have an intuition for the cause of this problem.
 
— John

Thomas Covert

unread,
Jul 30, 2014, 6:17:41 PM7/30/14
to julia...@googlegroups.com
I've done some more sleuthing and have concluded that the problem was on my end (a bug in the gradient calculation, as you predicted). 

Is an inaccurate gradient the only way someone should encounter this assertion error?  I don't know enough about line search methods to have an intuition about that, but if it is the case, maybe the line search routine should throw a more informative error?

-Thom

John Myles White

unread,
Jul 30, 2014, 7:24:26 PM7/30/14
to julia...@googlegroups.com
I’ve never seen our line search methods produce an error that wasn’t caused by errors in the gradient. The line search methods generally only work with function values and gradients, so they’re either buggy (which they haven’t proven to be) or they’re brittle to errors in function definitions/gradient definitions.

Producing better error message would be great. I once started to do that, but realized that I needed to come back to fully understanding the line search code before I could insert useful errors. Would love to see improvements there.

 — John

Thomas Covert

unread,
Aug 19, 2014, 2:51:37 PM8/19/14
to julia...@googlegroups.com
I'm seeing this same error (ERROR: assertion failed: lsr.slope[ib] < 0) again, and this time my gradients (evaluated at "reasonable" input values) match the finite difference output generated by Calculus.jl's "gradient" function.  The function I am trying to minize is globally convex (its a multinomial logit log-likelihood).

I encounter this assertion error after a few successful iterations of BFGS and it is caused by NAN's in the gradient of the test point.  BFGS gets to this
test point because the step size it passes to hz_linesearch eventually gets to be large, and a big enough step can cause floating point errors in the calculation of the the derivatives.  For example, on a recent minimization attempt, the assertion error happens when "c" (the step size passed by bfgs to hz_linesearch) appears to be about 380.

I think this is happening because hz_linesearch (a) expands the step size by a factor of 5 (see line 280 in hz_linesearch) until it encounters upward movement and (b) passes this new value (or a moving average of it) back to the caller (i.e., bfgs).  So, the next time bfgs calls hz_linesearch, it starts out with a potentially large value for the first step.

I don't really know much about line search routines, but is this way things ought to be?  I would have thought that for each new call to a line search routine, the step size should reset to a default value.

By the way, is it possible to enable display of the internal values of "c" in the line search routines?  It looks like there is some debugging code in there but I'm not sure how to turn it on.

-thom

Thomas Covert

unread,
Aug 20, 2014, 10:16:41 AM8/20/14
to julia...@googlegroups.com
Ok after reading the paper which the hz_linesearch! routine is based on, I can see that I'm wrong about this.  Still puzzled, but definitely wrong!

John Myles White

unread,
Aug 20, 2014, 10:18:29 AM8/20/14
to julia...@googlegroups.com
I don’t think I’m going to have time to look into this soon, but why do you use BFGS? In my experience L-BFGS is almost always better.

Of course, we want our BFGS code to be better. But I use BFGS only quite rarely because of its O(N^2) complexity.

 — John

Thomas Covert

unread,
Aug 20, 2014, 10:23:19 AM8/20/14
to julia...@googlegroups.com
My (current) objective function has about 30 parameters, so N^2 complexity isn't a problem (storage-wise or matrix multiplication time wise).  Also, for my current work, the objective function is much slower than the optimization routine itself, so the overhead of a full inverse Hessian is relatively small.

In Optim.jl, L-BFGS seems to use the same line search routine as BFGS.  Is there a reason to think it should take substantively different search path?


-thom

John Myles White

unread,
Aug 20, 2014, 10:33:21 AM8/20/14
to julia...@googlegroups.com
It seems odd that your objective function takes less time than the optimization routine itself, unless you include the calls to your objective function in the cost of the optimization routine. The optimization routine does very little work: most of the line search’s cost is induced by repeatedly calling calling the objective function at trial points to decide a stepsize.

The search path for L-BFGS is likely to be somewhat different from BFGS because it uses a different approximation to the Hessian. Whether that difference is substantive is problem-dependent.

— John

Thomas Covert

unread,
Aug 20, 2014, 10:37:24 AM8/20/14
to julia...@googlegroups.com
sorry if I was unclear - what I meant to say is that a single call to the objective function and its gradients represents considerably more computational work than what goes in inside the optimizer.

I will see if L-BFGS does a better job later today.  Thanks for your help.

-thom

John Myles White

unread,
Aug 20, 2014, 11:03:02 AM8/20/14
to julia...@googlegroups.com
Let’s see my suggestion actually helps. If it does, then you should thank me. :)

 — John
Reply all
Reply to author
Forward
0 new messages