Fwd: LBFGS Accuracy

David Hall

unread,

Apr 16, 2013, 12:36:37 PM4/16/13

to scala-...@googlegroups.com

Sorry everyone, for some reason Google Groups decided to set a bunch of people (like 50) to disallow posting, including people I either know personally are know through the Scala community. I've reset everyone who was banned from posting to moderated, and I'll set you to allowed next time you post. Sorry.

As for Martin's question, could you please send me the code for both python and scala if it's not too much trouble? I have checked the implementation against others in the past, but this will be good for me to make sure I get it 100% right...

-- David

---------- Forwarded message ----------
From: Martin Mauch <ma...@crealytics.de>
Date: Tue, Apr 16, 2013 at 9:24 AM
Subject: Fwd: LBFGS Accuracy
To: dl...@berkeley.edu

Hi David, I just wanted to send the following email to the breeze Google group, but it seems that I'm not allowed to do that although I am a member.

Maybe you can grant me access? :)

---------- Forwarded message ----------
From: Martin Mauch <martin...@gmail.com>
Date: Tue, Apr 16, 2013 at 6:20 PM
Subject: LBFGS Accuracy
To: scala-...@googlegroups.com

Hi all,

we are comparing the LBFGS optimizer from breeze with SciPy's Python wrapper around the original Fortran code.

Unfortunately, it seems that the breeze version almost always yields worse results, e.g.

breeze:
Minimum: -172358.40291857603
Evaluations: 2137

SciPy:

Minimum: -172636.61570534675
Evaluations: 568

I've tried small values for the tolerance and improvementTol (forked and added as constructor value to LBFGS) and this helps to get as close as in the example above

but with a lot more function evaluations and still not all the same.

In other cases setting small values for tolerance and improvementTol can't force breeze to do more evaluations:

breeze:
Minimum: -1700794.6637494287,
Evaluations: 108

SciPy:

Minimum: -1708001.0462022428
Evaluations: 521

Does anyone have a idea what the differences between the breeze and the Fortran versions are?

I guess/hope, that it's only a parameter that needs to be tweaked for breeze to get the same performance.

Thanks for developing such a great library!

Martin

--

Martin Mauch
Innovationsmanager

Tel: +49 851-213728-0Fax: +49 851-213728-88E-Mail: ma...@crealytics.deWeb: http://www.crealytics.de

crealytics GmbH - Profit Driven Search Marketing
Brunngasse 1
94032 Passau

Oranienstraße 185
10999 Berlin

-----------------------------------------------------------------
Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost
Sitz der Gesellschaft: Passau
Registergericht: Amtsgericht Passau, HRB 7466
-----------------------------------------------------------------
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder
diese E-Mail irrtümlich erhalten haben, informieren Sie bitte
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte
Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht
gestattet.

This e-mail may contain confidential and/or privileged
Informations. If you are not the intended recipient, please
immediately inform the sender and delete this mail. Any
unauthorized copying, disclosure or distribution of this Mail
is not allowed.
-----------------------------------------------------------------

Aaron Defazio

unread,

Apr 16, 2013, 7:32:45 PM4/16/13

to scala-...@googlegroups.com

I can confirm also that the current breeze implementation takes many more iterations to converge than the standard FORTRAN implementation. The behaviour occurs on simple examples like the rosenbrock function.

David Hall

unread,

Apr 16, 2013, 7:57:49 PM4/16/13

to scala-...@googlegroups.com

ok, I'll compare on rosenbrock and figure it out

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/d3hSLSIBMgQJ.
For more options, visit https://groups.google.com/groups/opt_out.

David Hall

unread,

Apr 17, 2013, 11:57:39 AM4/17/13

to scala-...@googlegroups.com

Ok, almost done. There are two problems, one where I reversed the multiplication order of the limited memory (which I swear I fixed a year ago) and another where the line search isn't as awesome as it could be. Both are important for making rosenbrock work. I'll hopefully finish the latter today.

David Hall

unread,

Apr 17, 2013, 8:11:20 PM4/17/13

to scala-...@googlegroups.com

Pushed a fix. It now produces the same sequence of iterates on rosenbrock as liblibfgs when using backtracking search. It would probably be good to also implement their super-fancy line search that reduces the number of iterations on rosenbrock by ~10%.

Aaron Defazio

unread,

Apr 17, 2013, 8:14:42 PM4/17/13

to scala-...@googlegroups.com

I have scala code for a strong-wolfe line search that I could contribute. It uses cubic interpolation for choosing search points.

David Hall

unread,

Apr 17, 2013, 8:16:16 PM4/17/13

to scala-...@googlegroups.com

That'd be great!

I can either handle the port or you can see if it fits into my new LineSearch interface... whichever is fine.

-- David

To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/VRpMtdjjWNQJ.

martin.mauch

unread,

Apr 22, 2013, 5:29:06 AM4/22/13

to scala-...@googlegroups.com

Hi David, Aaron,

posting works now :)

Thanks for improving this so fast! Now the tolerance parameter allows for results that are similar to the SciPy version although Breeze uses more function evaluations for the same accuracy:

breeze:
Minimum: -1707268.2321252606,
Evaluations: 900

SciPy:

Minimum: -1708001.0462022428
Evaluations: 521

@Aaron: I'll happily jump in to port your strong-wolfe line search to breeze in case you don't have the time :)

Thanks again for replying to fast!

Martin

David Hall

unread,

Apr 22, 2013, 12:09:02 PM4/22/13

to scala-...@googlegroups.com

Ok, that's an improvement at least.

One thing, you would happen to be using the CachedDiffFunction wrapper would you? It will greatly reduce the number of function evaluations, if not. I mean to make LBFGS auto-wrap the passed in function, but haven't yet. If you aren't using it, I'd suspect it makes up 80% of the difference.

lbfgs.minimize(new CachedDiffFunction(myFunc))

-- David

To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/4gi7k93UjiIJ.

Aaron Defazio

unread,

Apr 22, 2013, 8:20:55 PM4/22/13

to scala-...@googlegroups.com

I've put the code in a gist:
https://gist.github.com/anonymous/5439754
It uses a different logging library and properties object then breeze, but otherwise should be easy to integrate.

Martin Mauch

unread,

Apr 23, 2013, 4:49:36 AM4/23/13

to scala-...@googlegroups.com

Hi David, Aaron,

the CachedDiffFunction indeed brings the number of function evaluations down a lot. It is now about the same as the SciPy version.

I'm just adapting Aaron's StrongWolfe implementation to fit breeze's interfaces.

I'll issue a pull request once I'm done.

Martin

--
You received this message because you are subscribed to a topic in the Google Groups "Scala Breeze" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-breeze/CeW8fd_cfNA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to scala-breeze...@googlegroups.com.

To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/1pV6N4dPDS0J.

Martin Mauch

unread,

Apr 23, 2013, 10:49:46 AM4/23/13

to scala-...@googlegroups.com

Hi Aaron,

I gave the breezefication of your Strong Wolfe line search a first shot and tried to change as little as possible.

You can find the result here: https://github.com/crealytics/breeze/commit/90ee6e9e3920ff8ee1a82a45e3c6affa5128d967

The tests are all green, but the line search throws a "Line search zoom failed" in our more complicated function.

Unfortunately this is a proprietary algorithm, so I can't post it here...

It seems that in our case there is a invocation of zoom where linit == rinit which leads to low == Bracket(NaN, NaN, NaN) in the next step and the algorithm not being able to find a valid zoom subsequently.

Is the problem ill-posed if the algorithm doesn't work? Or is that a special case that the algorithm should be able to deal with?

Best regards

Martin

David Hall

unread,

Apr 23, 2013, 12:30:55 PM4/23/13

to scala-...@googlegroups.com

sometimes line searches just fail, in my experience. (often there's a bug in the objective, but not always.) There's a LineSearchFailed exception that LBFGS catches, making it reset the history. If two searches in a row fail, it aborts because it can't get any search direction. If you switch to throwing that rather than calling error, it might make things ok.

-- David

Martin Mauch

unread,

Apr 23, 2013, 12:45:27 PM4/23/13

to scala-...@googlegroups.com

Hi David,

I just converted the exceptions in the line search to FirstOrderExceptions:

https://github.com/crealytics/breeze/commit/45fbf96c1fc2567430e16cd3917eb47a3aa040db

Now LBFGS terminates but with this result:

18:35:18.501 [main] ERROR breeze.optimize.LBFGS - Failure! Resetting history: breeze.optimize.FirstOrderException: Line search zoom failed for
low: Bracket(NaN,NaN,NaN)
hi: Bracket(1.0,-5.951212887255733E8,4.4210734062078905E8)
linit: Bracket(1.0,-5.951212887255733E8,4.4210734062078905E8)
rinit: Bracket(1.0,-5.951212887255733E8,4.4210734062078905E8)
18:35:18.509 [main] ERROR breeze.optimize.LBFGS - Failure again! Giving up and returning. Maybe the objective is just poorly behaved?

Minimum = 1.1740027861369298E9

Using BacktrackingLineSearch the algorithm does one history reset as well but calculates the correct result:

18:39:51.826 [main] ERROR breeze.optimize.LBFGS - Failure! Resetting history: breeze.optimize.NaNHistory:
Minimum = -172681.50132468613

So I probably made some mistake while porting or the problem posed somehow results in an edge-case situation that the StrongWolfeLineSearch does not handle fully yet.

David Hall

unread,

Apr 23, 2013, 1:25:54 PM4/23/13

to scala-...@googlegroups.com

Ok, probably looks like a bug then. :-)

Aaron Defazio

unread,

Apr 23, 2013, 8:04:04 PM4/23/13

to scala-...@googlegroups.com

The line setting var low = phi(init) looks wrong to me. Should it be a change to the line above it, t=init?, it should never occur that the two brackets are at the same t.
My implementation is based off of Nocedal & Wright's book, which is the same source as the implementation in scipy (the non-limited memory version).

Martin Mauch

unread,

Apr 24, 2013, 4:54:38 AM4/24/13

to scala-...@googlegroups.com

Hi Aaron, you're right! I had tried that, but that was before I converted all Exceptions to FirstOrderExceptions.

The fix is pushed and your StrongWolfe indeed seems to reduce the number of function evaluations with only minor changes in accuracy :)

BacktrackingLineSearch

Minimum = -172640.0758045734,

timeNeeded = 9.137

evaluations = 722

StrongWolfeLineSearch

Minimum = -172635.31212677842,

timeNeeded = 6.328

evaluations = 547

I've issue the pull request here: https://github.com/scalanlp/breeze/pull/59

As mentioned I could try to make StrongWolfeLineSearch implement ApproximateLineSearch and use the State case class instead of the Bracket case class.

To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/rp4TEW_rl4YJ.

David Hall

unread,

Apr 24, 2013, 11:09:03 AM4/24/13

to scala-...@googlegroups.com

Great! I don't care much about the State class. I'll get this PR integrated today.

martin.mauch

unread,

Apr 25, 2013, 2:06:13 PM4/25/13

to scala-...@googlegroups.com

Hi David,

it seems I didn't pay enough attention, the minimization problem we have is actually constrained...

I've seen that there is an algorithm called LBFGS-B (Bounded) for optimizing with constraints.

Do you know by chance how this algorithm works?

I might give an implementation a shot, but I'm not very versed in that field and my brain overheats when I read to many formulas or Fortran Code ;)

Best regards

Martin

David Hall

unread,

Apr 25, 2013, 2:54:11 PM4/25/13

to scala-...@googlegroups.com

I think I'm kind of in the same boat as you. I haven't tried to really understand the LBFGS-B algorithm, and it looks pretty tough. I rarely have bounded optimization problems (at least ones that aren't just "positive variables", which is easy to deal with by just replacing x with exp(y)), so I haven't bothered.

I might suggest looking into any easier algorithm, or at least one that comes with Matlab code:

http://jmlr.csail.mit.edu/proceedings/papers/v5/schmidt09a/schmidt09a.pdf http://www.di.ens.fr/~mschmidt/Software/PQN.html

It's strictly more powerful than LBFGS-B, but is probably slower for just box-constraint problems. Looks easier to implement though!

This method looks comparatively easy to implement, as well: http://www.cs.utexas.edu/users/inderjit/public_papers/pqnj_sisc10.pdf no code though.

Probably the 09 paper is the easiest.

-- David

To view this discussion on the web visit https://groups.google.com/d/msg/scala-breeze/-/hsAEUW6uLrEJ.

Anthony Di Franco

unread,

Apr 25, 2013, 4:08:09 PM4/25/13

to scala-...@googlegroups.com

While we're talking about optimization algorithms that are more general and easier to implement, let me air the cross entropy method, which is state-of-the-art on even hard combinatorial optimization problems and is intuitive and straightforward to implement. I've got a rough old scala 2.8 implementation a friend gave me a few years ago and I'm revisiting it now as part of gearing back up to revive the project that called for it.
I'll eventually get it done but it's part of a long list of tasks that I'm fitting into a small slice of spare time so I could definitely use some help.

Here are two of the better references:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.1828
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.8174

Martin Mauch

unread,

Apr 25, 2013, 4:13:42 PM4/25/13

to scala-...@googlegroups.com

Hi Anthony,

if you send me the code I can try to integrate it into breeze. If it's already written in Scala it shouldn't be that hard.

Best regards
Martin

David Hall

unread,

Apr 25, 2013, 4:19:52 PM4/25/13

to scala-...@googlegroups.com

Thanks Anthony.

Martin, it's probably not what you want right now. It's a randomized 0th order method. Useful sometimes, but I doubt here.

-- David

David Hall

unread,

Apr 25, 2013, 4:46:28 PM4/25/13

to scala-...@googlegroups.com

(I'm not opposed to having an implementation, just know that it's a very different class of optimization algorithm.)

Anthony Di Franco

unread,

Apr 25, 2013, 4:47:15 PM4/25/13

to scala-...@googlegroups.com

While it is true that I am looking at it for purposes far removed from the usual numerical optimization problems and pretty firmly in the POMDP planning domain, and it comes out of rare-event simulation, and most of the published examples are about of those classes of hard, I'd still be quite curious to see how well it does on a problem where L-BFGS is a typical first guess, such as, apparently, yours. Both in terms of raw performance and ease of implementation and use.
From my position of mostly a priori information it seems like it has potential to be a sensible default of sorts, scaling from simple to hard problems fairly gracefully, the scala of optimization if you will, and this kind of information would help to test that idea.

I am also intrigued by perceived connections to proof theory via information theory that vaguely suggest to me a rationale for expecting such general utility.

Reply all

Reply to author

Forward