How to use LBFGS.ApproximateHessian()

149 views
Skip to first unread message

xuepeng sun

unread,
Jan 8, 2015, 9:11:10 AM1/8/15
to scala-...@googlegroups.com
Hi,

   I'd like to use LBFGS.ApproximateHessian to compute the Hessian of an objective function with large sparse samples.
Does anyone have a use case of this class?  It seems that no test case are available in the testSuite.  Thanks a lot. 

Best,

Xuepeng


David Hall

unread,
Jan 13, 2015, 12:27:31 PM1/13/15
to scala-...@googlegroups.com
Sorry for the delay.

It's really not meant to be used outside of the LBFGS framework. You need a sequence of gradient steps, recording \delta x and \delta \nabla f(x) by calling "updated."

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/475f55e1-2e71-4c70-ab55-91a7ac2f0b96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Debasish Das

unread,
Feb 11, 2015, 3:09:46 AM2/11/15
to scala-...@googlegroups.com
Hi,

Please look in the following PR:


LBFGS.minEigen will generate the minimum eigen value using power iterations

LBFGS.maxEigen is tricky since I am building CompactHessian from ApproximateInverseHessian. The code fails right now but I asked David's help in fixing it...If CompactHessian generation works, power iterations will extract maxEigen value...

Thanks.
Deb

Soila Pertet Kavulya

unread,
Jun 13, 2015, 4:05:24 AM6/13/15
to scala-...@googlegroups.com
Hi David,

Would SecondOrderFunction.empirical() yield the correct approximation of the Hessian for L-BFGS?

Thanks,

Soila


On Tuesday, January 13, 2015 at 9:27:31 AM UTC-8, David Hall wrote:

David Hall

unread,
Jun 13, 2015, 4:08:47 AM6/13/15
to scala-...@googlegroups.com
No, LBFGS builds its own hessian that is a relatively low rank (2 * "memory", I think) approximation to the inverse hessian by tracking the history of steps it takes. The empirical hessian is a numeric approximation to the hessian at the current point using standard multivariate calculus results.

Soila Pertet Kavulya

unread,
Jun 15, 2015, 2:54:06 PM6/15/15
to scala-...@googlegroups.com
Thanks David,

Sorry, my question was a bit unclear. I would like to approximate the Hessian after the L-BFGS algorithm completes so that I can compute the standard error of the coefficients. The approximate Hessian generated by L-BFGS would be out-of-date because the most recent approximation of the Hessian is for time (t-1). I was looking at R source code, and it looks like they are using numeric approximation to generate the Hessian after the optimizer converges if the flag "hessian=TRUE" is set in the optim function. I was wondering if I could use SecondOrderFunction.empirical() in Breeze to generate the numeric approximation.

Soila

David Hall

unread,
Jun 15, 2015, 6:11:17 PM6/15/15
to scala-...@googlegroups.com
Ah. SecondOrderFunction only computes Hessian vector products H * x. .empirical() computes H * x \approx (f'(theta + x * epsilon) - f'(theta))/epsilon.

I guess it's possible to do the work to extract the relevant bits of the inverse hessian given that and some kind of conjugate gradient algorithm?

What does R do, exactly?

-- David

Soila Pertet Kavulya

unread,
Jun 16, 2015, 12:37:17 PM6/16/15
to scala-...@googlegroups.com
Hi David,

Thank you so much for looking into this. I didn't realize that empirical hessian was computing the hessian vector product. optim.c in the R stats package is using central difference to approximate the hessian. They calculate the first order difference as  (f(theta + epsilon) - f(theta - epsilon))/ 2*epsilon. They also scale the parameters if the user input a scaling function. Next, they compute the second order derivatives based on the first order derivatives. It looks like the best option might be for me to implement central difference.

Soila

David Hall

unread,
Jun 16, 2015, 12:44:33 PM6/16/15
to scala-...@googlegroups.com
great, let me know if you want to contribute it back when you do!

Soila Pertet Kavulya

unread,
Jun 16, 2015, 1:37:57 PM6/16/15
to scala-...@googlegroups.com
Thanks! Will do.

Soila Pertet Kavulya

unread,
Jun 18, 2015, 6:32:45 PM6/18/15
to scala-...@googlegroups.com
Hi David,

I created a pull request for the approximate hessian. https://github.com/scalanlp/breeze/pull/413 

This is a bare bones implementation. I'm new to breeze and I was not sure what conventions and data structures to use. Could you take a look at it and let me know what I should change? 

I noticed that breeze.optimize.ApproximateGradientFunction is using forward differences instead of central differences. Would it be ok if I changed it to use central differences?

Thanks,

Soila
Reply all
Reply to author
Forward
0 new messages