For a better usage of BFGS for MLE

Skip to first unread message

Axel Breuer

Oct 5, 2022, 5:20:05 AM10/5/22
to pystatsmodels

While going through the code of statsmodels\base\, I saw that in the method, you have the following lines:

       if cov_params_func:
            Hinv = cov_params_func(self, xopt, retvals)
        elif method == 'newton' and full_output:
            Hinv = np.linalg.inv(-retvals['Hessian']) / nobs
        elif method == 'newton' and full_output:
            Hinv = np.linalg.inv(-retvals['Hessian']) / nobs
        elif not skip_hessian:
            H = -1 * self.hessian(xopt)
            invertible = False
            if np.all(np.isfinite(H)):
                eigvals, eigvecs = np.linalg.eigh(H)
                if np.min(eigvals) > 0:
                    invertible = True

            if invertible:
                Hinv = / eigvals)).dot(eigvecs.T)
                Hinv = np.asfortranarray((Hinv + Hinv.T) / 2.0)
                warnings.warn('Inverting hessian failed, no bse or cov_params '
                              'available', HessianInversionWarning)
                Hinv = None

Actually, when method=='bfgs', Hinv is already in the optimizer's outputs: retvals['Hinv'].

I would encourage to use retvals['Hinv'] instead of calculating the inverse of H, not only for saving a bit of CPU (by avoiding a re-calculation of inverse and eigenvectors), but also for stability (as bfgs calculates some sort of 'robust' estimator of the inverse of the actual Hessian).

Furthermore, in many other stat packages doing MLE use bfgs, they do use Hinv returned by bfgs: Statsmodels would then comply to other existing statistical codes (which is useful for result comparison between packages).

I thought that this could maybe be added to a forthcoming stasmodels release ?

Kind regards,

Oct 5, 2022, 8:25:07 AM10/5/22

I'm not sure whether the following applies to bfgs, I looked more at lbfgbs.

In lbfgbs, the hessian is build up during the iterations and is only an approximation to the hessian.
if there are only a few iterations needed in the optimization, or if we start at the optimum, then the hessian approximation is pretty bad.

I remember other cases, where I was using our simple newton methods to improve the estimate after bfgs optimization.

But I never looked specifically at the bfgs inverse hessian, at least I don't remember any details

Also in most standard models we have an analytical expression for the hessian which is most likely more accurate than a numerical approximation.

This would need a lot of testing before we change this. Our unit tests are all based on our current hessian computation.
Replacing it would be risky, although changing the hessian to bfgs would only be a relatively small code change and the unit test suite would check it for many cases.

Which other packages use the bfgs inverse hessian?


You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Oct 5, 2022, 8:45:10 AM10/5/22

> but also for stability (as bfgs calculates some sort of 'robust' estimator of the inverse of the actual Hessian).

We might not want to have some sort of robust hessian.
If we want a regularized hessian, then it has to be a known approximation and a separate decision whether to return a regularized hessian to the user.

For example, I have added a tiny ridge penalty to our newton method for better optimization with badly behaved hessian away from the optimum.
However, the returned hessian is still without ridge penalty.
I recently ran into a stackoverflow question with singular exog, where returning a regularized hessian would have been useful.
Also we have an old issue using pinv in newton

Returning by default a regularized hessian is a big decision. We made it for pinv in linear models, but it is still unclear whether and how to add an option for it to models with nonlinear optimization.


Axel Breuer

Oct 7, 2022, 4:55:31 AM10/7/22
Hi Josef,

Thank you very much for your quick response:
your arguments are clear and perfectly understandable.

I realize that my first message was a bit unclear, so I would like to rephrase it.

Indeed, if you have an analytical hessian you should:
* During optimization iterations: pass on that hessian to a 2nd order method such as Newton method
* Once optimization is finished: calculate the inverse of that hessian

However, if you don't have access to the analytical hessian, gradients and/or hessians are estimated via finite differences.
(My original mail was concerned with that specific situation.)
When using bfgs, you have 2 alternatives for calculating the inverse of the hessian at the end of the optimization:
1) calculate the inverse of the finite difference approximation of the hessian
2) use the finite difference approximation of the inverse of the hessian (which is calculated at each bfgs iteration)
When I mentioned 'robustness' in my original mail, I meant there are situation where 1) returns non invertible (or non SDP)  hessians but 2) returns a (finite) inverse of hessian.

Concerning your question on packages, I am deeply sorry but I cannot remember where I read this trick for the first time, all I know, is that I am using it in all my own MLE codes when I compute standard errors. Since then, I must have convinced myself that the trick is used everywhere.

Kind regards,

Reply all
Reply to author
0 new messages