Calculation of Hessian for Scipy minimization

sam mahdi

unread,

Apr 13, 2023, 10:07:04 AM4/13/23

to pystatsmodels

Hello,

I am trying to understand/learn how to get errors for my minimized results, and was referred to statsmodels for this.

I'm new to statsmodels and just wanted to confirm my setup since there are no example for Hessians on the docs.

I have a script that is doing some minimization.

import numpy as np

from scipy.optimize import basinhopping

from statsmodels.tools.numdiff import approx_hess2

solution=basinhopping(fun,minimizer_kwargs={"args":(input1,input2),"method" : 'Powell',"bounds":((0,np.inf),)*2}, x0=np.array([1.0,1.0]))

What I wanted to see is whether or not the setup I have for the Hessian is correct, because it's results don't make much sense to me. This is how I have set it up

hessian=approx_hess2(solution.x,fun,args(input1,input2))

But the output doesn't make much sense to me. If the solution for basinhopping is this:

message: ['requested number of basinhopping iterations completed successfully']
success: True
fun: 2.9025799548976257
x: [ 8.341e+04 4.603e-03]
nit: 100
minimization_failures: 0
nfev: 11311
lowest_optimization_result: message: Optimization terminated successfully.
success: True
status: 0
fun: 2.9025799548976257
x: [ 8.341e+04 4.603e-03]
nit: 2
direc: [[ 1.000e+00 0.000e+00]
[ 0.000e+00 1.000e+00]]
nfev: 84

Then the Hessian I get is this:

[[1.66708814e-11 3.04307003e-04]
[3.04307003e-04 5.82690162e+03]

With the inverse diagonal being being [1.28435430e+12 3.67456319e-03]

Considering the values are 8e4 and 4e-3, the inverse hessian diagonal for these values is...really high?

Furthermore, I am a bit confused as to what is truly being calculated here. From my understanding, error propagation uses a covariance matrix as such

[df/dx df/dy][covariance matrix][df/dx df/dy]^T

I know the script calculates the derivative using finite differences (so this would calculate df/dx), but I believe the Hessian is the 2nd derivative, which from what I understand the inverse Hessian is therefore the covariance matrix, and the diagonal then the variances?

I apologize for the confusion here, I am just trying to understand/learn how to get errors for my minimized results, and am relatively new to the process.

Any help would greatly be appreciated!

josef...@gmail.com

unread,

Apr 13, 2023, 10:39:50 AM4/13/23

to pystat...@googlegroups.com

This is correct if the objective function of the minimization is a negative log-likelihood function or least squares.

The inverse hessian is not the covariance of the parameter estimates for arbitrary objective functions.

In those cases the cov_params has a sandwich form of general M-estimators.

Your case might have scaling issues when computing numerical derivatives. Your parameters have very different scales.

It could also be that your first parameter is not well identified and therefore has a large variance.

You can also check the hessian with other Python packages like numdiff.

Our numerical derivatives are not adaptive, giving up robustness to difficult cases in favor of speed and vectorization.

Can you explain or show what your objective function for the minimization is?

Josef

I apologize for the confusion here, I am just trying to understand/learn how to get errors for my minimized results, and am relatively new to the process.

Any help would greatly be appreciated!

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/894a7f7a-49da-40e9-acdd-14cfa97c91aen%40googlegroups.com.

sam mahdi

unread,

Apr 13, 2023, 11:01:47 AM4/13/23

to pystatsmodels

"This is correct if the objective function of the minimization is a negative log-likelihood function or least squares.

The inverse hessian is not the covariance of the parameter estimates for arbitrary objective functions.

In those cases the cov_params has a sandwich form of general M-estimators."

Could you please break this down? I'm not super familiar with the terminology, so I'm completely confused as to what any of this means. The inverse hessian is not the covariance? Then are the errors I'm calculating not correct?

Briefly put, my "fun" is as such, input parameters calculate the "weights" of some given constants, which is used to generate a "model". The input parameters are then minimized as such so the model fits the experimental data:

def model:

model=pF*F+pO*O+pC*C #where F,O,C are given constants

def fun(k,k1):

pF=(np.sqrt((k*k1)**2+(8*io*k*k1)+(8*io*k*k1**2))-(k*k1))/(4*(1+k1)) pC=(((4*io)+(4*io*k1))/(4*(1+k1)**2))-(pF/(1+k1)) pO=k1*pC

return np.sum((((experimental_data-(scale*model))/data_error**2)**2)/model.size) #this is just a chi2

There is also a scale factor, the model needs to scale up to the data, so usually this is solved alongside the other 2 inputs. However, for some reason "Powell" cannot minimize the scalar factor at all, but it can minimize the other 2 inputs quite well. On the other hand, Nelder-Mead does a great job minimizing the scale, but not the other 2 inputs. Finally, while "Powell" is good, due to the rough landscape, "Powell" alone gives inconsistent varied answers, so I have to use it in conjunction with basinhopping to get consistent answers across the board. So all that put together, my script is something like this

scale_solution=minimize(fun2,x0=np.array([input1,input2,scale]),method='Nelder-Mead',bounds=((0,np.inf),)*3) #note fun2 is just fun2, but the scale is adjustable

basin_solution=basinhopping(fun,minimizer_kwargs={'args':(scale_solution.x[2]),'method':'Powell','bounds':((0,np.inf),)*2},x0=np.array([input1,input2])

hessian=approx_hess1(basin_solution.x,fun,args=(scale_solution.x[2]))

josef...@gmail.com

unread,

Apr 13, 2023, 11:56:25 AM4/13/23

to pystat...@googlegroups.com

On Thu, Apr 13, 2023 at 11:01 AM sam mahdi <sammah...@gmail.com> wrote:

"This is correct if the objective function of the minimization is a negative log-likelihood function or least squares.
The inverse hessian is not the covariance of the parameter estimates for arbitrary objective functions.
In those cases the cov_params has a sandwich form of general M-estimators."

Could you please break this down? I'm not super familiar with the terminology, so I'm completely confused as to what any of this means. The inverse hessian is not the covariance? Then are the errors I'm calculating not correct?

What you have below is least squares, so no problem using hessian or jacobian product for standard errors.

Nonlinear least squares is also handled by scipy curve_fit and the lmfit package.

Briefly put, my "fun" is as such, input parameters calculate the "weights" of some given constants, which is used to generate a "model". The input parameters are then minimized as such so the model fits the experimental data:

def model:
model=pF*F+pO*O+pC*C #where F,O,C are given constants

def fun(k,k1):
pF=(np.sqrt((k*k1)**2+(8*io*k*k1)+(8*io*k*k1**2))-(k*k1))/(4*(1+k1)) pC=(((4*io)+(4*io*k1))/(4*(1+k1)**2))-(pF/(1+k1)) pO=k1*pC

return np.sum((((experimental_data-(scale*model))/data_error**2)**2)/model.size) #this is just a chi2

Is data_error**2 correct? Usually we standardize by standard deviation.

There is also a scale factor, the model needs to scale up to the data, so usually this is solved alongside the other 2 inputs. However, for some reason "Powell" cannot minimize the scalar factor at all, but it can minimize the other 2 inputs quite well. On the other hand, Nelder-Mead does a great job minimizing the scale, but not the other 2 inputs. Finally, while "Powell" is good, due to the rough landscape, "Powell" alone gives inconsistent varied answers, so I have to use it in conjunction with basinhopping to get consistent answers across the board. So all that put together, my script is something like this

powell is not a good optimizer. It often has problems with converging to a proper solution.

Did you try basinhopping plus lbfgsb? The function looks locally smooth enough.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/ccb46ac2-0fad-49ad-b28b-97848470fc2cn%40googlegroups.com.

sam mahdi

unread,

Apr 13, 2023, 12:50:45 PM4/13/23

to pystatsmodels

Is data_error**2 correct? Usually we standardize by standard deviation.

The standard deviation (which I call the error here) is squared, so yes.

powell is not a good optimizer. It often has problems with converging to a proper solution.
Did you try basinhopping plus lbfgsb? The function looks locally smooth enough.

Powell with basinhopping is the only answer that gives consistent results with different inputs. L-BFGS-B fails to even find a solution, same with Newton-CG and constrained-trust. Nelder-Mead does succeed in finding a solution, but is highly dependent on initial starting conditions. This is why I am attempting to get errors, because with so much variability I want to ensure my values are accurate (i.e. a solution with a variance bigger than the solution is a useless solution, no matter how consistently I arrive to it).

The function is highly dependent on both the solver and initial starting conditions, arriving to completely different solutions with identical chi2. The only resolution I've found to this is as stated, Powell with basinhopping. I am curious though, why is Powell a bad solver?

josef...@gmail.com

unread,

Apr 13, 2023, 1:03:18 PM4/13/23

to pystat...@googlegroups.com

We had problems in statsmodels with powell convergence a long time ago and then stopped using it.

Recent issue

https://mail.python.org/archives/list/scip...@python.org/thread/5QYM4JBYSU4IL75JWGAXPIFXGWWTLVIA/#5QYM4JBYSU4IL75JWGAXPIFXGWWTLVIA

scipy might get replacements for it with newer "Powell" versions.

https://github.com/scipy/scipy/issues/18118

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/e5d9da41-642a-4538-abd6-07cacba2fdean%40googlegroups.com.

sam mahdi

unread,

Apr 13, 2023, 1:09:56 PM4/13/23

to pystatsmodels

Well this is concerning, unfortunately Powell with Basin Hopping pair is the only model that works. And while the solver might not find the minimum the best, I can attest the fun value is the minimum, so the solver is indeed working (at least as well as all the other solvers). Basin Hopping with any other solver just gives random answers depending on initial starting conditions (could this be why my error is so high? There are just infinite solutions so the variance for any one solution is incredibly high?).

josef...@gmail.com

unread,

Apr 13, 2023, 2:12:31 PM4/13/23

to pystat...@googlegroups.com

On Thu, Apr 13, 2023 at 1:09 PM sam mahdi <sammah...@gmail.com> wrote:

Well this is concerning, unfortunately Powell with Basin Hopping pair is the only model that works. And while the solver might not find the minimum the best, I can attest the fun value is the minimum, so the solver is indeed working (at least as well as all the other solvers). Basin Hopping with any other solver just gives random answers depending on initial starting conditions (could this be why my error is so high? There are just infinite solutions so the variance for any one solution is incredibly high?).

The hessian (co)variance only depends on local curvature. High standard errors means that the objective function is almost flat in some directions in the neighborhood of the minimum.

With numerical derivatives this depends on the stepsize, which might not be very good if there are scaling problems.

Another way of optimization that I often use in problem cases is to use a derivative based optimizer like bfgs starting at the values that a robust optimizer found (for me usually nelder-mead as more robust optimizer).

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/32f24fec-adee-44f5-bb0b-821165037de1n%40googlegroups.com.

sam mahdi

unread,

Apr 13, 2023, 5:25:42 PM4/13/23

to pystatsmodels

So you're saying to use bfgs Hessian calculation instead of statsmodels?

josef...@gmail.com

unread,

Apr 13, 2023, 5:38:15 PM4/13/23

to pystat...@googlegroups.com

On Thu, Apr 13, 2023 at 5:25 PM sam mahdi <sammah...@gmail.com> wrote:

So you're saying to use bfgs Hessian calculation instead of statsmodels?

no, that's worse.

I use bfgs (or a newton method) to improve the minimization, but we use our statsmodels hessians for the standard errors.

Only numerical derivatives with adaptive stepsize, like in numdiff package, can provide a better hessian for badly scaled cases.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/21195ed2-4e3d-414f-b2ae-205ac4814550n%40googlegroups.com.

sam mahdi

unread,

Apr 13, 2023, 7:41:55 PM4/13/23

to pystatsmodels

Ah I see. I was curious, I had noticed in another post that you also multiply the inverse Hessian by the termination value

https://stackoverflow.com/questions/43593592/errors-to-fit-parameters-of-scipy-optimize (the ftol in this example).

In my scenario, I am just looking at the inverse Hessian diagonal (for variances), but should I also be multiplying it by the termination value (i.e. 1e-9, this would change the results massively)?

Additionally, if I'm reporting Chi and not Chi2 (i.e. I take the sqrt root in my minimization), I presume I should also be taking the square root of the inverse Hessian that is calculated?

Reply all

Reply to author

Forward