Reurn Chi2 fit statistic for linear regression models

174 views
Skip to first unread message

Sean Mulcahy

unread,
Jun 20, 2014, 5:35:20 PM6/20/14
to pystat...@googlegroups.com
I'm trying to figure out how to return chi2 as measure of fit for linear regression models (OLS, WLS, etc). Ulitmately I'll use chi2 to calculate reduced chi-squared, which is the accepted goodness of fit measure in my field.  Here's some example code with actual data I'm using.


import numpy as np
import statsmodels.api as sm
from matplotlib import pyplot as plt

# measured data
x = np.array([0.514282,  1.963679,  2.174223,  2.110413,  0.152505,  0.023114])
y = np.array([0.284664,  0.289194,  0.289830,  0.289639,  0.283551,  0.283176])
# stdev in y measurment
s = np.array([0.000029, 0.000029, 0.000030, 0.000029, 0.000029, 0.000029])
# convert stdev to variance
w = 1/(s*s)

# add constant value to calculate intercept
X = sm.add_constant(x)

# weighted least squares regression
res_wls = sm.WLS(y, X, weights=w).fit()
print(res_wls.summary())

# plot the data and model fit
fig, ax = plt.subplots(figsize=(10, 7))
plt.plot(x, y, 'o')
plt.plot(x, res_wls.fittedvalues)

josef...@gmail.com

unread,
Jun 20, 2014, 6:43:15 PM6/20/14
to pystatsmodels
Given my discussions on the scipy mailing list and github issues, I
assume I know roughly what you want.

My guess is that it is (res_wls.wresid**2) / res.df_resid
However, how we internally scale the residuals has changed in master
compared to the 0.5 release.

Is your background in astronomy?

Josef

josef...@gmail.com

unread,
Jun 20, 2014, 6:59:30 PM6/20/14
to pystatsmodels
we don't divide by the df_resid when we use the chisquare distribution
so I guess it should be

chi2 = (res_wls.wresid**2) which should be chisquare distributed with
degrees of freedom equal to res.df_resid

Josef

josef...@gmail.com

unread,
Jun 21, 2014, 8:11:13 AM6/21/14
to pystatsmodels
it's already pre-calculated in ssr and mse_resid  IIUC

chi2_gof = res_wls.ssr
chi2reduced_gof = res_wls.mse_resid

Josef

Sean Mulcahy

unread,
Jun 21, 2014, 11:05:57 AM6/21/14
to pystat...@googlegroups.com
That's great, thanks!

Sean Mulcahy

unread,
Jun 21, 2014, 11:07:28 AM6/21/14
to pystat...@googlegroups.com
I'm a geologist and reduced chi2 is commonly used to measure the goodness of fit for applications in geochronology.

josef...@gmail.com

unread,
Jun 21, 2014, 11:26:36 AM6/21/14
to pystatsmodels
On Sat, Jun 21, 2014 at 11:07 AM, Sean Mulcahy <srmu...@gmail.com> wrote:
I'm a geologist and reduced chi2 is commonly used to measure the goodness of fit for applications in geochronology.

Interpreting reduced chi2 as an informative statistics works under the hypothesis that the weights reflect the actual variance of the observations (noise, error term).

Under this assumption, we would also have to adjust the reported standard errors of the parameters, bse, and the confidence intervals, see my separate thread about VWLS.

I don't know if you only want chi2_gof and the reduced chi2 for interpretation, or the full results under the assumption that the variance is not scaled.

different fields work with different assumptions on this.

Josef
Reply all
Reply to author
Forward
0 new messages