anova_wald convenience function

397 views
Skip to first unread message

josef...@gmail.com

unread,
Dec 7, 2014, 12:36:19 AM12/7/14
to pystatsmodels

No ANOVA table, but a Wald table that calculates wald (F- or chisquare) test for terms that include more than one column

This corresponds to a type 3 ANOVA, without ssr because that won't work with robust covariance matrices.

options: 
`skip_single` to ignore single column terms that are already covered by the summary table
`combine_terms`  to jointly test several terms that include the same substring
    useful if we want to know if one variable that is involved in an interaction has no effect at all

It's just a convenient loop over `wald_test` that creates the constraint matrices from the formula term information.
will go to toplevel LikelihoodModel, and will work with all models that have formula information

>>> print(anova_wald(res_ols))
                                       chi2          PR(>chi2)  df
Intercept                        279.754525  8.49355252808e-63   1
C(Duration, Sum)                   5.367071    0.0205204120227   1
C(Weight, Sum)                    24.864890  3.98710515153e-06   2
C(Duration, Sum):C(Weight, Sum)    0.352005     0.838615921347   2
>>> 
>>> 
>>> print(anova_wald(res_ols, skip_single=True))
                                      chi2          PR(>chi2)  df
C(Weight, Sum)                   24.864890  3.98710515153e-06   2
C(Duration, Sum):C(Weight, Sum)   0.352005     0.838615921347   2
>>> 
>>> 
>>> aw = anova_wald(res_glm, skip_single=False, 
...                     combine_terms=['Duration', 'Weight'])
>>> print(aw)
                                       chi2          PR(>chi2)  df
Intercept                        279.754525  8.49355252808e-63   1
C(Duration, Sum)                   5.367071    0.0205204120227   1
C(Weight, Sum)                    24.864890  3.98710515153e-06   2
C(Duration, Sum):C(Weight, Sum)    0.352005     0.838615921347   2
Duration                           6.019038     0.110687659046   3
Weight                            24.874835  5.33106659939e-05   4
>>> 
>>> 
>>> #for reference
... 
>>> 
>>> print(res_ols.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:       np.log(Days + 1)   R-squared:                       0.387
Model:                            OLS   Adj. R-squared:                  0.327
Method:                 Least Squares   F-statistic:                     6.449
Date:                Sun, 07 Dec 2014   Prob (F-statistic):           0.000103
Time:                        00:21:04   Log-Likelihood:                -60.212
No. Observations:                  57   AIC:                             132.4
Df Residuals:                      51   BIC:                             144.7
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
=============================================================================================================
                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------------------------
Intercept                                     1.6443      0.098     16.726      0.000         1.452     1.837
C(Duration, Sum)[S.1]                         0.2277      0.098      2.317      0.021         0.035     0.420
C(Weight, Sum)[S.1]                          -0.5844      0.144     -4.070      0.000        -0.866    -0.303
C(Weight, Sum)[S.2]                          -0.0429      0.137     -0.314      0.754        -0.311     0.225
C(Duration, Sum)[S.1]:C(Weight, Sum)[S.1]    -0.0848      0.144     -0.591      0.555        -0.366     0.197
C(Duration, Sum)[S.1]:C(Weight, Sum)[S.2]     0.0359      0.137      0.263      0.793        -0.232     0.304
==============================================================================
Omnibus:                        6.457   Durbin-Watson:                   1.873
Prob(Omnibus):                  0.040   Jarque-Bera (JB):                2.876
Skew:                          -0.258   Prob(JB):                        0.237
Kurtosis:                       2.028   Cond. No.                         1.92
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
>>> 

same result with GLM

>>> print anova_wald(res_glm, skip_single=False, combine_terms=['Duration', 'Weight'])
                                       chi2          PR(>chi2)  df
Intercept                        279.754525  8.49355252808e-63   1
C(Duration, Sum)                   5.367071    0.0205204120227   1
C(Weight, Sum)                    24.864890  3.98710515153e-06   2
C(Duration, Sum):C(Weight, Sum)    0.352005     0.838615921347   2
Duration                           6.019038     0.110687659046   3
Weight                            24.874835  5.33106659939e-05   4



josef...@gmail.com

unread,
Dec 15, 2014, 2:18:50 PM12/15/14
to pystatsmodels
PR to add it as a method to the LikelihoodModelResults class (it will work for most models outside of TSA)



I'm not sure about the name of the method
currently it's wald_anova, an alternative is wald_test_terms
It's not really an ANOVA, it's just a Wald test for terms. It provides essentially the same information as ANOVA type 3, but without requiring any assumptions that make Sum of Squares meaningful.
similar to `anova_lm` type 3 with a heteroscedasticity robust covariance.

The problem is that `wald_test_terms` as a pretty precise descriptive name, but it might not sound familiar enough to users to find it.

any suggestions ?

(`wald_test` is and will be part of the basic terminology/vocabulary of statsmodels)


usage is same as above but as method

res_ols = ols("np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)", data).fit(use_t=False)


res_glm = glm("np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)",

data).fit()


res_poi = Poisson.from_formula("Days ~ C(Weight) * C(Duration)", data).fit(cov_type='HC0')

res_poi_2 = poisson("Days ~ C(Weight) + C(Duration)", data).fit(cov_type='HC0')


print('\nOLS')

print(res_ols.wald_anova())

print('\nGLM')

print(res_glm.wald_anova(skip_single=False, combine_terms=['Duration', 'Weight']))

print('\nPoisson 1')

print(res_poi.wald_anova(skip_single=False, combine_terms=['Duration', 'Weight']))

print('\nPoisson 2')

print(res_poi_2.wald_anova(skip_single=False))

Reply all
Reply to author
Forward
0 new messages