I don't like ANOVA tables

61 views
Skip to first unread message

josef...@gmail.com

unread,
Feb 2, 2016, 11:59:39 AM2/2/16
to pystatsmodels
or why we don't have as many ANOVA tables as other packages.


One reason I never wrote an anova table function, and essentially never use it after regression is that I don't like them.
Skipper wrote anova_lm, and nobody wrote any other `anova` functions.

See for example factorial ANOVA  across R, SAS and Stata in http://www.ats.ucla.edu/stat/mult_pkg/whatstat/

To me type 2 ANOVA sound like stepwise regression.

The main reasons I don't like them:

- Anova tables don't specify the null and alternative hypothesis. What is taken as given (which other parameters are in the regression of the hypothesis test)? Is it subject to marginality restrictions or not? (type 1, 2, 3, 4, ... ANOVAs)

- They don't say which test has been used, f-test, Wald test, Score test or Likelihood Ratio test. Just because they are the same in the linear model under correct specification assumptions doesn't mean they generalize.

- F-test and Likelihood Ratio tests don't generalize to misspecified variance/correlation, i.e. for example Quasi-MLE  (with the only exception of overdispersion in GLM)


What I'm trying to get is something like http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.wald_test_terms.html which is what I wrote the last time I looked into this. This is available for all models because all models have Wald tests. (The working title was `wald_anova`.)
The LR and score/LM versions are in the works. (The LR test version needs decision for which dispersion to use. The score test version needs extension to robust covariance where I didn't find anything to write unit tests against.)

ANOVA after GLM based on deviance (i.e. LR test) is just two or three lines with the right text book, which for me was Agresti's Categorical Data a few months ago, and that's when I finally looked at the small print of the description of the anova function in R. (With Nathaniel's and patsy's help it is also possible to check for marginality with categorical regressors.)


BTW:
GLM is not a "linear" model for me, it's a convenient way of getting the QMLE in the linear exponential family. (Now, that I understand what that is.)
I thought the "linear" part in GLM is a historical artifact, and it took me a long time to figure out. (I still think it's just a local linear, or quadratic, approximation that we can get for most models.)

GLM is great and has been my favorite model in the last year or two, it combines a lot of features in a single model.


Contributions welcome, 
especially for things that won't get high on my personal priority list, which might be because I don't have the required background.

Josef


josef...@gmail.com

unread,
Feb 2, 2016, 12:17:45 PM2/2/16
to pystatsmodels
I thought only afterwards to check Julia, since they also start from scratch

:)

Our `compare_lrtest` method which does LR test for nested models is stalled for general/base models, because it requires more evaluation when and how it applies, and will most likely be added on a model specific base until the general applicability is tested.


Josef

 

josef...@gmail.com

unread,
Feb 2, 2016, 12:30:27 PM2/2/16
to pystatsmodels
And a clarification:
I'm not opposed to adding functions like anova_lm or anova_glm. If users want them, they can add and use them.

The only restriction is for the defaults and what we show automatically e.g. in summary.

Josef


 


Josef

 


josef...@gmail.com

unread,
May 17, 2017, 11:29:51 AM5/17/17
to pystatsmodels
related asides
MANOVA has been merged a while ago and will be new in 0.9
repeated measures, within ANOVA is close to merging.

Josef

josef...@gmail.com

unread,
May 17, 2017, 11:48:03 AM5/17/17
to pystatsmodels
Reading there discussion, it sounds like they don't want so much a (R-style) ANOVA, but more like a likelihood ratio equivalent what we have in `wald_test_terms` for wald tests.
We don't have it for LR or score tests, but that would be a good idea to add.

Like 3.x type ANOVA and not ANOVA II tables.

Josef
 
Reply all
Reply to author
Forward
0 new messages