or why we don't have as many ANOVA tables as other packages.
One reason I never wrote an anova table function, and essentially never use it after regression is that I don't like them.
Skipper wrote anova_lm, and nobody wrote any other `anova` functions.
See for example factorial ANOVA across R, SAS and Stata in
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/To me type 2 ANOVA sound like stepwise regression.
The main reasons I don't like them:
- Anova tables don't specify the null and alternative hypothesis. What is taken as given (which other parameters are in the regression of the hypothesis test)? Is it subject to marginality restrictions or not? (type 1, 2, 3, 4, ... ANOVAs)
- They don't say which test has been used, f-test, Wald test, Score test or Likelihood Ratio test. Just because they are the same in the linear model under correct specification assumptions doesn't mean they generalize.
- F-test and Likelihood Ratio tests don't generalize to misspecified variance/correlation, i.e. for example Quasi-MLE (with the only exception of overdispersion in GLM)
What I'm trying to get is something like
http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.wald_test_terms.html which is what I wrote the last time I looked into this. This is available for all models because all models have Wald tests. (The working title was `wald_anova`.)
The LR and score/LM versions are in the works. (The LR test version needs decision for which dispersion to use. The score test version needs extension to robust covariance where I didn't find anything to write unit tests against.)
ANOVA after GLM based on deviance (i.e. LR test) is just two or three lines with the right text book, which for me was Agresti's Categorical Data a few months ago, and that's when I finally looked at the small print of the description of the anova function in R. (With Nathaniel's and patsy's help it is also possible to check for marginality with categorical regressors.)
BTW:
GLM is not a "linear" model for me, it's a convenient way of getting the QMLE in the linear exponential family. (Now, that I understand what that is.)
I thought the "linear" part in GLM is a historical artifact, and it took me a long time to figure out. (I still think it's just a local linear, or quadratic, approximation that we can get for most models.)
GLM is great and has been my favorite model in the last year or two, it combines a lot of features in a single model.
Contributions welcome,
especially for things that won't get high on my personal priority list, which might be because I don't have the required background.
Josef