I don't like ANOVA tables

josef...@gmail.com

unread,

Feb 2, 2016, 11:59:39 AM2/2/16

to pystatsmodels

or why we don't have as many ANOVA tables as other packages.

One reason I never wrote an anova table function, and essentially never use it after regression is that I don't like them.
Skipper wrote anova_lm, and nobody wrote any other `anova` functions.

See for example factorial ANOVA across R, SAS and Stata in http://www.ats.ucla.edu/stat/mult_pkg/whatstat/

To me type 2 ANOVA sound like stepwise regression.

The main reasons I don't like them:

- Anova tables don't specify the null and alternative hypothesis. What is taken as given (which other parameters are in the regression of the hypothesis test)? Is it subject to marginality restrictions or not? (type 1, 2, 3, 4, ... ANOVAs)

- They don't say which test has been used, f-test, Wald test, Score test or Likelihood Ratio test. Just because they are the same in the linear model under correct specification assumptions doesn't mean they generalize.

- F-test and Likelihood Ratio tests don't generalize to misspecified variance/correlation, i.e. for example Quasi-MLE (with the only exception of overdispersion in GLM)

What I'm trying to get is something like http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.wald_test_terms.html which is what I wrote the last time I looked into this. This is available for all models because all models have Wald tests. (The working title was `wald_anova`.)
The LR and score/LM versions are in the works. (The LR test version needs decision for which dispersion to use. The score test version needs extension to robust covariance where I didn't find anything to write unit tests against.)

ANOVA after GLM based on deviance (i.e. LR test) is just two or three lines with the right text book, which for me was Agresti's Categorical Data a few months ago, and that's when I finally looked at the small print of the description of the anova function in R. (With Nathaniel's and patsy's help it is also possible to check for marginality with categorical regressors.)

BTW:
GLM is not a "linear" model for me, it's a convenient way of getting the QMLE in the linear exponential family. (Now, that I understand what that is.)
I thought the "linear" part in GLM is a historical artifact, and it took me a long time to figure out. (I still think it's just a local linear, or quadratic, approximation that we can get for most models.)

GLM is great and has been my favorite model in the last year or two, it combines a lot of features in a single model.

Contributions welcome,

especially for things that won't get high on my personal priority list, which might be because I don't have the required background.

Josef

josef...@gmail.com

unread,

Feb 2, 2016, 12:17:45 PM2/2/16

to pystatsmodels

I thought only afterwards to check Julia, since they also start from scratch

https://github.com/JuliaStats/GLM.jl/pull/65#issuecomment-39572141

:)

Our `compare_lrtest` method which does LR test for nested models is stalled for general/base models, because it requires more evaluation when and how it applies, and will most likely be added on a model specific base until the general applicability is tested.

Josef

josef...@gmail.com

unread,

Feb 2, 2016, 12:30:27 PM2/2/16

to pystatsmodels

And a clarification:

I'm not opposed to adding functions like anova_lm or anova_glm. If users want them, they can add and use them.

The only restriction is for the defaults and what we show automatically e.g. in summary.

Josef

Josef

josef...@gmail.com

unread,

May 17, 2017, 11:29:51 AM5/17/17

to pystatsmodels

update on the "Julians":
https://discourse.julialang.org/t/poll-do-we-julians-want-anovas/3757

related asides

MANOVA has been merged a while ago and will be new in 0.9

repeated measures, within ANOVA is close to merging.

Josef

josef...@gmail.com

unread,

May 17, 2017, 11:48:03 AM5/17/17

to pystatsmodels

Reading there discussion, it sounds like they don't want so much a (R-style) ANOVA, but more like a likelihood ratio equivalent what we have in `wald_test_terms` for wald tests.

We don't have it for LR or score tests, but that would be a good idea to add.

Like 3.x type ANOVA and not ANOVA II tables.

Josef

Reply all

Reply to author

Forward