Q re: anova on model deviance

48 views

Skip to first unread message

Shelley Sianta

unread,

Feb 11, 2019, 5:11:51 PM2/11/19

to Aster Analysis User Group

Hello,

I am running reaster models on a dataset and have two sets of questions for you - one regarding model comparison, and one regarding ideas on how to incorporate a phylogenetic correction.

Dataset - This is a comparative study, testing effects of specialist/generalist species on fitness trade-offs. I have 17 species, 8 of which are edaphic specialists and 9 of which are edaphic generalists. I grew seeds from each species (collected from one population/species) in the species's home vs foreign soil type. I want to know if the specialists have larger fitness trade-offs in their foreign soils than the generalists. My plan is to compare nested models that use species as a random effect, and test for a specialist/generalist effect * soil interaction term. More details about the effects are in the Rscript.

My impression of LRTs (and anova of model deviance by extension) is that the null hypothesis is the simpler model is a better fit, and a significant chi-square value indicates the more complex model is a better fit. I also read that high model deviance means the model is a worse fit to the data.

What seems strange to me is that in the anova of model deviances table I get when I compare all the nested reaster models, more complex models have a higher model deviance than simpler models (making me think they're a worse fit) but they also have a significant p-value. Perhaps this is because my impression of the test interpretation is wrong. If you get a significant value from the chi-square test, does that mean whichever model has the lowest model deviance is the better model, and not the more complex model by default? The analysis is in the Rscript.

Second, I'm curious what your thoughts are on incorporating some sort of phylogenetic correction into aster models. Part of the structure of my dataset is species - and they span a wide phylogenetic breadth. In past analyses, having a phylogenetic correction has been important to the conclusions because I have closely related species that are doing really different things. In the past, I've been using either a PGLS, or a bayesian framework when I have multiple data points per species (as I do in this dataset). I'm curious if there's a way to include (or if it even makes sense to include) a phylogenetic variance-covariance matrix to model the residual variance in an aster model. Alternatively, I'm curious if there'd be a way to extract some sort of derived parameter that reflects, per species, the difference in fitness between soil environments, and then be able to input that into a PGLS.

Thank you for the insights,

Shelley

reaster_anova_ex.csv

reaster_anova.R

geyer

unread,

Feb 14, 2019, 11:19:38 PM2/14/19

to Aster Analysis User Group

I will answer the first question. The other about phylogenetic is an open research question that I will answer via e-mail (outside fo the google group).

If you run the last example on the help page for R function anova.asterOrReaster in R package aster you get

Model 1: resp ~ varb + fit:(Site * Region)
Model 2: resp ~ varb + fit:(Site * Region), ~0 + fit:Block
Model 3: resp ~ varb + fit:(Site * Region), ~0 + fit:Block, ~0 + fit:Pop
Mod Df Fix Mod Df Rand Mod Dev Df Fix Df Rand Deviance   P-value
1          6           0 1167586
2          6           1 1175913      0       1   8327.4 0.00e+00
3          6           2 1176704      0       1    791.5 1.94e-174

and there are two "deviance" = twice log likelihood columns. The first labeled Mod Dev for model deviance is twice the log likelihood, which is automatically larger for larger models (the supremum over a larger set of parameter values is larger). This number is, by itself, of no interest whatsoever. Only differences of log likelihoods for pairs of models are interesting. Wilks's theorem says they are approximately chi-square with the degrees of freedom being the difference of number of parameters for the two models. Those differences

are in the second column labeled Deviance and the degrees of freedom are the sum of the two preceding columns. In this case there is one df for 1 parameter each of which is one variance component. Here model 1 has no random effects, model 2 has one variance component, and model 3 has 2 variance components. For this example both are highly statistically significantly different from zero. (There are some examples in the first paper about reaster in AOAS cited on the aster web page http://www.stat.umn.edu/geyer/aster/ where some variance components are not statistically significant (even one example where a variance component is estimated to be zero).

Does this help? Only the last 4 columns of the table are worth reading. But in R it is traditional to have this full table. I don't know why.

BTW although I said "log likelihood" above, that was a lie. The actual log likelihood is impossible to calculate for random effects aster models (or for any non-normal response models, such as general linear mixed models). So we are using an approximation due to Breslow and Clayton as described in the AOAS paper cited on the web page linked above and on the help page for R function reaster. So don't say I lied to you. As the paper says if you are worried about the validity of large sample approximation or Breslow-Clayton approximation (or both) the only resort is to a parametric bootstrap, which is very time consuming but illustrated in the technical report for that AOAS paper. I do and don't recommend that. Unless you are really worried or are dealing with obstreperous referees, you probably don't want to bootstrap. But the bootstrap is the only recourse to get more defensible p-values.

Reply all

Reply to author

Forward

0 new messages