Seemingly inappropriate non-sig contrast from glht() on a glm.nb model

99 views
Skip to first unread message

Kara Moore

unread,
Dec 17, 2014, 1:53:57 PM12/17/14
to davi...@googlegroups.com, Karen Tanner
Greetings D-RUG Hive Mind!

We are perplexed by a seemingly simple contrast that we are running via glht on a glm.nb model to determine differences between levels of an interaction term on a negative binomial glm. The differences are significant as we would think they would be (see barplot) except that there are no significant differences between the other levels and a level that is all zero, 2012.  

Certainly if there is a difference between the 2013 - Broad habitat and all others there should be similar or identical differences with 2012!

We do get a common error when we run the model because of the non-integer response, but I think the analysis is still appropriate because these are in fact count data. Our methods necessitate that we scale them slightly to conduct the comparisons between site types.

Thanks in advance for your feedback. A brief script version and dataset are attached. It would be excellent if you could reply to all to continue to include Karen (at UCSC) in the conversation. 

Thanks!  

Kara & Karen


Inline image 1

--
Kara Moore
Assistant Project Scientist
Center for Population Biology 
Department of Evolution and Ecology
University of California, Davis
minimal_script.R
dataset.csv

Jaime Ashander

unread,
Dec 17, 2014, 6:00:42 PM12/17/14
to davi...@googlegroups.com, Karen Tanner
Hi Kara and Karen, 

I believe the strange behavior of glht is due to zero variance within the 2012 level of factor Season. This blows up the SE on the test being used for the contrast, resulting in non-signifcance. 

A larger issue is this data doesn't seem negative binomial at all the combinations of factor levels that the model attempts to predict, leading to poor performance of the model (see some diagnostic plots below). Dropping 2012 from the analysis helps a bit, but issues remain. I would look into a zero-inflated model with at least Season as a predictor of the probability of getting a zero!

Hope this helps,
Jaime

## this data isn't negative binomial within levels of the predictors!
ggplot(all_habitat, aes(Season, Density_100)) + geom_jitter()

## resid v fitted -- model understantably struggles to get those 2012 zeroes
plot(t2, which=1)

## qqplot is ugly - should be somewhat linear
plot(t2, which=2)

library(ggplot2)

all_habitat$resid <- resid(t2)
all_habitat$fitted <- predict(t2, type='response')

## residuals by year -- systematic underprediction within the non-2012 years
ggplot(all_habitat, aes(Season, resid)) + geom_jitter()
ggplot(all_habitat, aes(fitted, resid)) + geom_jitter() + geom_smooth(method='lm') +
    geom_hline(xintercept=0) + facet_wrap(~Season)

## try without 2012
ahb2 <- all_habitat[all_habitat$Season != 2012, ]
ahb2$Season <- factor(ahb2$Season)
ahb2$HabitatSeason <- interaction(ahb2$Habitat, ahb2$Season, drop=T) # put interaction into a column in the data frame
t3<-glm.nb(Density_100 ~HabitatSeason, data=ahb2) # run another model

## qqplot -- still trouble
plot(t3, which=2)

## resid v fitted -- doesn't seem to help with underprediction
plot(t3, which=1)

ggplot(ahb2, aes(Season, resid)) + geom_jitter()
(g <- ggplot(ahb2, aes(fitted, resid)) + geom_jitter() + geom_smooth(method='gam') + geom_hline(xintercept=0))
g + facet_wrap(~Season)


--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Kara Moore

unread,
Dec 17, 2014, 6:21:20 PM12/17/14
to davi...@googlegroups.com
Thanks Jamie!

Ah ha! I see now why I should take the time to dig into ggplot, these plots are way more helpful than what I have been running. 

Do you run zeroinflated models using zeroinfl()?  If so, we would need to make the response an integer than run something like this:

ahb2$Den<-round(ahb2$Density_100*100)
m1 <- zeroinfl(Den ~HabitatSeason,
               data = ahb2, dist = "negbin", EM = TRUE)

Look right? Then what diagnostics do we used to assess the fit? plot(m1) doesn't work and there is no AIC for comparison. Can you point me in the right direction?

Thanks!

Jaime Ashander

unread,
Dec 17, 2014, 6:34:55 PM12/17/14
to davi...@googlegroups.com
No problem! I haven't run much zeroinflation outside a mixed models context. zeroinfl seems like a good place to start through, and this resource looks super helpful:
http://www.ats.ucla.edu/stat/r/dae/zinbreg.htm

After looking there I think the model I was suggesting (after rounding Den as you suggest) is something like

m_zi <- zeroinfl(Den ~HabitatSeason | Season,
               data = all_habitat, dist = "negbin", EM = TRUE)

which should use Season as a predictor for zero/non-zero. Note it's worth trying first on the whole data (with 2012).  As for comparison, does 
stats::AIC 
work on zeroinfl objects? The ucla page uses a vuong test . . .

Cheers, 
- Jaime

Kara Moore

unread,
Dec 17, 2014, 7:55:15 PM12/17/14
to davi...@googlegroups.com
Cool! Right you are Jaime!

ahb2$Den<-round(ahb2$Density_100*100)
m_zi <- zeroinfl(Den ~HabitatSeason | Season, data = ahb2, dist = "negbin", EM = TRUE)
m_nb <- glm.nb(Den~ HabitatSeason, data=ahb2)
vuong(m_zi, m_nb)

Shows that the zero inflated model is a bit better.

The next piece of the puzzle is to work out how to do something like a contrast between levels of within HabitatSeason. I've been exploring that a bit and see two viable options. 1) going back to glmmADMB, which I'm more familiar with, using AIC to compare models, then glht() to get a tukey's test, or 2) comparing CI around the coefficients using 
(est <- cbind(Estimate = coef(m1), confint(m1)))
and/or
exp(est) to look at the CI around incidence rate ratios as suggest by the ucla page: http://www.ats.ucla.edu/stat/r/dae/nbreg.htm

Thanks for the discussion and the excellent pointers!


Jaime Ashander

unread,
Dec 18, 2014, 1:55:08 PM12/18/14
to davi...@googlegroups.com
Glad this was helpful. Too bad zerofinfl models aren't easier to work with as it seems like a nice interface for running these models. 

On your next puzzles: 

1. unfortunately at this time glmmadmb can't do predictors for the zeriofinflation component -- it only estimates one zero inflation parameter

2. that means looking at the ci is the best way to go, I guess. Although I'd be surprised if someone hasn't dealt with getting contrasts from the output of zeroinfl 
Reply all
Reply to author
Forward
0 new messages