| From: | Suresh Andrew Sethi <sas...@gmail.com> |
| To: | unmarked <unma...@googlegroups.com> |
| Date: | 04/16/2012 02:57 PM |
| Subject: | [unmarked] overdispersion in pcount models: QAIC? |
| Sent by: | unma...@googlegroups.com |
Hello,
I’m in a similar position trying to get at overdispersion. I’m working with a pcount Poisson model for abundance and detection probability of juvenile coho salmon in pools of Oregon streams. Estimates from the negative binomial models wouldn’t ever settle as I increased initial K values, so I’ve stuck with Poisson. My models don’t seem to have great fit, and I’d like to be able to consider and account for overdispersion. To get at this, I’ve calculated the deviance residuals from observed and expected (fitted function) counts. I then took the sum of the squared deviance residuals and divided by the degrees of freedom to calculate a dispersion parameter with which to adjust my AIC and SE values. I’ve included the code below. The equations I took from stats courses and books.
After running this code, I do get a dispersion parameter that’s roughly similar to dispersion from a quasipoisson glm on just the abundance model, and also similar to Suresh’s Chi-squared-derived dispersion parameter. (Unfortunately, the dispersion parameters are all disconcertingly high, ranging from around 4.5-6.5 for models in three different streams.) In any case, everything in the code seems to function just fine, but I’d very much like a second, more experienced opinion on the appropriateness of using this method for pcount Poisson models in unmarked.
Your thoughts and advice are much appreciated! Â Please let me know if you have any questions for me.
Kristen
Â
Code:
full.model<-pcount(~diver.h+areap.h+depthp.h+woody.h~depth.h+area.h+ddf.h+cd.h,K=350,h.umf)
k<-10 Â Â Â # number of parameters
observed <- getY(full.model@data)
expected <- fitted(full.model)
df<-(length(observed)-sum(is.na(observed)))-k #degrees of freedom
Â
dev.i<-(sign(observed-expected)*sqrt(2*(observed*log(observed/expected)-(observed-expected)))) Â # Individual deviance residuals
dev.i.sq<-dev.i^2 Â Â
sumsqdev<-sum(dev.i.sq,na.rm=T)Â #sum of squared residuals
dispersion<-sumsqdev/df
c.hat<-dispersion  # c.hat / dispersion parameterÂ
Â
q.aic.m1<-(-2*logLik(full.model)/c.hat)+(2*k*(k+1)/(n-k-1))Â #Quasi-AICc
Â
# Adjusted SE and P-values
se.full.model<-SE(full.model)
se.adj<-((sqrt(c.hat))*se.full.model)
z.adj<-(coef(full.model)/se.adj)
p.adj<-2*pnorm(-abs(z.adj))
Hi Unmarked users --
I'm using distsamp and I've had some issues with models failing the GOF test. It seems like this will be an issue for quite a few people and it would be great to have some expert opinion on how to proceed.
--
2 of 4 of my species fail the GOF test on a highly saturated model -- I thought that a way to deal with this might be to adjust AIC and SEs via c.hat (QAIC ala ~ MacKenzie and Bailey) as Suresh suggested previously.
I was a little perplexed by Richard's suggestions in response to that question, but I tried to follow them :)
1.) In particular, if I have "left the realm of parametric statistics" & parboot is no longer appropriate -- was it OK to use parboot() in the first place to generate c.hat?
2.) And also in attempting to following Richard's suggestions, I tried to use nonparboot to generate bootstrapped SEs, but found that for some reason, calling (method=nonparboot) from within linearComb did not work for me (I got an error for unused argument).
So, I returned to QAIC adjustments.
It is easy enough to estimate a c.hat from Chi-square stats generated by parboot() and to adjust AIC by this value. MacKenzie and Bailey recommend then adjusting SEs of betas by sqrt(c.hat) and that is a little bit trickier for me.
3.) This statement will fully give away how much I am *not* a statistician...
I see this c.hat adjustment to SEs as a way to show that you are being cautious with your conclusions and inflating your SEs depending on the degree of model non-fit. I personally do not know how to adjust the parameter SEs from within predict() (--which is the method I am using to generate estimated abundance and detection values)... I am tempted to simply use predict() normally, then adjust the subsequent SEs generated around the predicted abundance/detection rate estimates. I suspect that is a major violation of some statistical rule .. but it sure would be easy.
Anyways, if anyone else is dealing with this or has sage advice, I'd love to hear it!
4.) Oh, and 1 other thing. I occasionally get estimates of detection rates that are greater than 100% (particularly with the upper CI). Does that indicate that I am doing something wrong or is it pretty common?
Sarah