variance inflation with c.hat if GOF is significant (thread had some other name)

1,225 views
Skip to first unread message

Kery Marc

unread,
Feb 27, 2015, 6:54:32 AM2/27/15
to unma...@googlegroups.com
Dear all,

I am actually a little disappointed that nobody commented on my post about variance-inflation by c.hat in case of a significant GOF test result.

I wrote this comment partly in a inquiring mode and would really be interested in what others think about this. I am certain some people must think that this is totally inadmissible and would be happy to hear so.

Kind regards   --  Marc



From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Kery Marc [marc...@vogelwarte.ch]
Sent: 24 February 2015 20:47
To: unma...@googlegroups.com
Subject: RE: [unmarked] Site-level detection probability and SE

Dear Hemanta et al.,

I have a comment on the GOF testing using the parboot in unmarked.

In parts of the capture-recapture world, e.g., when people fit Cormack-Jolly-Seber models for survival estimation and also in some parts of the occupancy universe (I think in that part where MARK and PRESENCE lives in), it is standard practice to do some GOF test and from that compute a c.hat index, which should be 1 for a fitting model. Values greater than 1 indicate EITHER the breakdown of the model OR simply some additional noise (or both). Extra noise is called "overdispersion" and is very common with all kinds of count data. Unless c.hat is greater than about 3 or so, people then always assume that c.hat>1 indicates simply extra noise, rather than a systematic breakdown of the model, and they adjust uncertainty measures (e.g., inflate SE's by sqrt(c.hat)) and AIC model selection indices (to become QAIC). For values of c.hat in the range of 1-2, this is done almost as a rule. In MARK, c.hat is computed from a parametric bootstrap of a fit statistic as the ratio of the observed value for the actual data set and the mean of the bootstrap distribution of the fit statistic under H0 of a fitting model.

Almost every time when I have done a parboot GOF for some model in unmarked and the model did not fit (p<0.05 or even p << 0.01), the value of c.hat was less than 2 and often far less so (e.g., 1.15). So I have started to think that we should probably often do what the CJS people do and simply inflate our SEs and AIC scores accordingly. The former can easily be done using functions in the "sister package" AICcmodavg. When this is done, we also see that the inflation of the SEs with c.hat values of 1.5 or something are often pretty minimal.

Note: I am not saying that this is necessarily *always* a good thing to do, only that I don't see why this approach should be adequate for a CJS model or an occupancy model fit in PRESENCE, but not for the hierarchical models fit in unmarked.

Note 2: I think that we should do more to assess the fit of our models. Particularly important and really more important than a simple black and white answer appear model diagnostics such as residuals, which show you *where* a model fits particularly badly. Residuals can easily be computed for all models in unmarked.

I think that there remains a lot to be done to develop methods and accumulate experience about GOF in HMs such as the ones fit using unmarked.

Comments welcome !

Kind regards  --  Marc
























From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Hemanta [hemant...@gmail.com]
Sent: 24 February 2015 19:20
To: unma...@googlegroups.com
Subject: Re: [unmarked] Site-level detection probability and SE

Hi Andy,
Thank you so much for your prompt response. As I wanted to be sure whether the model is adequate to proceed for inferring abundance, here I attach output of my parboot. 

1) It would be great to have your expert opinion on model fit.

 pb25

Call: parboot(object = fm2.5, statistic = fitstats, nsim = 100, report = 1)

Parametric Bootstrap Statistics:
                t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
SSE           3611        -1055.7           1768.5        0.723
Chisq        10130         4182.0            951.3        0.000
freemanTukey   665          -74.4             81.9        0.832

t_B quantiles:
               0% 2.5%  25%  50%  75% 97.5%  100%
SSE          1995 2466 3548 4407 5101  9104 13580
Chisq        4129 4351 5380 5904 6327  7842  9475
freemanTukey  544  585  685  741  784   907   977

t0 = Original statistic compuated from data
t_B = Vector of bootstrap samples

2) I used parboot function as you suggested for estimating N, with top models (with lowest AIC) for poisson and Negative binomial  results pasted below:

Negative Binomial mixture:

Nhat<- function(fm2.5){
+ sum(bup(ranef(fm2.5)))
+ }
> (pb.N<- parboot(fm2.5, Nhat, nsim=100, report=5))

Call: parboot(object = fm2.5, statistic = Nhat, nsim = 100, report = 5)

Parametric Bootstrap Statistics:
    t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
1 3460           1772              357            0

t_B quantiles:
     0% 2.5%  25%  50%  75% 97.5% 100%
t*1 680 1044 1514 1665 1865  2360 2639

t0 = Original statistic compuated from data
t_B = Vector of bootstrap samples


> colSums(confint(ranef(fm2.5)))
 2.5% 97.5% 
 1650  6836 

Poisson Mixture:
Call: parboot(object = fm0.5, statistic = Poi.Nhat, nsim = 24, report = 5)

Parametric Bootstrap Statistics:
   t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
1 808            -21             97.3         0.48

t_B quantiles:
     0% 2.5% 25% 50% 75% 97.5% 100%
t*1 677  690 769 808 875  1026 1076

t0 = Original statistic compuated from data
t_B = Vector of bootstrap samples

Does the above output suggest poisson model is better to model abundance based on Pr(t_B > t0) values from the above models? A naive question again- So t0 mean corresponds to Nhat and StdDev(t0 - t_B) for SE?

Sorry for bugging you again given I am novice at using R and Unmarked.

Your help is much appreciated

Thank You

Hemanta



On Tuesday, February 24, 2015 at 7:05:14 AM UTC-6, Jeffrey Royle wrote:
hi Hemanta,
 1. If the model fits then it might be ok but it sounds like not. Most people would say your model needs to fit in order to do inference. 
 2. This is the right idea conceptually but I think probably not quite right if you have to estimate N with a model, especially one that doesn't fit, because there is uncertainty about N in that case.
 It might be better to use occupancy models to estimate site level p.

 3. I'm not sure what you mean by this
 4. you are correct about predict and raneff. You can sum the site level N's but the interpretation of that is not straightforward unless the sample units have precise area which is usually not the case.  To get a SE for total N you can use the parboot function.

regards
andy


On Mon, Feb 23, 2015 at 11:55 PM, Hemanta <hemant...@gmail.com> wrote:

Hello,

I am new to unmarked and seek some advice on analyses that I have been doing using unmarked pcount.

Below are the issues that I seek help from you.

1)      I have 361 sites where herd of prey species are counted 4 times. There are lots of zeros in the data. None of the Poisson and ZIP models fitted the data. NB model seems like better fit when I see the graph of SSE but chisq p value always appear 0. In this context, is it legitimate to use NB models for modeling abundance?

2)      I don’t have covariates explaining detectability. When I use plogis (the coeff of detection), I believe that the obtained p is for Individual herd level p. I am interested to find the site level pi. Does the following code give site level pi?

pi = 1-(1- p)^Ni

Where, p= plogis (the coeff of detection)

Ni = N estimated by ‘raneff’ command.

3)      If #2 is used to calculate site level pi, is it possible to calculate overall detectability of that particular species in general, if yes, how can the detectability and its SE be calculated?

4)      For N-- I believe that predicted value by the ‘predict’ function is basically giving site-level lambda, and ‘ranef’ is giving N’s for each site. How can I obtain total N then? I could sum predicted values given by ‘predict’ function but I don’t think that it yields N. And, I could not sum the values obtained from ‘ranef’ function. Can you suggest me the ways to find total N and the associated SE?

Thanks for your help

Hemanta

T

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Linden

unread,
Feb 27, 2015, 8:14:11 AM2/27/15
to unma...@googlegroups.com
If you want my gut reaction, you might not like it!  But I am definitely open to being schooled by someone more knowledgeable.

The c.hat adjustments are all so ad-hoc and minimal that I don't see the point.  If c.hat suggests something is off with your model, an adjustment will not fix that.  And that is just fine, given the model is always an approximation to begin with.  If you are worried about the SEs being underestimated, that is reasonable, but the SEs are probably always underestimated for all kinds of reasons.  Some may find such adjustments critical because of implications regarding p-values and AIC, but the inferences from those statistics will always be open to interpretation.  I feel like the reporting of c.hat is most useful for recognizing that parts of the model do not fit well, which may lead to more critical thinking about the problem.  If the c.hat adjustment really changes your interpretation, you were already dealing with a lot of model uncertainty in the first place.

I am in the camp of seeing out-of-sample prediction as what really matters, even though like most ecologists, I am rarely able to do it myself.  So while I respect the decision to use c.hat adjustments however you choose and would not criticize it, I also would not criticize the choice to completely ignore such adjustments.  For me, reporting the c.hat is potentially important but using it is less so.

Jeffrey Royle

unread,
Feb 27, 2015, 8:34:09 AM2/27/15
to unma...@googlegroups.com
Dear Marc,
 I think that the response you had was perhaps so definitive and authoratative that it appears to require no comment. I believe it will be a fine reference point for all future inquiries to the unmarked listserve regarding model fit!
regards
andy
 

Kery Marc

unread,
Feb 27, 2015, 9:26:15 AM2/27/15
to unma...@googlegroups.com
Dear Andy,

I agree that Dan sounds very authoritative (THANKS, Dan, for your valuable and insightful comments).

Kind regards  ---- Marc



Dan Linden

unread,
Feb 27, 2015, 11:04:46 AM2/27/15
to unma...@googlegroups.com
I'm not sure authoritative is accurate.  Some might call me c.hat-ignorant. :)
Reply all
Reply to author
Forward
0 new messages