variance inflation with c.hat if GOF is significant (thread had some other name)

Kery Marc

unread,

Feb 27, 2015, 6:54:32 AM2/27/15

to unma...@googlegroups.com

Dear all,

I am actually a little disappointed that nobody commented on my post about variance-inflation by c.hat in case of a significant GOF test result.

I wrote this comment partly in a inquiring mode and would really be interested in what others think about this. I am certain some people must think that this is totally inadmissible and would be happy to hear so.

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Kery Marc [marc...@vogelwarte.ch]
Sent: 24 February 2015 20:47
To: unma...@googlegroups.com
Subject: RE: [unmarked] Site-level detection probability and SE

Dear Hemanta et al.,

I have a comment on the GOF testing using the parboot in unmarked.

In parts of the capture-recapture world, e.g., when people fit Cormack-Jolly-Seber models for survival estimation and also in some parts of the occupancy universe (I think in that part where MARK and PRESENCE lives in), it is standard practice to do some GOF test and from that compute a c.hat index, which should be 1 for a fitting model. Values greater than 1 indicate EITHER the breakdown of the model OR simply some additional noise (or both). Extra noise is called "overdispersion" and is very common with all kinds of count data. Unless c.hat is greater than about 3 or so, people then always assume that c.hat>1 indicates simply extra noise, rather than a systematic breakdown of the model, and they adjust uncertainty measures (e.g., inflate SE's by sqrt(c.hat)) and AIC model selection indices (to become QAIC). For values of c.hat in the range of 1-2, this is done almost as a rule. In MARK, c.hat is computed from a parametric bootstrap of a fit statistic as the ratio of the observed value for the actual data set and the mean of the bootstrap distribution of the fit statistic under H0 of a fitting model.

Almost every time when I have done a parboot GOF for some model in unmarked and the model did not fit (p<0.05 or even p << 0.01), the value of c.hat was less than 2 and often far less so (e.g., 1.15). So I have started to think that we should probably often do what the CJS people do and simply inflate our SEs and AIC scores accordingly. The former can easily be done using functions in the "sister package" AICcmodavg. When this is done, we also see that the inflation of the SEs with c.hat values of 1.5 or something are often pretty minimal.

Note: I am not saying that this is necessarily *always* a good thing to do, only that I don't see why this approach should be adequate for a CJS model or an occupancy model fit in PRESENCE, but not for the hierarchical models fit in unmarked.

Note 2: I think that we should do more to assess the fit of our models. Particularly important and really more important than a simple black and white answer appear model diagnostics such as residuals, which show you *where* a model fits particularly badly. Residuals can easily be computed for all models in unmarked.

I think that there remains a lot to be done to develop methods and accumulate experience about GOF in HMs such as the ones fit using unmarked.

Comments welcome !

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Hemanta [hemant...@gmail.com]

Sent: 24 February 2015 19:20
To: unma...@googlegroups.com
Subject: Re: [unmarked] Site-level detection probability and SE

Hi Andy,

Thank you so much for your prompt response. As I wanted to be sure whether the model is adequate to proceed for inferring abundance, here I attach output of my parboot.

1) It would be great to have your expert opinion on model fit.

pb25

Call: parboot(object = fm2.5, statistic = fitstats, nsim = 100, report = 1)

Parametric Bootstrap Statistics:

t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)

SSE 3611 -1055.7 1768.5 0.723

Chisq 10130 4182.0 951.3 0.000

freemanTukey 665 -74.4 81.9 0.832

t_B quantiles:

0% 2.5% 25% 50% 75% 97.5% 100%

SSE 1995 2466 3548 4407 5101 9104 13580

Chisq 4129 4351 5380 5904 6327 7842 9475

freemanTukey 544 585 685 741 784 907 977

t0 = Original statistic compuated from data

t_B = Vector of bootstrap samples

2) I used parboot function as you suggested for estimating N, with top models (with lowest AIC) for poisson and Negative binomial results pasted below:

Negative Binomial mixture:

Nhat<- function(fm2.5){

+ sum(bup(ranef(fm2.5)))

+ }

> (pb.N<- parboot(fm2.5, Nhat, nsim=100, report=5))

Call: parboot(object = fm2.5, statistic = Nhat, nsim = 100, report = 5)

Parametric Bootstrap Statistics:

t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)

1 3460 1772 357 0

t_B quantiles:

0% 2.5% 25% 50% 75% 97.5% 100%

t*1 680 1044 1514 1665 1865 2360 2639

t0 = Original statistic compuated from data

t_B = Vector of bootstrap samples

> colSums(confint(ranef(fm2.5)))

2.5% 97.5%

1650 6836

Poisson Mixture:

Call: parboot(object = fm0.5, statistic = Poi.Nhat, nsim = 24, report = 5)

Parametric Bootstrap Statistics:

t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)

1 808 -21 97.3 0.48

t_B quantiles:

0% 2.5% 25% 50% 75% 97.5% 100%

t*1 677 690 769 808 875 1026 1076

t0 = Original statistic compuated from data

t_B = Vector of bootstrap samples

Does the above output suggest poisson model is better to model abundance based on Pr(t_B > t0) values from the above models? A naive question again- So t0 mean corresponds to Nhat and StdDev(t0 - t_B) for SE?

Sorry for bugging you again given I am novice at using R and Unmarked.

Your help is much appreciated

Thank You

Hemanta

On Tuesday, February 24, 2015 at 7:05:14 AM UTC-6, Jeffrey Royle wrote:

hi Hemanta,

1. If the model fits then it might be ok but it sounds like not. Most people would say your model needs to fit in order to do inference.

2. This is the right idea conceptually but I think probably not quite right if you have to estimate N with a model, especially one that doesn't fit, because there is uncertainty about N in that case.

It might be better to use occupancy models to estimate site level p.

3. I'm not sure what you mean by this

4. you are correct about predict and raneff. You can sum the site level N's but the interpretation of that is not straightforward unless the sample units have precise area which is usually not the case. To get a SE for total N you can use the parboot function.

regards

andy

On Mon, Feb 23, 2015 at 11:55 PM, Hemanta <hemant...@gmail.com> wrote:

Hello,

I am new to unmarked and seek some advice on analyses that I have been doing using unmarked pcount.

Below are the issues that I seek help from you.

1)      I have 361 sites where herd of prey species are counted 4 times. There are lots of zeros in the data. None of the Poisson and ZIP models fitted the data. NB model seems like better fit when I see the graph of SSE but chisq p value always appear 0. In this context, is it legitimate to use NB models for modeling abundance?

2)      I don’t have covariates explaining detectability. When I use plogis (the coeff of detection), I believe that the obtained p is for Individual herd level p. I am interested to find the site level p_i. Does the following code give site level p_i?

p_i = 1-(1- p)^N_i

Where, p= plogis (the coeff of detection)

N_i = N estimated by ‘raneff’ command.

3)      If #2 is used to calculate site level p_i, is it possible to calculate overall detectability of that particular species in general, if yes, how can the detectability and its SE be calculated?

4)      For N-- I believe that predicted value by the ‘predict’ function is basically giving site-level lambda, and ‘ranef’ is giving N’s for each site. How can I obtain total N then? I could sum predicted values given by ‘predict’ function but I don’t think that it yields N. And, I could not sum the values obtained from ‘ranef’ function. Can you suggest me the ways to find total N and the associated SE?

Thanks for your help

Hemanta

T

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Linden

unread,

Feb 27, 2015, 8:14:11 AM2/27/15

to unma...@googlegroups.com

If you want my gut reaction, you might not like it! But I am definitely open to being schooled by someone more knowledgeable.

The c.hat adjustments are all so ad-hoc and minimal that I don't see the point. If c.hat suggests something is off with your model, an adjustment will not fix that. And that is just fine, given the model is always an approximation to begin with. If you are worried about the SEs being underestimated, that is reasonable, but the SEs are probably always underestimated for all kinds of reasons. Some may find such adjustments critical because of implications regarding p-values and AIC, but the inferences from those statistics will always be open to interpretation. I feel like the reporting of c.hat is most useful for recognizing that parts of the model do not fit well, which may lead to more critical thinking about the problem. If the c.hat adjustment really changes your interpretation, you were already dealing with a lot of model uncertainty in the first place.

I am in the camp of seeing out-of-sample prediction as what really matters, even though like most ecologists, I am rarely able to do it myself. So while I respect the decision to use c.hat adjustments however you choose and would not criticize it, I also would not criticize the choice to completely ignore such adjustments. For me, reporting the c.hat is potentially important but using it is less so.

Jeffrey Royle

unread,

Feb 27, 2015, 8:34:09 AM2/27/15

to unma...@googlegroups.com

Dear Marc,

I think that the response you had was perhaps so definitive and authoratative that it appears to require no comment. I believe it will be a fine reference point for all future inquiries to the unmarked listserve regarding model fit!

regards

andy

Kery Marc

unread,

Feb 27, 2015, 9:26:15 AM2/27/15

to unma...@googlegroups.com

Dear Andy,

I agree that Dan sounds very authoritative (THANKS, Dan, for your valuable and insightful comments).

Kind regards ---- Marc

Dan Linden

unread,

Feb 27, 2015, 11:04:46 AM2/27/15

to unma...@googlegroups.com

I'm not sure authoritative is accurate. Some might call me c.hat-ignorant. :)

Reply all

Reply to author

Forward