Question about Mackenzie-Bailey GOF Test (single-species, single season models)

Scott Buchanan

unread,

May 11, 2017, 9:06:47 AM5/11/17

to unmarked

Hi All,

I have a general question about the estimation/interpretation of Mackenzie-Bailey Goodness of Fit tests to characterize model fit of single-species, single-season models. Its not necessarily an issue with 'unmarked' but I figured this was the best place to pose the question as I've found this board very helpful in the past.

To put it simply: do very high occupancy rates prevent models from obtaining satisfactory MB GOF estimates (i.e., p-values > 0.05 and c-hat around 1)? For example, if a species is detected in, say, 85-95% of sites, is there just no power to model occupancy with covariates? I suspect this is what is going on but I would like to be able to characterize the issue in greater depth for manuscripts/presentations...

I have carried out a model selection procedure to identify top models based on a number of covariates, and estimated detection and occupancy parameters for four different species using single-species, single-season models (from up to four trapping replicates at total of 87 sites). The MB GOF statistics suggest solid evidence of model fit for three of these species. But for the species with the highest occupancy and detection rates, the MB GOF statistics suggest lack of evidence of model fit. The species in question was found in 74/87 (~85%) of sites.

I'm wondering if anyone has encountered a similar situation before and/or if you can point me in the direction of appropriate reading material for this issue, specifically. I've put the top model and MB GOF code and output (all from unmarked and AICcmodavg) for the species in question below.

Here is the formulation of the top model...the parameter estimates, standard errors, and p-values seem ok...

Call:

occu(formula = ~Julian ~ woody + hectares, data = CHPI, method = "SANN")

Occupancy:

Estimate SE z P(>|z|)

(Intercept) 2.66 0.638 4.16 3.17e-05

woody -1.24 0.405 -3.07 2.15e-03

hectares 1.74 0.817 2.13 3.28e-02

Detection:

Estimate SE z P(>|z|)

(Intercept) 1.769 0.195 9.08 1.04e-19

Julian -0.826 0.188 -4.40 1.10e-05

AIC: 304.5661

Here is the MB GOF output (with 1000 simulations to generate the bootstrap distribution). It has P-values less than 0.05 (bad) and c-hat that exceeds 1 (also bad). Notice all the different trapping scenarios and cohorts because of differences in trappability of certain sites (some of the wetlands sampled went dry). Also notice that there are two trapping scenarios in cohort 1 (about half way down) in which the Chi-square score is much higher than the other trapping scenarios. But these constitute scenarios from only a couple of the sites. Could these be disproportionately throwing off the whole test?

#Mackenzie Bailey Goodness of fit test

> #compute chi-square

> CHPI.mb.chi = mb.chisq(CHPI.top)

>

> CHPI.boot = mb.gof.test(CHPI.top, nsim = 1000)

> print(CHPI.boot, digit.vals=4, digits.chisq=4)

MacKenzie and Bailey goodness-of-fit for single-season occupancy model

Pearson chi-square table:

Cohort Observed Expected Chi-square

0000 0 6 7.31 0.24

0010 0 1 0.13 5.82

0110 0 2 0.89 1.40

0111 0 1 1.64 0.25

1000 0 2 0.51 4.31

1010 0 1 1.87 0.40

1011 0 2 3.44 0.60

1100 0 8 3.54 5.63

1101 0 1 6.46 4.61

1110 0 7 13.52 3.15

1111 0 36 25.82 4.01

000NA 1 2 1.38 0.27

010NA 1 1 0.08 10.65

100NA 1 2 0.16 20.93

110NA 1 1 1.07 0.00

111NA 1 2 4.32 1.25

11NA0 2 1 0.51 0.48

11NA1 2 1 0.84 0.03

00NANA 3 2 1.81 0.02

10NANA 3 1 0.46 0.65

11NANA 3 3 3.51 0.07

0NA0NA 4 1 0.42 0.80

0NANANA 5 3 1.79 0.82

Chi-square statistic = 71.921

Number of bootstrap samples = 1000

P-value = 0.021

Quantiles of bootstrapped statistics:

0% 25% 50% 75% 100%

8 22 29 38 133

Estimate of c-hat = 2.25

Thanks ahead of time for your consideration of the issue and please let me know if I can provide any additional information. -Scott B.

Darryl MacKenzie

unread,

May 15, 2017, 8:09:33 AM5/15/17

to unmarked

Hi Scott,
The issue probably isn't related to high occupancy rates as the M&B test has no power to detect problems with the occupancy component, only detection component. In which case, high occupancy is helpful because it's means there's more data about detection.

Looking at your results I suspect you've got some unmodelled detection heterogeneity going on. Did you try fitting any models hat would allow detection to be different in different sampling units beyond using Julian date (eg woody, hectares, etc)?

Cheers
Darryl

markr...@gmail.com

unread,

May 16, 2017, 5:27:04 PM5/16/17

to unmarked

Darryl,

Coincidentally I am dealing with a similar issue right now. I'm curious, would you consider it appropriate to use % habitat cover for modeling detection probability? I always thought it was bad to use any measure that might directly increase abundance of the species for modeling detection. Shouldn't those measures be used exclusively for Psi?

best,

Mark

Dan Linden

unread,

May 17, 2017, 9:07:27 AM5/17/17

to unmarked

Hi Mark,

If you have justification for including habitat cover as a detection covariate, then you should include it. The explicit modeling of detection probability is arguably most important when you have a site attribute that is hypothesized to influence both detection and the state variable (e.g., occupancy).

Scott Buchanan

unread,

May 22, 2017, 12:51:41 PM5/22/17

to unmarked

Hi Darryl,

I really appreciate you taking the time to answer my question.

I reread MacKenzie and Bailey (2004) and found the following statement: "By considering the contribution of each observed detection history to the test-statistic, it appears the poor fit is caused by an unusually large number of sites where the salamanders were detected on each sampling occasion. This may be due to an unmeasured site characteristic that also affects detection probabilities, or possibly caused by the species occurring at higher densities at those sites (probability of detecting at least one member of the species could be higher at sites where the species is more abundant). We suggest that this should be kept in mind when drawing conclusions about the affects of the available covariates from this analysis."

I think this is what you're referring to, yes? I do indeed appear to have the same issue with 36/87 sites having a capture history of all detections. I noticed that in the 2004 paper, in the initial salamander example you perform an all subsets analysis and use some of the same covariates (D,E,V, and S) to model both the occupancy and detection parameters. I did not do this - I stuck to the rule of site covariates (occupancy) vary by site, and visit covariates (detection) vary by visit. I included Julian date, air temperature, precipitation, and visit number (i.e., 1,2,3, or 4) as detection covariates. When comparing models with different combinations of these detection covariates (and keeping occupancy constant), Julian date emerged as the most parsimonious, therefore I used it as the detection component for all subsequent models.

On your recommendation (and on a hunch) I just tried using wetland size (i.e., "hectares") as a detection covariate under the logic that painted turtles occur with greater density in larger wetlands. It improved things a little bit, but there is still strong evidence of lack of fit (p-value = 0.044; c-hat = 1.79). Given my methods (trapping) I'm not really sure what else would be driving detection. So I suppose I still have a couple questions regarding how to proceed...

- Should this be interpreted as general evidence of lack-of-fit for this species, and should I therefore perform QAIC for model selection? I've used BIC for all other model selection procedures for this and the other species considered to this point - would I have to change them all for ease of interpretation, or can I just do painted turtles using QAIC?

- Is it adequate to simply inflate the SEs by a factor of the c-hat and then considered the top model supported? Or should the top model simply be ignored and we conclude there is little inference on which to advance?

- Should I try modelling the detection parameter with additional site-level covariates before coming to the conclusion that there is evidence of lack of fit?

Can't thank you enough for the help. Much appreciated.

-Scott B.

On Monday, May 15, 2017 at 7:09:33 AM UTC-5, Darryl MacKenzie wrote:

Scott Buchanan

unread,

Jun 9, 2017, 5:12:39 PM6/9/17

to unmarked

Anyone else willing to take a stab?

Dan Linden

unread,

Jun 9, 2017, 10:10:29 PM6/9/17

to unmarked

Hi Scott, a few things:

1) To be clear, occupancy can only vary by site while detection can vary by site, visit, or both. So there is no rule about only having visit covariates on detection. Having site covariates in the detection model is a common approach if there's good reason to think they matter, otherwise the variation by site due to occupancy could be biased. This is one of the primary motivations for modeling detection explicitly in the first place.

2) GOF is a tricky topic and just because a model shows signs of a lack-of-fit doesn't mean it is useless or can't be used for inferences. That c-hat value (1.79) isn't so bad. You could conclude that hectares serves a decent approximation to density and be done. Or you could try a Royle-Nichols model (occuRN) if you think density varies widely enough to violate occupancy assumptions.

3) AIC and BIC favor different ends of the spectrum with regards to model complexity, so you could argue they address different objectives with regards to model selection. A discussion on that is beyond what I can type here, but if you search the literature on that it might be helpful (or might leave you asking more questions!). So I don't think you should mix AIC and BIC across species if you want to be logically consistent. That said, QAIC will give you the same answers as AIC if your c-hat is 1. So if the other models fit well, you could conceivably just use QAIC for all. Or use AIC for the other species if c-hat is close to 1. The model selection inferences won't change.

4) Again, I think inflating the SEs and going forward with inferences with a c-hat of 1.79 is probably fine.

5) You should avoid fishing for covariates that can improve fit without good biological justification. As I said earlier, I would go with what you've got or try a Royle-Nichols model if you're curious.

These are my opinions, of course, and some are probably more widely agreed upon than others.

Scott Buchanan

unread,

Jun 11, 2017, 11:15:53 AM6/11/17

to unmarked

Dan, Thanks so much for the great reply. Yes, I've considered including site covariates in the detection component of my models, but have kept them separated along by-site/by-visit lines because its so darn tidy and something about that appeals to me. But your absolutely right that something like vegetation structure (e.g. lots of emergent shrubs vs. open water) could influence the effectiveness of my traps - Ill think more about this and perhaps incorporate some by site covariates into detection.

Your insight into GOF is really helpful as I've had a hard time identifying opinions/insight into interpretation of the test. Ill also definitely look into RN models. Thanks again, this is enormously helpful in moving the ball forward...

Paulo Fernandes Paulo

unread,

Aug 28, 2017, 5:04:10 PM8/28/17

to unmarked

I'm facing some interpretation trouble with GOF and c-hat, so think this is the best place to ask my question.

I'm using single-season occupancy models to model detection and occupancy parameters for 5 species. So, I used the metodology proposed by Mackenzie & Bailey (2004) to test the fit of my global model.

Here is what I obtained:

Species X: p=0.3 and c-hat=1.3

Species Y: p=0.001 and c-hat=3.6

Species W: p=0.7 and c-hat=0.58

1) My question is, I should correct for overdispersion only if my global model present evidence of lack of fit?

In this case, I would consider that I should correct only for species Y. Is that correct?

2) Also, as it's possible to see, the c-hat for species Y seems pretty high. Is there any critical value for us to consider the global model? Or does it mean that data really does not fit the data?

3) Finally, for species W, I obtained no evidence of lack of fit, but the c-hat is below 1 (0.68). Should I be concerned about this? Should I correct for over(under?)dispersion?

Thanks in advance.

Reply all

Reply to author

Forward