Hi All,
I have a general question about the estimation/interpretation of Mackenzie-Bailey Goodness of Fit tests to characterize model fit of single-species, single-season models. Its not necessarily an issue with 'unmarked' but I figured this was the best place to pose the question as I've found this board very helpful in the past.
To put it simply: do very high occupancy rates prevent models from obtaining satisfactory MB GOF estimates (i.e., p-values > 0.05 and c-hat around 1)? For example, if a species is detected in, say, 85-95% of sites, is there just no power to model occupancy with covariates? I suspect this is what is going on but I would like to be able to characterize the issue in greater depth for manuscripts/presentations...
I have carried out a model selection procedure to identify top models based on a number of covariates, and estimated detection and occupancy parameters for four different species using single-species, single-season models (from up to four trapping replicates at total of 87 sites). The MB GOF statistics suggest solid evidence of model fit for three of these species. But for the species with the highest occupancy and detection rates, the MB GOF statistics suggest lack of evidence of model fit. The species in question was found in 74/87 (~85%) of sites.
I'm wondering if anyone has encountered a similar situation before and/or if you can point me in the direction of appropriate reading material for this issue, specifically. I've put the top model and MB GOF code and output (all from unmarked and AICcmodavg) for the species in question below.
Here is the formulation of the top model...the parameter estimates, standard errors, and p-values seem ok...
Call:
occu(formula = ~Julian ~ woody + hectares, data = CHPI, method = "SANN")
Occupancy:
Estimate SE z P(>|z|)
(Intercept) 2.66 0.638 4.16 3.17e-05
woody -1.24 0.405 -3.07 2.15e-03
hectares 1.74 0.817 2.13 3.28e-02
Detection:
Estimate SE z P(>|z|)
(Intercept) 1.769 0.195 9.08 1.04e-19
Julian -0.826 0.188 -4.40 1.10e-05
AIC: 304.5661
Here is the MB GOF output (with 1000 simulations to generate the bootstrap distribution). It has P-values less than 0.05 (bad) and c-hat that exceeds 1 (also bad). Notice all the different trapping scenarios and cohorts because of differences in trappability of certain sites (some of the wetlands sampled went dry). Also notice that there are two trapping scenarios in cohort 1 (about half way down) in which the Chi-square score is much higher than the other trapping scenarios. But these constitute scenarios from only a couple of the sites. Could these be disproportionately throwing off the whole test?
#Mackenzie Bailey Goodness of fit test
> #compute chi-square
> CHPI.mb.chi = mb.chisq(CHPI.top)
>
> CHPI.boot = mb.gof.test(CHPI.top, nsim = 1000)
> print(CHPI.boot, digit.vals=4, digits.chisq=4)
MacKenzie and Bailey goodness-of-fit for single-season occupancy model
Pearson chi-square table:
Cohort Observed Expected Chi-square
0000 0 6 7.31 0.24
0010 0 1 0.13 5.82
0110 0 2 0.89 1.40
0111 0 1 1.64 0.25
1000 0 2 0.51 4.31
1010 0 1 1.87 0.40
1011 0 2 3.44 0.60
1100 0 8 3.54 5.63
1101 0 1 6.46 4.61
1110 0 7 13.52 3.15
1111 0 36 25.82 4.01
000NA 1 2 1.38 0.27
010NA 1 1 0.08 10.65
100NA 1 2 0.16 20.93
110NA 1 1 1.07 0.00
111NA 1 2 4.32 1.25
11NA0 2 1 0.51 0.48
11NA1 2 1 0.84 0.03
00NANA 3 2 1.81 0.02
10NANA 3 1 0.46 0.65
11NANA 3 3 3.51 0.07
0NA0NA 4 1 0.42 0.80
0NANANA 5 3 1.79 0.82
Chi-square statistic = 71.921
Number of bootstrap samples = 1000
P-value = 0.021
Quantiles of bootstrapped statistics:
0% 25% 50% 75% 100%
8 22 29 38 133
Estimate of c-hat = 2.25
Thanks ahead of time for your consideration of the issue and please let me know if I can provide any additional information. -Scott B.