Confusingly High Values of c-hat

686 views
Skip to first unread message

D.J. McNeil, Jr.

unread,
Jul 9, 2021, 11:39:49 AM7/9/21
to unmarked
Hi all,

I see there are a lot of posts in the group about the mb.gof.test() but I can't find anything that seems to address my problem.

I have been using occupancy models for a decade now - a nice chunk of that has been through unmarked. I am having an issue now with single-season occupancy models that I have never before had - I cannot achieve a reasonable c-hat, no matter how I build my models.

The data:
> 1/0 data from a variety of forest structures (young, old, etc.) across an entire state
> Two visits per site/year
> Two years - year is included as a covariate - most sites visited in only one year
> Large sample size (~1,500)
> I have good detection covariates that explain p nicely (as best I can tell)
> I also have great habitat data that explain species occupancy patterns

The analyses:
> Single-season occu
> Linear variables are scaled
> Nothing else stands out as unusual to me about this analysis
> Focusing on one bird species for now, though the dataset has a bunch of birds I could model if I wanted
> detection estimates are reasonable (0.65ish)
> Naive occupancy is in the neighborhood of 0.50

The issue:
No matter how I build my models -- and almost regardless of the species I choose (when I try modeling other species) -- c-hat is through the roof. Like 10-40. I tried several other species and consistently get this issue -- with one exception -- and that species had c-hat = 0.20.

I'm at a complete loss - I've tried scaling variables, NOT scaling them, subsetting the data to focus on particular regions, including ALL detection covariates... or none.... I've tried running super simple models with few covariates... and really complex models with many covariates - it doesn't seem to matter. And species doesn't matter THAT much either. Here's the kind of output I am getting:

####################################

MacKenzie and Bailey goodness-of-fit for single-season occupancy model
Pearson chi-square table:
Cohort Observed Expected Chi-square

00 0 710 706.36 0.02
01 0 73 88.32 2.66
10 0 134 126.39 0.46
11 0 158 153.94 0.11

Chi-square statistic = 3.241
Number of bootstrap samples = 1000
P-value = 0
Quantiles of bootstrapped statistics:
0% 25% 50% 75% 100%
0.00046 0.07730 0.18452 0.36060 2.90067
Estimate of c-hat = 11.95

######################################

Any tips? Maybe I ought to cut my losses to work with glms for this project?

Thank you in advance - happy to provide more detail if it helps...

Carl S

unread,
Jul 20, 2021, 9:16:43 AM7/20/21
to unmarked
Hi Darin,

I just wanted to chime in because I have recently had a very similar issue with the mb.gof.test() function, where I am getting c-hat values of ~14. I  also have not yet found a way to address the problem, but would be interested to hear any suggestions. I thought it was maybe because I have a lot of zeros in my data set, but from what I’ve read, I think this is pretty normal for occupancy data. Also not sure if you've seen this thread, but it may also be worth a read: https://groups.google.com/g/unmarked/c/3wvnDyLlxok

Have you tried looking at the residual plots to see if there were any patterns that might suggest the need for including a quadratic term? This is one thing that I tried, but unfortunately there were no systematic patterns in the residuals that would suggest a missing covariate. 

e.g: plot(rowMeans((residuals(m1)), na.rm=T) ~ M_covs$x1) , where m1 is your model and M_covs$x1 is one of your predictors.


As far as my own issues go, for some context, I have run a single season occupancy model with the occu() function:

Data:
>0/1 detection/non detection data from pitfall traps for a ground beetle species 
> between 6 and 9 visits per site per year (some NAs in detection history)
> two years of data
> Large data set (795 sites * 9 visits = 7155 data points)

Model:
> single season occu
> scaled continuous variables
> model selection performed via Bayesian lasso
> model includes 1 detection and 4 occupancy covariates
> effect plots for predictors seem ecologically plausible

and here’s the output from mb.gof.test()

#####################################
MacKenzie and Bailey goodness-of-fit for single-season occupancy model

Pearson chi-square table:

          Cohort Observed Expected Chi-square
000000000      0      616   551.59       7.52
000000001      0       16     7.39      10.05
000000010      0        5     7.39       0.77
000000011      0        2     1.35       0.31
000000100      0        3     7.39       2.60
000000110      0        2     1.35       0.31
000001000      0        6     7.39       0.26
000001101      0        1     0.26       2.17
000001110      0        1     0.26       2.17
000001111      0        1     0.05      17.84
000010000      0        2     7.39       3.93
000010001      0        1     1.35       0.09
000010100      0        1     1.35       0.09
000010101      0        1     0.26       2.17
000011000      0        2     1.35       0.31
000100010      0        2     1.35       0.31
000111111      0        1     0.00     414.93
001000000      0       10     7.39       0.93
001000100      0        2     1.35       0.31
001010000      0        2     1.35       0.31
001010001      0        1     0.26       2.17
010000000      0        1     7.39       5.52
010000001      0        1     1.35       0.09
010001011      0        1     0.05      17.84
011100100      0        1     0.05      17.84
100000000      0        5     7.39       0.77
101000000      0        1     1.35       0.09
101000100      0        1     0.26       2.17
101000110      0        1     0.05      17.84
101001001      0        1     0.05      17.84
101001110      0        2     0.01     372.42
101010001      0        1     0.05      17.84
111011111      0        1     0.00    6541.79
111100111      0        1     0.00    1710.73
00000001.      1        8     0.19     312.53
0000001..      2       13     0.29     548.50
000000...      3        1    55.21      53.23
000001...      3       16     2.42      76.27
000010...      3        9     2.42      17.91
000011...      3        3     0.43      15.25
000100...      3        7     2.42       8.68
000101...      3        1     0.43       0.75
001000...      3        6     2.42       5.30
001011...      3        1     0.08      10.89
010000...      3        7     2.42       8.68
010001...      3        3     0.43      15.25
010101...      3        1     0.08      10.89
010110...      3        1     0.08      10.89
011000...      3        2     0.43       5.69
100000...      3       11     2.42      30.45
100001...      3        2     0.43       5.69
100010...      3        1     0.43       0.75
100110...      3        1     0.08      10.89
101010...      3        1     0.08      10.89
101100...      3        1     0.08      10.89
110111...      3        1     0.00     377.39
111000...      3        1     0.08      10.89
111111...      3        1     0.00    2022.54

Chi-square statistic = 12869.32 
Number of bootstrap samples = 1000
P-value = 0.003
Quantiles of bootstrapped statistics:
    0%    25%    50%    75%   100% 
   301    535    662    946 104225 
Estimate of c-hat = 12.77 
#####################################

Hope you don’t mind me piggybacking on your post, but it seems that we are having very similar issues! I don't have a ton of experience with occupancy models, and would also be grateful for any insights into these high c-hat values.

Cheers,
Carl

John C

unread,
Jul 20, 2021, 12:14:21 PM7/20/21
to unmarked
Hello, 

Carl, your case seems pretty consistent with un-modeled heterogeneity to me. The biggest issue seems to be the sites that have 6 - 8 detections (which appear almost impossibly unlikely to occur based on the fitted model). Guessing that considering more detection covariates (Site or Obs) based on the characteristics of those locations, random effects (see thread at https://groups.google.com/g/unmarked/c/u0P__59pU_k/m/Z6z_zgZLAAAJ to get patch to do this), or both might help. Certainly room to consider several detection covariates given the sample size here. If I'm reading correctly, that 0000001.. is the observed detection history at each of the 13 sites sampled 7 times is also posing issues. Seems like there is some important factor in the 7th occasion at these sites missing from the model? 

In Darin's case, chief issue seems to be the 01's & 10's. Not sure what to suggest beyond the generic "try new ObsCovs or interactions". Perhaps random slopes by site for obs covs could help. Haven't used v1.1.1, so not sure what random effects structures are presently supported (although if unmarked itself doesn't support it, ubms does). Also not sure that this would help much with 2 occasions. Either way, maybe reasonable to think that if very complex models do not improve fit, the issue is with the terms being considered or some other aspect of the model structure.

John
Reply all
Reply to author
Forward
0 new messages