c-hat close to zero

300 views
Skip to first unread message

Joseph Miller

unread,
Sep 8, 2021, 11:03:40 PM9/8/21
to unmarked
Hello everybody,

After running a Mackenzie and Bailey goodness of fit test on my global and best Single season models, in both instances I get a reasonable p-value, but the estimated c-hat is around 0.002. I am wondering what might cause this underdispersion, and if there are potential implications when it comes to interpreting the model estimates.

The study is examining a rare species of minnow, that prior to this study had only been detected twice in the past 20 years. 49 sites were sampled and the species was detected at 9 sites. Surveys were spatial replicates of seine hauls at each site. The data  has a lot of zeroes and 40 detections.

> summary(umf)

unmarkedFrame Object

 

47 sites

Maximum number of observations per site: 45

Mean number of observations per site: 27.06

Sites with at least one detection: 9

 

Tabulation of y observations:

   0       1   <NA>

1232   40  843

# global model

sitemodel <- occu(formula = ~ depthavg + velocity + sand + gravel + pebble + cobbel +                                     lux + Macrophytes
                                 ~ Saline + Avgdepth + Conductivity, data = umf)

#GoF test
obs.boot <- mb.gof.test(sitemodel, nsim = 1000)

> print(obs.boot, digits.vals = 2, digits.chisq = 2)

 

MacKenzie and Bailey goodness-of-fit for single-season occupancy model

 

Pearson chi-square table:

 

                                                                                        Cohort Observed Expected   Chi-square

010100110100011100100000000000000000000101000      0        1     0.00 641793093.97

000000000000000000000000000000000000000......      1        1     1.00         0.00

000000000000000000000000000000000000.........      2        1     1.00         0.00

000010000000000000000101100000000000.........      2        1     0.00     32740.68

00000000000000000000000000000000000..........      3        1     1.00         0.00

00000000010101000010001100000000000..........      3        1     0.00     46120.00

0000000000000000000000000000000000...........      4        2     2.00         0.00

011111101011000001000000000000000............      5        1     0.00  18542915.68

0000000000000000000000000000000..............      6        1     1.00         0.00

000000000000000000000000000000...............      7        5     5.00         0.00

00000000000000000000000000000................      8        3     3.00         0.00

0000000000000000000000000000.................      9        3     3.00         0.00

000000000000000000000000000..................     10        1     1.00         0.00

00000000000000000000000000...................     11        2     2.00         0.00

0000000000000000000000000....................     12        3     3.00         0.00

000000000000000000000000.....................     13        3     3.47         0.06

000000000001000000000000.....................     13        1     0.06        14.81

000000000100100000000000.....................     13        1     0.00       256.63

000001000000000010000000.....................     13        1     0.00       612.58

001000000000000010000000.....................     13        1     0.00       355.47

00000000000000000000000......................     14        3     2.75         0.02

0000000000000000000000.......................     15        2     1.59         0.10

000000000000000000000........................     16        3     3.00         0.00

00000000000000000000.........................     17        3     3.00         0.00

100100000000000000...........................     18        1     0.00      2273.61

00000000000000000............................     19        1     1.00         0.00

 

Chi-square statistic = 660418393

Number of bootstrap samples = 1000

P-value = 0.652

 

Quantiles of bootstrapped statistics:

     0%     25%     50%     75%    100%

9.1e+05 3.3e+08 2.1e+09 1.9e+10 3.3e+13

 

Estimate of c-hat = 0


Any thoughts would be greatly appreciated.

Thanks,

- Joe



Marc J. Mazerolle

unread,
Sep 9, 2021, 9:19:20 AM9/9/21
to unma...@googlegroups.com
Hi Joe,

My first observation based on the information you provided, is that
with a total of 45 sites and 9 of which had at least one detection,
your model is too complex for the data at hand. Do you have any
estimates that are unusually high with high SE indicating problems or
any warnings from unmarked?

Given that you have so many visits (> 27), this produces a very large
number of unique detection histories. As a result, some of the observed
detection histories have very very low expected frequencies (i.e.,
0.00, rounded to 2 decimals). The chi-square for site i is based on
chi[i] <- ((observed.freq[i] - expected.freq[i])^2) / expected.freq[i]

If you have very low expected frequencies, then dividing by these low
expected frequencies will yield very large chi-square values.

Thus, the very high chi-squares you are getting are the result of
having a relatively small number of sites and a very large number of
visits. I'd start by simplifying the model and maybe consider
truncating the visits to a smaller period during which changes in
occupancy are less likely. Otherwise, you might want to pool some
visits together (i.e., grouping seine hauls) to reduce the number of
columns - this will also increase the detection probability of your
rare species.

Best,

Marc
--
____________________________________
Marc J. Mazerolle
Professeur agrégé et directeur du bac. en environnements naturels et aménagés
Département des sciences du bois et de la forêt
2405 rue de la Terrasse
Université Laval
Québec, Québec G1V 0A6, Canada
Tel: (418) 656-2131 ext. 407120
Email: marc.ma...@sbf.ulaval.ca

Veuillez noter que je suis en télétravail. Le meilleur moyen de me rejoindre est par courriel.

-------- Message initial --------
De: Joseph Miller <mill...@gmail.com>
Répondre à: unma...@googlegroups.com
À: unmarked <unma...@googlegroups.com>
Objet: [unmarked] c-hat close to zero
Date: Wed, 08 Sep 2021 20:03:39 -0700

Joseph Miller

unread,
Sep 9, 2021, 11:29:27 AM9/9/21
to unmarked
Marc,

Thank you for your input.

That's a good point about the model being too complex, the estimates and SE aren't crazy high, but in that overparameterized model the site covariate SE's are relatively high.

My better models only include 2-3 detection covariates, and 1-2 site covariates, all of which seem to have reasonable estimates and SE's. 

I was thinking the number of unique detection histories might be a source of the problem. I like the idea of pooling visits together, I'm definitely going to do that.

To aid in my understanding, if my better (and less complex models) are providing reasonable estimates and SE's, and have a passing GoF p-value, but have high chi-squares and a c-hat close to zero, can there still be some confidence in these models?

Thank you,

- Joe

Marc J. Mazerolle

unread,
Sep 9, 2021, 12:35:48 PM9/9/21
to unma...@googlegroups.com
Hi Joe,

I would still be weary to use the model if c-hat is << 1.

Marc
--
____________________________________
Marc J. Mazerolle
Professeur agrégé et directeur du bac. en environnements naturels et aménagés
Département des sciences du bois et de la forêt
2405 rue de la Terrasse
Université Laval
Québec, Québec G1V 0A6, Canada
Tel: (418) 656-2131 ext. 407120
Email: marc.ma...@sbf.ulaval.ca

Veuillez noter que je suis en télétravail. Le meilleur moyen de me rejoindre est par courriel.

-------- Message initial --------
De: Joseph Miller <mill...@gmail.com>
Répondre à: unma...@googlegroups.com
À: unmarked <unma...@googlegroups.com>
Objet: Re: [unmarked] c-hat close to zero
Date: Thu, 09 Sep 2021 08:29:27 -0700

Joseph Miller

unread,
Sep 9, 2021, 6:22:57 PM9/9/21
to unmarked
Thanks Marc, I appreciate it.

- Joe

Reply all
Reply to author
Forward
0 new messages