Detectability random effects for data collector with no replicate surveys

Aaron Skinner

unread,

Mar 26, 2026, 12:28:58 AMMar 26

to spOccupancy and spAbundance users

Hi Jeff & all,

I have 10 data collectors contributing to a multi-species occupancy model, and I’ve included data collector as a random intercept in the detection model. One data collector did not conduct repeat surveys at each site (all replicates are NA after the first visit), so there is effectively no within-site replication to inform detectability for that observer. My expectation was that the detection intercepts for this data collector would shrink strongly toward the hypermean, but that is not what I see empirically. I extracted species × data collector random effects from Occ_mod$alpha.star.samples using the function below:
# Function to create random effect table
create.re.tbl <- function(re.samples){
Intercept <- colMeans(re.samples)
Group <- str_split_i(names(Intercept), "-", 1)
Id <- str_split_i(names(Intercept), "-", 2)
Species <- str_split_i(names(Intercept), "-", 3)
# Join together as tibble
tibble(Group, Id, Species, Intercept)
}
Detection_re <- create.re.tbl(Occ_mod$alpha.star.samples)

This produces a tibble with 1,000 rows (100 species × 10 data collectors). When I plot these values, the data collector with no replicate surveys (collector_team_1) shows a clear shift to lower detectability (peak left of zero), rather than shrinking toward the global mean. In contrast, summary(Occ_mod) reports a global mean of 0.144 (SD = 0.022).

A few questions:

How should I expect the detection model to behave for a data collector with no replicate surveys? Is a systematic deviation (rather than shrinkage to the mean) expected in this case?
Would you recommend excluding this data collector? They also used a different protocol (25 m vs 50 m radius, and no survey duration recorded).
The global mean and standard deviation (0.144, 0.022) from summary(Occ_mod) don't match with the plot produced by my post-hoc attempt using Occ_mod$alpha.star.samples. I assume this is because the global mean reported by summary(Occ_mod) is informed by differing amounts of data from each species × data collector (i.e. partial pooling), whereas my post-hoc summary treats all species × data collectors equally. Am I on the right track here?
I also see very narrow peaks around ~0 for some collectors (e.g., 3, 9, 10). Do you have any thoughts what might be causing this?

Thanks very much for any thoughts,
Aaron

Captura de pantalla 2026-03-25 a la(s) 20.17.34.png

gilles colling

unread,

Mar 26, 2026, 4:40:37 AMMar 26

to Aaron Skinner, spOccupancy and spAbundance users

Hi Aaron,

I’ve been working with something very similar recently, so just my 2 cents. I think the leftward shift for collector_team_1 comes from the model not being able to properly partition the variance. Because that observer used a 25 m radius instead of 50 m (~4× area difference), the systematic detection difference gets absorbed into the observer random intercept. With no repeat visits there’s basically no information to estimate detectability for that collector, so it doesn’t really get pulled back toward the mean.

Given the protocol difference and lack of replication, I’d probably consider dropping that collector since detection is effectively unidentifiable there.

On the global mean: summary() reports the community-level hyperparameter, while the alpha.star values are zero-centered deviations within the grouping factor, so they’re not the same thing. Your post-hoc summary also treats all species × collector combinations equally, which the hierarchical model doesn’t.

The narrow peaks for collectors 3, 9, 10 are likely just standard shrinkage: weak data with no obvious bias letting the prior dominate.

Cheers, Gilles

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/306ca9a1-21ce-44ed-a576-8d5eca2c5535n%40googlegroups.com.

Aaron Skinner

unread,

Mar 27, 2026, 12:59:51 PMMar 27

to gilles colling, spOccupancy and spAbundance users

Hi GIles,

Thank you for your thoughts! I sort of see what you're getting at, but I'll ask a few follow-ups. My intuition is that the 25 vs 50 m radius would increase detection for this observer, not decrease it. But more importantly, without any replicates it seems that occupancy and detectability are completely unidentifiable, and thus I would've guessed that the model would assume the global mean for this data collector. I agree that dropping this collector probably makes the most sense.

If alpha.star values are zero-centered deviations then I think this should be the correct way to generate the plot I had in mind, where the black curve is the global mean + SD, although you're correct that this still treats all species × collector combinations equally.

Captura de pantalla 2026-03-27 a la(s) 09.57.17.png

Thanks!

Aaron

He/him/his

Jeffrey Doser

unread,

Apr 1, 2026, 10:03:28 AMApr 1

to Aaron Skinner, gilles colling, spOccupancy and spAbundance users

HI Aaron and all,

Just will add in my couple of thoughts here as well.

I agree that you should drop the data from this observer. The fact that there is a different area being sampled (and no info on sampling effort) complicates the ability to compare from that observer and others, which is all compounded by the fact that there is only one replicate for that observer.
Assuming you also have continuous covariates in the detection model that are distinct from the occupancy model, you are able to identify detection probability from occupancy models with single-visit data, it just requires some fairly strict assumptions. See this paper here, and some of Sara Stoudt's recent work. This is likely why there is some information to inform the estimate for this observer. Whether it's reliable or not is a whole different story.
It is correct that the black line will not give the same values as those reported in the community-level summary given the differences in sample sizes.

Jeff

To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/CANFYF_FjCrMhW737p0cjJY56AwS4eph66%3DajaQ8fSMYr102uNg%40mail.gmail.com.

--

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Pronouns: he/him/his

gilles colling

unread,

Apr 1, 2026, 12:02:20 PMApr 1

to Jeffrey Doser, Aaron Skinner, spocc-spa...@googlegroups.com

Hi Jeff, Aaron,

I just realized my wording in the previous email was a bit sloppy. When I said there was “basically no information” for that collector, I didn’t mean that the model has zero information in the statistical sense. Rather, there is essentially no direct information to estimate detectability for that observer in a reliable way, since there are no repeat visits and the protocol differs from the others. So any estimate for that collector is likely being informed mostly through covariates and hierarchical borrowing rather than observer-specific replication.

Aaron, regarding the plot you have in mind, I think the logic is mostly right but with one small clarification. My understanding is that alpha.star corresponds to the species-specific deviations for the collector random effect. In other words, the detection intercept can be written as

logit(p_{i j s}) = X_{i j} β + α_{s,k}

with

α_{s,k} = μ_α + α*_{s,k}

where μ_α is the community-level mean random effect and α*_{s,k} (alpha.star) are the zero-centered deviations for each species × collector combination.

So when you plot the distribution of the alpha.star values, you’re effectively looking at the distribution of those deviations around zero. If you want the curve corresponding to the global detection intercept, it would be μ_α + α*_{s,k}. In that sense, the black curve you’re describing would correspond to shifting the deviation distribution by the community-level mean.

As I see it, the main difference from the summary() output is that the hyperparameters reported there are estimated within the hierarchical model and are informed by different amounts of data across species × collector combinations, whereas the post-hoc plot treats each combination equally.

Cheers,

Gilles

Sent from my iPhone

On 01.04.2026, at 16:03, Jeffrey Doser <jwd...@ncsu.edu> wrote:

HI Aaron and all,

Just will add in my couple of thoughts here as well.

I agree that you should drop the data from this observer. The fact that there is a different area being sampled (and no info on sampling effort) complicates the ability to compare from that observer and others, which is all compounded by the fact that there is only one replicate for that observer.
Assuming you also have continuous covariates in the detection model that are distinct from the occupancy model, you are able to identify detection probability from occupancy models with single-visit data, it just requires some fairly strict assumptions. See this paper here, and some of Sara Stoudt's recent work. This is likely why there is some information to inform the estimate for this observer. Whether it's reliable or not is a whole different story.
It is correct that the black line will not give the same values as those reported in the community-level summary given the differences in sample sizes.
Jeff

On Fri, Mar 27, 2026 at 12:59 PM Aaron Skinner <skinnera...@gmail.com> wrote:

Hi GIles,

Thank you for your thoughts! I sort of see what you're getting at, but I'll ask a few follow-ups. My intuition is that the 25 vs 50 m radius would increase detection for this observer, not decrease it. But more importantly, without any replicates it seems that occupancy and detectability are completely unidentifiable, and thus I would've guessed that the model would assume the global mean for this data collector. I agree that dropping this collector probably makes the most sense.

If alpha.star values are zero-centered deviations then I think this should be the correct way to generate the plot I had in mind, where the black curve is the global mean + SD, although you're correct that this still treats all species × collector combinations equally.

Reply all

Reply to author

Forward