I have 10 data collectors contributing to a multi-species occupancy model, and I’ve included data collector as a random intercept in the detection model. One data collector did not conduct repeat surveys at each site (all replicates are NA after the first visit), so there is effectively no within-site replication to inform detectability for that observer. My expectation was that the detection intercepts for this data collector would shrink strongly toward the hypermean, but that is not what I see empirically. I extracted species × data collector random effects from Occ_mod$alpha.star.samples using the function below:
# Function to create random effect table
create.re.tbl <- function(re.samples){
Intercept <- colMeans(re.samples)
Group <- str_split_i(names(Intercept), "-", 1)
Id <- str_split_i(names(Intercept), "-", 2)
Species <- str_split_i(names(Intercept), "-", 3)
# Join together as tibble
tibble(Group, Id, Species, Intercept)
}
Detection_re <- create.re.tbl(Occ_mod$alpha.star.samples)
This produces a tibble with 1,000 rows (100 species × 10 data collectors). When I plot these values, the data collector with no replicate surveys (collector_team_1) shows a clear shift to lower detectability (peak left of zero), rather than shrinking toward the global mean. In contrast, summary(Occ_mod) reports a global mean of 0.144 (SD = 0.022).
A few questions:
How should I expect the detection model to behave for a data collector with no replicate surveys? Is a systematic deviation (rather than shrinkage to the mean) expected in this case?
Would you recommend excluding this data collector? They also used a different protocol (25 m vs 50 m radius, and no survey duration recorded).
The global mean and standard deviation (0.144, 0.022) from summary(Occ_mod) don't match with the plot produced by my post-hoc attempt using Occ_mod$alpha.star.samples. I assume this is because the global mean reported by summary(Occ_mod) is informed by differing amounts of data from each species × data collector (i.e. partial pooling), whereas my post-hoc summary treats all species × data collectors equally. Am I on the right track here?
I also see very narrow peaks around ~0 for some collectors (e.g., 3, 9, 10). Do you have any thoughts what might be causing this?
Thanks very much for any thoughts,
Aaron
--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/306ca9a1-21ce-44ed-a576-8d5eca2c5535n%40googlegroups.com.
%2009.57.17.png?part=0.1&view=1)
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/CANFYF_FjCrMhW737p0cjJY56AwS4eph66%3DajaQ8fSMYr102uNg%40mail.gmail.com.
Hi Jeff, Aaron,
I just realized my wording in the previous email was a bit sloppy. When I said there was “basically no information” for that collector, I didn’t mean that the model has zero information in the statistical sense. Rather, there is essentially no direct information to estimate detectability for that observer in a reliable way, since there are no repeat visits and the protocol differs from the others. So any estimate for that collector is likely being informed mostly through covariates and hierarchical borrowing rather than observer-specific replication.
Aaron, regarding the plot you have in mind, I think the logic is mostly right but with one small clarification. My understanding is that alpha.star corresponds to the species-specific deviations for the collector random effect. In other words, the detection intercept can be written as
logit(p_{i j s}) = X_{i j} β + α_{s,k}
with
α_{s,k} = μ_α + α*_{s,k}
where μ_α is the community-level mean random effect and α*_{s,k} (alpha.star) are the zero-centered deviations for each species × collector combination.
So when you plot the distribution of the alpha.star values, you’re effectively looking at the distribution of those deviations around zero. If you want the curve corresponding to the global detection intercept, it would be μ_α + α*_{s,k}. In that sense, the black curve you’re describing would correspond to shifting the deviation distribution by the community-level mean.
As I see it, the main difference from the summary() output is that the hyperparameters reported there are estimated within the hierarchical model and are informed by different amounts of data across species × collector combinations, whereas the post-hoc plot treats each combination equally.
Cheers,
Gilles
On 01.04.2026, at 16:03, Jeffrey Doser <jwd...@ncsu.edu> wrote:
HI Aaron and all,Just will add in my couple of thoughts here as well.
- I agree that you should drop the data from this observer. The fact that there is a different area being sampled (and no info on sampling effort) complicates the ability to compare from that observer and others, which is all compounded by the fact that there is only one replicate for that observer.
- Assuming you also have continuous covariates in the detection model that are distinct from the occupancy model, you are able to identify detection probability from occupancy models with single-visit data, it just requires some fairly strict assumptions. See this paper here, and some of Sara Stoudt's recent work. This is likely why there is some information to inform the estimate for this observer. Whether it's reliable or not is a whole different story.
- It is correct that the black line will not give the same values as those reported in the community-level summary given the differences in sample sizes.
Jeff
On Fri, Mar 27, 2026 at 12:59 PM Aaron Skinner <skinnera...@gmail.com> wrote:
Hi GIles,
Thank you for your thoughts! I sort of see what you're getting at, but I'll ask a few follow-ups. My intuition is that the 25 vs 50 m radius would increase detection for this observer, not decrease it. But more importantly, without any replicates it seems that occupancy and detectability are completely unidentifiable, and thus I would've guessed that the model would assume the global mean for this data collector. I agree that dropping this collector probably makes the most sense.If alpha.star values are zero-centered deviations then I think this should be the correct way to generate the plot I had in mind, where the black curve is the global mean + SD, although you're correct that this still treats all species × collector combinations equally.