Occupancy models with latent variables in both occupancy and detection submodels

Carlos

unread,

Jun 18, 2025, 7:12:18 PMJun 18

to hmecology: Hierarchical Modeling in Ecology

In multi-species occupancy models, it is common to include latent variables in the occupancy submodel to account for heterogeneity not explained by observed environmental covariates — such as spatial autocorrelation or unknown ecological factors, including residual correlations between species. This approach helps improve occupancy estimates and reduce bias in complex environments, and is currently implemented in R packages such as HMSC and spOccupancy.

However, we can also expect detection to be influenced by unobserved environmental heterogeneity. For example, forest height might affect the probability that a bird flies at mist-net height, but forest height is not always available as a covariate. If the effect is strong enough, it could bias occupancy estimates indirectly through the detection process. Thus, including latent variables in the detection submodel could also be useful.

I noticed that something similar was suggested by Thomas Riecke et al. (2021), who proposed including community-level site random effects in the detection submodel. However, their approach assumes that all species respond identically to the unobserved heterogeneity, which may not be appropriate in some contexts.

So, my questions are:

Would it make sense, statistically, to include latent variables in both the occupancy and detection submodels?
Has this already been done in the literature?

Best regards,
Carlos

Matthijs Hollanders

unread,

Jun 18, 2025, 7:15:47 PMJun 18

to Carlos, hmecology: Hierarchical Modeling in Ecology

Hey Carlos,

I don’t think there’s any issue including such effects in both the occupancy and detection models. I’ve simulated such models and found no issue in parameter recovery using simulation-based calibration. Including random effects in the observation model makes a lot of sense, especially since detection is often influenced by local abundance which is highly variable between sites.

Cheers,

Matt

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/4069307d-5ceb-4485-9db0-6c8fd8b989b6n%40googlegroups.com.

John Clare

unread,

Jun 18, 2025, 8:49:19 PMJun 18

to Carlos, hmecology: Hierarchical Modeling in Ecology

Hey Carlos,

We did that (albeit kinda buried in a broader focus on preferential sampling) here (https://doi.org/10.1111/ddi.70034). I agree with Matt that it is identifiable (at least in some circumstances) and think maybe the detection latent factors/loadings might often be better informed than those for the occupancy component. I don't have a good sense of the trade-offs that might be informed by a thorough simulation study or some of the other considerations. For example, other collaborators have mentioned the possibility that latent factors/loadings on the occupancy component alone could absorb some residual detection heterogeneity (or vice versa). This makes sense to me, but the degree of the risk is not obvious and there are surely some cases for which a simpler approach might be better.

John

--

Matthijs Hollanders

unread,

Jun 18, 2025, 8:57:51 PMJun 18

to John Clare, Carlos, hmecology: Hierarchical Modeling in Ecology

Hi all,

I think latent variables on occupancy are poorly identified because (in single season models) there’s no “repeat measures”, which is generally what makes random effects identifiable. Spatial effects help this somewhat because the spatial correlation is somewhat informative but even then, the variance and length scale parameters are weakly identified. If anything, random detection effects are much better identified because the repeat surveys directly inform the site-level detection probabilities. In multi species models, you could have site-level occupancy effects shared across species, where the multiple species inform the random effect.

Cheers,

Matt

To view this discussion visit https://groups.google.com/d/msgid/hmecology/CACXzbMDzBr9EMr1oYjNSs3tKEJP%2BYUnBzyEgoAfNOVkLqmLkNA%40mail.gmail.com.

Carlos

unread,

Jun 18, 2025, 10:23:11 PMJun 18

to hmecology: Hierarchical Modeling in Ecology

Hi Matt and John.

It's good to know that there are already simulations and publications exploring this idea. I took a quick look at the paper and will read it carefully later.

It makes sense to me that latent variables would be identifiable in multispecies models, since observations of different species at the same site effectively provide replication. This identifiability would likely be weaker in single-species models.

Regarding the potential confounding between occupancy and detection processes, where latent occupancy can absorb unmodeled detection heterogeneity (or vice versa), this is something worth exploring with simulations. I believe that estimates of true occupancy (Z) may be biased in this case.

Best regards,
Carlos

Reply all

Reply to author

Forward