Modeling plan and spOccupancy capabilities

64 views
Skip to first unread message

Aaron Skinner

unread,
Jan 22, 2026, 3:32:57 PMJan 22
to spOccupancy and spAbundance users

Hi Jeff, Marc, and all

I wanted to see if you have any general recommendations for the modeling adventure I am embarking on here, including whether you’d recommend I approach certain questions outside of spOccupancy ! I’ve included the functions I’ve determined to be appropriate for the data / questions of interest, but I’d appreciate any other suggestions.

Data structure:
Data was collected in 5 ecoregions across Colombia from 2013 - 2024 by 5 different data collectors (with similar but not standardized methodologies). Some point counts were revisited several times (eg, 2013, 2017, 2024). Within each ecoregion point counts were distributed across farms so I would guess there is high autocorrelation within farms (centroids of many point counts are only 50m apart). I’ve attached a map of our study region within Colombia and point counts on an example farm.

Overall modeling plan:
Ultimately, I want to fit a multi-species model, depending on the capabilities of spOccupancy:
-Full community: We’ve observed ~600 species, although many extremely rare, but it might be possible to get estimates of species richness?
Function: sfMsPGOcc
-Minimum cutoff: If the full community is too challenging to model, could select a few hundred species that were observed above some minimum threshold (e.g., there are about 300 species observed at least 5x)
Function: sfMsPGOcc
-Reduced community: I will likely start by trying to model 10-30 specific species that are thought to provide specific ecosystem services in cattle ranching systems
Function: spMsPGOcc

With that in mind, I  wanted to ask a few specific questions regarding the capabilities of SpOccupancy:

Multi-species spatio-temporal occupancy model -
stPGOcc conducts single-species multi-season spatio-temporal occupancy model, are there any multi-species versions of this?

Biogeographic clipping -
The idea here is to separate absences from structural zeros using species range maps to delimit which survey locations are outside of the species elevational or distributional range. However, I don’t think you can have different size matrices (a ragged array) for the different species?  

Functional traits -
There are certain species-level functional traits that will influence the relationship of interest. For example, a species habitat preference (i.e., forest, grassland, etc) will influence how it responds to silvopasture, so it would make sense to include an interaction between a species’ habitat preference * silvopasture. I know this can be examined with postHocLM() , but it seems like the species traits might actually help the model fit better?

Thanks so much for any thoughts,
Aaron
PhD Candidate, University of British Columbia

Spatial_summary.pdf

Jeffrey Doser

unread,
Jan 27, 2026, 9:04:22 AMJan 27
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

Thanks for the questions, sounds like a cool project. Here are some answers to your questions and your overall modeling plan: 
  • Fitting these models with ~600 species is possible but from what you describe would present substantial computational challenges. First off, even with the smaller community of ~300 species, having only 5 (or similar) detections for many species is a very small number. Multi-species models can certainly handle rare species, but the reliability of those estimates is dependent on a variety of characteristics, and the ability of the model to fit will also depend on how many common species there are, the spatial (and temporal) pattern of the detections, and how similar those are across species. These challenges are all of course compounded when fitting a spatial model, which adds in additional parameters that the model needs to try and identify. All that to say, I would certainly suggest starting with the smaller, targeted community before scaling up to larger sets of species. 
  • For the reduced community, I would suggest still using the sfMsPGOcc function as opposed to spMsPGOcc. The spMsPGOcc function will be substantially slower, and simulations from Doser et al. 2023 showed that even under conditions that favored spMsPGOcc, the sfMsPGOcc version provided nearly identical inference and predictive performance. So, sfMsPGOcc will be faster and likely easier to estimate if any of those species are very rare. 
  • A multi-species spatio-temporal occupancy model can be fit with the function stMsPGOcc
  • A basic form of biogeographic clipping as defined in some of Jacob Socolar's work can be implemented in the svcMsPGOcc function. If you set svc.cols = 1 in that function, it is equivalent to the sfMsPGOcc function. This allows you to specify an additional matrix "range.ind" that indicates if each site is within the species range for each individual species. This can of course vary by species. If the value of range.ind is 1, that site will be used for the given species. If it is 0, it is assumed the species cannot occupy that site. You could define this matrix based on either elevational or distributional ranges. 
  • There is no way in spOccupancy to include species functional traits directly in the model. You are certainly correct that this could improve model fit, but it would require a substantial overhaul of the machinery in spOccupancy that I don't foresee happening in the near future. So, in terms of spOccupancy capabilities, the only option for assessing those patterns would be with postHocLM().
Hope that helps. 

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/31977a2a-377a-495b-a095-fd529f91bcbbn%40googlegroups.com.


--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his

Aaron Skinner

unread,
Feb 7, 2026, 1:12:32 PMFeb 7
to Jeffrey Doser, spOccupancy and spAbundance users
Hi Jeff,

Thank you for your response. I wanted to start by saying that the biogeographic clipping was a big boost in terms of both convergence and model fit! Given the longitudinal structure of the data (some point counts were revisited in several distinct time periods; e.g., 2013, 2017, 2024), I’m now considering the function svcTMsPGOcc(). I wanted to provide a bit more detail on the data structure and ask a few follow-up questions.

Additional background:

In my dataset there are 496 unique site locations and 663 site × time period combinations (depending on how a “distinct” temporal period is defined). Thus far, I have been organizing the data such that each site × time period combination is its own row, and including point count cluster as a random effect (1 | point count cluster). My thinking was that this random effect is doing a bit of double duty: there are several points sampled within a cluster (helping to account for fine-scale spatial autocorrelation within a time period), and the same locations are also revisited through time, so it allows those repeated observations to share information. I also include an explicit spatial random effect, but restrict the prior such that it accounts for autocorrelation at larger spatial scales. My primary goal is to estimate the causal impact of silvopasture implementation on the bird community. One model I am planning to run that may be relevant here includes an interaction between land cover (e.g., silvopasture, forest) and year, to evaluate whether occupancy trends through time differ by local land cover.

Questions:
1. Current approach vs explicit temporal modeling:
How would you expect this current approach (treating each site × time period as its own unit with a random effect for point count cluster) to compare with more explicitly modeling temporal structure using a spatio-temporal occupancy model?
2. Uneven temporal sampling:
Sampling is quite uneven, such that the time between “distinct” periods varies widely (some revisits occur ~4 months apart, while others are 7–8 years apart). Ideally, I would like a temporal random effect where correlation depends on how much time has passed between visits. If AR1 = TRUE, does the temporal dependence only operate between consecutive time steps (t and t−1), or does it effectively behave as a smooth function of temporal distance? In a case like this, would an unstructured temporal random effect be more appropriate?
3. Short-term revisits (~4 months):
I’m also unsure how best to treat revisits that are only a few months apart. These seem too far apart to treat as within-period replicates (which could violate closure and affect detection estimation), but also very different from revisits that occur many years later. Do you have recommendations for handling this type of intermediate temporal spacing?

Thanks very much for any thoughts or suggestions,

Aaron
He/him/his

Jeffrey Doser

unread,
Feb 20, 2026, 9:10:30 PMFeb 20
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

Responses to each of your questions below: 

1. Given the pretty small amount of temporal replication you have based on your numbers (i.e., 496 unique sites and 663 site x time period combinations) I would bet you would see extremely similar results between using your current approach and the spatio-temporal approach that is implemented in stMsPGOcc. In my opinion, it is probably not worth going through the trouble of reformatting things for stMsPGOcc and would stick with your current approach, where your random effect structure is well-defined and is likely doing an adequate job accounting for any temporal autocorrelation in the system. 
2. Setting ar1 = TRUE does not fit any sort of smooth function over temporal distance. It simply works between consecutive time steps. So, if the difference in time steps between sites is different, then using ar1 = TRUE doesn't make all too much sense. The unstructured random effect would likely be the approach to take here, particularly if you don't have a long time series at each individual site. 
3. How you define what is a period of closure and what is a separate replicate is highly dependent on the study species, so I'm not sure I have too much advice to give here. You might take a look at this paper here (and perhaps also this one) for some ideas related to the impact of closure violation and how it may change the use/interpretation of occupancy. 

Jeff

Aaron Skinner

unread,
Feb 27, 2026, 12:08:49 AM (12 days ago) Feb 27
to Jeffrey Doser, spOccupancy and spAbundance users

Hi Jeff,

 

Given that I’m working with highly mobile organisms (birds) I’ve split the samples that are 4 months apart into different sampling periods (increasing the difference between the number of sites vs number of site x time period combinations). I’ve further split the dataset by habitat: a forest dataset across 5 ecoregions & a ‘land-use gradient’ dataset just in 1 region, to try to improve identifiability.

  • Forest dataset: 263 sites vs 482 site x time period combinations
  • Land-use gradient dataset: 312 sites vs 480 site x time period combinations

You mentioned that the random effect (point count cluster) is well-defined, but I am having some issues with convergence (Rhat > 1.2).

Captura de pantalla 2026-02-26 a la(s) 20.09.51.png

Let me know if this changes your mind regarding reformatting for svcTMsPGOcc().

Thanks,

Aaron

He/him/his

Jeffrey Doser

unread,
Mar 9, 2026, 9:52:18 AM (2 days ago) Mar 9
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

This does not suggest to me that there is a major problem with this formulation of the model. Rather, you may just need to run the model longer or save more samples. If you look at the traceplot and the traceplot shows some very odd behavior (atypical of what you would expect just for a model that has yet to converge), thenperhaps the random effect could be difficult to identify for some reason. You mentioned this could be because your random effect is correlated with continuous variables you have in the model. That could certainly be the case, so it would be worthwhile to check how closely any correlations between those variables are. If they are extremely highly correlated, you may not be able to include both in the model. 

Jeff
Reply all
Reply to author
Forward
0 new messages