Spatial modeling questions

20 views

Skip to first unread message

Aaron Skinner

unread,

Jan 22, 2026, 3:34:47 PMJan 22

to spOccupancy and spAbundance users

Hi Jeff, Marc, and all

I recently posted in a separate thread which has some background information about my goals and data structure, including map of our study region within Colombia and point counts on an example farm.

I think accounting for spatial structure will be important given the small distance between points, and the benefits for prediction. There could also be spatially structured unmeasured confounds that are correlated with silvopasture implementation (or survival of trees), which could bias my estimate of interest. Thus, I’m looking for thoughts on how I can improve performance on these spatial models.

So far Ive just fit models to try to understand the spatial process. I’ve experimented with fitting very basic occupancy models (intercept, ecoregion, a random effect of point count cluster) to full models. Ive fit 3 detection covs in all cases. The variables I’ve been playing with for occupancy are:  Ecoregion + Elevation + Precipitation + Total_edge_300m + landcover_300m + Local habitat + Survey_year + (1 | point count cluster)

Generally, I’ve found that the priors on phi and sigma.sq have had the biggest impact on the performance of these models. Originally, I couldn’t get model convergence or decent chain mixing on the spatial parameters until I changed the sigma.sq prior to a uniform distribution and didn’t allow any probability on zero (I set minimum to .5). I have also played with the priors on phi to control for autocorrelation (i.e. the effective spatial range) at scales of 500 meters - 5 km (within farms) to regional (e.g. 5 km - 25 km).

These have produced much more reasonable chain-mixing and reasonable rhats and effective sample sizes. However the goodness of fit tests aren’t particularly inspiring, as maybe 1/3 of the 120 (30 species x 4 GOF tests) GOF tests are under 0.1.

Some specific questions:
-I’m struggling to interpret the sigma.sq term. Phi has a nice biological interpretation as the eﬀective spatial range, but the sigma.sq term isn’t as clear to me. Should sigma.sq scale with the eﬀective spatial range? Perhaps tighter more regularizing priors would be helpful?
-I’ve noticed that the effective spatial range goes to the lower limit of the prior on phi. For example, if I set lower <- 3 / 50e3 the effective spatial range tends to approach 50km for all species. Similarly with sigma.sq, it tends to approach the lower limit of whatever the prior is. Is this behavior expected?
-Occasionally the model seems to get stuck and just shows ‘Chain 1 Sampling…’. There is no error produced, but spOccupancy doesn’t actually start to sample and so eventually I just have to shut down R and start again. What is the model telling me in this case?
-I am not that interested in a full model selection workflow (e.g. dredging all possible submodels), but would like to beef up or pair down models to achieve better model fit. Do you have suggestions for this process?
-Overall, what would you recommend for next steps?

Model specifications:  
spPGOcc(
occ.formula = occ.formula,
det.formula = det.formula,
priors = priors,
cov.model = "exponential", NNGP = TRUE, n.neighbors = 15,
data = spOcc[1:4], n.burn = 2000, n.batch = 400, batch.length = 25,
n.thin = 20, n.chains = 3, n.report = 100, verbose = TRUE
)

Thanks so much for any thoughts,
Aaron
PhD Candidate, University of British Columbia

Jeffrey Doser

unread,

Jan 29, 2026, 10:55:15 AMJan 29

to Aaron Skinner, spOccupancy and spAbundance users

Hi Aaron,

Here are some thoughts regarding your current approach.

Including the random effect of point count cluster is likely complicating the estimation of the spatial random effects. The 1 | point_count_cluster random effect will account for correlation among points within an individual cluster, and account for differences that occur in one point count cluster compared to others. This is a simple form of accounting for spatial autocorrelation and likely contributes to why you are having problem with convergence for the spatial parameters. I don't know what the scale of the "point count clusters" is, but if you want to include that in a spatial occupancy model, you would likely need to restrict the prior on phi to attempt to explain spatial autocorrelation at a different scale. Depending on the form of point count cluster, you could also just set the lower bound on the effective spatial range to be larger than the distance across a point count cluster, which would effectively force there to be some amount of correlation between points within a cluster. I might suggest taking a look at the this paper by Bajcz et al, which discusses the concept of trying to account for spatial autocorrelation at multiple scales.
sigma.sq is just like any other random effect variance parameter. The larger the variance is, the larger variation there is in the random effect across the different "levels" (which in the spatial context are sites). With that said, I personally would not put too much weight on interpreting the values of sigma.sq and phi, as they are not fully identifiable. There is theoretical work that shows that it is only the product of phi and sigma.sq that is identifiable in one statistical sense, which is often why estimating sigma.sq and phi can present substantial challenges and require more informative priors.
The behavior of phi and sigma.sq you are noticing is tied to both of my previous points. The point_cluster random effect may already be accounting for a large amount of spatial autocorrelation, and as a result the spatial random effect may be difficult to estimate (and/or not needed), which can exacerbate the identifiability challenges I mentioned in my second point.
Regarding the model getting stuck: this could be a variety of things. The model could be too complex given the data at hand, so there could be some numerical problem that quickly occurs leading to the model getting stuck. This can also happen if there is some formatting problem with the data, or if there are invalid initial values being used (the default initial values by the package will be valid, so this would only happen if you manually specify them).
There are no automated model selection tools available in the package akin to dredge(), so if you want to compare multiple models, I would suggest determining a set of models you seek to test (ideally based on explicit hypotheses), and then compare those with via WAIC (with waicOcc) and potentially via cross-validation as well.

Hope that helps,

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/5e7cbd6c-fb50-4cce-a0da-7012cab66966n%40googlegroups.com.

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Pronouns: he/him/his

Reply all

Reply to author

Forward

0 new messages