Spatial modeling questions

73 views
Skip to first unread message

Aaron Skinner

unread,
Jan 22, 2026, 3:34:47 PMJan 22
to spOccupancy and spAbundance users
Hi Jeff, Marc, and all

I recently posted in a separate thread which has some background information about my goals and data structure, including map of our study region within Colombia and point counts on an example farm.

I think accounting for spatial structure will be important given the small distance between points, and the benefits for prediction. There could also be spatially structured unmeasured confounds that are correlated with silvopasture implementation (or survival of trees), which could bias my estimate of interest. Thus, I’m looking for thoughts on how I can improve performance on these spatial models.

So far Ive just fit models to try to understand the spatial process. I’ve experimented with fitting very basic occupancy models (intercept, ecoregion, a random effect of point count cluster) to full models. Ive fit 3 detection covs in all cases. The variables I’ve been playing with for occupancy are: 
Ecoregion + Elevation + Precipitation + Total_edge_300m + landcover_300m + Local habitat + Survey_year + (1 | point count cluster)

Generally, I’ve found that the priors on phi and sigma.sq have had the biggest impact on the performance of these models. Originally, I couldn’t get model convergence or decent chain mixing on the spatial parameters until I changed the sigma.sq prior to a uniform distribution and didn’t allow any probability on zero (I set minimum to .5). I have also played with the priors on phi to control for autocorrelation (i.e. the effective spatial range) at scales of 500 meters - 5 km (within farms) to regional (e.g. 5 km - 25 km).

These have produced much more reasonable chain-mixing and reasonable rhats and effective sample sizes. However the goodness of fit tests aren’t particularly inspiring, as maybe 1/3 of the 120 (30 species x 4 GOF tests) GOF tests are under 0.1.

Some specific questions:
-I’m struggling to interpret the sigma.sq term. Phi has a nice biological interpretation as the effective spatial range, but the sigma.sq term isn’t as clear to me. Should sigma.sq scale with the effective spatial range? Perhaps tighter more regularizing priors would be helpful?
-I’ve noticed that the effective spatial range goes to the lower limit of the prior on phi. For example, if I set lower <- 3 / 50e3 the effective spatial range tends to approach 50km for all species. Similarly with sigma.sq, it tends to approach the lower limit of whatever the prior is. Is this behavior expected?
-Occasionally the model seems to get stuck and just shows ‘Chain 1 Sampling…’. There is no error produced, but spOccupancy doesn’t actually start to sample and so eventually I just have to shut down R and start again. What is the model telling me in this case?
-I am not that interested in a full model selection workflow (e.g. dredging all possible submodels), but would like to beef up or pair down models to achieve better model fit. Do you have suggestions for this process?
-Overall, what would you recommend for next steps?

Model specifications:
spPGOcc(
    occ.formula = occ.formula,
    det.formula = det.formula,
    priors = priors,
    cov.model = "exponential", NNGP = TRUE, n.neighbors = 15,
    data = spOcc[1:4], n.burn = 2000, n.batch = 400, batch.length = 25,
    n.thin = 20, n.chains = 3, n.report = 100, verbose = TRUE
  )

Thanks so much for any thoughts,
Aaron
PhD Candidate, University of British Columbia

Jeffrey Doser

unread,
Jan 29, 2026, 10:55:15 AMJan 29
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

Here are some thoughts regarding your current approach.

  • Including the random effect of point count cluster is likely complicating the estimation of the spatial random effects. The 1 | point_count_cluster random effect will account for correlation among points within an individual cluster, and account for differences that occur in one point count cluster compared to others. This is a simple form of accounting for spatial autocorrelation and likely contributes to why you are having problem with convergence for the spatial parameters. I don't know what the scale of the "point count clusters" is, but if you want to include that in a spatial occupancy model, you would likely need to restrict the prior on phi to attempt to explain spatial autocorrelation at a different scale. Depending on the form of point count cluster, you could also just set the lower bound on the effective spatial range to be larger than the distance across a point count cluster, which would effectively force there to be some amount of correlation between points within a cluster. I might suggest taking a look at the this paper by Bajcz et al, which discusses the concept of trying to account for spatial autocorrelation at multiple scales. 
  • sigma.sq is just like any other random effect variance parameter. The larger the variance is, the larger variation there is in the random effect across the different "levels" (which in the spatial context are sites). With that said, I personally would not put too much weight on interpreting the values of sigma.sq and phi, as they are not fully identifiable. There is theoretical work that shows that it is only the product of phi and sigma.sq that is identifiable in one statistical sense, which is often why estimating sigma.sq and phi can present substantial challenges and require more informative priors. 
  • The behavior of phi and sigma.sq you are noticing is tied to both of my previous points. The point_cluster random effect may already be accounting for a large amount of spatial autocorrelation, and as a result the spatial random effect may be difficult to estimate (and/or not needed), which can exacerbate the identifiability challenges I mentioned in my second point. 
  • Regarding the model getting stuck: this could be a variety of things. The model could be too complex given the data at hand, so there could be some numerical problem that quickly occurs leading to the model getting stuck. This can also happen if there is some formatting problem with the data, or if there are invalid initial values being used (the default initial values by the package will be valid, so this would only happen if you manually specify them). 
  • There are no automated model selection tools available in the package akin to dredge(), so if you want to compare multiple models, I would suggest determining a set of models you seek to test (ideally based on explicit hypotheses), and then compare those with via WAIC (with waicOcc) and potentially via cross-validation as well. 
Hope that helps, 

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/5e7cbd6c-fb50-4cce-a0da-7012cab66966n%40googlegroups.com.


--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his

Aaron Skinner

unread,
Apr 29, 2026, 11:26:22 PM (11 days ago) Apr 29
to Jeffrey Doser, spOccupancy and spAbundance users

Hi Jeff and all,

I wanted to follow-up about convergence of spatial parameters and setting the priors on the spatial decay (phi) and variance (sigma.sq) parameters. As a reminder, I’m modeling 271 species in a specific region of Colombia (map attached). My research question is to understand the impact of silvopasture on the bird community. To do this, I’ve included a categorical variable corresponding to the predominant landcover within the 50 m point count radius, as well as a continuous variable representing the amount of silvopasture within 300 m (these are only slightly correlated).

The point counts are clustered in space (e.g., groups of 6–10 points separated by 100–200 m), so I’ve included a random effect of (1 | point count cluster). The points are also repeated through time, which should help account for temporal autocorrelation.

In many model runs, at least a few spatial factor parameters have Rhat > 1.1. I’ve fit an 11- factor model using svcMsPGOcc() with:
n.burn = 40000, n.thin = 10, n.batch = 4000, batch.length = 25, n.chains = 3.

I recently read this Makinen et al (2022) article that Jeff suggested and it raised some questions for me. One recommendation they suggested was doing a sensitivity analysis to see whether results are robust to changes in the priors for the spatial range / variance parameters. In performing this sensitivity analysis, I’ve set priors on φ such that the effective spatial range is between 1–3 km, 3-18km, or 18-66km where 66 km is the largest distance between sites. I've set priors to ensure the effective spatial range is at least 1 km so that it doesn't interfere with the random effect (1 | point count cluster). I have not changed the priors for σ².

General results of sensitivity analysis:

  • Across runs, the model tends to estimate φ such that the effective spatial range is near the upper limit (i.e., near ~3km, ~18km, or ~66km).
  • Generally the models with priors limiting the effective spatial range between 3-18km performed best in terms of convergence diagnostics, but had higher WAICs
  • The priors have a large impact on convergence diagnostics - Some priors produce spatial and community parameters where all rhats < 1.1 and ESS >> 100 (see attached document for example), whereas others show very poor convergence diagnostics or do not even converge (returning the error 'c++ error: dpotrf var failed')
  • Generally, changes to the priors do not change the posterior of the variables of interest (categorical habitat). 

Some specific questions:

  • In the models that show poor convergence or fail entirely (e.g., dpotrf error), does this suggest issues with identifiability between the spatial random effect and covariates at that spatial scale (i.e., spatial confounding)?
  • In the spatial model output, the reported SD for all 11 spatial factors is 0 (Table 1 in attached document). Am I correct in interpreting this as implying near-zero spatial variance (σ² ≈ 0), or are these SD values not directly equivalent to the σ² parameter in the Gaussian process? If σ² is effectively zero, I would expect the model to behave similarly to a non-spatial model, but this does not seem to be the case.
    • I fit the equivalent non-spatial model using lfMsPGOcc(), which showed very poor mixing (e.g., 8 community-level parameters with ESS < 100 or Rhat > 1.1). 
    • Does this suggest that the spatial model is still helping stabilize estimation even if the estimated spatial variance is near zero?
  • In your vignette you suggest choosing φ priors so that the effective spatial range aligns with the scale of inference. For example, if I want to make inference about fine-scale processes (e.g., effects of silvopasture), I might restrict the spatial range to ~1–5 km. However, I consistently see poor convergence of spatial (and community) parameters when restricting φ to small ranges, and this seems to go against the recommendations of Makinen et al (2022). How should I balance the goal of targeting a specific spatial scale of inference with the need for stable model fitting?
    • Relatedly, I assume that in selecting priors for final models (i.e., those that I will make inference from), I should give priority to convergence diagnostics (for both spatial and community parameters)? 
  • Finally, there is one model formulation (including an interaction between habitat and an environmental variable) that does not converge under any of the spatial range priors I’ve tried. Would you recommend abandoning this model, or are there additional strategies to improve convergence (e.g., adjusting σ² priors, reducing the number of factors)?

Thanks very much for any guidance,
Aaron
He/him/his

Spatial_factor_convergence.docx

Jeffrey Doser

unread,
Apr 30, 2026, 5:25:15 AM (11 days ago) Apr 30
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

Thoughts on your questions below: 

  • The models could fail to converge for a variety of reasons. 271 species is a ton of species, and from what I remember many of these are extremely rare. In theory spatial confounding could contribute to this, but it would not be appropriate to conclude the lack of convergence and/or ability for the model to run is due to spatial confounding. Particularly for the dpotrf error, this is likely just because the model is to extremely challenging to estimate and may in fact be over fit, such that depending on the random starting value it gets stuck somewhere and causes the error. 
  • This is not correct. sigma.sq is not estimated in spatial factor models (see the Doser et al. 2023 paper in ecology). sigma.sq is the spatial variance parameter of a GP and it is fixed at 1 for all the spatial factors. The factor loadings (lambda) in a way serve a similar role to what sigma.sq does in a single-species model, so the assumption of fixing sigma.sq at 1 is not particularly consequential. The SD is the posterior standard deviation of the phi parameters. It has nothing to do with sigma.sq, and is just a measure of uncertainty in the estimated phi parameter. It is reported at 0 since the values of phi are so small. Based on the ESS values, it does not seem like this is actually a value of 0 (which would suggest a problem with the model in that it was not mixing). 
  • I don't know if I have a great answer to your third question. You're asking a lot of your data to try and estimate all of these spatial factors, particularly with so many rare species. It is likely that many of the models you are trying simply won't converge, and there may not be super clear patterns as far as why this is happening. It is crucial for the final model you have to successfully converge, so that needs to be a priority. 
  • Again, it sounds like this model is just too complex for the data to inform all the parameter estimates, so I don't believe any fine tuning of priors or anything else of that nature would help. There could be a variety of reasons why the interaction is not able to be estimated. If you really wanted to dig into that, I would explore that first with a very simple non-spatial model and if that doesn't work then there is a problem with the interaction itself. Also just a note: sigma.sq is not a relevant parameter for multi-species factor models. It is forced to be fixed at 1, and so setting the prior won't do anything (since it is not a parameter in the model). 
Hope that's useful, 

Jeff

Aaron Skinner

unread,
May 8, 2026, 12:36:06 PM (3 days ago) May 8
to spOccupancy and spAbundance users

Hi Jeff,

Thanks for your thoughts. A few followup points - 

  • Point 1: Just to clarify, everything about the model and data is the same in the sensitivity analysis. The only difference between runs is the priors on φ, such that the effective spatial range is between 1–3 km, 3-18km, or 18-66km where 66 km is the largest distance between sites. Given the extreme differences in convergence in the sensitivity analysis — some priors produce spatial and community parameters where all rhats < 1.1 and ESS >> 100, whereas others show very poor convergence diagnostics or do not even converge (returning the error 'c++ error: dpotrf var failed') — it seems like spatial confounding is a likely culprit? 
  • Point 2: The information about sigma.sq is important and you’re right that all the info is there in the Doser et al. 2023 paper in ecology. Thanks for the reminder!
  • Point 4: I think the issue was twofold, first that this environmental variable was spatially correlated with habitat type, and that the reference habitat level had a small sample size, so the intercept had very high standard errors. The models are converging well now that I have reordered the levels of the factor.
Best, 
Aaron
Reply all
Reply to author
Forward
0 new messages