Spatial Confounding

66 views

Skip to first unread message

Garrett Erickson-Harris

unread,

Jun 1, 2026, 2:43:38 PMJun 1

to spOccupancy and spAbundance users

Hi list,

I am seeking some advice on a spatial occupancy analysis that I am working on. Our survey design, in short, was designed to maximize spatial coverage of targeted bioacoustics surveys for red-headed woodpeckers across Minnesota using minimal equipment. Thus, we surveyed ~750 sites for 3-day deployments during two field seasons. Due to permitting limitations, our surveys are clustered on public lands (i.e. a single refuge/management area had multiple ARUs at least 250m apart and at most ~10km apart, and each refuge could be across the state). A map of our survey locations is attached. I am interested in analyzing these data with a single-season spatial occupancy model.

My current modeling challenge is spatial confounding. In fitting a non-spatial model, there are a few covariate effects that seem to be important. However, in fitting a spatial model, all covariate effects are pushed towards zero and the spatial random effect seems to be accounting for all of the variation in occupancy. Additionally, the results seem to be quite sensitive to the prior distribution of phi; setting the effective spatial range to 10km (as suggested by your vignette) results in confounding, while restricting it to 1.5km leaves some covariate effects present. I think part of this issue is that I am using remotely sensed predictors that are also spatially autocorrelated and occupancy probability is quite low (< 0.10).

I am wondering if you have any advice on dealing with spatial confounding in this scenario.

Would it be useful to adjust the sigma.sq prior in addition to phi?
Would acknowledging that there is no best answer and presenting multiple models with their associated prior distributions be best?
Would it be useful to seek an alternative way to model spatial structure (e.g. orthogonality)?

We are interested in estimating these covariate relationships and creating a map of occupancy probability for our study area.

Thank you so much!

Garrett

Screenshot 2026-06-01 121748.png

Jeffrey Doser

unread,

Jun 8, 2026, 12:27:44 PMJun 8

to Garrett Erickson-Harris, spOccupancy and spAbundance users

Hi Garrett,

Sorry for the delay, and thanks for the question. Spatial confounding is indeed a challenging topic, and it's worth pointing out that the "best" way to deal with spatial confounding is a very active area of research in the statistical literature, so there is not a one-size-fits-all remedy and what one person tells you may differ from another. I'll try to give some of my general suggestions here before getting to your specific question, since I've received a few queries on spatial confounding in spOccupancy/spAbundance models before. Four particularly relevant papers from the spatial statistics literature on the topic are Khan and Berrett (2026), Zimmerman and Ver Hoef (2021), Dupont et al. (2021), and Dupont et al. (2025).

First, as your third question hints, a common approach for "dealing" with spatial confounding is to use restricted spatial regression. The idea of RSR is that if there is indeed correlation between the covariate effect and a spatial random effect, then one could remedy this confounding by forcing the spatial random effect to be orthogonal to the covariate effect it is confounded with. The idea behind it is that correlation between the two is bad, so we should try to get rid of it. However, many recent papers (including Khan and Berrett (2026) and Zimmerman and Ver Hoef (2021)) show that this conclusion does not hold under many common circumstances, and a standard spatial GLMM often outperforms RSR in terms of accurately estimating regression coefficients and performing inference, even when there is confounding. This has led to many questioning whether accounting for spatial confounding is appropriate, with Zimmerman and Ver Hoef perhaps having the strongest statement that "deconfounding a spatial linear model is bad statistical practice and should be avoided".

Second, the papers led by Dupont do a great job of emphasizing that the regression coefficients in a model changing when including a spatial random effect vs. not including a spatial random effect is not in and of itself a bad thing. I am drastically oversimplifying here, but part of their argument relies on the fact that the regression coefficients from the nonspatial model are not necessarily correct if there is in fact residual spatial autocorrelation in the system, and thus we should avoid treating the non-spatial model's regression coefficients as the "true effects" when they can also be biased in the presence of spatial autocorrelation (again, see the great Dupont papers for a much more elegant explanation of this, particularly the 2025 paper).

So that gets to the question about what to do in a situation like you have encountered where regression coefficients do substantially differ between a spatial and nonspatial occupancy model. This is a particularly challenging question in a situation like yours where you're interested in both inference and prediction. My first recommendation is to test if a spatial model is really even necessary. In other words, is there residual spatial autocorrelation that is not explained by the covariates in your model? You can check this by exploring the residuals of the nonspatial occupancy model using the residuals() function. This will calculate residuals following the approach of Wilson et al. (2019) to calculate residuals that can provide information on whether there is evidence of residual spatial autocorrelation. This function is only available for single-species, single-season occupancy models, which seems to be suitable for your case. I have hopes of eventually implementing it for other functions. If this does not reveal any spatial autocorrelation, there is no benefit to fitting a spatial model and a nonspatial model is likely adequate. The residuals() function is not super well-documented, so let me know if you have questions.

If that does reveal that there is residual spatial autocorrelation, there are a couple of options you could pursue:

Further restrict the prior distribution based on survey design, species characteristics, or some other ecologically/design motivated reason. The most intuitive way to do this is with the prior on phi as you have attempted, but you could also restrict the prior on sigma.sq. Note that this of course is subjective in a way and you should certainly avoid simply tuning the prior to get covariate effects that show up as significant, or that most closely resemble the non-spatial model. Instead, motivate the prior distributions based on data characteristics. In your case, if the main goal of trying to account for spatial autocorrelation is to account for the non-independence of points that are clustered within the same management unit, then a good way to restrict this would be to set the effective range of the spatial autocorrelation to be no larger than the average distance across a management unit, or something like that. You could also explore the prior recommendations of Makinen et al. (2022). Something along these lines would likely be my preferred approach, because I think it minimizes subjectivity in trying to get a prior that gets you the "right" regression coefficient estimates and also acknowledges some of your a priori expectations regarding autocorrelation based on the survey design.
You could attempt to try the Spatial+ approach that is outlined in the two Dupont papers. This is a fairly new approach proposed to account for situations in which there is confounding that needs to be addressed for accurate inference. I have not tried this myself, but it has shown to have quite good performance across a broad range of simulation studies. The basic idea is that for the confounded covariate, you would fit a spatial regression model with the covariate as the response variable (this could be done with the spAbund function in spAbundance). You would then extract the residuals from this model, and use those residuals as the predictor variable in the spatial occupancy model. Again, I have not tried this, but it is something that I think is feasible with spOccupancy/spAbundance.

Anyways, sorry for the extremely long message and the somewhat "it depends" answer, but I hope this is at least somewhat helpful!

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/f519b3e4-d919-47c5-bbbd-7090bb8ab237n%40googlegroups.com.

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Reply all

Reply to author

Forward

0 new messages