Accounting for Spatial Autocorrelation (SAC) in SDM

913 views
Skip to first unread message

Dimitris Poursanidis

unread,
Oct 13, 2018, 4:01:40 PM10/13/18
to Maxent
Hi folks

An important, hence overlooked issue in SDM is the control of SAC.
However this seems not "trivial" and not easy.

Several tools have been designed for spatial" filtering" such as spThin in R as package but also incorporation of it in Wallace R package, the spatial filtering tool in SDMtoolbox or SSDM R package.

But non of them provide a statistical explanation why the e.c. filtering of 10km radius is better than 5 km and so on.

I have search a lot around to understand  how e.c. Moran I or L.I.S.A can be efficiently used in SDM and SAC reductiob but fail ......

How you do it ? How you take into account SAC ? 

Looking forward

Dimitris

Jamie M. Kass

unread,
Oct 21, 2018, 12:29:32 PM10/21/18
to Maxent
Spatial thinning is a relatively easy way to reduce SAC in the model because it doesnmt require any changes to the model specification (you simply change the input data). However, you do lose some data this way (which may or may not have held extra environmental variance that may be important to your model. For example, in high elevation areas a 5 km difference may be a huge change in temperature, etc. Therefore, you need to be the judge of what an appropriate distance is for your system.

As for which distance is “best”, there is no statistical rule, because it depends on your species and predictor variables, including resolution. I suggest you experiment with different distances that are ecologically informed (if possible) to reflect dispersal distance, etc., and figure out which works best for you. This may involve seeing what the model prediction and responses look like.

Jamie Kass
PhD Candidate
City College of NY

Dimitris Poursanidis

unread,
Oct 21, 2018, 2:25:32 PM10/21/18
to Maxent
And what you answer to the potential reviewer on "how you choose this distance and not an other one" and "where is the correlogram that show from which distance onward you have no issue"? 

The trial n' error tests can be endless and again you need a statistical test (which?) to again prove that you choose the "best" distance.

Jamie M. Kass

unread,
Oct 25, 2018, 12:51:09 PM10/25/18
to max...@googlegroups.com
SAC is a problem when it is detected in model residuals, not the predictor variables (which should be correlated to some extent), right? For the same reason AUC cannot be judged as an independent metric of model accuracy (but can be as a relative one between models with the same inputs but different settings) because background points are not absences (see Lobo et al 2008), and because the predicted quantity is not probability of presence, the definition of residuals here might be problematic. 

Sometimes we need to make expert-driven choices as ecologists, and I think deciding on a thinning distance that is reasonable based on the species’ biology should be enough for most reviewers.

Jamie

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/94DcmLNa61Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie M. Kass
PhD Candidate, Dept. of Biology
City College of New York, CUNY Graduate Center

Samuel Veloz

unread,
Oct 25, 2018, 1:59:15 PM10/25/18
to max...@googlegroups.com
Jaime gives a good answer to the issue. Another way to think about this though is to imagine that you are using multiple occurrence points from the same environmental grid cell. The model will think this represents a high level of occurrence in the env conditions at that cell (the model doesn't know they are from the same cell it just deals with env space). If that isn't the case (1 observer recorded the same individual multiple times in the grid cell for example) the model will have poor parameter estimation. The other place SAC becomes an issue is when you are trying to assess the accuracy of the model. Again, partitioning points from the same grid cell into training/testing bins is clearly not a good idea. These samples are in no way independent, a basic criteria for most statistical tests. There is likely to be some threshold of distance (environmental of geographical?) at which points are truly independent. However, as Jaime points out, this is really hard to assess in the presence only situation as you don't have true residuals (or deviance) to look at to guide you. 

The spatial thinning approach is really a blunt tool to deal with the spatial clustering of occurrence data. To my knowledge there isn't really a more sophisticated approach to incorporating SAC into presence only models though. It is probably best practice to look at your occurrence data and assess whether any spatial clustering is due to true patterns of the species or the sampling that obtained the data. That may give you a sense for how important it is to deal with SAC in your modeling exercise.
Sam


--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.

Jamie M. Kass

unread,
Oct 30, 2018, 11:01:16 PM10/30/18
to max...@googlegroups.com
Thanks for the informative response in  language that is easy to understand, Sam. Yes, grid cell duplicates are particulary dangerous, but points that are very proximal to each other are too, and spatial thinning is probably the easiest way to deal with that, as you point out.

J

Dimitris Poursanidis

unread,
Oct 31, 2018, 4:07:00 AM10/31/18
to max...@googlegroups.com
Thank you both for the analytical and informative responses.

Duplication in same cells is the first that somebody MUST cope with.
Then, the spatial thinning - and this seems to need a trial'n'error approach.

D.

>->->->->->->->->->->->->->->->-

Dimitris Poursanidis, Ph.D


The latest papers !!! 
1) http://www.mdpi.com/2072-4292/10/8/1227 - Towards Global-Scale Seagrass Mapping and Monitoring 
2) http://www.mdpi.com/2072-4292/10/6/859 - Sentinel 2 and Satellite derived bathymetry !
3) https://www.tandfonline.com/eprint/HrdVjksy5Xmar57aThZ3/full  - Deep limits of seagrass meadows using Earth Observation
4) https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13018 - Integration of satellite remote sensing data in ecosystem modelling at local scales: Practices and trends
5) https://rdcu.be/36k4 - Testing the robustness of a coastal biodiversity data protocol in the Mediterranean: insights from the molluskan assemblages from the sublittoral macroalgae communities

Johannes Sörensen

unread,
Oct 31, 2018, 11:21:26 AM10/31/18
to Maxent
Hi, I currently have the exact same problem dealing with SAC.
@Jamie: I think it might make sense what you say (I have to admit I still didn't get all of what SAC really is) but on the other there are a lot of methods that look for spatial auto correlation in the predictors:
It does not say predictors explicitly but in the example at the bottom it uses a data set with predictors. However in your support, I tried removing all my points for one species that have a significant auto correlation and now my AUC dropped to 0.6 for one model.
I analyzed my data with GeoDA and calculated the Queen Distance. Then I used a multivariate Geary (999 permutations) and marked all my points with high significance in SAC (Even after thinning I detect SAC). But maybe I got it all wrong, I didn't find a tutorial for the Geary part.

However I can really recommend GeoDA for analyzing and visualizing your data if you do not use R or want something with a GUI.

Dimitris Poursanidis

unread,
Oct 31, 2018, 2:29:08 PM10/31/18
to max...@googlegroups.com
Thanks for the email Johannes.
Seems that SAC is something that we have to live with .... 

LISA seems to be a way to examine them - https://rdrr.io/cran/usdm/man/lisa.html


>->->->->->->->->->->->->->->->-

Dimitris Poursanidis, Ph.D


The latest papers !!! 
1) http://www.mdpi.com/2072-4292/10/8/1227 - Towards Global-Scale Seagrass Mapping and Monitoring 
2) http://www.mdpi.com/2072-4292/10/6/859 - Sentinel 2 and Satellite derived bathymetry !
3) https://www.tandfonline.com/eprint/HrdVjksy5Xmar57aThZ3/full  - Deep limits of seagrass meadows using Earth Observation
4) https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13018 - Integration of satellite remote sensing data in ecosystem modelling at local scales: Practices and trends
5) https://rdcu.be/36k4 - Testing the robustness of a coastal biodiversity data protocol in the Mediterranean: insights from the molluskan assemblages from the sublittoral macroalgae communities

Mohan Joshi

unread,
Dec 23, 2018, 6:27:47 PM12/23/18
to Maxent
 "how you choose this distance and not an other one". This question has to be answered through z-score at 95% confidence interval which shows if your occurrence points at a given threshold (say 1 or 5 km) are autocorrelated or not. If you are easy with arcmap, i suggest using spatial autocorrelation global moran's I (a toolbox under spatial autocorrelation). A nice video tutorial is here.
Reply all
Reply to author
Forward
0 new messages