Hi All,
I need some advice on choosing a threshold for maxent modelling.
I am working within the PEW project “A rapid assessment project for mid-scale biogeographic pattern analysis for the development of a MPA network plan in the Chilean Patagonian Fjord Region and action for its implementation” of Dr. Vreni Häussermann.
The background of this project was that the marine life of Chilean Patagonia is very poorly known, and the data of our research from the last 10 years represent an important percentage of the existing species distribution data. Within the PEW (and other) projects, we have dived down to 30 m depth at hundreds of sites throughout Chilean Patagonia between 1998 and now, and filled out presence lists of about 70 species which we can recognize while diving (which most probably could even be considered presence-absence data for many of the species for these sites), plus added other species we have sampled or identified at these sites). Nevertheless, the existing information is still poor considering that the coastline of Chilean Patagonia was estimated to be more than 90.000km. Thus many Planning Units (we have a total of 29235 hexagonal Pus) are still without information even for the most common species. To avoid that MARXAN considers these PUs as low diversity areas (while only been un-sampled or poorly studied), we started to apply species distribution models for the region based on the (unfortunately also poor) data on abiotic factors we could gather from the region. We are modelling (marine benthic) species distribution within the fjords of Patagonia using presence only data. Using the thresholds to produce a binary output, which is then fed into a MARXAN analysis. The SDM binary output is overlaid on conservation planning units which are given a binary yes/no whether the species is found in a planning unit or not.
Since the abiotic data also were poor, we had to interpolate the data to cover the whole area. But this means that our sdms using the usual thresholds give us quite extended distribution ranges. We compared the sdms of the species we know well (e.g. for sea anemones, Dr. Haüssermann's specialty) and found that they always overestimate the distribution. We see this as problematic considering that we will suggest MPAs to the government to protected the mentioned species.
The Marxan analysis is being done for approximately 220 benthic species and 30-40 mammals and birds, and the sdms for the benthic species only. Thus we are aiming for a very conservative threshold, which takes only a very high percentage probability of each specie to be used for conservation. So far we have modelled using the maximum sensitivity and specificity threshold, which has give us a thresholded level ranging 35-45%. However, this threshold seems very low considering that a species is listed as present in a PU when a probability of 35-45% is calculated for it being present in the area. It seems problematic to suggest MPAs in areas where we only have such a low probability of the species occurring, and even knowing that they DO NOT occur in many of these areas.
Because we are looking for a very conservative threshold, we are interested in your opinion on using a threshold we define e.g. 70% sensitivity (that portion of presences that the model correctly predicted). What might be the implications of this? Can it be used?
Is using a threshold set by the MaxEnt algorithm more statistically sound? .....The most conservative threshold we noted was the Equal Sensitivity and Specificity.
Any thoughts on how to deal with our specific situation?
Sincerely,
Stacy Ballyram (and Dr. Vreni Häussermann)1) Are you accounting in some way for spatial autocorrelation (i.e. are you spatially filtering points so that none are too close together, by filtering by a grid cell size bigger than your predictor variables or using a tool like spThin?)
2) Are you restricting the background extent to only areas each species is able to disperse to (i.e. are you using species-specific backgrounds?)
3) Are you exploring model complexity (i.e. are you using some tool like SDMtoolbox or ENMeval to figure out optimal settings per species? If not, you may have overly complex models for many species that predict poorly).
Also, please be aware that the Maxent logistic output is NOT probability of presence, but something akin to suitability of those conditions for your species. In other words, think of the output as a "potential distribution" given ideal conditions for everything else (biotic factors, movement factors, etc.). For example, it is a very real possibility that many of your species affect each other's distributions. Consider including the model predictions of some species as predictor variables for other species (e.g. if they compete) to integrate the biotic effect in some way. Start here and see if any of this helps.
Jamie Kass
PhD Candidate
City College of NYC