Dear all
I have a question about outputs of ENMeval.
In the paper Muscarella et al. 2014 (Methods in Ecology and Evolution), respected authors proposed four criteria (i.e. Lowest AUC.DIFF, Lowest ORmin, Highest Mean AUC and Lowest ∆AICc), But in another paper Muscarella & Uriarte,2016, only test AUC value and test point omission rate have been used.
How to choose the optimum feature-rm combination from ENMeval outputs specially when various criteria propose very different combinations?
Best regards
Iman
Great question. This will entirely depend on your study system and the questions you are asking. I wish there was a straightforward answer to this. It's kind of like asking "which predictor variables should I use in my model?". The answer will always vary across studies.
To start to answer why one might choose any of these evaluation stats over another, think first of whether or not you want to evaluate your models with cross-validation (CV) or not. If you are not transfering the model to a new time or place, and are not concerning with how accurately the model can predict to new environments (i.e. how stable are predictions to new data), then relying on something like AIC, which does not consider testing data, could be sufficient to choose a model. Further, AIC directly penalizes model parameter number, so Maxent models with low regularization multiplier (e.g. like default setting of 1) and hinge features (which can result in many parameters) generally have extremely inflated AIC values (low AIC is good).
However, if the inverse is true, you will likely be interested in spatial CV results for test AUC, AUC diff, or omission rates (see Roberts et al. 2017 for a great discussion on spatial CV). You could also use non-spatial CV if you don't think there is spatial structure to your occurrences across space, or if you have very few occurrences and you must partition by jackknife (see Shcheglovitova & Anderson 2013). Further, some stats more directly reflect how overfit the model is (like AUC diff; see Warren and Seifert 2011). As for which CV stats to choose, there is no agreed-upon "best" stat -- they all have problems (e.g. see Lobo et al. 2008 and Peterson et al. 2008 for issues with AUC).
Also, if you happen to have independent occurrence data, you probably want to skip CV and just evaluate on this dataset. CV evaluations are never truly independent because they work on subsets of the same dataset, but they can be good approximations for independent groups with the right partitions (again, see Roberts et al. 2017).
My lab (Anderson lab at CCNY) usually uses some combination of AUC (threshold independent) and omission rate (threshold dependent), but we've also experimented with AICc and found that sometimes optimizing by either returns the same or similar models. Sometimes they are quite different. Bottom line -- you need to make this choice as the researcher. Hope I helped people understand the differences between these stats a little better. I personally think that testing on withheld data, whether it's by CV or with independent data, is preferable for most questions we ask when we run niche/distribution models.
Jamie Kass
PhD Candidate
City College of NY