Is the AUC value reliable to the simulation in the Maxent model?-- -I was stumped by a question from a paper reviewer

1,135 views
Skip to first unread message

1648...@qq.com

unread,
Sep 29, 2019, 5:26:49 PM9/29/19
to Maxent
I recently submitted a manuscript to a journal, but the reviewer raised doubts about the AUC algorithm.
He gave two papers in 2008 that questioned the AUC in Maxent model. (Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145-151.; Peterson AT, Papeş M, Soberón J. 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling 213(1):63-72
May I ask, is there any latest research to support the AUC in Maxent, 
or whether the AUC algorithm in the latest Maxent 3.4.1 has been optimized?

Surajit Hazra

unread,
Sep 30, 2019, 11:19:08 AM9/30/19
to max...@googlegroups.com
Hi, There lots of paper  available which was already published Using AUC with Maxent 

Try Akpan, 2019 paper. It is Based on Maxent And AUC value 
You can find this paper in Plos one 
--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/maxent/6bcd49a3-f162-40b9-825d-be760f684de0%40googlegroups.com.

Surajit Hazra

unread,
Sep 30, 2019, 11:27:13 AM9/30/19
to max...@googlegroups.com
Can you send me the question of your reviewer and a sample copy of your paper in pdf ? 

Adam Smith

unread,
Sep 30, 2019, 12:16:59 PM9/30/19
to Maxent
Hi,

The calculation of AUC in the Maxent program is not flawed, to my knowledge.  However, the issues Lobo, Peterson, and others have with AUC is that it needs to be interpreted with caution if it is calculated with presences and background sites (like in the Maxent program) instead of presences and absences.  For a well-tuned model maximum AUC is 1 - a/2 where "a" is the prevalence of the species (proportion of the study area occupied by the species). For example, if a species occupies 60% of the study region, maximum AUC will be 1 - 0.6/2 = 0.7.  Since we rarely know "a", we rarely know the maximum value of AUC that could be obtained by a good model.  In other words, if you get an AUC value of 0.7, you can't tell if this is a good or a bad model.  If "a" is 0.6, as in the previous example, this is a great model!  However, if "a" is 0.1, then this is a really bad model!  But you don't know "a" so you can't tell.

But what if you get a high value of AUC?  What if you get, for example, 0.95?  Even if "a" is 0.2 (so maximum AUC is 0.99), 0.95 close to 0.99. So isn't this a good model? Maybe. I don't think this is common with Maxent, but it is possible to get AUC > 1 - a/2. However, in this case AUC calculated with presences and background sites has a *negative* relationship with AUC calculated with presences and absences, which is what we'd really like to know since it tell us us how well our model truly performs.  So doing things to increase AUC (with presences/background) like adjusting settings, selecting among predictors, changing the study region extent, etc. can actually make the model worse even though it looks better. (The True Skill Statistic has a similar problem.)

I don't know what your reviewer said, but I would suggest also calculating some other measures of model performance, make like COR (see Elith et al. 2006) or the Continuous Boyce Index (Boyce et al. 2002, Hirzel et al. 2006). You could also use threshold-based statistics like sensitivity or omission rate.

Good luck!
Adam

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A.  2002.  Evaluating resource selection functions.  Ecological Modeling 157:281-300.

Elith, J., C.H. Graham, R.P. Anderson, M. Dudík [Dudik], S. Ferrier, A. Guisan, R.J. Hijmans, F. Huettmann, J.R. Leathwick, A. Lehmann, J. Li, L.G. Lohmann, B.A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J.McC. Overton, A.T. Peterson, S.J. Phillips, K. Richardson, R. Scachetti-Pereira, R.E. Schapire, J. Soberón [Soberon], S. Williams, M.S. Wisz, and N.E. Zimmermann.  2006.  Novel methods improve prediction of species’ distributions from occurrence data.  Ecography 29:129-151.

Hirzel, A.H., Le Lay, G., Helfer, V., Randin, C., and Guisan, A.  2006.  Evaluating the ability of habitat suitability models to predict species presences.  Ecological Modeling 199:142-152.

Smith, A.B.  2013.  On evaluating species distribution models with random background sites in place of absences when test presences disproportionately sample suitable habitat.  Diversity and Distributions 19:867-872.



On Monday, September 30, 2019 at 10:27:13 AM UTC-5, Surajit Hazra wrote:
Can you send me the question of your reviewer and a sample copy of your paper in pdf ? 

On Mon 30 Sep, 2019, 8:48 PM Surajit Hazra, <surajit...@gmail.com> wrote:
Hi, There lots of paper  available which was already published Using AUC with Maxent 

Try Akpan, 2019 paper. It is Based on Maxent And AUC value 
You can find this paper in Plos one 
On Mon 30 Sep, 2019, 2:56 AM 1648...@qq.com, <1648...@qq.com> wrote:
I recently submitted a manuscript to a journal, but the reviewer raised doubts about the AUC algorithm.
He gave two papers in 2008 that questioned the AUC in Maxent model. (Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145-151.; Peterson AT, Papeş M, Soberón J. 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling 213(1):63-72
May I ask, is there any latest research to support the AUC in Maxent, 
or whether the AUC algorithm in the latest Maxent 3.4.1 has been optimized?

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to max...@googlegroups.com.

Samuel Veloz

unread,
Sep 30, 2019, 12:39:53 PM9/30/19
to Maxent
There are lots of studies looking at this question but as far as I can tell there still doesn't seem to be a great answer for how to best evaluate predictions from a presence-only model. I often see people advocating for this or that statistic but in many of the comparisons I have read (and maybe I haven't kept up enough here) it seems like most of the statistics people have proposed are correlated with each other and none seem to consistently perform better than others. So ideally you have some independent presence absence data to work with but even then you are a getting an estimate from the realized niche and so how do you characterize an over-prediction (high suitability where a species is not observed)? 

Warren et al. just published an open access paper making the case that discrimination statistics may not be the best for model evaluation for many applications of Maxent models (and other presence only algorithms). Might be worth giving that a read https://onlinelibrary.wiley.com/doi/full/10.1111/jbi.13705

Sam

To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.

To view this discussion on the web visit

Neftalí Sillero

unread,
Sep 30, 2019, 1:37:15 PM9/30/19
to max...@googlegroups.com
Hi,

I think currently the best method to validate the models is to run null models following Raes and Ter Steege methodology and prove the AUC empiricals results (or any other metric) are always higher than the values obtained by the null models.


Best,

Neftalí

1648...@qq.com

unread,
Oct 1, 2019, 1:00:16 PM10/1/19
to Maxent
in my manuscript:

5XX}~WFHJ5[UAC%}8AT~UZS.png


the comment from the reviewer: 143-149 I suggest to include more references related to the opposite view about the value of AUC use as accuracy of the model. References used in this paragraph are not related on the statistical value, those papers only mention that AUC was used for model evaluation. References suggested are:
 view).

Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145-151.

Peterson AT, Papeş M, Soberón J. 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling 213(1):63-72


在 2019年9月30日星期一 UTC+8下午11:19:08,Surajit Hazra写道:
Hi, There lots of paper  available which was already published Using AUC with Maxent 

Try Akpan, 2019 paper. It is Based on Maxent And AUC value 
You can find this paper in Plos one 
On Mon 30 Sep, 2019, 2:56 AM 1648...@qq.com, <1648...@qq.com> wrote:
I recently submitted a manuscript to a journal, but the reviewer raised doubts about the AUC algorithm.
He gave two papers in 2008 that questioned the AUC in Maxent model. (Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145-151.; Peterson AT, Papeş M, Soberón J. 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling 213(1):63-72
May I ask, is there any latest research to support the AUC in Maxent, 
or whether the AUC algorithm in the latest Maxent 3.4.1 has been optimized?

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to max...@googlegroups.com.

1648...@qq.com

unread,
Oct 1, 2019, 1:01:19 PM10/1/19
to Maxent
thank you very much!

在 2019年10月1日星期二 UTC+8上午12:16:59,Adam Smith写道:

Adam Smith

unread,
Oct 1, 2019, 5:48:05 PM10/1/19
to Maxent
I think the reviewer is asking you to note that AUC isn't always a reliable measure of model performance.  In this case, (for reasons I explained above) you can't know if 0.9 is a good or bad value of AUC because it's calculated with background sites instead of absences.  It's *probably* indicative of a good model, but you can't be sure.  Also, I haven't read Rong et al. (2019) but I am guessing they are ultimately citing Swets (1988.  Measuring the accuracy of diagnostic systems.  Science 240:1285-1293.)  This article is commonly cited because it presumably gives classes for "excellent", "good", "mediocre" and "poor" AUC, but it actually doesn't.  Instead, he seems to say different fields of medicine (which is what he was reviewing) have different thresholds for what constitutes a good model.

So I would suggest you can compare AUC values, and assume the one that is higher indicates the model is better, although there is the chance that if you optimize too much you are actually making the model predict presence/absence worse!

Good luck,
Adam

MaxentNoob

unread,
Oct 3, 2019, 6:03:12 AM10/3/19
to Maxent
AUC alone is problematic because the integral includes nonsensical regions where sensitivity (sen) or specificity (spe) are approaching zero (among many other problems)

Aside from p-ROC I would suggest to threshold maximizing sum of sen & spe and evaluate using SEDI, i.e. symmetric extremal dependence index. Happy to provide links to my code and article and the ref for original sedi paper. Hope it helps.

Rainer

https://natureconservation.pensoft.net/article/33918/

https://github.com/RWunderlich/SEDI

Ferro CAT, Stephenson DB (2011) Extremal dependence indices: Improved evaluation measures for deterministic forecasts of rare binary events. Weather and Forecasting 26(5): 699–713. https://doi.org/10.1175/WAF-D-10-05030.1

1648...@qq.com

unread,
Oct 4, 2019, 2:51:32 AM10/4/19
to Maxent
thank you very much!

在 2019年10月2日星期三 UTC+8上午5:48:05,Adam Smith写道:

Jamie M. Kass

unread,
Oct 9, 2019, 8:10:24 PM10/9/19
to Maxent
I would like to add to the great explanations of Sam and Adam. First of all, when evaluating presence-background models with AUC, you should probably be looking at the test AUC, and not the training AUC, as the latter does not force your model to make predictions for occurrences not included in the training data, and thus is not representative of how well your model can make any new predictions with new data or for new places or times. The new Warren et al. paper does seem to go against this, so it's worth reading it through, but as this study is so new, there is no such consensus in the field.

Second, it is very important that you understand that comparing AUC (ideally test) in a relative way between models built with different settings (i.e., feature classes, regularization multipliers for Maxent) but with the same data and extent, does not violate any of the points made in either the Peterson et al. or Lobo et al. studies criticizing AUC. Their point was that you cannot use it as a definitive measure of model performance, for the reasons Adam pointed out. However, you can use it, and any other metric, to compare relatively between models built with the same data and for the same extent. See Radosavljevic & Anderson 2014 for an example of this logic, and Muscarella et al. 2014 for an R package ENMeval that is based on this idea.

Jamie

Adam Smith

unread,
Oct 10, 2019, 10:55:16 PM10/10/19
to Maxent
Hi!

Jamie and I might have to disagree on this, although I do agree with him that you can compare AUC of models for the same species and exact same data.  However, where we probably disagree is that at some point higher doesn't mean better (above 1 - a/2); it's actually worse.  As I said, I am guessing that this isn't much of a problem with Maxent or most other SDMs.  I have only ever found one case were I was pretty sure AUC was above that threshold (https://www.earth-syst-sci-data-discuss.net/essd-2016-65/). They got an AUC of 0.98 for a bird that is pretty much pan-global in distribution, meaning a is relatively high so "optimal" AUC should be lower than 0.98.

In any case I don't think there's a magic measure of model performance.  They all suffer in some way, and the few that don't are really hard to interpret/explain.

Good luck!
Adam

Jamie M. Kass

unread,
Oct 20, 2019, 1:15:17 AM10/20/19
to Maxent
Just to clarify -- I don't disagree with Adam at all. The problem is we don't usually know the prevalence of the focal species, and thus it's hard to know if there is a limit to AUC much below 1 above which we should stop interpreting that higher values mean better model discrimination. For cases where we suspect we are modeling a very prevalent species, it is probably better to use evaluation statistics that do not rely on prevalence (like the Boyce index, etc.). But for modeling rare species, we probably don't need to worry too much about this feature of AUC and interpret higher values as better (but we should all be skeptical of very high values).

Jamie

Reply all
Reply to author
Forward
0 new messages