interpretation of ENMeval results - omission rate metrics & overfitting

1,148 views
Skip to first unread message

Amber

unread,
Sep 9, 2018, 7:34:50 PM9/9/18
to Maxent
Hi all,

This is an example of what my ENMeval results look like:

settings features  rm full.AUC  Mean.AUC     Var.AUC Mean.AUC.DIFF Var.AUC.DIFF  Mean.OR10   Var.OR10  Mean.ORmin    Var.ORmin     AICc     delta.AICc     w.AIC          nparam
63    H_5.5        H   5.5   0.9571 0.9544917     0.001254027    0.01014694  0.0004757842  0.1025801   0.01243401 0.001470588   8.650519e-06 10634.26          0          0.9980756     47

The ENMeval publication (Muscarella et al., 2014) speaks of two kinds of omission rate evaluation metrics, ORMTP(‘Minimum Training Presence’ omission rate) and OR10 (10% training omission rate). I'm guessing that the 'Mean.OR10' from my results is the OR10 from the paper and the 'Mean.ORmin' is the ORMTP? I'm wanting more clarity about the interpretation of these two values in particular. 

For the OR10, the paper says 'Omission rates greater than the expectation of 10% typically indicate model overfitting'. So is it right to say then that if that Mean.OR10 value is over 0.10 then the model is overfit? I have a lot of results from my models that have a mean.OR10 of ~ 0.20 so are these overfit as it's over 10%??

For the ORMTP  the paper says 'Omission rates greater than the expectation of zero typically indicate model overfitting'. My Mean.ORmin values are often very low, like they are here, but are never 0. 
Is there a cut off for this value that would indicate overfitting or is it just anything above 0, would this example here be overfit based on this value?

I have the same question about the AUCDIFF  metric, the paper says 'Value of AUCDIFF is expected to be positively associated with the degree of model overfitting' ..but at what point do you call the model overfit based on this value? 

Thanks

Jamie M. Kass

unread,
Sep 11, 2018, 7:20:32 AM9/11/18
to Maxent
OR10 is the omission rate after removing those points with suitability in the lowest 10 percentile among all points. So the baseline is 10%, and anything above that indicates omission of points with higher suitability than the bottom. Thus, if you have a few outliers, OR10 will likely exclude these in its calculation. This is different from ORmin, which calculates omission rate on all your data, and is sensitive to any outliers. Thus, if an outlier consistently has low suitability, the threshold for calculating the omission rate will also be low, and thus the omission rate will be deceivingly low. If you remove this point prior, the threshold will match the rest of your data, and will represent the suitability of non-outliers better.

Your OR10 is barely above 0.1, so it looks very good. It is important to ask which partitioning method you used though. Random partitioning could lead to problems with spatial autocorrelation, and block or checkerboard could give you omission rates that are not affected as much by this and give you more honest (but likely higher) omission rates. Also, if you plan to transfer the model to other times/places, block is a good option. Check out these papers:

https://onlinelibrary.wiley.com/doi/abs/10.1111/ecog.02881
https://onlinelibrary.wiley.com/doi/abs/10.1111/geb.12684

Jamie Kass
PhD Candidate
City College of NY

Reply all
Reply to author
Forward
0 new messages