Thank you, Steven. I have been hesitant to use that "Balance..."
metric at all, given that I couldn't explain to an end user what those
factors mean or where they come from. It would be interesting to see
how good a job it does with regard to prevalence on a larger number of
species, which is something I may do for my own edification down the
road.
I have a follow-up question, or maybe just something to ponder...
Freeman and Moisen (2008--see reference below) show that selecting a
threshold such that kappa is maximized might be the best approach for
producing a model that is unbiased in terms of prevalence, and in
terms of maximizing kappa. Obviously, we cannot calculate a true
kappa value with Maxent, given that we are not using absence data.
However, another metric that has been suggested for model accuracy
assessment--True Skill Statistic (TSS; Allouche et al. 2006)--can be
calculated using sensitivity and the pseudo-specificity produced by
Maxent. This metric is suggested as being preferable to kappa, since
it's less affected by differences in prevalence. Functionally (unless
I'm mistaken), maximizing TSS selects the same threshold as "Maximum
sensitivity plus specificity," so one might expect that this metric
would have been shown in Freeman & Moisen to be among the least
biased. This is not the case; in fact, it appears that MaxSens+Spec
is among the worst thresholding metrics with regard to minimizing bias
in predicted prevalence and maximizing kappa. Granted, they were
using a different algorithm, and were using true specificity, so that
surely plays a role.
Additionally, Freeman and Moisen show that the only thresholding
metrics that do a good job of producing maps not biased with regard to
prevalence are "Maximize Kappa" and "Set Predicted Prevalence =
Observed Prevalence," neither of which we can really calculate with
Maxent output. For most of my modeling, producing a map that's
unbiased with regard to prevalence is a high priority, given that we
have to balance between uses like clearance surveys (where you want to
be sure you're not missing potential areas of distribution) and
genetics work (where you don't want to waste time and budget trapping
in areas that may not be occupied). We have talked about a multi-
class model that uses specificity and sensitivity cutoffs (0.9, for
example), along with the MaxSens+Spec cutoff, to produce a map with 4
categories that could be interpreted as follows:
(a) Predicted Absent (<10% Estimated Omission)
(b) Predicted Absent (Moderate Probability)
(c) Predicted Present (Moderate Probability)
(d) Predicted Present (<10% Estimated Commission)
This may work for many of our end users, since it would contain the
data for both of the use types mentioned above, as well as
representing sort of a binary presence/absence map, but I want to be
sure that the threshold I use to make the split between (b) and (c) is
producing a map that is as unbiased as possible with regard to
prevalence.
Therefore, I am trying to understand a number of things:
1) Why would maximizing TSS (via MaxSens+Spec threshold) not produce a
less prevalence-biased map?
2) Is there a mathematical reason why using the pseudo-specificity vs.
a true specificity in MaxSens+Spec might lead to a biased prediction?
3) Given what we know about "Maximize Kappa" and "Set Predicted
Prevalence = Observed Prevalence" and their ability to predict good,
unbiased thresholds, is there some other analog we might be able to
derive from Maxent output?
This is probably one of those riddles with no real answer (or, at
least, no simple answer), but I would welcome any thoughts or
suggestions.
Refs:
Allouche, Omri, Tsoar, Asaf & Kadmon, Ronen (2006). Assessing the
accuracy of species distribution models: prevalence, kappa and the
true skill statistic (TSS). Journal of Applied Ecology, 43, 1223-1232.
Freeman, Elizabeth A., Moisen, Gretchen G. (2008). A comparison of the
performance of threshold criteria for binary classification in terms
of predicted prevalence and kappa. Ecological Modelling, 217, 48-58.