threshold rule

3,921 views
Skip to first unread message

Maryam Bordkhani

unread,
May 19, 2014, 12:33:31 PM5/19/14
to max...@googlegroups.com
Dear Mr Phillips

Hi. I have a question regard threshold rule option in maxent software. I read posts about threshold rules in maxent group but i was unable to find answer.
There are ten threshold rules in software. What 's means each of them (Fixed camulative value 1, Fixed cumulative value 5, Fixed cumulative value 10, Maximum training presence, 10 percentile training presence, Equal training sensitivity and specificity, Maximum training sensitivity plus specificity, Equal test sensitivity and specificity, Maximum test sensitivity plus specificity, Equal entropy of threshold). I need to have a brief description of each. I like understand  each of these thresholds, in what situations and for what purpose is it used? For example if i Choice
''10 percentile training presence'', What justification is there for using it?
Considering that there are a lot of unanswered posts In this context, Your answer could be helpful for many users.
I'd be very grateful if you can answer these questions because I need it to finish my dissertation greatly.

Yours truly

Maryam Bordkhani

M.Sc.  Student of Environmental Sciences

Department of Environment, Faculty of Natural Resources,

Isfahan University of Technology, Isfahan, Iran

Email: m.bor...@gmail.com

            m.bor...@na.iut.ac.ir

Martin Damus

unread,
May 23, 2014, 6:08:16 AM5/23/14
to max...@googlegroups.com
I'm not Mr. Phillips, but I sense you need this information sooner rather than later and I've not seen an answer come across since you asked. Maybe this'll further help the conversation, so I'll give an attempt to answer this and *anyone / everyone* please point out where I've gone wrong, fill in blanks, make suggestions etc.

The fixed value thresholds you can probably ignore if you are investigating a biological phenomenon such as distribution. They may be interesting from a statistical point of view but do not reflect any biological meaning.

The minimum training presence (you wrote maximum by accident) means that the threshold is set so that no training sample will be excluded. Use this only if you have confidence in the validity of all your training dataset, especially those at the edge of the range. Often used in settings where being conservative is the 'preferred approach' such as in invasive species modelling, where being wrong on the side of caution is less worse than the alternative. But -- not suitable if you are trying to identify native suitable habitat -- probably over-estimates the range by a good margin with any realistic dataset.

Ten percentile training presence means that the threshold will identify the top 90% of your training samples --- a better choice if you have less than full confidence in your training set. Some will be missed -- it's up to you to decide whether that's okay or not. Probably better suited to native habitat estimation.

Equal training sensitivity and specificity -- at this threshold the chance of missing suitable distribution and assigning unsuitable distribution is the same. Gives a decent 'average' perhaps. Are there specific situations where this is optimal?

Maximum training plus sensitivity -- here the threshold maximises both the chance of erroneously assigning unsuitable distribution and missing suitable distribution -- not quite the same as the previous because the chance of doing each is not necessarily the same, but is maximised for both. Better than the previous??? No idea.

Equal test ... etc. -- these are the same as the two previous ones above, but refer to the test samples you used, not the training samples.

Equal entropy of threshold -- I could guess, but would rather not -- anyone else????

I generally use a threshold somewhere between 0% and 10% of the training dataset assigned incorrectly. It all depends on my confidence in the points used in training, but my application is to answer the question: "Does it have a chance of survival in Canada?", not "Where specifically in Canada could it survive?" -- in quarantine plant pest modelling, where keeping something harmful out by accident is preferable to letting it in by accident.

Martin

From: Maryam Bordkhani <m.bor...@gmail.com>
To: max...@googlegroups.com
Sent: Monday, May 19, 2014 12:33:31 PM
Subject: threshold rule

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.


Husam El Alqamy

unread,
May 23, 2014, 7:37:34 AM5/23/14
to max...@googlegroups.com
Dear Maryam
You can watch this video you tube it is called "BITC/ ENM - 41- Maxent Outputs 1"

--
Husam El Alqamy, B.Sc., M.Phil.
Sr. Biodiversity GIS Analyst ,
Environmental Information Sector, EIS
Environmental Agency Abu Dhabi,UAE
Antelope Specialist Group, ASG - IUCN

Samuel Veloz

unread,
May 23, 2014, 11:43:22 AM5/23/14
to max...@googlegroups.com
Justifying a threshold to use for presence-only modelling is probably one of the most challenging parts of the whole exercise. Although there are some recent papers on the subject, I am unaware of any general recommendations for thresholds with presence-only data to use and I think, as Martin points out, the choice may depend on your question. If you don't have a strong justification for selecting a particular threshold, which I would argue is the case most of the time, one approach might be to try several thresholds and see how it changes the results of your study. If your results are not that sensitive to the threshold, than the choice of threshold isn't a big deal. However, if the results are sensitive, you really need to justify why you select the threshold you use, again Martin had some reasonable explanations for some of the thresholds above. Another point to consider if your study results are sensitive to the threshold is whether you need to select a threshold in the first place. Can you get by just using the continuous output?

Cheers,
Sam

CliMond

unread,
May 25, 2014, 5:24:53 PM5/25/14
to max...@googlegroups.com, Martin Damus
Hiya Martin,

I'm interested in your comment that Minimum Training Presence (AKA Minimum Presence Threshold) "probably over-estimates the range by a good margin with any realistic dataset".  Can you please elaborate?  It seems like you are taking your input location set as given, and applying a universal mildly cynical assessment of them.  I agree that generally, location data aren't to be trusted - without verification. What I find interesting is that you'd advocate introducing an unquantified bias into your model to address the unidentified errors, rather than trying to address them directly.  I would have thought that MTP would have been the default threshold in MaxEnt, given the MaxEnt maxim about the model "agreeing with all that is known..."  Selecting any threshold other than MTP involves explicitly throwing away data (post facto).  I do understand that this may be prudent in some cases, depending on the research question at hand, where the costs of different errors are balanced or opposite those in most invasive species situations.

Can anyone please give me an example of where the equal sensitivity/specificity option was used to address a question?

Cheers
Darren

Martin Damus

unread,
May 25, 2014, 7:02:41 PM5/25/14
to CliMond, max...@googlegroups.com
Hi Darren,

Yeah, in my experience, I've noted that applying the MTP threshold can give wonky results given that many species include collection sites in urban areas in locations where one might question they could survive were the urban environment not present -- either due to heat-island effects, heated foundations to snuggle against in winter, or these areas are probably sinks rather than sources and the sampling of the species is due to human carriage into + human finding of, both magnified by increased human populations. I only model those species for my work purposes where there is a question whether they might survive in Canada -- a fairly cold place on average. So the species that I model also live in cold temperate regions where urban effects could be important. Also, I model forest and agricultural pests, for which there might be records in urban areas due to their carriage into them, rather than due to their establishment there. I have no hard evidence (i.e. studies done to prove this), but sometimes the model output doesn't jive with the general description of the biology of the species. Also, the model doesn't differentiate between a threshold on the cold vs the hot edge of the range -- the number's just a number at that point. The MTP-defining point might be a hot location, and using that same MTP *value* in defining the cold end threshold doesn't always jive with the organism's biology. I might choose a threshold that is higher than the 'real' MTP because it represents the 'MTP' at the cold end of the range. Remember  -- I'm using the model to answer the question "Could this organism survive in Canada?", not "what is its potential range in the Americas?". They are meant to be 'fit for purpose', not subjects of research papers.

If I can verify the veracity of a location then I do so, always. If the model output suggests that a given presence location is perhaps not to be trusted, then I have to go back to the biology of the organism and assess for myself whether or not to drop that location.

My comment may well have been an over-generalisation of something I note in my rather specific application -- thanks for bringing attention to it.

BTW -- anyone know of a North America-located Maxent training course happening this year?

Cheers,

Martin

From: CliMond <darren....@gmail.com>
To: max...@googlegroups.com
Cc: Martin Damus <dam...@yahoo.com>
Sent: Sunday, May 25, 2014 5:24:52 PM
Subject: Re: threshold rule

Samuel Veloz

unread,
May 25, 2014, 11:06:06 PM5/25/14
to max...@googlegroups.com, CliMond
I agree with Martin's comments above, just to add on. Remember that omission error (falsely predicting an occurrence location as absent) is just one side of the confusion matrix. You can use a threshold that includes all of your occurrence locations but chances are, this threshold will also predict as suitable, locations that aren't really suitable (high commission error) for your species. The MTP may get you the lowest omission error, but you may pay a price with higher commission error. The problem here with presence only data is that you can't really evaluate the commission error so MTP can seem like an attractive choice. The other thresholds are more conservative in that you will probably find that you accept some omission error but will see a reduction in commission error. So if for your question it is important to minimize both types of error, the MTP is probably not the best choice.

Cheers,
Sam


CliMond

unread,
May 26, 2014, 1:03:20 AM5/26/14
to max...@googlegroups.com, CliMond, Martin Damus
Thanks Martin,

I don't doubt that applying the MTP threshold to an unqualified dataset could give wonky results, but the same could probably be said of almost any threshold.  I typically have to spend a day or so with each species distribution dataset extracted from GBIF to clean it up.  As you know, I'm a great fan of the iterative modelling process, letting the model results and an understanding of the species ecology help indicate which points need closer scrutiny.

I understand your heat island issue.  Have you considered creating a composite temperature data set, combining a heat island scenario in urban areas with natural temperature variables elsewhere?

I'm not sure I understand the implication about your point regarding the lack of differentiation between a cold and hot threshold.  The model is fitted to all of the training data - across all gradients.  Towards the edges of the occupied n-dimensional space, the probability surface decreases in magnitude non-linearly as the probability of encountering location points diminishes.  I can imagine that the rate of decline in location point density in covariate space could be different across each of these dimensions, but would MTP really push things out that differently in different ends of the range?  If the MPT at the cold end of the range doesn't jive with the species biology, that's really a problem with the under-representation of the training data at the cold end of its range, rather than the threshold isn't it?  If you adjust the threshold to get your model portrayal to fit the cold end, don't you end up distorting it at the hot end?

BTW, I'm pursuing these questions mostly out of academic interest, so no criticism of the fitness of your modelling for its intended purpose should be implied!

Cheerio
Darren

CliMond

unread,
May 26, 2014, 1:39:43 AM5/26/14
to max...@googlegroups.com, CliMond, Samuel Veloz
Hi Sam,

I understand all of the issues you raise.  For the record, in my original post, I didn't disagree with Martin, I just wanted to know more about his underlying thinking.

Martin and I each work predominantly with invasive species, so we are more naturally concerned with minimising omission errors, and within that constraint, achieving satisfactory model plausibility and specificity.  We also have to deal with transferability in time and space.  For the questions we mostly face, trading off omission and commission errors isn't an option.  The tradeoffs you mention arise from model imperfections, whose negative effects are typically magnified during transferance.  To step outside the tradeoff, you have to try to eliminate or minimise input data and covariate errors, and then dig into the model structure and function to increase specificity without sacrificing sensitivity (e.g., in MX, selecting the "don't extrapolate" option is probably a good start).  The artful challenge is increasing specificity without over-fitting the model.

Cheers
Darren

Colin Driscoll

unread,
May 26, 2014, 4:14:56 AM5/26/14
to max...@googlegroups.com
Hi there

I model rare plant distribution and have generally settled on equal training sensitivity and specificity having trialled most of the options. I have found that ETSS is better at discriminating between suitable habitat and habit that the species would definitely not occur in e.g. a dry land species never occurring in swamps. Also, I expect that some occurrences will not be contained in the model as described in niche theory where not all occurrences will result in a long-term established population. But I guess it is easier to model plants compared with highly mobile fauna.

Nice discussion.

Colin

Martin Damus

unread,
May 26, 2014, 7:31:20 AM5/26/14
to CliMond, max...@googlegroups.com
Well I'm glad at least my initial intent -- to get some discussion going -- has worked! Being in a non-academic position within a relatively small regulatory agency means I have little internal opportunities for learning, so thanks for the questions and further discussion. 

This is likely the problem of a 'non-modeller' taking a system, trying to learn it and applying the reasoning he's got available to it and maybe messing up. When I look at the final output I'm not thinking in n-dimensional spaces and the linearity of probability surfaces. My mind doesn't work that way. I see a geographic spread predicted by a model that is bounded by a single value, be it 0.015, 0.08 or whatever the minimum training presence is. When I go to find which data point provided that minimum training presence threshold value I'm looking in 2 or 3-dimensional geographic space and thinking of single climate predictor layers. I may find the point that provided that threshold number in the northern part of the species' range, I may find it in the southern part of the species' range (talking northern hemisphere here). If I find it in the southern end and it's the northern predicted edge that doesn't seem to make sense to me, given what I know about the species' biology, I question whether using the threshold value as indicated by that southern-most point can reflect the threshold value that should be applied in the north. Often I will look for a northern point (if I have good confidence in the distribution of my presence locations) and select its logistic value as the threshold of occurrence. Because cold is the defining feature of Canada, I may select that presence point that experiences the most cold. This is an explanation only -- not an argument for or against the application of my logic. I expect it works more often than not when the question I am asking is whether the species has a chance of surviving in Canada. I'm usually not looking for a detailed predictive presence map, just an indication that it could survive. If it just ekes into Canada the presence map will likely indicate the west coast, the east coast, and maybe southern Ontario (or mildest areas). This makes biological sense and mirrors the distribution of many pests we already have. 

If I distort the hot end I may not care -- it tends to lie south of the Canadian border anyway and I'm not using the model to create policies that affect the United States or Mexico.

The composite scenario ... I'm waiting for it to appear on the Climond website ; )  Glibness aside, no, I haven't done that. My position is not research and my time to do anything 'extra' is limited. When we have research funds they normally go to support external research that advances our needs in some way.

Criticism is fine and happily accepted, so long as it intends to be helpful and points out a better way !!! : ) (which I would say it so far has been, thanks)

Cheers,
Martin


Cc: CliMond <darren....@gmail.com>; Martin Damus <dam...@yahoo.com>
Sent: Monday, May 26, 2014 1:03:20 AM
Subject: Re: threshold rule

Maryam Bordkhani

unread,
May 26, 2014, 4:40:48 PM5/26/14
to max...@googlegroups.com
Dear Martin, Husam, Sam, Darren and Colin

Hi all. I thanks a lot Martin for development discussion and i apologize you for the delay in my response to your comments.
In generally i think selecting threshold depend on the aim of study as Martin has pointed out as well so i agree with Martin about ''biological meaning of threshold''.
Your comments is very interesting but I need to more study & more thinking about this issue for commenting.
Anyway if this discussion will be continues in future can be interesting and useful for people like me. Thanks to all.
 
Yours truly
------------------------------------------------------------------------
Maryam Bordkhani
M.Sc.  Student of Environmental Sciences
Department of Environment, Faculty of Natural Resources,
Isfahan University of Technology, Isfahan, Iran
          m.bor...@na.iut.ac.ir

Reply all
Reply to author
Forward
0 new messages