Thresholds questions for binary maps

4,129 views
Skip to first unread message

Camille

unread,
Jul 30, 2012, 3:09:10 AM7/30/12
to max...@googlegroups.com
Dear all,

I have 3 questions regarding the thresholds:

1- I ran a model (10-fold crossvalidate, for small sample of 8 occurrence points), and got exactly the same threshold values for the 'minimum training presence logistic' and the '10 percentile training presence' ....any idea why?...I have not had that before when I ran the model with subsample replication type....I want to make binary maps and I usually use the minimum threshold, but this time, it shrank the suitable habitat of my species way too much....Any thoughts??

2- I am not sure at the difference is between the 'minimum training presence logistic' and the 'minimum presence area' within the maxentResults.csv file...can someone explain?...is it just that the minimum presence area includes every single area of the distribution that is >0 ? while the minimum training presence just reduce it a little according the threshold?

3- One think I have been wondering is what difference does it make when we set the threshold rule BEFORE running the model??....what I usually do is that I do not set a threshold rule BEFORE but go get my threshold value within the maxentResults.csv file AFTER to create my binary maps...because I always want to make binary maps and always decide to use the minimum presence training one (for the study I am working on at the moment).....so should I set the threshold rule before? what difference does it make??

Any advise/comments etc. would be much appreciated!

Thanks to the community;

Best,
Camille.

John Baumgartner

unread,
Jul 30, 2012, 9:24:48 PM7/30/12
to max...@googlegroups.com
Hi Camille,

1. Minimum Training Presence uses the suitability associated with your least suitable training presence record as the threshold. 10 Percentile Training Presence uses the suitability threshold associated with the presence record that occurs at the 10th percentile of presence records (i.e. the suitability of the presence record below which 10% of presence records' suitabilities fall). In your case, it looks like these should be equivalent, because when you are using 10-fold cross validation but have fewer than 10 records (although I'm not sure what Maxent does here... does it stop at 8-folds?).

2. For each threshold type, the maxentResults.csv file gives (1) the cumulative value associated with that threshold (the cumulative value for a particular location indicates the proportion of pixels or sample points that have a "probability of occurrence" equal to less than that for the focal location), (2) the logistic value associated with that threshold, (3) the omission rate on the training sample (i.e. the proportion of training/test presence sites incorrectly predicted unsuitable, given the choice of threshold), and (4) the proportion of points/pixels predicted suitable, given that threshold. So to answer your question, using Minimum Training Presence Logistic as a threshold means that you are considering as suitable all sites that are at least as suitable as the least suitable site in your training set. The Minimum Training Presence Area indicates the proportion of training (combined with test? not sure) locations that are predicted as suitable, given the MTP threshold.

3. Setting Apply threshold rule in Advanced settings (or by using the command line arg 'applythresholdrule=Minimum training presence') will create a binary grid of suitability calculated at your specified threshold, in addition to the regular outputs. (Help states: "Apply a threshold rule, generating a binary output grid in addition to the regular prediction grid. Use the full name of the threshold rule in Maxent's html output as the argument. For example, 'applyThresholdRule=Fixed cumulative value 1'.")

Hope that helps.

John

Camille

unread,
Jul 31, 2012, 6:38:32 AM7/31/12
to max...@googlegroups.com
Thanks for the reply!

I am still puzzled though by the fact that the 'minimum training threshold' and '10% training threshold' values are the same for just that species (I am modelling 4 different species, and these two thresholds are the same only for one of them, no matter what run type I do)....I ran again the model using subsampling (25:75) and again the 2 thresholds were the same...then I ran the model again with cross-validation (8-folds, since yes, Maxent stops anyway after 8 as I only have 8 occurrence points), and it gave again the same value for the two thresholds....I don't really know what to do because my binary map doesn't seem right...there seem to be something wrong here, because it mean that no matter the threshold I decide to use for that species, I get the same binary map...

I also tried to find the binary grid that is supposed to be produced when setting a threshold before the run, but I don't know where to find it...??



Camille.

John Baumgartner

unread,
Jul 31, 2012, 9:02:50 AM7/31/12
to max...@googlegroups.com
Hi again,

Do your other species have more than 10 presence records? If so, that would explain why the other species do not show an identical value for the minimum training presence and the 10 percentile training thresholds. Regardless of whether/how you split up your 8 records into training and test sets, you will never have more than 8 records in the training set. If, as in your case, you are training the model on 75% of the data (i.e. 6 records), and testing on the remaining 25% (2 records), then your minimum training presence threshold will be the suitability value associated with the least suitable of those 6 training sites. However, because you have less that ten training records, the record that occurs at the 10th percentile will also be this least suitable of these 6 sites.

As for your thresholded ascii (or whatever format you've specified)... The file will be called something like species_thresholded.asc, and can be found in the output directory. It will only be made if you are projecting your model to grids of your predictors. The predict function for projecting Maxent models with dismo doesn't seem to notice the 'applythresholdrule' argument. But if you're running from R, then you can easily threshold with something along the lines of:

me.pred.thresholded <- me.pred > me.fit@results['Minimum.training.presence.logistic.threshold',]
plot(me.pred.thresholded)

where me.fit is the trained Maxent model (using maxent() in the dismo package), and me.pred is the projection from predicting me.fit.

That's the easiest way in my opinion, but you could also call Maxent from R using system(), which I believe should listen to the applythresholdrule argument.

Cheers,
John

Camille

unread,
Jul 31, 2012, 11:18:21 AM7/31/12
to max...@googlegroups.com
Thanks so much for your time to respond!
Very useful information...
I am learning everyday a little more about Maxent!

Yes, my 3 other species have occurrence data >10...so now I understand better...

One last thing, can the 'Balance training omission, predicted area and threshold value' threshold (in the MaxentResults.csv) be used to make the binary maps?? 
I checked all my outputs and it seems that this one would represent the best the logistics output pictures provided by maxent, and would include most of the probability distribution, which is kind of what I am after since I am interested in observing the ecological niche difference of the 4 species rather than predict a conservative distribution intended for species conservation or management....

Thank you,
Best,
Camille.
Reply all
Reply to author
Forward
0 new messages