Sub-sample, cross-validation and bootstrapping, when to do which one?

4,412 views
Skip to first unread message

Hank

unread,
Feb 7, 2013, 4:29:30 AM2/7/13
to max...@googlegroups.com
Hi all

can anybody tell me how we should choose replicated run type( sub-sample, cross-validation and bootstrap)? I know about the advantages of them, but none are rejected, as I get different prediction map, in what basis I should pick replicated type?based on AUC?or nature of data.

Cheers

David Galbraith

unread,
Feb 8, 2013, 1:06:10 PM2/8/13
to max...@googlegroups.com
Most folks would probably say the nature of the data, but I think it is kind of fun to try them all just for kicks.


--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

tjsch...@gmail.com

unread,
Feb 9, 2013, 11:29:23 AM2/9/13
to max...@googlegroups.com
Hello Hank,

Here is my understanding of the three different replicated run types:

Cross-validation: Maxent makes k number of folds of your occurrence data to train and test the data. Here you will not be able to tell Maxent how many replicates you would like to run or the percentage of occurrence data you would like withheld for model validation (test occurrences). I believe this is optimal if you have a large number of species occurrences.

Subsample: Here you can set the number of replicates and the percentage to be withheld from each replicated run. This method is optimal for modelers who want to control their number of reps and percentage of withheld test occurrences and also who have moderate to many occurrences for their species of interest.

Bootstrapping: This method is optimal for modelers with few occurrences as Maxent will be allowed to test the model with occurrences that may have been used to train the model. Two potential problems with this method is that you lose statistical independence of your test and train data and your AUC values will end up slightly inflated; however, if you are limited in occurrence data, this may be your best option. I would use multiple approaches for model discrimination rather than just rely on Maxent's AUC values.

Hope this is helpful.

Best,
Tom

hakhandan

unread,
Feb 10, 2013, 1:06:04 AM2/10/13
to max...@googlegroups.com
HiTom

Thanks alot, but how many occureces would be consideed as limited or alot? I am trying to model at global scale and I have got 100 presence points, I can say they are well distributed, I tried all three methods but which one to choose?in which basis?
Secondly, as the study area is large compared to what we have meaning limited point, I know that it is expected to get high AUC as I get(0.992!), what other criteria I can look at and decidd about model performance?

Cheers



Sent from Samsung tablet

David Galbraith

unread,
Feb 11, 2013, 7:53:50 AM2/11/13
to max...@googlegroups.com
Could you get away with removing 20% for an independent test, then using cross-validation for model creation and tuning, and finally test on the withheld set? It might not be worth the loss in descriptive data for model creation though, since models are just approximations of reality anyway.

Hope that helps,
Dave

Hank

unread,
Jun 16, 2013, 5:19:44 PM6/16/13
to max...@googlegroups.com
Hi Tom
Thanks for the previous reply. Is there any rule of thumb for sub-sampling? I mean what is the usual split ( I read 80 percent for training and 20 percent for testing). I s there any reference to find he appropriate split percentage?

Cheers

Kris

unread,
Aug 14, 2013, 5:41:56 PM8/14/13
to max...@googlegroups.com
Is subsample the Jackknife approach Pearson referred to in his 2007 paper? If it is not has anyone done this, and does it mean that I need to make multiple occurrence samples?

sebastian...@gmail.com

unread,
Jan 21, 2018, 10:09:22 PM1/21/18
to Maxent
Hi tom,

thanks for this explination. Do you have a citation for this, an article or a book?

Best,
Sebastian
Reply all
Reply to author
Forward
0 new messages