Cross validation- how to setup

Genevieve

unread,

May 6, 2011, 5:50:57 AM5/6/11

to Maxent

Dear All,

I am a relatively new user to maxent, and would like to know how the K
fold cross validation analysis is set-up, where K is equal to 10. I am
running 10 replicate models on a relatively small dataset (sample size
ranges from 30 to 280), using a 25 % random test percentage. Do I
physically need to create 10 swd sub-sets from the full dataset, where
each time I run the model I specify a different training and test
dataset. OR will Maxent automatically separate the full dataset (input
as samples swd file) into 10 sub sample datasets?

Any input would really be appreciated.

Many thanks,
Gen

Heather Peacock

unread,

May 6, 2011, 3:07:25 PM5/6/11

to max...@googlegroups.com

Hi Genevieve,

You don't need to create the random subsets, Maxent does it for you.

I would caution you about using 10 subsets with such small data sets. For example, with 30 occurrence records and 10 fold cross validation you only have 3 points used per replicate run, which may not be enough to adequately make the model. Some authors suggest that with smaller data sets 5 replicates is acceptable. I myself had small datasets and used 4 fold cross validation in addition to bootstraping for those with fewer than 25 records.

Hope this helps!

Heather

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To post to this group, send email to max...@googlegroups.com.
To unsubscribe from this group, send email to maxent+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/maxent?hl=en.

Genevieve

unread,

May 10, 2011, 5:12:49 AM5/10/11

to Maxent

Hi Heather,

Thanks for the advice!!

hmmm, I am working with a species complex, so would it not be best to
use the same approach for all datasets within the complex? I just
looked at the maxent results file for the smallest dataset, maxent
used 30 samples to train, and 9 or 10 samples to test. Should I still
be using bootstrapping?

Thanks again,
Genevieve

On May 6, 9:07 pm, Heather Peacock <heather.peac...@gmail.com> wrote:
> Hi Genevieve,
>
> You don't need to create the random subsets, Maxent does it for you.
>
> I would caution you about using 10 subsets with such small data sets. For
> example, with 30 occurrence records and 10 fold cross validation you only
> have 3 points used per replicate run, which may not be enough to adequately
> make the model. Some authors suggest that with smaller data sets 5
> replicates is acceptable. I myself had small datasets and used 4 fold cross
> validation in addition to bootstraping for those with fewer than 25 records.
>
> Hope this helps!
>
> Heather
>

> On Fri, May 6, 2011 at 3:50 AM, Genevieve <genevieve.tom...@gmail.com>wrote:
>
>
>
>
>
>
>
> > Dear All,
>
> > I am a relatively new user to maxent, and would like to know how the K
> > fold cross validation analysis is set-up, where K is equal to 10. I am
> > running 10 replicate models on a relatively small dataset (sample size
> > ranges from 30 to 280), using a 25 % random test percentage. Do I
> > physically need to create 10 swd sub-sets from the full dataset, where
> > each time I run the model I specify a different training and test
> > dataset. OR will Maxent automatically separate the full dataset (input
> > as samples swd file) into 10 sub sample datasets?
>
> > Any input would really be appreciated.
>
> > Many thanks,
> > Gen
>
> > --
> > You received this message because you are subscribed to the Google Groups

> > "Maxent" group.> To post to this group, send email toma...@googlegroups.com.

Heather Peacock

unread,

May 10, 2011, 2:55:41 PM5/10/11

to max...@googlegroups.com

Yes, you should be consistent with your methods. With 30+ samples bootstrapping is not necessary and cross validation is best. I was simply suggesting using fewer replicate folds than 10 with the smaller datasets. Hopefully it works out for you.

To post to this group, send email to max...@googlegroups.com.

Reply all

Reply to author

Forward