fewer occurrences, higher AUC

154 views
Skip to first unread message

Allison Howard

unread,
Jan 27, 2014, 3:43:03 PM1/27/14
to max...@googlegroups.com
I'm curious if anyone has run into increasing AUC values when reducing the number of occurrence points used to build and test the model. I started with a data set with 8600 presence points and I have subsequently reduced that sample to take a look at the effect of clustering on my model. What I saw was that the AUC is increasing as a result of fewer presence points. I reduced the points by taking subsets of my data set with min buffers of 30, 25, 20, 10, and 5 m.

I haven't found any explanation of why this might work this way, but perhaps I'm just missing something. 

thanks

Francisco Rodriguez Sanchez

unread,
Jan 28, 2014, 5:16:35 AM1/28/14
to max...@googlegroups.com
Hi Allison,

I guess that may be related to the fact that as you reduce the number of
occurrences the model is more able to discriminate between presences and
absences, which is what AUC conveys. It has long been known that AUC is
higher with restricted species (i.e. fewer occurrences). If you have
many presences, the species is almost everywhere, and the model can
hardly discriminate presences and absences. See related thread here:
https://groups.google.com/forum/#!topic/maxent/LfAJjkc_2Ts

Hope it helps

Cheers

Paco
> --
> You received this message because you are subscribed to the Google
> Groups "Maxent" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to maxent+un...@googlegroups.com.
> To post to this group, send email to max...@googlegroups.com.
> Visit this group at http://groups.google.com/group/maxent.
> For more options, visit https://groups.google.com/groups/opt_out.

--
Dr Francisco Rodriguez-Sanchez
Forest Ecology and Conservation Group
Department of Plant Sciences
University of Cambridge
Downing Street
Cambridge CB2 3EA
United Kingdom
http://sites.google.com/site/rodriguezsanchezf

Hypolite Bayor

unread,
Jan 28, 2014, 10:12:56 AM1/28/14
to max...@googlegroups.com
This will occur if the fewer points left are located in more homogenous environments than the original points. It does not necessarily mean a better model and in fact may result from biased sampling. A lower AUC obtained from a fairly well distributed samples representing the natural distribution of the organism modelled is better than the higher AUC model of locations that do not represent the distribution of the organism.

Dr. Hypolite Bayor


> an email to maxent+unsub...@googlegroups.com.

> To post to this group, send email to max...@googlegroups.com.
> Visit this group at http://groups.google.com/group/maxent.
> For more options, visit https://groups.google.com/groups/opt_out.

--
Dr Francisco Rodriguez-Sanchez
Forest Ecology and Conservation Group
Department of Plant Sciences
University of Cambridge
Downing Street
Cambridge CB2 3EA
United Kingdom
http://sites.google.com/site/rodriguezsanchezf

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+unsub...@googlegroups.com.

Marcelo Lima

unread,
Feb 5, 2014, 5:57:45 AM2/5/14
to maxent
Hi Allison, I am curious why you are using Maxent in this case, i.e 8600 points.
Cheers
Marcelo


--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dr. Marcelo Gonçalves de Lima
Senior Programme Officer - Protected Areas Effectiveness

Protected Areas Programme
United Nations Environment Program, World Conservation Monitoring Centre
219 Huntingdon Road, Cambridge, CB3 0DL, UK

http://www.unep-wcmc.org http://www.protectedplanet.net


IUCN - CEM member
Biologist, PhD in Ecology
http://lattes.cnpq.br/1539538568877382

John Baumgartner

unread,
Feb 5, 2014, 7:08:56 AM2/5/14
to maxent

Now I'm curious, Marcelo. Do you think Maxent is inappropriate for presence-only modeling with large datasets?

John Baumgartner

unread,
Feb 5, 2014, 7:14:50 AM2/5/14
to maxent

One thing you haven't told us, Allison, is whether you're referring to AUC calculated on training data or test data, and if the latter, which data? I would be more surprised if this pattern arose with AUC calculated on your huge withheld dataset.

Marcelo Lima

unread,
Feb 5, 2014, 8:09:58 AM2/5/14
to maxent
Hi John, I had the impression that it was actually more useful with much smaller datasets, like 20 or so points? 8200 is an impressive dataset!
Best
Marcelo

Megan S

unread,
Feb 7, 2014, 7:45:26 AM2/7/14
to max...@googlegroups.com
Hi Marcelo,

Are there any modeling programs that you would recommend for a large dataset?

Thanks,

Megan

John Baumgartner

unread,
Feb 7, 2014, 4:38:50 PM2/7/14
to maxent

While Maxent can be effective with relatively small presence-only datasets, it's no less suitable for large ones. All else equal, it means there is more information available in order to better characterise the species-environment relationship. It still pays to check response curves to ensure they're sensible, and if there are signs of overfitting, adjust feature types and regularisation accordingly.

I'm still interested in whether people have recommendations for alternative approaches that might better exploit large sample size.

Reply all
Reply to author
Forward
0 new messages