Maxent results

906 views
Skip to first unread message

Martin Damus

unread,
Apr 23, 2008, 10:35:28 AM4/23/08
to max...@googlegroups.com
Hello,
 
Some questions in a long email from a newbie.
 
I am using maxent as a tool, and am not familiar enough with the mathematical theory to understand it from a technical viewpoint. I need layman's jargon, and I'm hoping someone can give it to me.
 
The descriptors in the output table leave me mostly baffled. Going line by line we have:
 
1) Fixed cumulative value of 1, in my current results this is a logistic threshhold of 0.027, and a fractional predicted area of 0.255, training omission rate of 0.002 and test omission rate of 0.086. P value is 0E0.
 
To me this means that in the output map, if I look at all the area "scored" 0.027 and above, this is 0.255 of the total area predicted by the model, and 0.002 of the training data points lie outside this area, while 0.086 of the test points lie outside of it. If I choose this as my "area of climatic suitability" I can say that it is highly likely (P 0E0) that the species will be found within this area. IS THIS THE CORRECT INTERPRETATION?
 
This continues with significant P-values through the next two, which I interpret similarly.
 
The fourth row is described as "minimum training presence" -- I presume this is now the area, threshhold etc. where none of the training samples are excluded, hence the large predicted area (0.815), low logistic threshhold (0.002) and training omission rate of 0. IS THIS CORRECT?
 
The fifth row is the "10 percentile training presence", which is interpreted similarly, that is it is the threshhold, fractional area etc. that includes 90% of the training points. CORRECT? In my model, the p-value for this row is 0.3, signifying that if I were to designate this logistic threshhold level (0.431) as the area in which my species would find suitable habitat, I cannot support that statement with these results -- the area is too small and I would miss a significant area of suitable habitat. CORRECT?
 
This is the end of the part I think I understand.
 
The next row is described as "Equal training sensitivity and specificity" and I do not understand the meaning of that. Ditto for the next row "Maximum training sensitivity plus specificity". If I understood these two I think the next two, which are the same for the test samples, I could figure out. COULD ANYONE PLEASE EXPLAIN THESE TO ME?
 
Then comes "balance training omission, predicted area and threshhold value". Again, I'm lost.
 
Ditto for the last one. ANYONE WANT TO TRY TO EXPLAIN THESE TO THIS LAYMAN?
 
How do others use these results? What level of logistic threshhold do you use to signify suitable area, especially if you are projecting into a geographic area your organism of interest does not already occupy?
 
Finally, I read in this group's archive that using a mask variable or otherwise restricting the environmental layers to only those areas (provinces, countries, etc.)  in which the organism is known to occur improves the quality of the model and its predictive value. In my modelling, I have data for where an insect is native, and where it has been introduced (more than 100 years ago). When I cut down the training environmental layers to circumscribe the known native range and then project onto the environmental layers of the known introduced range, the visual fit with the introduced range is better than when my environmental layers include the entire earth. BUT, I cannot get any of the test-training statistics because it seems Maxent wants all the data (test and training) to be within the geographical bounds of the training environmental layers. Is there any way around this?
 
That's all for now, thanks!
Martin Damus
Canadian Food Inspection Agency
Ottawa, Canada


Looking for the perfect gift? Give the gift of Flickr!

Adam

unread,
Apr 29, 2008, 4:23:13 PM4/29/08
to Maxent
Hello,

Don't know if you got any help on this...however, I am currently
taking a class on Maxent at CUNY and everyone in it is wishing for the
same exact thing. The best thing to do is to go to www.scholar.google.com
and look for articles by Phillips, Pearson, Anderson, Peterson, or
just Maxent articles in general....I am currently working on my final
project and am wishing for a cheat-sheet of defined terms. This field
has it's own lingo. I am a primatologist studying this as a
supplemental skill, and it is killing my classmates and I. If anyone
sent you a definitions cheat sheet, we would love to have it! Hope
this helps a bit...

Adam

atmc...@gmail.com
> ---------------------------------

Martin Damus

unread,
Apr 30, 2008, 6:57:38 AM4/30/08
to Max...@googlegroups.com
Hi Adam,

I got one very useful reply, but that is all. I have read all the papers I've been able to find, and have garnered from them what I can (which means anything that is non-mathematical). Good luck with your class!

Anyways, here's what I was sent as a reply to my question. The suggested paper was very useful, and the rest of the email helped clarify things:

The p value is the result of a chi square test in which the null 
hypothesis is that the omission rate you got in the model with the
given
proportional area predicted as present (so both values are sensitive to

your threshold choice) are no different then a random prediction with
the same predicted area present. So with a significant p value you,
theoretically, have a prediction better than random. With higher
thresholds, you are predicting more area as present and thus, harder to

be better than random. The threshold choices you are confused about are

based on the ROC curve. They are thresholds which try to balance
omission and comission errors. See this citation for a review:

Liu, C., P. M. Berry, T. P. Dawson, and R. G. Pearson. 2005. Selecting
thresholds of occurrence in the prediction of species distributions.
Ecography 28:385-393.

The problem(perhaps?) with these methods is that they are based on
presence/absence data and *may* not be appropriate for presence only
data. There is no (to my knowledge) good work evaluating the best
method
for presence only data so you will have to use your judgement.

I think it is better to limit your training area as you have done.
However, as you note, you now have to get your evaluation statistics
yourself for the projected area. The new tutorial for Maxent gives an
example to do this in R. I think there are functions in Diva GIS as
well
to get AUC but I haven't tried this.
Cheers,

Martin


Adam <atmc...@gmail.com> wrote:

Hello,

Don't know if you got any help on this...however, I am currently
taking a class on Maxent at CUNY and everyone in it is wishing for the
same exact thing. The best thing to do is to go to www.scholar.google.com
and look for articles by Phillips, Pearson, Anderson, Peterson, or
just Maxent articles in general....I am currently working on my final
project and am wishing for a cheat-sheet of defined terms. This field
has it's own lingo. I am a primatologist studying this as a
supplemental skill, and it is killing my classmates and I. If anyone
sent you a definitions cheat sheet, we would love to have it! Hope
this helps a bit...

Adam

atmc...@gmail.com

On Apr 23, 10:35 am, Martin Damus wrote:
> Hello,
>
>   Some questions in a long email from a newbie.
>
>   I am using maxent as a tool, and am not familiar enough with the mathematical theory to understand it from a technical viewpoint. I need layman's jargon, and I'm hoping someone can give it to me.
>
>   The descriptors in the output table leave me mostly baffled. Going line by line we have:
>
>   1) Fixed cumulative value of 1, in my current results this is a logistic threshhold of 0.027, and a fractional predicted area of 0.255, training omission rate of 0.002 and test omission rate of 0.086. P value is 0E0.
>
>   To me this means that in the output map, if I look at all the area "scored" 0.027 and above, this is 0.255 of the total area predicted by the model, and 0.002 of the training data points lie outside this area, while 0.086 of the test points lie outside of it. If I choose this as my "area of climatic suitability" I can say that it is highly likely (P 0E0) that the species will be found within this area. IS THIS THE CORRECT INTERPRETATION?
>
>   This continues with significant P-values through the next two, which I interpret similarly.
>
>   The fourth row is described as "minimum training presence" -- I presume this is now the area, threshhold etc. where none of the training samples are excluded, hence the large predicted area (0.815), low logistic threshhold (0.002) and training omission rate of 0. IS THIS CORRECT?
>
>   The fifth row is the "10 percentile training presence", which is interpreted similarly, that is it is the threshhold, fractional area etc. that includes 90% of the training points. CORRECT? In my model, the p-value for this row is 0.3, signifying that if I were to designate this logistic threshhold level (0.431) as the area in which my species would find suitable habitat, I cannot support that statement with these results -- the area is too small and I would miss a significant area of suitable habitat. CORRECT?
>
>   This is the end of the part I think I understand.
>
>   The next row is described as "Equal training sensitivity and specificity" and I do not understand the meaning of that. Ditto for the next row "Maximum training sensitivity plus specificity". If I understood these two I think the next two, which are the same for the test samples, I could figure out. COULD ANYONE PLEASE EXPLAIN THESE TO ME?
>
>   Then comes "balance training omission, predicted area and threshhold value". Again, I'm lost.
>
>   Ditto for the last one. ANYONE WANT TO TRY TO EXPLAIN THESE TO THIS LAYMAN?
>
>   How do others use these results? What level of logistic threshhold do you use to signify suitable area, especially if you are projecting into a geographic area your organism of interest does not already occupy?
>
>   Finally, I read in this group's archive that using a mask variable or otherwise restricting the environmental layers to only those areas (provinces, countries, etc.)  in which the organism is known to occur improves the quality of the model and its predictive value. In my modelling, I have data for where an insect is native, and where it has been introduced (more than 100 years ago). When I cut down the training environmental layers to circumscribe the known native range and then project onto the environmental layers of the known introduced range, the visual fit with the introduced range is better than when my environmental layers include the entire earth. BUT, I cannot get any of the test-training statistics because it seems Maxent wants all the data (test and training) to be within the geographical bounds of the training environmental layers. Is there any way around this?
>
>   That's all for now, thanks!
>
>   Martin Damus
>   Canadian Food Inspection Agency
>   Ottawa, Canada
>
> ---------------------------------
> Looking for the perfect gift? Give the gift of Flickr!

Adam

unread,
May 4, 2008, 3:22:21 PM5/4/08
to Maxent
Thanks that is kind of helpful. For presence-only data a very useful
paper is phillips and dudick 2008. They recommend a logistic setting
with linear and hinge features if I am not mistaken (but check the
paper). In my case I have very few occurence records and this gave me
a good result.......hope this helps, adam
> Adam <atmcl...@gmail.com> wrote:
>
> Hello,
>
> Don't know if you got any help on this...however, I am currently
> taking a class on Maxent at CUNY and everyone in it is wishing for the
> same exact thing. The best thing to do is to go towww.scholar.google.com
> and look for articles by Phillips, Pearson, Anderson, Peterson, or
> just Maxent articles in general....I am currently working on my final
> project and am wishing for a cheat-sheet of defined terms. This field
> has it's own lingo. I am a primatologist studying this as a
> supplemental skill, and it is killing my classmates and I. If anyone
> sent you a definitions cheat sheet, we would love to have it! Hope
> this helps a bit...
>
> Adam
>
> atmcl...@gmail.com
>
> On Apr 23, 10:35 am, Martin Damus  wrote:
>
>
>
>
>
> > Hello,
>
> >   Some questions in a long email from a newbie.
>
> >   I am using maxent as a tool, and am not familiar enough with the mathematical theory to understand it from a technical viewpoint. I need layman's jargon, and I'm hoping someone can give it to me.
>
> >   The descriptors in the output table leave me mostly baffled. Going line by line we have:
>
> >   1) Fixed cumulative value of 1, in my current results this is a logistic threshhold of 0.027, and a fractional predicted area of 0.255, training omission rate of 0.002 and test omission rate of 0.086. P value is 0E0.
>
> >   To me this means that in the output map, if I look at all the area "scored" 0.027 and above, this is 0.255 of the total area predicted by the model, and 0.002 of the training data points lie outside this area, while 0.086 of the test points lie outside of it. If I choose this as my "area of climatic suitability" I can say that it is highly likely (P 0E0) that the species will be found within this area. IS THIS THE CORRECT INTERPRETATION?
>
> >   This continues with significant P-values through the next two, which I interpret similarly.
>
> >   The fourth row is described as "minimum training presence" -- I presume this is now the area, threshhold etc. where none of the training samples are excluded, hence the large predicted area (0.815), low logistic threshhold (0.002) and training omission rate of 0. IS THIS CORRECT?
>
> >   The fifth row is the "10 percentile training presence", which is interpreted similarly, that is it is the threshhold, fractional area etc. that includes 90% of the training points. CORRECT? In my model, the p-value for this row is 0.3, signifying that if I were to designate this logistic threshhold level (0.431) as the area in which my species would find suitable habitat, I cannot support that statement with these results -- the area is too small and I would miss a significant area of suitable habitat. CORRECT?
>
> >   This is the end of the part I think I understand.
>
> >   The next row is described as "Equal training sensitivity and specificity" and I do not understand the meaning of that. Ditto for the next row "Maximum training sensitivity plus specificity". If I understood these two I think the next two, which are the same for the test samples, I could figure out. COULD ANYONE PLEASE EXPLAIN THESE TO ME?
>
> >   Then comes "balance training omission, predicted area and threshhold value". Again, I'm lost.
>
> >   Ditto for the last one. ANYONE WANT TO TRY TO EXPLAIN THESE TO THIS LAYMAN?
>
> >   How do others use these results? What level of logistic threshhold do you use to signify suitable area, especially if you are projecting into a geographic area your organism of interest does not already occupy?
>
> >   Finally, I read in this group's archive that using a mask variable or otherwise restricting the environmental layers to only those areas (provinces, countries, etc.)  in which the organism is known to occur improves the quality of the model and its predictive value. In my modelling, I have data for where an insect is native, and where it has been introduced (more than 100 years ago). When I cut down the training environmental layers to circumscribe the known native range and then project onto the environmental layers of the known introduced range, the visual fit with the introduced range is better than when my environmental layers include the entire earth. BUT, I cannot get any of the test-training statistics because it seems Maxent wants all the data (test and training) to be within the geographical bounds of the training environmental layers. Is there any way around this?
>
> >   That's all for now, thanks!
>
> >   Martin Damus
> >   Canadian Food Inspection Agency
> >   Ottawa, Canada
>
> > ---------------------------------
> > Looking for the perfect gift? Give the gift of Flickr!
>
> ---------------------------------
> Looking for the perfect gift? Give the gift of Flickr!- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages