Another question about a run

427 views
Skip to first unread message

Martin Damus

unread,
Apr 3, 2008, 10:46:02 AM4/3/08
to max...@googlegroups.com
Hello,
 
I have run Maxent on the same data with two environmental data layer sets. The first set is simply the second cut down to the area of interest (where the modeled organism is native and where it might be invasive) and the second is the whole earth. The graphical results are rather similar, but the weighting of the layers is not. Here you see what I mean. There are some strikingly different contributions, depending on the geographical extent of the environmental layer. Any comments?
 
percent contribution
Layer Whole Earth Subset Earth
PrecipColdQt 38.2 14.3
TempSeason 15.1 0
MnTempColdMth 14.8 11
MeanDiurRnge 10.2 6.5
Soil Moisture 8.5 10
PrecipWarmQt 7.1 0.1
PrecipDryMth 2.7 2
MeanTempWarmQt 1.3 5.3
MeanTempWetQt 0.6 0
MxTempWarmMth 0.4 0.8
Isothermal 0.4 2.9
PrecipWetMth 0.4 0.4
AnnMeanTemp 0.2 0
MeanTempDryQt 0.1 0
PrecipDryQt 0 0
PrecipWetQt 0 0
MeanTempColdQt 0 0
TempAnnRnge 0 8.6
PrecipSeasonal 0 0.2
AnnuPrecip 0 0
SoilType(Categorical) 0 37.9
 
 
Thanks,
 
Martin Damus
Entomologist, Canadian Food Inspection Agency


Instant message from any web browser! Try the new Yahoo! Canada Messenger for the Web BETA

Tereza

unread,
May 7, 2008, 9:34:28 PM5/7/08
to Maxent
Hey Martin,

Have you ever got any response on this? I am puzzeled by a similar
issue. In my case I get strikingly different predictions depending on
whether I use layers clipped to the area of interest or whether I use
the entire world. The later is predicting ***much*** larger area. Any
ideas what's causing that?

Thanks, Tereza



On Apr 3, 7:46 am, Martin Damus <dam...@yahoo.com> wrote:
> Hello,
>
>   I have run Maxent on the same data with two environmental data layer sets. The first set is simply the second cut down to the area of interest (where the modeled organism is native and where it might be invasive) and the second is the whole earth. The graphical results are rather similar, but the weighting of the layers is not. Here you see what I mean. There are some strikingly different contributions, depending on the geographical extent of the environmental layer. Any comments?
>
>                   percent  contribution    Layer  Whole Earth  Subset Earth    PrecipColdQt  38.2  14.3    TempSeason  15.1  0    MnTempColdMth  14.8  11    MeanDiurRnge  10.2  6.5    Soil Moisture  8.5  10    PrecipWarmQt  7.1  0.1    PrecipDryMth  2.7  2    MeanTempWarmQt  1.3  5.3    MeanTempWetQt  0.6  0    MxTempWarmMth  0.4  0.8    Isothermal  0.4  2.9    PrecipWetMth  0.4  0.4    AnnMeanTemp  0.2  0    MeanTempDryQt  0.1  0    PrecipDryQt  0  0    PrecipWetQt  0  0    MeanTempColdQt  0  0    TempAnnRnge  0  8.6    PrecipSeasonal  0  0.2    AnnuPrecip  0  0    SoilType(Categorical)  0  37.9
>
>   Thanks,
>
>   Martin Damus
>   Entomologist, Canadian Food Inspection Agency
>
> ---------------------------------

Martin Damus

unread,
May 8, 2008, 9:43:17 AM5/8/08
to Max...@googlegroups.com
Hi Tereza,
 
In short, no, I haven't. It may be no one knows, or that it is supposed to be obvious (which it isn't to me). I don't know.
 
But I seem to get better results (let's define better as "fitting more closely to my a priori gut feeling") when I cut down the layers. My explanation is that then the background pseudo-absence points do not include irrelevant points located in climes where the organism has no hope of survival, but rather from areas that are similar to its range in at least some of the variables. Hence the final projected distribution is more realistic, and the variable contributions change because Maxent wasn't selecting, say, areas on Greenland's ice sheets as pseudo-absence locations. Maybe I'm right, maybe I'm way off.
 
Cheers!
 
Martin

Tereza <calop...@hotmail.com> wrote:

All new Yahoo! Mail - Get a sneak peak at messages with a handy reading pane.

Steven Phillips

unread,
May 8, 2008, 10:06:01 AM5/8/08
to Max...@googlegroups.com
Martin and Tereza,

The basic Maxent theory depends on the presence records being drawn
randomly from the species distribution in your study area. To avoid
violating that assumption, you should generally exclude from your
study area any regions where there's a chance that the species is
present, but where you know you haven't done any surveys. For
example, if you only have presence records collected in one country,
it's best not to use a whole continent or the whole world for
background.

-- Steven


Date: Thu, 8 May 2008 09:43:17 -0400 (EDT)
From: Martin Damus <dam...@yahoo.com>

Tereza

unread,
May 9, 2008, 1:52:49 PM5/9/08
to Maxent
Martin and Steven,

Thanks for your replies. They make sense. The reason why I ran two
parallel runs (one with the layers clipped to the area of interest and
one with the entire world) is that I project my models on the climatic
conditions of the last glacial maximum (LGM). If I project from "the
clipped area to clipped area" - my clamping values are approximatelly
ten times higher than when I project from "the entire world to clipped
area" since the range of values in the later case is larger. Also, in
the later case, the projected (LGM) models don't have patches of weird
high probabilities in some areas where they should not be (and which
usually have high clamping values). On the other side "the clipped
area" model gives me (as you mentioned) more reasonable current
prediction.
So which approach would you recommend me to use?

Thanks again, Tereza

Sam Veloz

unread,
May 9, 2008, 8:04:25 PM5/9/08
to Max...@googlegroups.com
Tereza,
If your occurrence records are incomplete (which is usually the case,
especially if you consider fundamental niche space vs. realized niche
space) then I think you should limit your training area to some region
around your occurrence points in which you feel confident in the
completeness of surveys. So the lower clamping values you are getting
when training with the whole world are really misleading. I think, as
Steven stated below, that you are really violating the assumptions of
the model and I think your predictions are then suspect. Unfortunately,
with incomplete sampling of the potential niche space of a species, you
have to live with lower confidence in your predictions when predicting
to environmental conditions that differ from the training area. I would
say if your predictions seem strange in areas with high clamping, there
is a good chance they are a bad fit. This is generally true when
extrapolating with any technique, like regression for example. I am
curious how folks are dealing with high clamping values. Do you exclude
predictions once clamping values are too high? If so, how do you
determine a threshold?
Sam

Tereza

unread,
May 13, 2008, 12:24:22 AM5/13/08
to Maxent
Thanks, Sam.

So what about if I use the bias file with high values (e.g. 10) within
my study area and small values (e.g. 1) throughout the rest of the
world?

Otherwise, I am just fighting with including the clamping values into
my models. It looks like I would have to exclude areas with clamping
values higher than about 0.1 since those areas seem to have
unrealistically high probability values. I am also thinking about
simply substracting the clamping values from the the logistic
probabilities since the clamping values are difined as "the absolute
change in logistic output value due to clamping". Suprisingly this
approach gives me a very reasonable model ***without*** those weird
clumps of high probabilities way out that have also the highest
clamping values.

I will appreciate any further comments or suggestions.

Thanks, Tereza
> >>      Martin- Hide quoted text -
>
> - Show quoted text -

Steven Phillips

unread,
May 13, 2008, 4:37:01 PM5/13/08
to Max...@googlegroups.com
Tereza and Sam,

I agree with Sam's assessment: if you're projecting into areas with
environmental conditions that differ from your training area, your
predictions will be suspect.

I don't recommend the bias file approach that Tereza describes. I
prefer your other proposal: subtract off the clamping values from the
logistic predictions. That's a reasonably conservative approach to
limit predictions when environmental conditions are outside the
training range.

-- Steven


Date: Mon, 12 May 2008 21:24:22 -0700 (PDT)
From: Tereza <calop...@hotmail.com>

Sam Veloz

unread,
May 13, 2008, 4:44:02 PM5/13/08
to Max...@googlegroups.com
Anyone have any suggestions for dealing with high clamping values when
you are using the raw data format?
Thanks,
Sam

RC

unread,
Aug 2, 2012, 1:23:40 AM8/2/12
to max...@googlegroups.com
The effect of the extent of the "buffer" around your presence points has now been explicitly tested. See VanDerWal J, Shoo LP, Graham C, & Williams SE (2009) Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling, 220(4), 589-594 

Tropica

unread,
Aug 23, 2012, 1:28:56 AM8/23/12
to max...@googlegroups.com
Hi Renee,

Thanks for dragging this old post up!  What I found most interesting about it was the statement from Stephen Phillips (8th May 2008) around how to choose your background:

"To avoid violating that assumption, you should generally exclude from your study area any regions where there's a chance that the species is present, but where you know you haven't done any surveys."

Obviously thinking around the background has come a long way since 2008, but the statement above certainly doesn't fit with how I see many backgrounds being chosen these days.  I think we have also come a long way since the VanDerWal paper of 2009 and there exist now a range of more informative ways to approach the question.  As highlighted by Rodda et al (2011, PLoS One), the choice of background is extremely important.

One of the biggest problems I see with background choice is the conflict between geographical and covariate space and how these two issues are applied to background selection.  Backgrounds are always ultimately defined by geographic space, but if we consider how backgrounds are used in the model, it makes a lot of sense to use covariate space to provide the 'right amount of buffer' to the range occupied by the distribution points (i.e. to find a balance between avoiding the model being driven by too few covariates or not having enough covariate space to discriminate usefully).  For example, there are regions in the world where walking 200kms in a straight line would provide vast contrasts in temperature, but in other areas there could be almost no change.  Actively considering the spatial nature of these covariate gradients was the main thinking behind the approach to use Koppen-Geiger zones to define backgrounds (see Kriticos et al. 2012 Meth Ecol Ev). 

Some would argue that dispersal ability etc might also be relevant, but that's a story for another day...




Fabio Berzaghi

unread,
Aug 23, 2012, 5:08:43 AM8/23/12
to max...@googlegroups.com
I am glad this thread came up because I was wondering about this as well. I have modeled initially two species in an area across four easter african countries, then i cut down to a smaller area and and once species remained constant but the other one with less samples, around 30, showed suitable conditions in a much larger area than before. There is a big difference between the small and big area. I know 30 samples aren't that many but I was hoping in more consistency. I wonder if the 10,000 background points are enough when modeling an area with 4,000,000 points. 
Reply all
Reply to author
Forward
0 new messages