Questions regarding MaxEnt/ENMeval (running replicates, customizing response curve plots)

480 views
Skip to first unread message

ltbe...@gmail.com

unread,
Oct 5, 2018, 9:32:28 AM10/5/18
to Maxent


Dear all, 

 

I have some questions regarding MaxEnt/ENMeval and I hope that some of you guys are able/willing to help me with some of them…

 

For context: I am trying to model a species’ seasonal habitat suitability. We have 4 years of hourly GPS data on several animals, split up into 4 seasons per year. I thinned the data to 3-hourly positions. It’s a rather small study area (ca. 7000 km2), and we are not aiming to predict to different areas or to different climate change scenarios (for now) – we just want to look at differences in habitat suitability between seasons and years. From the GPS tracks we can see that the animals are roaming the entire study area. I did a preliminary comparison of MaxEnt, GLM and BRT, and MaxEnt performed best. (Seeing as we want to compare seasons/years, we don’t want to also compare different algorithms…)

 

We tested different settings of regularization multipliers (RM, 0.5 – 4.0 in increments of 0.5) and all possible feature class combinations for each of the 16 seasonal models using the ENMeval package. We applied the block partitioning method in ENMeval for spatially independent CV.

 

Now we were thinking of running multiple replicates of all seasonal models with either the best settings according to AUC or AIC to get standard errors/confidence intervals for variable importance and the response curves. But I just read Jamie’s answer in this thread and if I understand him correctly, running replicates does not make sense if you sample enough background points?

 

To test the optimal number of background points in terms of model performance/predictive ability (but keep computation time low), we ran one seasonal model using 1-10 times the number of presence occurrences as background points, as well as a total of 100 000 background localities. Map predictions remained the same and AUC values started to stabilize from 5 times the occurrence points on, hence we generated 5 times the amount of occurrence points as background localities for each seasonal model. Would you consider this sufficient or is it necessary to use the full representation of environments available by including all pixels within the study area, as recommended in this new paper? And either way: does running replicates make sense or not? Even if the se/CI would be very small, that might still be nice to show, no? 

 

However, I am also wondering if running replicates is even implemented in ENMeval? I can’t find it in the package documentation… if not: is there any other package where using the spatial block partitioning for CV is implemented in the same way and replicates are possible? 

 

In the end, I would like to extract the response curves for each variable for each seasonal model and customize the plot so that all seasonal response curves (potentially with CIs) for one variable are in one plot, not plotted separately. Is there a smart way of doing this in ENMeval?

 

And is there an efficient way of looking at model residuals in ENMeval?

 

Sorry for that many questions – I appreciate any help and input!  

 

Thank you so much!

Larissa



Jamie M. Kass

unread,
Oct 9, 2018, 8:47:36 AM10/9/18
to Maxent
Okay, there are a lot of questions here, and I’ll try to answer them in turn.

1. How you choose the optimal model from a set of candidate models is important. Using just test AUC is usually not recommended, because of the multiple issues with interpreting it for presence-only SDMs (see Lobo 2008). It’s best to examine a couple together, including some measures of overfitting, like AUCdiff or omission rate. Relying on just AIC is how many modelers engage in model selection in other fields, but the model chosen may not perform well on withheld data. It’s complicated.
2. Replicates are useful when doing random cross validation, because there is variation in the points used for evaluation with each model run, and generating multiple replicates can help you either quantify the variation or do model averaging, etc. However, when the blocking method is not random, the evaluations will always be on the same points. As Maxent is a deterministic algorithm (always converges to same answer with same inputs), the only other random element in the model is the background selection. If you sampled the background well, there will be little variation between runs, hence replicates may be unnecessary.
3. To make sure you have a representative background sample, you can examine the variation of the environmental signal for multiple random background draws (e.g. via histogram or PCA) and see if it changes much each time. If there are extreme and unique environmental values missing from the model, your response curves may be truncated (Guevara et al. 2018).
4. You can get the model object out like this, assuming “e” is the ENMevaluation object and you want the first model in the results table: e@models[[1]]. Plug that into dismo::response() for response curves. If you do x <- response(...), x will be a data frame with the predicted values used to make the plot, and you can customize your own plot that way.
5. The concept of model residuals in Maxent output is weird. Because we don’t know the probability of presence, as the output is more like “suitability”, we can’t get “real” residuals. Unless someone else thinks differently? I’ve seen papers that take 1 - suitability as the model residuals, but it’s not clear to me how statistically sound that is (as suitability = 1 does not mean predicted presence).

Hope this helps.

Jamie Kass
PhD Candidate
City College of NY

ltbe...@gmail.com

unread,
Oct 14, 2018, 5:14:10 PM10/14/18
to Maxent

Thanks a lot for taking the time to answer my questions Jamie!!! 


Colin Goodman

unread,
Jun 2, 2020, 12:00:14 PM6/2/20
to Maxent
Hi Jamie,

I am attempting to generate better looking response curves through a model generated in ENMeval. To that end, I have tried your solution below (i.e. using the dismo::response() command). However, when I attempt to assign this to a variable as suggested, it simply says "NULL" when I recall said variable. Is there anything that I am missing?

x <- dismo::response(model_NR_only_check20_added@models[[3]])
x
NULL

Jamie M. Kass

unread,
Jun 5, 2020, 12:21:05 PM6/5/20
to max...@googlegroups.com
Colin,

Not sure -- maybe a newer version of dismo removed this capability? Hasn't been updated recently so that's strange. Two options: get the code off Github and customize it, or simply make your own ranges and use predict() to get the curves (the values of other variables should be at their means).

Jamie Kass
Postdoctoral Scholar
Okinawa Institute of Science and Technology Graduate School

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/8-S9RXhqXto/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/maxent/80bbe6a3-a50c-45ec-ac97-f9e9855dd912%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages