**ENMeval v 0.2.0 now on CRAN - Featuring Parallel Processing!!!**

825 views
Skip to first unread message

Bob Muscarella

unread,
Sep 11, 2015, 12:34:51 PM9/11/15
to Maxent
A new version (v. 0.2.0) of the ENMeval R package is now available on CRAN (https://cran.r-project.org/web/packages/ENMeval/index.html). 

The main change (led by Jamie Kass) is the implementation of a parallel processing option that can significantly speed up run times for big jobs. See the updated documentation for ENMevaluate for more details. Additional significant changes include:
  • Added a "models" slot in ENMevaluation object class to hold Maxent model objects. This allows the user to access the lambda values and original results table generated by Maxent, and use the dismo::predict() function to create logistic predictions and project the model to new areas and/or time periods.
  • Fixed a bug that allowed only a single categorical variable; now multiple categorical variables work.
  • Added an argument in the ENMevaluate function to turn off raster prediction generation to save time (default is rasterPreds=TRUE).
As always, please let Bob and Jamie know if you encounter any bugs or have particular requests for future functionality (see emails in the help documentation).

Madeline Steele

unread,
Oct 9, 2015, 4:53:50 PM10/9/15
to Maxent
Excellent improvements, thank you Jamie!

Is there any chance you could post a few lines of example code that demonstrate how to go from an enmeval_results@models object to a logistic prediction map using dismo?

Madeline Steele

unread,
Oct 9, 2015, 5:25:48 PM10/9/15
to Maxent
Hello again Bob and Jamie, 

I thought I should let you know that I've gotten the following error a few times now:

...
Of 8 total cores using 8
Running in parallel...
Calculating niche overlap
Error in txtProgressBar(0, nlayers(predictive.maps) - 1, style = 3) : 
  must have 'max' > 'min'

This only seems to happen when I use rasterPreds=FALSE.  If it's set to TRUE, there is no issue and it runs fine.

Thanks,

Madeline
 

On Friday, September 11, 2015 at 9:34:51 AM UTC-7, Bob Muscarella wrote:

Jamie M. Kass

unread,
Oct 11, 2015, 3:01:00 AM10/11/15
to Maxent
Madeline,
I'm not sure what version you're using, because I can't find that line anywhere. The line should read:
line 110 | pb <- txtProgressBar(0, length(maxent.args), style = 3)
Did you modify the script? Not really sure where that line came from.

-Jamie

Jamie M. Kass

unread,
Oct 11, 2015, 3:13:24 AM10/11/15
to Maxent
As for a demonstration of how to make a new prediction via a model object from the ENMeval output, here's a simple one:

eval.out <- ENMevaluate(occs, preds, bg.coords=backg.pts, RMvalues=seq(0.5, 5, 0.5),
                 fc
=c('L', 'H', 'LQ', 'LQH', 'LQHP'), method='block', parallel=TRUE)
# here's the model object for the first model
eval
.out@models[[1]]
# here I pull out the lambda file info

eval.out@models[[1]]@lambdas
# here I pull out the results info
eval.out@models[[1]]@results
# here I make a new prediction to a different location, assuming the predictors in preds.newRegion are the same as preds

new.prediction <- predict(eval.out@models[[1]], preds.newRegion)
# here I make a logistic prediction using the same preds.newRegion
new.prediction.log <- predict(eval.out@models[[1]], preds.newRegion, args=c("outputformat=logistic"))

Madeline Steele

unread,
Oct 13, 2015, 5:21:56 PM10/13/15
to Maxent
Hi Jamie,

Thanks very much for the code sample! I'm still pretty new to R and find this very helpful.

Here are the details on the ENMeval version I'm using:

Package: ENMeval
Type: Package
Version: 0.2.0
Date: 2015-09-11

I grabbed the tar.gz package from CRAN. I did not change the ENMeval code. For what it's worth, I got the same error on both my home Mac and my work PC.

Thanks again for your work on this project, and please let me know if further details would be helpful.

-Madeline

ndimhypervol

unread,
Oct 13, 2015, 7:48:40 PM10/13/15
to max...@googlegroups.com
Hm, could you possibly email me the code you're using? It would help if I could emulate the issue.

Jamie Kass
PhD Student, CCNY
--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/gIGrSBqTe1g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

Bob Muscarella

unread,
Oct 22, 2015, 3:41:38 AM10/22/15
to Maxent
Hi Madelon -
This error arises because the function used to calculate niche overlap statistics needs the raster predictions (those stats are based on niche overlap in geographic space).  So, that's why you get an error when trying to use overlap=T and rasterPreds=F.  I've added a warning message about this to the development version.  


Hope that helps clear it up and thanks for pointing this out.  All the best,
Bob

Madeline Steele

unread,
Oct 22, 2015, 12:36:06 PM10/22/15
to Maxent
Hi Bob,

Ah yes, that makes perfect sense! Thank you for clearing that up for me.

Also, could you please confirm that the rasterPreds parameter must be set to TRUE if the user wishes to calculate AICc?

Thanks again,

Madeline

Bob Muscarella

unread,
Oct 23, 2015, 2:28:11 AM10/23/15
to max...@googlegroups.com
Hi Madeline -
Yes, it is true that the raster predictions must be generated to
compute AICc values.
Bob

On Thu, Oct 22, 2015 at 6:36 PM, Madeline Steele

Madeline Steele

unread,
Nov 5, 2015, 2:25:42 PM11/5/15
to Maxent
Hi Bob,

Thanks for clearing that up, and thanks again for your work on ENMeval.

For the record, I think it would be useful to be able to use a few more of the MaxEnt parameters in ENMeval. Specifically, I'm interested in the Extrapolate parameter. I may try to make my own GitHub branch with this change, but that would likely be slow at my level .

I also think it would be nice to be able to view the HTML files; I'm most interested in  visually inspecting the variable response curves for complexity. 

I just wanted to put that out there so you'd know these enhancements have at least one user vote if you're ever considering either of them.

Best regards,

Madeline

Bob Muscarella

unread,
Nov 6, 2015, 3:50:03 AM11/6/15
to max...@googlegroups.com
Hi Madeline -

Thanks for the input.  We are working on making it possible to pass most of the Maxent arguments (e.g., extrapolate, prevalence, output format, ... ) through ENMeval.  

Saving the HTML files presents a couple challenges... first, it means that the user has to store ALL of the output from the maxent run on their machine (currently ENMeval deletes those files as it goes along).  Depending on the analysis, this could quickly eat up a lot of memory and be a file management nightmare...  There's probably a creative alternative, though, like perhaps using the 'writeplotdata' argument and then creating custom plots of the response curves... or maybe saving the HTML file as a PDF or something before deleting the intermediate files...  

Anyways, thanks for the thoughts and feel free to fork ENMeval on Github to try and make it work!  We'll try and get a lot of this implemented in the next version...

Bob

Madeline Steele

unread,
Nov 6, 2015, 2:55:24 PM11/6/15
to Maxent
Thanks Bob!

One more quick question for you. Are the following interpretation of field names from the results table (enmeval_results@results) correct?

full.AUC - Average of Training and Test AUC
Mean.AUC - Average Test AUC
Var.AUC - Variance of Test AUC

Just wanted to be sure!

Thanks again,

Madeline

Bob Muscarella

unread,
Nov 6, 2015, 3:52:35 PM11/6/15
to max...@googlegroups.com
Hi Madeline -

Almost...   First, there is a small bug in ENMeval v 0.2.0 (currently on CRAN) whereby columns in the @results slot get mislabeled if you choose bin.output=TRUE.  This popped up when adding the option for parallel computing and was not an issue in previous versions.  We have since fixed this (in version 0.2.1, currently on GitHub) and these changes should get to CRAN soon.

Besides that, the 'full.AUC' column refers to the AUC value based on the model built with the full dataset (i.e. all presences, not partitioned, taken from the same model as used to calculate AICc).

Your interpretations of Mean.AUC and Var.AUC are correct.  Do note, however, that Var.AUC is calculated with a correction for the non-independence of k-fold iterations (not plain old variance).  See ?corrected.var for more details...

Let me know if you have other questions and I hope that helps,

Bob




Bob Muscarella

unread,
Nov 19, 2015, 2:05:35 AM11/19/15
to Maxent
Hi again Madeline,

Another quick update: Jamie pointed out that you can use also the 'response' function of dismo to visualize the response curves of a maxent model object in R.  For instance, plot the response curves for the first model in your ENMevaluation object 'enmeval_results' by doing this:

> response(enmeval_results@models[[1]])

For options on the response function, check it's documentation.

> ?response

Bob

Jamie M. Kass

unread,
Nov 23, 2015, 10:31:02 AM11/23/15
to Maxent
Madeline,

Sorry about the lapse, but soon after you sent me the code I saw Bob responded with an answer. Did you still need any more help with this?

-Jamie

Madeline Steele

unread,
Nov 25, 2015, 3:51:16 PM11/25/15
to Maxent
Hi Bob and Jamie,

Thanks very much for the tip about using the dismo response function. That will do the trick, and I have no further questions at this point.

Thanks again, and have a great holiday!

Madeline

Madeline Steele

unread,
Dec 15, 2015, 9:18:03 PM12/15/15
to Maxent
Hello, Bob and Jamie, 

Quick question for you, I hope. Before I started using ENMeval, I was running MaxEnt with dismo and generating five replicates of every model. When I used the MaxEnt output object to create a map using the predict function, it returned a raster stack with 5 layers corresponding to the five replicates. I could then take the mean or the standard deviation of this raster stack. 

When using ENMeval with kfolds set to 5, the output of the predict function is a single layer. Is this already a mean of the five iterations, or is the best of the five? Am I missing something?

Thanks very much,

Madeline

Bob Muscarella

unread,
Dec 19, 2015, 9:47:37 AM12/19/15
to max...@googlegroups.com
Hi Madeline -

First off, I had to go back to the Maxent help file to understand what the replicates argument really does.  You probably know this but maybe it will be helpful for others to include...

The help file says, "the number of replicates tells Maxent how many of replicate runs to do when cross-validating, bootstrapping or doing sampling with replacement runs."  Then you should also be aware that the argument "replicatetype" is used if the number of replicates > 1.  Maxent does multiple ("replicate") runs of the type you specify ("crossvalidate: samples divided into replicates folds; each fold in turn used for test data.  Bootstrap: replicate sample sets chosen by sampling with replacement.  Subsample: replicate sample sets chosen by removing random test percentage without replacement to be used for evaluation.").  Note that the default for "replicatetype" is crossvalidate.  So, when you get the results from dismo using replicates > 1 and using the 'crossvalidate' method, it seems that you get the results of each of the k-folds.

I think that the random k-fold partitioning method in ENMeval is equivalent but the results are summarized across k folds for the models run for each particular combination of settings (regularization multiplier and feature class options).  For example, the "Mean.AUC" and "Var.AUC" column in the results table provides the mean and variance of AUC value across the k-folds.  Note it will also provide the evaluation stats for each bin if you set the "bin.output" to TRUE.  ENMeval does not save the training models built with each data partition so generating predictions from these is not (immediately) possible.  However, ENMeval does give you the bins that each point got assigned to (see ENMev...@occ.grp and ENMev...@bg.grp) so you could recreate these models if you wanted.

Let me know if that isn't clear or if you think I got something wrong.  All the best,

Bob

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/gIGrSBqTe1g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.

Madeline Steele

unread,
Feb 7, 2016, 2:59:56 PM2/7/16
to Maxent
Hello again, Bob and Jamie,

My question this time relates to model selection. I know that there is a lack of consensus about best practices for SDM model selection, and that in your paper, you suggest that using AICc is a good option. However, I have a very large study area with higher resolution rasters, and because of time and space constraints I am running ENMeval with rasterPreds=FALSE. Thus, I don't have access to AICc scores. I've been experimenting with alternative methods of model selection using the other metrics that ENMeval provides, and the code and comments for my current favored approach are below. I was hoping that if you have time, you might let me know what you think of this method, especially if you have any concerns with it. I would of course prefer to use a well-established, citable method if you know of one that works given my lack of the AICc metric.


      ranked.results <- enmeval.results@results  
      ranked.results <- add_rownames(ranked.results, "row.names")  #add row.names before ranking

      ## Rank the models. Start by filtering out any models with an Mean.AUC below 0.7
      ## Then, penalize the models that have high overfitting by
      ## making a binary column that shows if a model is in the worst quartile for any 
      ## of the three overfitting metrics. Sort by this column, then by Mean.AUC (desc.)
      ## The top models are the ones with best Mean.AUC that do not have excessive overfitting
      ranked.results <- enmeval.results@results %>% 
         select(-aicc) %>%  # remove the empty aicc column
         filter(Mean.AUC >= min.acceptable.AUC) %>%
         mutate(q.Mean.AUC.DIFF = ntile(Mean.AUC.DIFF, 4)) %>%
         mutate(q.Mean.OR10 = ntile(Mean.OR10, 4)) %>%
         mutate(q.Mean.ORmin = ntile(Mean.ORmin, 4)) %>%
         mutate(bottom_quartile_overfitting = 
                   ifelse((q.Mean.AUC.DIFF == 4 | q.Mean.OR10 == 4 | q.Mean.ORmin == 4 ), 1, 0)) %>%
         arrange(bottom_quartile_overfitting, -Mean.AUC)


Note that this code uses dplyr sytnax. It filters out models with AUC below a threshold (0.7 in my case) and then makes a new column that has a one if the model is in the bottom quartile for either Mean.OR10, Mean.ORMin, or Mean.AUC.Diff. It then sorts these to the bottom of the table, and does a secondary sort by Mean.AUC. The hope is that this helps me pick a set of models with high goodness of fit that are not overfit. I also thought about using nparam to help eliminate overfit models, but it seems redundant with this approach.

Once I have this sorted list, I save out the logarithmic results for the top three, as well as their response curves and lambdas. The ecologists I'm working with and I can then see if one of them is clearly better based on their expert knowledge.

Thanks again for all your help and for developing and sharing ENMeval,

Madeline

Jamie M. Kass

unread,
Feb 8, 2016, 9:52:00 PM2/8/16
to Maxent
Hey Madeline,

I think the general method you're using of filtering out poorly performing models in a sequential way is a great way to limit yourself to a couple of good candidates, whereupon you should examine them as an ecologist and see which one makes the most sense via expert opinion. Too many people, if they do comprehensive evaluation at all, either blindly pick the model with the highest AUC or lowest AIC. As many papers have shown (Lobo et al. 2007, Jimenez-Valverde et al. 2012, Yackulic et al. 2013, etc.) judging presence-only models by AUC alone is not a good idea for a number of reasons -- most importantly that this metric was built for evaluating presence-absence models, and background points are not absence points. That said, using test AUC along with other metrics like omission rate can be effective (see Shcheglovitova & Anderson 2013). Some researchers also like the True Skill Statistic (TSS), but like AUC, this was originally developed for presence-absence models and so has some short-comings. The most important take-away is that, after narrowing down your choices, all candidate models should be inspected for ecological validity, and your decision should be backed up by informed ecological knowledge of the species and its range.

Jamie Kass
PhD Student
City College, NYC
Reply all
Reply to author
Forward
0 new messages