Hello again, Bob and Jamie,
My question this time relates to model selection. I know that there is a lack of consensus about best practices for SDM model selection, and that in your paper, you suggest that using AICc is a good option. However, I have a very large study area with higher resolution rasters, and because of time and space constraints I am running ENMeval with rasterPreds=FALSE. Thus, I don't have access to AICc scores. I've been experimenting with alternative methods of model selection using the other metrics that ENMeval provides, and the code and comments for my current favored approach are below. I was hoping that if you have time, you might let me know what you think of this method, especially if you have any concerns with it. I would of course prefer to use a well-established, citable method if you know of one that works given my lack of the AICc metric.
ranked.results <- enmeval.results@results
ranked.results <- add_rownames(ranked.results, "row.names") #add row.names before ranking
## Rank the models. Start by filtering out any models with an Mean.AUC below 0.7
## Then, penalize the models that have high overfitting by
## making a binary column that shows if a model is in the worst quartile for any
## of the three overfitting metrics. Sort by this column, then by Mean.AUC (desc.)
## The top models are the ones with best Mean.AUC that do not have excessive overfitting
ranked.results <- enmeval.results@results %>%
select(-aicc) %>% # remove the empty aicc column
filter(Mean.AUC >= min.acceptable.AUC) %>%
mutate(q.Mean.AUC.DIFF = ntile(Mean.AUC.DIFF, 4)) %>%
mutate(q.Mean.OR10 = ntile(Mean.OR10, 4)) %>%
mutate(q.Mean.ORmin = ntile(Mean.ORmin, 4)) %>%
mutate(bottom_quartile_overfitting =
ifelse((q.Mean.AUC.DIFF == 4 | q.Mean.OR10 == 4 | q.Mean.ORmin == 4 ), 1, 0)) %>%
arrange(bottom_quartile_overfitting, -Mean.AUC)
Note that this code uses dplyr sytnax. It filters out models with AUC below a threshold (0.7 in my case) and then makes a new column that has a one if the model is in the bottom quartile for either Mean.OR10, Mean.ORMin, or Mean.AUC.Diff. It then sorts these to the bottom of the table, and does a secondary sort by Mean.AUC. The hope is that this helps me pick a set of models with high goodness of fit that are not overfit. I also thought about using nparam to help eliminate overfit models, but it seems redundant with this approach.
Once I have this sorted list, I save out the logarithmic results for the top three, as well as their response curves and lambdas. The ecologists I'm working with and I can then see if one of them is clearly better based on their expert knowledge.
Thanks again for all your help and for developing and sharing ENMeval,
Madeline