Choice of "current" bioclimate variables

232 views
Skip to first unread message

Martin Lowry

unread,
Jan 15, 2025, 5:51:33 PMJan 15
to Maxent
Hi All,

There are so many sources of bioclimatic variables now available that it is difficult to know which is most appropriate for a particular problem.
For the particular Andean plant species I'm working with my occurrences date from 1996 - 2024 and I have three sets off variables, the "original" World Climate dataset and two I made myself from CHELSA annual climatologies. 

World Climate: 1970 - 2000,  observations from global weather stations, extrapolated and downscaled
CHELSA V1 1979 - 2013, simulated climate, downscaled
CHELSA V2 1995 - 2015, simulated climate, downscaled

Each dataset produces a highly accurate (perhaps overfitted) suitability model for my species but when projected onto the other two datasets the results are very poor, they barely overlap. So what is going on here?

Given the lifespan of my species (~30yr) I've considered that current distributions are a consequence of past climatic conditions which would suggest that World Climate data is still appropriate. However the source data for that dataset is actually quite sparsely and unevenly distributed, particularly in the Andes.

I was hoping to assess the impact of past and future climate change on this species but until I'm confident my current model is not spurious it seems foolish to proceed.

I'd appreciate reading other peoples thoughts on this subject.

Cheers

Bede-Fazekas Ákos

unread,
Jan 16, 2025, 2:51:54 AMJan 16
to max...@googlegroups.com
Hi Martin,

There are several alternative ways of calculating bioclimatic variables. Have you tried calculating BCVs from the raw monthly Tmean, Tmin, Tmax and Prec obtained from WorldClim? (I guess "World Climate" means the WolrdClim database...) The BCVs downloaded from WorldClim and the BCVs you calculated from the monthly climate data should not differ.
Also you should check whether the range and distribution of the BCVs originated from the three databases mostly overlap. So, before starting SDM training and prediction, you should check the predictors. If there are large differences, then it might be caused by
- unit of measurement
- rescaling
- calculation method.
If there are no differences, then the predictors are OK and your trained model might be overfit. Maybe you should split the environmental dataset to a training and an evaluation part using spatial blocks. This will result in seemingly worse model (evaluation AUC/maxTSS/Boyce will be lower) but the model will be less overfit and more transferable, i.e., more suitable for spatial or temporal extrapolation, such as future prediction.

Hope this helps,
Ákos
_____________
Ákos Bede-Fazekas
Centre for Ecological Research, Hungary
--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/maxent/8192993d-01b4-4c7d-b330-06f1f5e9b3d5n%40googlegroups.com.

Martin Lowry

unread,
Jan 17, 2025, 8:21:56 AMJan 17
to Maxent
Hello Akos,

Thank you for your response and suggestions.

I've now regenerated my 3 sets of current BCVs from the appropriate multi-year climate normals using the same calculation procedure for each and have confirmed they are equivalent in spatial extent, unit of measurement and scaling. They give the same result: all three BCV sets produce well fitting models with test AUC = train AUC > 0.98 and barely overlapping suitability regions when projected. I've tried regularisation multiplier values between 3 and 20 with little effect, AUCs remain > 0.95. I have only 63 occurrence records and I'm currently using 5-fold cross-validation as my blocking scheme.

It seems the models are grossly overfitted but using Maxent.jar there seems little else I can do.  Move to R? :)

Regards,
Martin

Bede-Fazekas Ákos

unread,
Jan 17, 2025, 8:42:10 AMJan 17
to max...@googlegroups.com
Hello Martin,

In R, you can use package "blockCV". I'm not familiar with the standalone maxent.jar, but I guess, spatial blocks are not available in that software. So I suggest moving to R
Anyway, the extremely large training AUC and hardly interpretable predictions suggest me one more question: have you filtered the 19 bioclimatic variables based on the correlation/multicollinearity of the variable set? Or did you use all bioclimatic variables? If this latter, then it is sure that the SDM will be overfit...

Have a nice weekend,

Ákos
_____________
Ákos Bede-Fazekas
Centre for Ecological Research, Hungary

Martin Lowry

unread,
Jan 20, 2025, 11:00:22 AMJan 20
to Maxent
Hello,

My original experiments were conducted using all 19 BCVs but I have since re-run them using only 6 and still have the same outcome and only a tiny decrease in AUC.

Okay, I've now migrated to R and am using Adam Smith's enmSdmX package downloaded from github. I produced 4 data-folds for presence and background using geoFold() and geofoldContrast() respectively and then trained MaxEnt with trainByCrossValid() with the default range of master regularization multipliers to give me 4 "best" models, one for each fold. AUC for each is now more reasonable, 0.85-0.95. So is there a way to combine these models? Or perhaps the only way is to predict with each separately, like MaxEnt, and average the results?

Thanks for your help.

Martin

Bede-Fazekas Ákos

unread,
Jan 20, 2025, 11:44:01 AMJan 20
to max...@googlegroups.com
Hello Martin,
I think that the average of the ensemble is the only reasonably way of combining these models, but I'm not the expert of this. Others in this list might have better suggestion.
Have a nice week,

Ákos
_____________
Ákos Bede-Fazekas
Centre for Ecological Research, Hungary

Reply all
Reply to author
Forward
0 new messages