Permutation Importance & Removing Variables

408 views
Skip to first unread message

Pluvials

unread,
Feb 5, 2019, 10:26:23 AM2/5/19
to Maxent
Hello everyone,

To decipher which variables are most influential to the model using a large data set (17,628 occurrence points) - is it more appropriate to use percent contribution or permutation importance? After reading several tutorials (Phillps, 2006) and seeing the NIMBios tutorial - it sounded that permutation importance was more reliable for larger data sets yet in the literature, I have found that most us percent contribution to select most influential variables. Does anyone have advice? My goal is to select variables to remove to improve model performance. 

Also, this might be a silly question but also one that I have failed to find an answer to...
When you do remove variables and you are projecting to future climate scenarios - should you remove/reduce your variables in both the current climate and the projection data? 

Thanks in advance for your consideration!

Adam Smith

unread,
Feb 6, 2019, 11:38:57 AM2/6/19
to Maxent
Hi Pluvials,

Your first question is a research question.  That is, you should simply try it and see what happens.  I just submitted a manuscript on variable importance in SDMs and can say that there are no systematic assessments of how well SDMs assess variable importance and how certain conditions (prevalence, correlation among predictors, etc.) affect their ability to identify the most important variables.  So you will have to try and see!  Importantly, the measures of importance you mention are relative (because they're standardized to 100%), so they won't tell you in an absolute sense how important the variables are to the species, just how important they are relative to themselves

Regarding question #2, if you do not give a variable to the model, then it won't matter if you include it in future rasters or not since the model will never use it.

Best,
Adam

Adam B. Smith
Assistant Scientist in Global Change
Missouri Botanical Garden
St Louis, Missouri USA

Gafarou AGOUNDE

unread,
Feb 6, 2019, 12:21:56 PM2/6/19
to max...@googlegroups.com
Hi! As for me...variables selection is not easy but with a little effort we can reach a good result. For exemple if you're runnig your data with Maxent software it selects you some variables basing on AUC, contribution, Jackknife test....l can say by default.
Now it's not too right to use all variables selected by software as important. You should eliminate those are not match on study area. For exemple if you're working on west african régions, colder or moistry variables are not matched for those regions. Also, you should take acount the ecology of the species to eliminate more or add others. For exemple if you know at first that the distribution of your species is influenced by soil but results from software shows the opposit then you should take soil variable among variables retained.
For more informations you can read on turtorial bellow or reach me imbox.
All the best...


Faculty of Agronomic Sciences (FSA), University of Abomey-Calavi, Benin republic. Phone number:(+229)96692934

Le mer. 6 févr. 2019 17:46, Adam Smith <ad...@earthskysea.org> a écrit :
Boxbe This message is eligible for Automatic Cleanup! (ad...@earthskysea.org) Add cleanup rule | More info
--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
Maxent_tutorial2017.pdf

Gafarou AGOUNDE

unread,
Feb 6, 2019, 12:30:27 PM2/6/19
to max...@googlegroups.com
I forgot something...
I think you should use the same variables you retained from current data to build future models. For exemple if you retained bio1, bii2 bio3, and bio12 as important for current model you're going to use same variables selected on scénario 2.6 or 4.5 for exemple.
Best....


Faculty of Agronomic Sciences (FSA), University of Abomey-Calavi, Benin republic. Phone number:(+229)96692934

Sanjo Jose

unread,
Feb 7, 2019, 5:03:44 AM2/7/19
to max...@googlegroups.com
Hi, Pluvials
What I did for my research was to run a maxent model to know the percentage and permutation importance. After that, I removed the correlated variables based on the correlation coefficient, percentage contribution and permutation importance. Correlated variables have to be removed having a low contribution and which having no ecological importance with the species you are working on. Then with the remaining variables, I ran the model for current and future.
Regards, 

Sanjo Jose V,
PhD Scholar,
Climate Change and Forest Influence,
Forest Research Institute,
Dehradun - 248006


Message has been deleted
Message has been deleted

Pluvials

unread,
Feb 7, 2019, 10:27:29 AM2/7/19
to Maxent
Hello Adam, Gafarou and Sanjo,

Thank you for each of your responses as they all have helped clear some foggy areas of my understanding of maxent outputs. I continued my runs, reducing variables based on permutation importance and collinearity while incorporating what I have learned about the mountain pine beetle and have more confidence in my ability to interpret the results. Thank you for your time, input and experience!
All the best,
Nathalie

Husam El Alqamy

unread,
Feb 7, 2019, 12:39:53 PM2/7/19
to max...@googlegroups.com
Just a comment on the workflow suggested by Sanjo. It makes no sense putting the correlation and ecological importance after the percentage and permutations. These should be the first criteria that drive the research. SDM is ecological research before being a computational process. the ecological aspects of the species in hand are the main drive for the investigation, not a result that follows a computational investigation. If the researcher caves into the technical aspect of the modeling process forgetting about the main reason why modeling is adapted the results become ecologically invalid and just a sound mathematical product that has nothing to do with the species under study.
Regards


Hossameldin ELALKAMY, MPhill., PhD., RPBio.

GIS Analyst 

BC Timber Sales|  Prince George

Ministry of Forests, Lands and Natural Resource Operations

P. 250.614.7521 C. 778.896.3229|2000 Ospika Blvd.

Prince George, BC,V8W 9M1

 | Profile




Reply all
Reply to author
Forward
0 new messages