Reducing number of variables

1,388 views
Skip to first unread message

Derek Corcoran Barrios

unread,
Dec 18, 2012, 12:17:58 PM12/18/12
to max...@googlegroups.com
Hello Everyone:

I'm modelling the niche of an invasive specie in patagonia, and I'm working with worlclim database. I want to reduce the number of variables, trying to see if they are correlated, since I've read that if two variables are highly correlated I should use just one of them. How should I do that? should I do regression analysis or correlations? is it possible to do this analysis with diva-gis or arcgis? and how high have to be my correlation indexes for me to use just one of those variables?

thanks

Derek

Lukas Rinnhofer

unread,
Dec 18, 2012, 2:08:33 PM12/18/12
to max...@googlegroups.com
Hey Derek,
please keep in mind that even if two variables are correlating, both might be important for a species. Always include knowledge about a species ecology in the variable selection process.
For analysing if two variables are correlated you could use ENMTools by Dan Warren (http://enmtools.blogspot.co.at/) It has a function included to compare two or more worldclim files. But there is other ways too.
Normally you can use correlation above 0.75 But some studies use >0.8 or >0.85

We tried to explain our steps of variable selection in our publication: 
Iterative species distribution modelling and ground validation in endemism research: an Alpine jumping bristletail example
http://link.springer.com/article/10.1007/s10531-012-0341-z

But there is always other ways and variable selection always depends on the species, on the study area, the study itself and other aspects.

Hope this helps.

Regards,

Lukas
PS: let me know if you can't download the paper and i'll send you a copy.

2012/12/18 Derek Corcoran Barrios <gaiterodel...@gmail.com>
> --
> You received this message because you are subscribed to the Google Groups "Maxent" group.
> To post to this group, send email to max...@googlegroups.com.
> To unsubscribe from this group, send email to maxent+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/maxent?hl=en.

Derek Corcoran Barrios

unread,
Dec 24, 2012, 9:51:39 AM12/24/12
to max...@googlegroups.com
Thanks a lot for your help lukas, I couldn't answer before since I was in the field. I could Download your paper, and I'll read it carefully

Derek

2012/12/18 Lukas Rinnhofer <lukas.r...@gmail.com>

David Galbraith

unread,
Feb 7, 2013, 5:56:46 PM2/7/13
to max...@googlegroups.com
Bisbrian,

Since Maxent uses several 'features' when constructing its models, the effective number of explanatory variables (k) in its final model can be thought of as multiplied by the number of variables you supply the model (e.g. Bioclim variables). So basically, I think that if you construct a model with more explanatory variables than presence observations, it is possible to entirely overfit the model and come up with unreliable results.

Dave

On Thu, Feb 7, 2013 at 12:04 PM, Bis Nava <ban...@gmail.com> wrote:
Hello Lukas,
My name is Bisbrian Nava. I'm actually also working with SDMs and BIOCLIM variables and I have a few doubts about reducing the number of variables. Someone told me that it was most desirable not to use a greater number of variables than records of presence to avoid statistical redundance, would you be so kind as to help me in this matter?
Thanks in advance.

Bisbrian
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.

Natalia Trujillo Arias

unread,
Feb 7, 2013, 8:12:42 PM2/7/13
to max...@googlegroups.com
hello
I need this paper, someone can help me, please

http://link.springer.com/article/10.1007/s10531-012-0341-z

thanks!

2013/2/7 Bis Nava <ban...@gmail.com>
Hello Lukas,
My name is Bisbrian Nava. I'm actually also working with SDMs and BIOCLIM variables and I have a few doubts about reducing the number of variables. Someone told me that it was most desirable not to use a greater number of variables than records of presence to avoid statistical redundance, would you be so kind as to help me in this matter?
Thanks in advance.

Bisbrian

El martes, 18 de diciembre de 2012 13:08:33 UTC-6, Lukas escribió:
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.



--
Natalia Trujillo Arias
Bióloga-Universidad del Quindio, Colombia
Becaria Doctoral,
Division de Ornitología
Museo Argentino de Ciencias Naturales,
Buenos Aires, Argentina

Teddy Siles

unread,
Feb 8, 2013, 8:14:54 AM2/8/13
to max...@googlegroups.com
Hey Natalia here is the paper...

good luck..

Kind regards

T.

Machi Siles


Iterative_species_distribution_modelling_and_ground_validation_in_endemism_research_an_Alpine_jumping_bristletail_example_Rinnhofer_2012.pdf

Natalia Trujillo Arias

unread,
Feb 8, 2013, 9:00:25 AM2/8/13
to max...@googlegroups.com
Hii
Thanks!!

A hug!

2013/2/8 Teddy Siles <tsi...@gmail.com>

Jordan Golubov

unread,
Feb 8, 2013, 9:38:21 AM2/8/13
to max...@googlegroups.com
There it goes,
Dr. Jordan Golubov
Lab. Ecologia, Sistematica y FisiologiaVegetal
Departamento El Hombre y Su Ambiente
Universidad Autonoma Metropolitana Xochimilco
Calz. del Hueso 1100, Col. Villa Quietud, Coyoacán
04960, México D. F. México


"Change happens by listening and then starting a dialogue with the
people who are doing something you don't believe is right".
>Jane Goodall
art%253A10.1007%252Fs10531-012-0341-z.pdf

Alaa Eldeen

unread,
Feb 8, 2013, 1:48:13 PM2/8/13
to max...@googlegroups.com
Hi Derek,
In addition to ENMtools as suggested by Lukas, I would also suggest regression analysis as an alternative tools to check the correlation for your variables.  Variance Inflation Factor (VIF) is doing well to check the correlation, where need to get rid of all variables that have a VIF value larger than 10.  You can use the linear regression in SPSS 18 by requesting the VIF values for all continuous variables. The variables with highest VIF were removed one by one until all the remaining variables had a VIF value less than 10.
Take into your account the ecology of your species when you remove any variable.

best of luck
Alaa Eldin Soultan,  B.Sc, MSc.
Wildlife Conservationist,
St Katherine Protectorate,
National parks of Egypt,
Nature Conservation Sector,
Egyptian Environmental Affairs Agency, EEAA
Tel-Fax: + 2069- 3470033
Mobil:+201225773922
Message has been deleted

Bis Nava

unread,
Feb 12, 2013, 7:20:44 PM2/12/13
to max...@googlegroups.com
Hello Dave,

Thank you for your reply.

I completely agree with you and that's exactly why I'm looking for papers where people describe this matter. My thesis advisors had also recommended me to explore deeply the whole correlation process. So if by chance you happened to have some papers to help me out regarding correlation and use of variables to avoid overfitting, I'll be very thankful if you could lend them to me.

Nati Trujillo

unread,
Feb 12, 2013, 8:14:11 PM2/12/13
to max...@googlegroups.com
Hello Everyone
I want to know which variables are correlated, so, I am trying to use ENMTools, but i have problems with .csv file, i dont know why, I have used this .csv  before for run maxent, and it works ok. someone can help me? maybe someone have a manual or paper that explain how use that, because the manual at ENMTools's webpage  doesn't explain all. 
thanks

Dan.L....@gmail.com

unread,
Feb 13, 2013, 4:34:26 AM2/13/13
to max...@googlegroups.com
At present the correlation tool in ENMTools is only designed to work with .asc raster files, not .csv files.  If you want to look at correlations between variables stored in a .csv file, it's probably easiest right now to just do it in R or Excel.

Natalia Trujillo Arias

unread,
Feb 13, 2013, 6:15:13 AM2/13/13
to max...@googlegroups.com
thanks Dan

At present the correlation tool in ENMTools is only designed to work with .asc raster files, not .csv files.  If you want to look at correlations between variables stored in a .csv file, it's probably easiest right now to just do it in R or Excel.

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nati Trujillo

unread,
Feb 28, 2013, 12:49:50 PM2/28/13
to max...@googlegroups.com
Hello erevyone

i have a question, i did the correlations between 19 variables using SPSS (Pearson coefficient) and ENMTools, with Both analysis, I could see Which variables are correlationated and for pairs with a correlation coefficient> 0.9 Variable I selected one ..... ..., but yet, i have 12 variables, i want to know, Which is the better graphic of jacknife that i will use can select the variables for MOST That Contribute to the model Given by maxent?

I would be very thankful for any help



El miércoles, 13 de febrero de 2013 08:15:13 UTC-3, Nati Trujillo escribió:
At present the correlation tool in ENMTools is only designed to work with .asc raster files, not .csv files.  If you want to look at correlations between variables stored in a .csv file, it's probably easiest right now to just do it in R or Excel.

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+unsubscribe@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Lays

unread,
Aug 26, 2018, 7:06:42 PM8/26/18
to Maxent
Hi, guys. I was searching for one solution to my problem and I found this post. Maybe someone could read my question and help me here. I'm working with two different areas (2 populations of same specie) and I need to choose the variables to use for both populations. I used vif and it gave me two different groups of variables. How I decide the variables to develop my models using maxent?

Thank you in advance,

jiang...@gmail.com

unread,
Aug 30, 2018, 6:33:09 AM8/30/18
to Maxent
Hello everyone

I want to konw what format should I enter when I use SPSS to handle the collinearity of environment variables?

Thanks.

在 2013年3月1日星期五 UTC+8上午1:49:50,Nati Trujillo写道:
Hello erevyone

i have a question, i did the correlations between 19 variables using SPSS (Pearson coefficient) and ENMTools, with Both analysis, I could see Which variables are correlationated and for pairs with a correlation coefficient> 0.9 Variable I selected one ..... ..., but yet, i have 12 variables, i want to know, Which is the better graphic of jacknife that i will use can select the variables for MOST That Contribute to the model Given by maxent?

I would be very thankful for any help



El miércoles, 13 de febrero de 2013 08:15:13 UTC-3, Nati Trujillo escribió:
At present the correlation tool in ENMTools is only designed to work with .asc raster files, not .csv files.  If you want to look at correlations between variables stored in a .csv file, it's probably easiest right now to just do it in R or Excel.

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Malcolm McCallum

unread,
Aug 30, 2018, 6:33:09 AM8/30/18
to max...@googlegroups.com
Use best subsets regression.
Then, use valence inflation factors, Mallows statistic, standard error of the regression and aic to choose variables to include.  Rin individual correlations between variables to identify multicolininiarity.  Consider interactions by multiplying those columns together to create a new predictor.in some cases the interaction but not the individual predictors will be used .  Alternatively, you can use factor analysis or principle components to weed out variables 

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

Ataollah Ebrahimi

unread,
Sep 2, 2018, 1:40:11 PM9/2/18
to max...@googlegroups.com
As far as I remember SPSS reads .csv directly but you can read .csv files (text format) into EXCEL then save it into excel data format and finally read the data in Spss from EXCEL worksheet, after that, you can do correlation analysis.  
regads 
With kind regards
Ataollah Ebrahimi
Dep. of Range and Watershed Management
Fac. of Natural Resources and Earth Science
Shahrekord University
Shahrekord Po. Box 115
Iran
Altrnative e-mail:
 Ataollah...@NRES.SKU.AC.IR
Webpage:
 http://www.sku.ac.ir/fa/faculty/scienceearth/index.htm
Tel:0098 38 32324423
Fax:0098 38 32324423



Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

Lays Viturino de Freitas

unread,
Sep 10, 2018, 4:35:17 PM9/10/18
to max...@googlegroups.com
Do you have a suggestion of what software use to run all these analyses? 

You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/_viMk-seVo4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.


--

Lays Viturino de Freitas
___________________________________________________________________
Bióloga/UFPE
Msc. em Biologia Animal - Universidade Federal de Pernambuco
Laboratório de Ecologia, Sistemática e Evolução de Aves/Ornitolab UFPE (http://ornitolab.wix.com/ufpe)
Observadores de Aves de Pernambuco

Jamie M. Kass

unread,
Sep 11, 2018, 6:59:05 AM9/11/18
to Maxent
Check out the vif_step() function in the R package usdm. It suggests variables for removal from a dataset based on the variance inflation factor, and works in a stepwise fashion.

Jamie Kass
PhD Candidate
City College of NY

Lays

unread,
Sep 11, 2018, 2:47:22 PM9/11/18
to Maxent
I used vif to run, but since I have two different areas of distribution for my populations and I need to run my niche models for both areas using the same variables, I don't know how to decide what variables to choose since I'm runing vif for each area. (I need to choose between the worldclim variables and one layer for aridity). My first idea was to run the analyses having a bigger buffer grouping both areas (each layer of variables, etc), but the areas are too far away each other and the populations are allopatrics...

Francisco Amorim

unread,
Sep 12, 2018, 5:25:38 AM9/12/18
to max...@googlegroups.com

Hi Lays,

Why not join the data on the two areas and run vif on all the data? That would give you only one set of variables based on vif for both areas. You can compare this set of variables and the sets for each area to check if it makes sense.

Cheers

-- 
Francisco Amorim

--

Lays

unread,
Sep 12, 2018, 1:39:12 PM9/12/18
to Maxent
hmmm Indeed, I tried that few days ago before ask here, but the areas are separated for over 400.000 kilometers, so I'm not sure if it was a good idea or not. Also, I used the environmental informations from the occurrence points of my species to run that (all the species and populations that I work with together).
My result when trying to do that:

#Area 1
#[1] "Bio_02"   "Bio_19"   "Bio_18"   "Wind"     "Bio_08"   "Radiação" "Bio_15"

#Area 2
#[1] "Bio_03"     "Bio_02"     "Bio_11"     "Bio_01"     "Bio_10"     "Bio_19"     "Bio_09"     "Wind"      
#[9] "Bio_08"     "Bio_07"     "Radiação" "Bio_06"     "Aridez"     "Bio_05"     "Bio_14"     "Bio_13"    
#[17] "alt_1"     

#Both
#[1] "Bio_03" "Bio_02" "Bio_11" "Bio_01" "Bio_10" "Bio_19" "Bio_09" "Bio_18" 
#"Wind" [10] "Bio_08" "Bio_17" "Bio_07" "Bio_16" "Bio_15" "Aridez" 
#"Bio_14" "Bio_04" "Bio_13" [19] "alt_1" 

Thank you for the help so far, everyone. <3

Champ

unread,
Sep 17, 2018, 2:32:01 AM9/17/18
to Maxent
Hi everyone,
Since my issue is related to this I thought to send it to your attention. Appreciate if you could help me to resolve my issue.

I am running Maxent model for some invasive plants. I use all available GBIF global data and 30s bioclim environmental data.  I tried to remove correlated variables in my 30s global asc layers using 2 methods but they didn't work due to memory problem.Then, i used 10m bioclim global layers to remove correlated variables. It worked nicely for 10m global asc layers. I selected variables and  went back to my 30 s layers.  Is this method alright ? Or anybody has a better suggestion pl?
Thanks a lot
Champ
Reply all
Reply to author
Forward
0 new messages