Again, for dropping variables, based on correlations or VIF, or do nothing?

914 views
Skip to first unread message

epiphyte

unread,
May 29, 2012, 4:51:45 AM5/29/12
to Maxent
Dear Maxenter,

I understand this is a question being discussed thousand times in this
group. But I gonna to ask again.
How do you decide to eliminate variables to avoid multi-collinearity?

I used pearson's correlation test to drop variables with values > 0.75
for my last study. Some people also suggest variance inflation factors
(VIF). However, I just learned that multi-collinearity don't actually
reduce the predictive power or reliability of a model. High values of
VIF sometimes don't discount the result of regression analyses
(O'Brien, 2007), and variables may be correlated as high as .8
without causing such issues.

Since dropping variables means losing information for modelling
species. For example, my epiphytic plants are sensitive to two
correlated variables, minimum and annual temperature. I have to drop
one variable for avoiding mutli-collinearity which probably won't
influence the predictive power. Should I just keep these correlated
variables in my model? Plus, if I use all occurrence data for model
building and null-models for testing model significance (Niels Raes
and ter Steege, 2007), there will be no changes in occurrence data and
thus coefficient estimates. Am I just to naive about all this? Any
response is appreciated!

Rebecca


Ed

unread,
May 30, 2012, 12:25:40 PM5/30/12
to Maxent
Hi Rebecca -

To help you out of your dilemma, I think you need to ask a different
question. Asking "how to avoid multi-collinearity" in your predictor
variables has a straightforward, statistical answer which you have
found, more or less. But you have also found that this is rather
unstatisfying from an ecological or process perspective.

If you ask "what are the most appropriate variables to use in my
model?". Then multi-collearity becomes only part of the question, and
you are then free to weigh other criteria (e.g., ecological relevance)
as you see fit. I don't think statistical independence of predictors
is a requirement for Maxent, and even many of the advanced regression
methods are robust to it - certainly the question of spatial auto-
correlation among grid cells is at least as big an issue.

So my advice would be to not dwell too much on the statistics, do what
makes sense ecologically, and most importantly, examine your response
curves because, in my experience, Maxent has a tendency to overfit the
data, leading to pretty unrealistic (ecologically) response curves
using the default settings.

And how you use your dependent data depends on your question. Are you
goals to predict or understand? Its not hard to get our models to
outperform 'null' models, so I find this is often a rather pointless
exercise. You are often better splitting your data and using some kind
of cross-validation approach.

You are not being naive. This is not exactly easy, especially if you
stop to consider what you are doing instead of just pushing buttons.

good luck!
ed.

David Le Maitre

unread,
May 31, 2012, 3:54:16 AM5/31/12
to Maxent
Hi Rebecca
 
Excellent advice Ed. Rebecca - you ask about using both mean and minimum temperature. Your interest in is what determines these species distributions so you need to ask - knowing what I do about the sensitivity of such species to temperature (e.g. based on literature) is minimum temperature likely to be more limiting than mean temperature? It gets more complicated when you have, for example, temperature and rainfall correlated (which can happen where rainfall and temperature gradients are aligned). Then you have to ask how rainfall and temperature could interact and do the responses to these variables correspond with such an interaction? In which case you could use Maxent's product features.
 
Regards
 
David

>>> Ed <ejg...@gmail.com> 30/05/2012 18:25 >>>
--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To post to this group, send email to max...@googlegroups.com.
To unsubscribe from this group, send email to maxent+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/maxent?hl=en.


--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean.

Please consider the environment before printing this email.


--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.


This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean.


Please consider the environment before printing this email.

epiphyte

unread,
May 31, 2012, 9:50:25 PM5/31/12
to Maxent
Thanks! Ed and David,

I think Ed's advice is very inspiring, quite fit our Taiwanese
philosophy :).

Yes, I was thinking to keep both mean temperature & minimum
temperature, Since at some areas, there are few vascular epiphytes
probably because of occasional frost. Similar situation happens such
as Annual Precipitation vs Precipitation of Driest Quarter. I also
found that response curves were hard to explain for many cases of my
study. Another problem is most of my studied species are endemic or
rare with occurrence below ten. If I use cross-validation approach, do
you suggest me to select model based on AIC or BIC values? BTW, I
notice that Maxent default uses linear model for sample less than ten.
Will violate any statistic rules if I choose product features?

Kind regards,

Rebecca
Reply all
Reply to author
Forward
0 new messages