reducing the number of predictors

Message has been deleted

Bruce McCune

unread,

May 12, 2011, 11:14:08 AM5/12/11

to HyperNiche and NPMR

Just to prime the pump a bit, I thought I would share a couple of
questions that have come my way, along with my answer to them. Feel
free to supplement with your own ideas!

-------------
Q. What is the best way to narrow down my potential list of predictor
variables? In Irvine et al (2009) they use an NPMR analysis, but first
narrow down the potential predictor variables using an AIC analysis of
a-priori hypotheses. In Thompson and Spies (2009), Thompson narrows
down his list of candidate variables using randomForest, and then
conducts his analysis of the relationships between responses and
important predictors using random tree analysis.

I considered using randomForest to narrow down my list of potential
explanatory variables and then use the pruned set in an NPMR
analysis, but I wasn't sure whether that was useful way to narrow
things down, since the two analyses analyze interactions in different
ways. (However, for the analysis that I have tried, the "top"
predictors were similar regardless of whether I used the "pruned" or
full set of variables.)
-----------
My answer: I'm not sure I would recommend either of the approaches for
narrowing down the variables. The regression/AIC method only sees
linearity and does not see interactions (unless you explicitly build
those into the model, which is difficult if you have a large number of
predictors). Random forests might make more sense, but it would be
subject to being given too large a pool of predictors too. Lintz et
al. (2011; Quantifying ecological thresholds from response surfaces.
Ecological Modelling 222: 427-436.) shows NPMR to be superior to
Random Forests for many simulated response surfaces.

I would probably start by eliminating any predictors that either (1)
are unlikely to be relevant to the response or (2) strongly correlated
with other predictors. You may have done this already. If so, there is
not too much else you can do (short of getting more data), other than
relying on the cross-validation built into NPMR to reduce overfitting.
-----------
Other ideas?

Peter Nelson

unread,

May 12, 2011, 12:06:12 PM5/12/11

to hyper...@googlegroups.com, Bruce McCune, HyperNiche and NPMR

Hi all,

You could also do a PCA (or some other form of data reduction) that
distills linearly related predictors. Perhaps this is akin to the AIC
test (not remembering the details of that one). PCA's of environmental
variables tend to work well in reducing the number of variables
because linearly related variables make biological sense (eg. elev.
and temperature plus and sometimes other topographic features
distilled into one axis). Whether combining variables is a good idea
depends on whether your interested in those variables individual
contribution to explaining seedling density. You could try Bruce's
heat load calculation, too. There are other indices like this (eg.
ruggedness) you could look for (like NBR for spectral data), that may
be of interest. THis could both reduce the number of predictors and
serve as clean hypotheses for AIC-style testing (unless AIC needs
nested models ... can't remember) between models with different
predictors. Perhaps you could approach the seedling density using some
point-pattern analysis to get a spatial patterns? Are the data
collected extensively (eg. census) or sampled? I know just including
UTM's in your model as a predictor seems to do similar things as
point-pattern analyses, probably easier, too.

Just some ideas...

Peter Nelson
PhD student
BPP GSA vice president
Department of Botany and Plant Pathology
Cordley Hall 2082
Oregon State University
Corvallis, Oregon 97331-2902
Phone: 541-737-1742

Quoting Bruce McCune <br...@salal.us>:

> I would like to look at what are the most important predictor
> variables for seedling density in the Biscuit Fire. I would like to
> know if burn severity (as measured by dNBR) is an important factor
> in explaining seedling density, and what other factors are important
> regardless of whether burn severity is important. I have a
> relatively high number of potential predictor variables (17) in
> comparison to my number of sites (78). Additionally, for some of my
> species, presence data is sparse. (I've decided not to look at an
> individual species if it is present on less than 10 sites.)

Reply all

Reply to author

Forward