Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

How many predictor variables are appropriate

18 views
Skip to first unread message

Chris Bowman-Prideaux

unread,
Apr 2, 2019, 10:20:19 PM4/2/19
to HyperNiche and NPMR
My collaborator is concerned that reviewers will scoff at the number of predictor variables I have used in our Local Mean Gaussian NPMR model. I have 80 variables that include

1) species cover and density data (many species)
2) climate variables (monthly, seasonal, and annual averages)
3) soil data
4) fire history
5) rehabilitation history

The best fit models have 4 variables. My questions are:

1) How many possible predictor variables is too many?

2) If a reviewer does say there are too many, what is the best defense for the number of predictor variables?

Thanks

Bruce McCune

unread,
Apr 2, 2019, 10:29:13 PM4/2/19
to hyper...@googlegroups.com
Well, that is a boatload of predictors, so it makes sense to be concerned about that. Your best defense is a randomization test, which might be too computationally expensive if you have a large data set. If the pool of predictors is too large, you will never beat the randomization test. If your model beats the randomization test, then it would be hard to argue with.

From the HyperNiche Help system:

Randomization Test

You can select a Randomization (Monte Carlo) test in the Evaluate Selected Model dialog. A checkbox on the Output Options provides for this test. By default it is turned off because it can be a very lengthy procedure. If you select this option, you will be asked in a subsequent dialog for the number of randomizations (runs) and whether the random number generator should pick its own starting point (seed) or whether you would like to specify the seed (see below). A randomization test is available for your model only via Fit Model | Evaluate Selected Model.

The randomization tests the null hypothesis that the fit of the selected model is no better than could be obtained by chance alone, given an equal number of predictor variables. HyperNiche shuffles the response variable, destroying the observed relationship with the predictors, then attempts to fit the best model possible, using a free search. Whatever variables are in the predictor matrix are considered as predictors, so it is important to include only those variables that were in the pool of available predictors when the original model was derived.

Note that you can use randomization tests as nonparametric alternatives to standard parametric ways to evaluate statistical significance via a t-test, one-way ANOVA. simple linear regression, multiple linear regression, as well as likelihood ratio and chi-square tests. See Simple Randomization Tests.

During the randomization test, a free search attempts to fit no more predictors than the number included in the selected model. For example, if you have 10 predictors available, but only 2 predictors in the selected model, the randomization runs look only for the best 1- or 2-predictor models.

The procedure of randomization followed by free search for a model is repeated some number of times that you select in the Random numbers dialog. The proportion of randomization runs that results in an equal or better fit is used as the p value for the test.

A large number of runs is generally desirable, but an acceptable value will depend on the speed of your computer, the size of your data set, and the desired precision of the resulting p-value. Note that the p-value for a randomization test can be no smaller than 1/N where N is the total number of runs.

Randomization tests can be very slow. Randomization tests use multiple runs with free searches. Even a single free search is very slow if you have many predictor variables. A large sample size will further slow down model fitting. You can estimate the time it will take for a randomization test by multiplying the time required for a free search of your real data by the number of runs that you plan to request.

 


--
You received this message because you are subscribed to the Google Groups "HyperNiche and NPMR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyperniche+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages