In another thread Andy Liaw, who CRAN lists as locfit maintainer; said:
<quote>
From: "Liaw, Andy" <andy...@merck.com>
To: "Guy Green" <guyg...@netvigator.com>; <r-h...@r-project.org>
Subject: Re: Alternatives to linear regression with multiple variables
Date: 22 February 2010 17:50
You can try the locfit package, which I believe can handle up to 5
variables. E.g.,
</quote>
Looking in the locfit documentation (e.g.
http://www.stats.bris.ac.uk/R/web/packages/locfit/locfit.pdf) I can't see an
upper limit on the number of predictors; if it is 5 I'm getting close in one
of my applications.
Can anyone confirm or deny the existence of a 'crisp' upper limit on the
number of predictors in locfit?
If it is 5, or thereabouts, can anyone suggest an alternative which can
handle a few more? (I'm using it for multidimensional interpolation).
Best regards,
Keith Jewell
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
n <- 5e3
m <- 20
x <- matrix(runif(n * m), nrow=n)
y <- rnorm(n)
require(locfit)
fit <- locfit.raw(x[, 1:10], y)
The code above took a while on my laptop, and ended up giving some error
I don't understand. Not sure if the error was caused by insufficient
sample size, or some inherent limitation. At least it didn't choke on
five variables. However, if all 20 columns of x is used, locfit.raw()
will choke because it can't compute the dimension of some variable that
it needs to allocate memory for.
I had vague recollection of reading that "5" is the limit somewhere.
Unfortunately my copy of Local Regression and Likelihood has been MIA
for a few years, so I can't check there. In any case it doesn't seem
like the number of data points and/or computing power are bigger issue.
Andy
Notice: This e-mail message, together with any attachme...{{dropped:10}}
I've investigated a little more using...
y <- rowSums(x) + runif(n)
... just so I had some correlation to play with.
The error I get when it fails is "Invalid what in exvval", which I don't
understand either!
With n=5e3 it worked with 6 variables but not with 7.
I wasn't sure the error was caused by number of variables rather than
something else, so I tried with...
n <- 100
I also tried locfit rather than locfit.raw using...
xd <- lapply(1:10, function(x) runif(n))
xd <- as.data.frame(xd)
names(xd) <- paste("x", 1:10, sep="")
y=rowSums(xd)
xd$y <- y
aF <- formula(paste("y ~ lp(",paste(names(xd)[1:6], collapse=","), ")"))
locfit(aF, xd)
Both of these gave the same results, success with 6 variables but not with
7.
IT APPEARS, the maximum number of predictors is 6, but I don't know locfit
well, and it may be that other settings would allow more variables.
CAN anyone give a more DEFINITIVE ANSWER?
My current data sets currently reach 5 predictors, and I expect this it
increase.
In S-Plus (v6.2.1) I used loess in which "Locally quadratic models may have
at most 4 predictor variables; locally linear models may have at most 15".
In R stats::loess allows only "one to four numeric predictors".
I'd assumed (foolishly) that because locfit didn't mention limits, the only
limits were practical (memory, time,...) - it seems not :-(
I guess I could write something myself, I only need rough interpolation,
even "straight line" interpolation between nearest neighbours would be OK.
But at first glance it seems non-trivial with a substantial non-fixed number
of dimensions (nnclust::nnfind to identify neighbours??), and I don't want
to re-invent wheels.
Can anyone suggest an ALTERNATIVE route for INTERPOLATION in 5-10
DIMENSIONS?
Best...
(apologies for capitals, not shouting, just highlighting key points for
those skimming quickly)
Keith Jewell
"Liaw, Andy" <andy...@merck.com> wrote in message
news:B10BAA7D28D88B45AF8...@usctmx1157.merck.com...