[R] define subset argument for function lm as variable?

11 views
Skip to first unread message

Rainer M Krug

unread,
Aug 21, 2012, 10:44:44 AM8/21/12
to R-help
Hi

I want to do a series of linear models, and would like to define the input arguments for lm() as
variables. I managed easily to define the formula arguments in a variable, but I also would like to
have the "subset" in a variable. My reasoning is, that I have the subset in the results object.

So I wiould like to add a line like:

subs <- dead==FALSE & recTreat==FALSE

which obviously does not work as the expression is evaluated immediately. Is is it possible to do
what I want to do here, or do I have to go back to use

dat <- subset(dat, dead==FALSE & recTreat==FALSE)

?



dat <- loadSPECIES(SPECIES)
feff <- height~pHarv*year # fixed effect in the model
reff <- ~year|plant # random effect in the model, where year is the
dat.lme <- lme(
fixed = feff, # fixed effect in the model
data = dat,
random = reff, # random effect in the model
correlation = corAR1(form=~year|plant), #
subset = dead==FALSE & recTreat==FALSE, #
na.action = na.omit
)
dat.lm <- lm(
formula = feff, # fixed effect in the model
data = dat,
subset = dead==FALSE & recTreat==FALSE,
na.action = na.omit
)

Thanks,

Rainer

--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel : +33 - (0)9 53 10 27 44
Cell: +33 - (0)6 85 62 59 98
Fax : +33 - (0)9 58 10 27 44

Fax (D): +49 - (0)3 21 21 25 22 44

email: Rai...@krugs.de

Skype: RMkrug

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley

unread,
Aug 21, 2012, 10:51:41 AM8/21/12
to Rai...@krugs.de, R-help
Hi Rainer,

You could try:

subs <- expression(dead==FALSE & recTreat==FALSE)

lme(formula, subset = eval(subs))

Not tested, but something along those lines should work.

Cheers,

Josh
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

Bert Gunter

unread,
Aug 21, 2012, 10:57:59 AM8/21/12
to Rai...@krugs.de, R-help
?? I do not groc what you mean. ... subset == subs would work fine in
your lm call. So unless someone else does get it, you may need to
elaborate.

In general, ?substitute, ?bquote, and ?quote are useful to avoid
immediate evaluation of calls, but I don't know if that's relevant to
what you want here.

-- Bert

On Tue, Aug 21, 2012 at 7:44 AM, Rainer M Krug <r.m....@gmail.com> wrote:
--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Rainer M Krug

unread,
Aug 21, 2012, 11:11:14 AM8/21/12
to Bert Gunter, R-help
On 21/08/12 16:57, Bert Gunter wrote:
> ?? I do not groc what you mean. ... subset == subs would work fine in
> your lm call. So unless someone else does get it, you may need to
> elaborate.

OK - here is an example:

dat <- data.frame(
ctl = c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14),
trt = c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69),
group = gl(2,10,20, labels=c("Ctl","Trt")),
weight = c(ctl, trt)
)
lm(weight ~ group, data=dat, subset=trt>0)

subst <- trt>0 ### here I get the obvious error: Error: object 'trt' not found

# I want to use:

lm(weight ~ group, data=dat, subset=subst)


>
> In general, ?substitute, ?bquote, and ?quote are useful to avoid
> immediate evaluation of calls, but I don't know if that's relevant to
> what you want here.

Looks promising from the help, but I don't get it to work.

Rainer

Eik Vettorazzi

unread,
Aug 21, 2012, 11:35:10 AM8/21/12
to Rai...@krugs.de, R-help
Hi Rainer,
I got an error while replicating your data.frame construction.

But this worked for me

ctl = c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt = c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
dat <- data.frame( group = gl(2,10,20, labels=c("Ctl","Trt")),
weight = c(ctl, trt)
)
lm(weight ~ group, data=dat, subset=trt>0)

Cheers
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

Rainer M Krug

unread,
Aug 21, 2012, 11:41:20 AM8/21/12
to Eik Vettorazzi, R-help, Rai...@krugs.de
On 21/08/12 17:35, Eik Vettorazzi wrote:
> Hi Rainer,
> I got an error while replicating your data.frame construction.
>
> But this worked for me
>
> ctl = c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> trt = c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
> dat <- data.frame( group = gl(2,10,20, labels=c("Ctl","Trt")),
> weight = c(ctl, trt)
> )

Sorry - to much garbage in my workspace left.

Just add:

rm(ctl)
rm(trt)

after the definition

> lm(weight ~ group, data=dat, subset=trt>0)

this obviously works, but I would like to have:

subst <- trt>0
lm(weight ~ group, data=dat, subset=subst)

Sorry about this,

Rainer

Joshua Wiley

unread,
Aug 21, 2012, 11:54:43 AM8/21/12
to Rai...@krugs.de, R-help
What is wrong with what I suggested initially?

subst <- expression(trt > 0)
lm(weight ~ group, data=dat, subset=eval(subst))


??
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

Rainer M Krug

unread,
Aug 21, 2012, 12:12:03 PM8/21/12
to Joshua Wiley, R-help
On 21/08/12 17:54, Joshua Wiley wrote:
> What is wrong with what I suggested initially?
>
> subst <- expression(trt > 0)
> lm(weight ~ group, data=dat, subset=eval(subst))

That it does not work?

ctl = c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt = c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
dat <- data.frame( group = gl(2,10,20, labels=c("Ctl","Trt")),
weight = c(ctl, trt)
)
rm(ctl)
rm(trt)
subst <- expression(trt > 0)
lm(weight ~ group, data=dat, subset=eval(subst))
# output: Error in eval(expr, envir, enclos) : object 'trt' not found


and

lm(weight ~ group, data=dat, subset=subst)
# output: Error in xj[i] : invalid subscript type 'expression'
>

also does not work.

Rainer

Eik Vettorazzi

unread,
Aug 21, 2012, 12:14:07 PM8/21/12
to Rai...@krugs.de, R-help
Josh's solution should be fine.

just one more note,
trt>0 may not work as intended since trt is a factor in your example.
Just check
subset(dat,trt>0)
wich is just 'dat'.


--

Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

______________________________________________

Rainer M Krug

unread,
Aug 21, 2012, 12:20:00 PM8/21/12
to Rai...@krugs.de, R-help
Sorry - it is working as suggested by Joshua.

Thanks a lot and sorry for the horrible confusion and examples,

Rainer

Milan Bouchet-Valat

unread,
Aug 29, 2012, 6:56:35 AM8/29/12
to Joshua Wiley, R-help, Rai...@krugs.de
Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit :
> Hi Rainer,
>
> You could try:
>
> subs <- expression(dead==FALSE & recTreat==FALSE)
>
> lme(formula, subset = eval(subs))
>
> Not tested, but something along those lines should work.
Out of curiosity, why isn't "subset" (and "weights", which is very
similar in that regard) evaluated in the "data" environment, just like
the formula? Is this for historical reasons, or are there drawbacks to
such a feature?

It seems very common to pass a data frame via the "data" argument, and
use variables from it for subsetting and/or weighting.


Regards

Joshua Wiley

unread,
Aug 29, 2012, 7:01:30 AM8/29/12
to Milan Bouchet-Valat, R-help
On Wed, Aug 29, 2012 at 3:56 AM, Milan Bouchet-Valat <nali...@club.fr> wrote:
> Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit :
>> Hi Rainer,
>>
>> You could try:
>>
>> subs <- expression(dead==FALSE & recTreat==FALSE)
>>
>> lme(formula, subset = eval(subs))
>>
>> Not tested, but something along those lines should work.
> Out of curiosity, why isn't "subset" (and "weights", which is very
> similar in that regard) evaluated in the "data" environment, just like
> the formula? Is this for historical reasons, or are there drawbacks to
> such a feature?

I am not sure about weights offhand, but subset is evaluated in the
data environment----that is why that solution works. The original
question was how to setup the expression as an object that was passed
to subset. The trick is to avoid having the logical expression
evaluated when the object is created, which I avoided by using
expression, and then in lme() forcing the evaluation of the object.

--

Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________

Milan Bouchet-Valat

unread,
Aug 29, 2012, 8:26:56 AM8/29/12
to Joshua Wiley, R-help
Le mercredi 29 août 2012 à 04:01 -0700, Joshua Wiley a écrit :
> On Wed, Aug 29, 2012 at 3:56 AM, Milan Bouchet-Valat <nali...@club.fr> wrote:
> > Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit :
> >> Hi Rainer,
> >>
> >> You could try:
> >>
> >> subs <- expression(dead==FALSE & recTreat==FALSE)
> >>
> >> lme(formula, subset = eval(subs))
> >>
> >> Not tested, but something along those lines should work.
> > Out of curiosity, why isn't "subset" (and "weights", which is very
> > similar in that regard) evaluated in the "data" environment, just like
> > the formula? Is this for historical reasons, or are there drawbacks to
> > such a feature?
>
> I am not sure about weights offhand, but subset is evaluated in the
> data environment----that is why that solution works. The original
> question was how to setup the expression as an object that was passed
> to subset. The trick is to avoid having the logical expression
> evaluated when the object is created, which I avoided by using
> expression, and then in lme() forcing the evaluation of the object.
OK, my phrasing was not really correct. What I meant (and what triggered
the OP question) was : why doesn't the "subset" argument behave the same
in lm() and in subset.data.frame()? Is there any advantage to evaluating
the argument at the object creation?

AFAICS, subset.data.frame() merely uses this trick:
e <- substitute(subset)
r <- eval(e, x, parent.frame())


I'm probably missing something... ;-)

Milan Bouchet-Valat

unread,
Aug 29, 2012, 9:50:49 AM8/29/12
to Joshua Wiley, R-help
Le mercredi 29 août 2012 à 14:26 +0200, Milan Bouchet-Valat a écrit :
> Le mercredi 29 août 2012 à 04:01 -0700, Joshua Wiley a écrit :
> > On Wed, Aug 29, 2012 at 3:56 AM, Milan Bouchet-Valat <nali...@club.fr> wrote:
> > > Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit :
> > >> Hi Rainer,
> > >>
> > >> You could try:
> > >>
> > >> subs <- expression(dead==FALSE & recTreat==FALSE)
> > >>
> > >> lme(formula, subset = eval(subs))
> > >>
> > >> Not tested, but something along those lines should work.
> > > Out of curiosity, why isn't "subset" (and "weights", which is very
> > > similar in that regard) evaluated in the "data" environment, just like
> > > the formula? Is this for historical reasons, or are there drawbacks to
> > > such a feature?
> >
> > I am not sure about weights offhand, but subset is evaluated in the
> > data environment----that is why that solution works. The original
> > question was how to setup the expression as an object that was passed
> > to subset. The trick is to avoid having the logical expression
> > evaluated when the object is created, which I avoided by using
> > expression, and then in lme() forcing the evaluation of the object.
> OK, my phrasing was not really correct. What I meant (and what triggered
> the OP question) was : why doesn't the "subset" argument behave the same
> in lm() and in subset.data.frame()? Is there any advantage to evaluating
> the argument at the object creation?
Nevermind, forget this silly question. This works exactly as I describe
it, it's just that I did not get the OP's problem right, and for an
unexplained reason in my testing this did not work. But now I realize,
as you said, the problem is just that the OP wanted to store the subset
in an object first.

Sorry for the noise - at least I learned I can specify "weights" the
easy way... ;-)

Reply all
Reply to author
Forward
0 new messages