[R] How to run lm for each subset of the data frame, and then aggreage the result?

2 views
Skip to first unread message

CHEN, Cheng

unread,
May 19, 2013, 8:31:04 AM5/19/13
to R-h...@r-project.org
Hi gurus,

I have a big data frame df, with columns named as :

age, income, country

what I want to do is very simpe actually, do

fitFunc<-function(thisCountry){
subframe<-df[which(country==thisCountry),];
fit<-lm(income~0+age, data=subframe);
return(coef(fit));}

for each individual country. Then aggregate the result into a new data
frame looks like :

countryname, coeffname1 USA 1.22 GB
1.03 France 1.1

I tried to do :
do.call("rbind", lapply(countries, fitFunc))

but this only gives something like:

age
[1,] 2.540879
[2,] 2.428830
[3,] 2.369560
How should I proceed?

can anyone help?


--
*CHEN*, Cheng

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

unread,
May 19, 2013, 12:10:25 PM5/19/13
to CHEN, Cheng, R help
HI,
May be this helps:

set.seed(24)
dat1<- data.frame(age=sample(30:70,120,replace=TRUE),income=sample(40000:80000,120,replace=FALSE),country=rep(c("USA","GB","France"),each=40),stringsAsFactors=FALSE)
library(plyr)
 ldply(dlply(dat1,.(country),lm,formula=income~0+age),function(x) coef(x))
#  country      age
#1  France 1127.192
#2      GB 1194.586
#3     USA 1161.795
#or
do.call(rbind,lapply(split(dat1,dat1$country),function(x) coef(with(x,lm(income~0+age)))))
#            age
#France 1127.192
#GB     1194.586
#USA    1161.795
#or
 do.call(rbind,lapply(unique(dat1$country),function(x) {subframe<- dat1[which(dat1$country==x),]; fit<- lm(income~0+age,data=subframe); Coef1<-data.frame(age=coef(fit)); row.names(Coef1)<-x; Coef1}))
#            age
#USA    1161.795
#GB     1194.586
#France 1127.192
A.K.

David Winsemius

unread,
May 19, 2013, 12:19:01 PM5/19/13
to CHEN, Cheng, R-h...@r-project.org

On May 19, 2013, at 5:31 AM, CHEN, Cheng wrote:

> Hi gurus,
>
> I have a big data frame df, with columns named as :
>
> age, income, country
>
> what I want to do is very simpe actually, do
>
> fitFunc<-function(thisCountry){
> subframe<-df[which(country==thisCountry),];
> fit<-lm(income~0+age, data=subframe);
> return(coef(fit));}
>
> for each individual country. Then aggregate the result into a new data
> frame looks like :
>
> countryname, coeffname1 USA 1.22 GB
> 1.03 France 1.1
>
> I tried to do :
> do.call("rbind", lapply(countries, fitFunc))
>
This suggests you have used 'attach' on df. Not a safe practice.

> but this only gives something like:
>
> age
> [1,] 2.540879
> [2,] 2.428830
> [3,] 2.369560
> How should I proceed?

That is exactly the sort of result I would have expected from your procedure. We cannot tell what you want that is different. For one thing you are posting in HTML so the "aggregate result above is mangled. I'm guessing it might have been.

countryname, coeffname1
USA 1.22
GB 1.03
France 1.1

So perhaps the only thing that is missing are the row names?

res <- do.call("rbind", lapply(df$countries, fitFunc)
rownames(res) <- as.character(df$countries)
res

If you had wanted a dataframe to be returned you could do this with the 'by' function or return a list with countries instead of a numeric vector from your 'fitFunc' calls. rbind a list of lists may give you something that should easily be coerced to data.frame. (But no data to test these theories)


>
> [[alternative HTML version deleted]]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> and provide commented, minimal, self-contained, reproducible code.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--
David Winsemius
Alameda, CA, USA
Reply all
Reply to author
Forward
0 new messages