I have a data frame with about 23000 observations. There first variable is
the group to which the observation belongs (about 500 different groups). The
second variable is a response for each observation that is a 1,2,3,4 or 5. I
want to be able to calculate the percentage of each group that choose each
response. For example I want to know what percentage of group 1 (which may
have a value of 34456) choose response 1 and so on.
Here is some code I wrote that generates a data frame like the one I have.
pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)
example.data <- data.frame(groupVar,responseVar)
Is there a fast way to calculate these percentages beyond writing loops to
manually count the responses for each of the groups?
Thanks,
EG
[[alternative HTML version deleted]]
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Using:
table(example.data)
will give you a cross tabulation of the counts of your ResponseVar by
each groupVar.
prop.table(table(example.data), 1)
will give you a row-wise proportion (0 - 1) of the counts of ResponseVar
for each groupVar. If you want percentages (0 - 100):
prop.table(table(example.data), 1) * 100
See ?table and ?prop.table for more information.
HTH,
Marc Schwartz
Michael Conklin
Chief Methodologist - Advanced Analytics
MarketTools, Inc.
6465 Wayzata Blvd. Suite 170
Minneapolis, MN 55426
Tel: 952.417.4719 | Mobile:612.201.8978
MarketTools(r) http://www.markettools.com
This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.
Now I am having a problem getting the results into a format I can use.
prop.table generates a contingency table that I tried to turn into a data
frame I could graph from with as.data.frame() however the results are not a
true data.frame. The rows have commas between some of the elements and
trying to look at a column results in the original contingency table being
displayed.
This is what I have:
pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)
example.data <- data.frame(groupVar,responseVar)
data.table <-
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.table(table(x))})
example.data.frame <- as.data.frame(data.table)
But the example.data.frame object is not a true data frame, and I am not
sure how to get it into a format I can graph.
On Fri, May 23, 2008 at 11:33 AM, Michael Conklin <
michael...@markettools.com> wrote:
> tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t
> able(table(x))})
>
> Michael Conklin
>
> Chief Methodologist - Advanced Analytics
>
>
>
>
[[alternative HTML version deleted]]
do.call(rbind,data.table)
HTH,
Jorge
On Fri, May 23, 2008 at 1:31 PM, Economics Guy <econom...@gmail.com>
wrote:
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 8)
Is there a way to tell rbind() or do.call() to treat missing values as zero
or make prop.table() include the zero proportions?
On Fri, May 23, 2008 at 1:59 PM, Phil Spector <spe...@stat.berkeley.edu>
wrote:
> EG -
> Thanks for the reproducible example!
>
> When I run your code, and check the class of the result from tapply(), I
> see that it is an
> "array", and using dim(), I see it's an array
> of length 500. How big is each element?
>
> table(sapply(res,length))
>>
>
> 5
> 500
>
> So each piece is the same length. That means we could
> make a 500x5 matrix as follows:
>
> do.call(rbind,res)
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spe...@stat.berkeley.edu
Michael Conklin
Chief Methodologist - Advanced Analytics
MarketTools, Inc.
6465 Wayzata Blvd. Suite 170
Minneapolis, MN 55426
Tel: 952.417.4719 | Mobile:612.201.8978
MarketTools(r) http://www.markettools.com
This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.
-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
On Behalf Of Economics Guy