[R] Percentages for categorical data by group

0 views
Skip to first unread message

Economics Guy

unread,
May 23, 2008, 10:51:39 AM5/23/08
to r-h...@stat.math.ethz.ch
I can think of several ways to blunt force hard code what I want but I
imagine there is a command or two that can be easily combined to do this:

I have a data frame with about 23000 observations. There first variable is
the group to which the observation belongs (about 500 different groups). The
second variable is a response for each observation that is a 1,2,3,4 or 5. I
want to be able to calculate the percentage of each group that choose each
response. For example I want to know what percentage of group 1 (which may
have a value of 34456) choose response 1 and so on.

Here is some code I wrote that generates a data frame like the one I have.

pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)

example.data <- data.frame(groupVar,responseVar)

Is there a fast way to calculate these percentages beyond writing loops to
manually count the responses for each of the groups?

Thanks,

EG

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

unread,
May 23, 2008, 11:05:20 AM5/23/08
to Economics Guy, r-h...@stat.math.ethz.ch
on 05/23/2008 09:51 AM Economics Guy wrote:
> I can think of several ways to blunt force hard code what I want but I
> imagine there is a command or two that can be easily combined to do this:
>
> I have a data frame with about 23000 observations. There first variable is
> the group to which the observation belongs (about 500 different groups). The
> second variable is a response for each observation that is a 1,2,3,4 or 5. I
> want to be able to calculate the percentage of each group that choose each
> response. For example I want to know what percentage of group 1 (which may
> have a value of 34456) choose response 1 and so on.
>
> Here is some code I wrote that generates a data frame like the one I have.
>
> pop <- matrix(1:100000)
> groupIDs <- sample(pop,500)
> groupVar <- sample(groupIDs,23000,replace=TRUE)
> responseVar <- sample(1:5,23000,replace=TRUE)
>
> example.data <- data.frame(groupVar,responseVar)
>
> Is there a fast way to calculate these percentages beyond writing loops to
> manually count the responses for each of the groups?
>
> Thanks,
>
> EG

Using:

table(example.data)

will give you a cross tabulation of the counts of your ResponseVar by
each groupVar.

prop.table(table(example.data), 1)

will give you a row-wise proportion (0 - 1) of the counts of ResponseVar
for each groupVar. If you want percentages (0 - 100):

prop.table(table(example.data), 1) * 100


See ?table and ?prop.table for more information.

HTH,

Marc Schwartz

Michael Conklin

unread,
May 23, 2008, 11:33:25 AM5/23/08
to Economics Guy, r-h...@stat.math.ethz.ch
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t
able(table(x))})

Michael Conklin

Chief Methodologist - Advanced Analytics

MarketTools, Inc.

6465 Wayzata Blvd. Suite 170

Minneapolis, MN 55426

Tel: 952.417.4719 | Mobile:612.201.8978

Michael...@markettools.com

MarketTools(r) http://www.markettools.com

This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.

Economics Guy

unread,
May 23, 2008, 1:31:45 PM5/23/08
to r-h...@stat.math.ethz.ch
Thanks Michael that works!

Now I am having a problem getting the results into a format I can use.

prop.table generates a contingency table that I tried to turn into a data
frame I could graph from with as.data.frame() however the results are not a
true data.frame. The rows have commas between some of the elements and
trying to look at a column results in the original contingency table being
displayed.

This is what I have:

pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)

example.data <- data.frame(groupVar,responseVar)

data.table <-
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.table(table(x))})

example.data.frame <- as.data.frame(data.table)

But the example.data.frame object is not a true data frame, and I am not
sure how to get it into a format I can graph.


On Fri, May 23, 2008 at 11:33 AM, Michael Conklin <
michael...@markettools.com> wrote:

> tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t
> able(table(x))})
>
> Michael Conklin
>
> Chief Methodologist - Advanced Analytics
>
>
>
>

[[alternative HTML version deleted]]

Jorge Ivan Velez

unread,
May 23, 2008, 1:53:20 PM5/23/08
to Economics Guy, r-h...@stat.math.ethz.ch
Hi there,
Try this:


do.call(rbind,data.table)


HTH,

Jorge

On Fri, May 23, 2008 at 1:31 PM, Economics Guy <econom...@gmail.com>
wrote:

Economics Guy

unread,
May 23, 2008, 2:35:52 PM5/23/08
to r-h...@stat.math.ethz.ch
I appreciate all the help. The trouble is that in my real data set each
group does not always have an observation that choose each response. This
results in some of the "rows" returned from prop.table() to be shorter than
others so I get:

Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 8)

Is there a way to tell rbind() or do.call() to treat missing values as zero
or make prop.table() include the zero proportions?

On Fri, May 23, 2008 at 1:59 PM, Phil Spector <spe...@stat.berkeley.edu>
wrote:

> EG -
> Thanks for the reproducible example!
>
> When I run your code, and check the class of the result from tapply(), I
> see that it is an
> "array", and using dim(), I see it's an array
> of length 500. How big is each element?
>
> table(sapply(res,length))
>>
>
> 5
> 500
>
> So each piece is the same length. That means we could
> make a 500x5 matrix as follows:
>
> do.call(rbind,res)
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spe...@stat.berkeley.edu

Michael Conklin

unread,
May 23, 2008, 4:59:03 PM5/23/08
to Economics Guy, r-h...@stat.math.ethz.ch

prop.table(table(factor(x,levels=1:5)))


Michael Conklin

Chief Methodologist - Advanced Analytics

MarketTools, Inc.

6465 Wayzata Blvd. Suite 170

Minneapolis, MN 55426

Tel: 952.417.4719 | Mobile:612.201.8978

Michael...@markettools.com

MarketTools(r) http://www.markettools.com

This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.


-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
On Behalf Of Economics Guy

Reply all
Reply to author
Forward
0 new messages