Add column with frequencies grouped by year

DrunkenPhD

unread,

Apr 21, 2015, 3:16:18 PM4/21/15

to manip...@googlegroups.com

I have data:

name sex year

A M 2011

B F 2010

A F 2012

B M 2010

C M 2011

C M 2010

C F 2012

A F 2010

A F 2011

My output should be in the format:

name sex year n prop

A F 2010 4 0.2

B M 2010 2 0.01

A F 2011 3 0.01

where n is the frequency of specific name(A, B, C) for the specific year(2011,2012 ) and prop is the proportion of name frequency n to total of names for that year

For example if we have for 2010 about 100 names and 4 of them are A than prop of A is 4/100

Regards

Doug Mitarotonda

unread,

Apr 21, 2015, 4:17:43 PM4/21/15

to DrunkenPhD, manip...@googlegroups.com

What have you tried doing so far? Is there a specific step you are blocked on? I could see a few ways of doing this with plyr/dplyr.

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

DrunkenPhD

unread,

Apr 21, 2015, 4:19:38 PM4/21/15

to manip...@googlegroups.com

I tried using dplyr and tidyr:

Something like:

data %>% arrange(data,year) %>% mutate(data, n =count(name))

Not successful though

Brandon Hurr

unread,

Apr 21, 2015, 4:41:50 PM4/21/15

to DrunkenPhD, manipulatr

Do you have a better example dataset? This one doesn't really have enough combinations...

df <- structure(list(name = c("A", "B", "A", "B", "C", "C", "C", "A",
"A"), sex = c("M", "F", "F", "M", "M", "M", "F", "F", "F"), year = c(2011L,
2010L, 2012L, 2010L, 2011L, 2010L, 2012L, 2010L, 2011L)), .Names = c("name", "sex", "year"), class = "data.frame", row.names = c(NA, -9L))

library(dplyr)

df %>%
group_by(year, name, sex) %>%
summarise(count=n())

Source: local data frame [9 x 4]

Groups: year, name

year name sex count

1 2010 A F 1

2 2010 B F 1

3 2010 B M 1

4 2010 C M 1

5 2011 A F 1

6 2011 A M 1

7 2011 C M 1

8 2012 A F 1

9 2012 C F 1

--

Endri Raco

unread,

Apr 21, 2015, 4:48:37 PM4/21/15

to Brandon Hurr, manipulatr

Thank you Brandon,

Your code seems to work but actually count shouldnt be 1 to all of names because there are more than 1 A for 2011

What I am doing wrong?

--

Endri Raço PhD
_________________________________________________
Polytechnic University of Tirana
Faculty of Mathematical Engineering and Physics Engineering
Department of Mathematical Engineering
Address: Sheshi Nene Tereza, Nr. 4, Tirana - ALBANIA
Mobile: ++ 355 682061988

Brandon Hurr

unread,

Apr 21, 2015, 4:52:33 PM4/21/15

to Endri Raco, manipulatr

From your example it looked like you were summarizing across all three variables. In your example, there are two A's, but one is M and the other is F.

If you want to summarize ignoring sex, take sex out of the group_by().

Reply all

Reply to author

Forward