Add column with frequencies grouped by year

22 views
Skip to first unread message

DrunkenPhD

unread,
Apr 21, 2015, 3:16:18 PM4/21/15
to manip...@googlegroups.com
I have data:

name sex    year
A        M      2011
B        F       2010
A        F       2012
B       M       2010
C       M       2011
C       M       2010
C       F        2012
A       F        2010
A       F        2011

My output should be in the format:


name sex year       n  prop
A        F    2010     4   0.2
B        M   2010     2   0.01
A        F    2011    3    0.01


where n is the frequency of specific name(A, B, C) for the specific year(2011,2012 ) and prop is the proportion of  name frequency n to total of names for that year
For example if we have for 2010 about 100 names and 4 of them are A than prop of A is 4/100

Regards

Doug Mitarotonda

unread,
Apr 21, 2015, 4:17:43 PM4/21/15
to DrunkenPhD, manip...@googlegroups.com
What have you tried doing so far? Is there a specific step you are blocked on? I could see a few ways of doing this with plyr/dplyr. 

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

DrunkenPhD

unread,
Apr 21, 2015, 4:19:38 PM4/21/15
to manip...@googlegroups.com
I tried using dplyr and tidyr:

Something like:

data %>% arrange(data,year) %>% mutate(data, n =count(name))

Not successful though 

Brandon Hurr

unread,
Apr 21, 2015, 4:41:50 PM4/21/15
to DrunkenPhD, manipulatr
Do you have a better example dataset? This one doesn't really have enough combinations...


df <- structure(list(name = c("A", "B", "A", "B", "C", "C", "C", "A",
"A"), sex = c("M", "F", "F", "M", "M", "M", "F", "F", "F"), year = c(2011L,
2010L, 2012L, 2010L, 2011L, 2010L, 2012L, 2010L, 2011L)), .Names = c("name", "sex", "year"), class = "data.frame", row.names = c(NA, -9L))

library(dplyr)

df %>%
group_by(year, name, sex) %>%
summarise(count=n())

Source: local data frame [9 x 4]
Groups: year, name

  year name sex count
1 2010    A   F     1
2 2010    B   F     1
3 2010    B   M     1
4 2010    C   M     1
5 2011    A   F     1
6 2011    A   M     1
7 2011    C   M     1
8 2012    A   F     1
9 2012    C   F     1

--

Endri Raco

unread,
Apr 21, 2015, 4:48:37 PM4/21/15
to Brandon Hurr, manipulatr
Thank you Brandon,

Your code seems to work but actually count shouldnt be 1 to all of names because there are more than 1 A for 2011
What I am doing wrong?
--
 Endri Raço PhD
_________________________________________________
Polytechnic University of Tirana
Faculty of Mathematical Engineering and Physics Engineering
Department of
Mathematical Engineering
Address:  Sheshi Nene Tereza, Nr. 4, Tirana - ALBANIA
Mobile: ++ 355 682061988

Brandon Hurr

unread,
Apr 21, 2015, 4:52:33 PM4/21/15
to Endri Raco, manipulatr
From your example it looked like you were summarizing across all three variables. In your example, there are two A's, but one is M and the other is F. 

If you want to summarize ignoring sex, take sex out of the group_by().
Reply all
Reply to author
Forward
0 new messages