Cumulative frequency of unique values in a data frame with dplyr?

554 views
Skip to first unread message

Andrew Mercer

unread,
Jun 18, 2014, 4:59:35 PM6/18/14
to manip...@googlegroups.com
I'm not sure if this is possible, but I'm working with some long format time series data and I'd like to have a cumulative tally of the number of distinct values that have appeared up through time t.   

Using the example dataset:
> d = data.frame(t = c(1:7), X=c("A", "A", "B", "C", "B", "D", "E"))
> d
  t X
1 1 A
2 2 A
3 3 B
4 4 C
5 5 B
6 6 D
7 7 E

What I would like would like a third column with the values (1, 1, 2, 3, 3, 4, 5) that counts the number of distinct values of X that have been shown up through row t.

I tried:

d %.% mutate(cum_num_items = length(unique(X[1:t])))

But that just gets a column of 1's.   I'd like to be able to do this within mutate because I have to do it for sequences belonging to many different subjects.  Anyone know of a way to do this?

Many thanks,
Andrew

jim holtman

unread,
Jun 19, 2014, 8:30:33 AM6/19/14
to Andrew Mercer, manipulatr
try this:

> d = data.frame(t = c(1:7), X=c("A", "A", "B", "C", "B", "D", "E"))
> d$cum_num <- cumsum(!duplicated(d$X))
> d
  t X cum_num
1 1 A       1
2 2 A       1
3 3 B       2
4 4 C       3
5 5 B       3
6 6 D       4
7 7 E       5



Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages