I'm not sure if this is possible, but I'm working with some long format time series data and I'd like to have a cumulative tally of the number of distinct values that have appeared up through time t.
Using the example dataset:
> d = data.frame(t = c(1:7), X=c("A", "A", "B", "C", "B", "D", "E"))
> d
t X
1 1 A
2 2 A
3 3 B
4 4 C
5 5 B
6 6 D
7 7 E
What I would like would like a third column with the values (1, 1, 2, 3, 3, 4, 5) that counts the number of distinct values of X that have been shown up through row t.
I tried:
d %.% mutate(cum_num_items = length(unique(X[1:t])))
But that just gets a column of 1's. I'd like to be able to do this within mutate because I have to do it for sequences belonging to many different subjects. Anyone know of a way to do this?
Many thanks,
Andrew