bin/aggregate by date range

766 views
Skip to first unread message

Jeroen Ooms

unread,
Sep 30, 2011, 8:18:27 PM9/30/11
to ggplot2
I have some timeseries data, for which I would like to plot the min,
mean and max over time. Therefore I should 'bin' the data into ranges
of course, so that I have multiple observations per time period, and
can calculate min mean and max. Suppose this is my data:

#simulate some data
dates <- as.Date(Sys.time()) + runif(1000,-180,0);
values <- rnorm(1000);

Now I am looking for an easy way to calculate min, mean and max for
arbitrary time periods (say, per day, week or month). Below is a very
hacky way to show the kind of plot I am interested in, but this only
aggregates per date. Instead, I would need something similar as
stat_bin but for values instead of counts.

#bin data per date
dates <- round(dates);
dates <- factor(unclass(dates), levels=seq(min(dates), max(dates),
by=1));
quantiles <- sapply(split(values,dates), quantile, probs=c(0, 0.5, 1),
na.rm=T)
myData <- as.data.frame(t(quantiles));
myData <- na.omit(myData)
names(myData) <- c("Min", "Mean", "Max");
myData$dates <- as.Date(row.names(myData));

#create plot
myplot <- ggplot(aes(x=dates, y=Mean, ymin=Min, ymax=Max),
data=myData) +
geom_ribbon(alpha=0.3) +
geom_line(size=1, color="blue") +
geom_point(size=3, color="red");


Kohske Takahashi

unread,
Oct 1, 2011, 9:52:56 PM10/1/11
to Jeroen Ooms, ggplot2
Hi,

not sure if this is the best way, but you can use zoo and xts:

dates <- as.Date(Sys.time()) + runif(1000,-180,0);
values <- rnorm(1000);

library(zoo)
library(xts)

# convert data into zoo object, and extract min, mean, max of each week
# see ?apply.weekly
r <- llply(c(min = min, mean = mean, max = max), function(f)
apply.weekly(zoo(values, dates), f))

# convert xts data into data.frame so that ggplot2 can handle them.
df <- as.data.frame(do.call("cbind", r))
df$weeks <- time(r[[1]])

# your plot
myplot <- ggplot(aes(x=weeks, y=mean, ymin=min, ymax=max), data=df) +


geom_ribbon(alpha=0.3) +
geom_line(size=1, color="blue") +
geom_point(size=3, color="red")

myplot

kohske
--
Kohske Takahashi <takahash...@gmail.com>

Research Center for Advanced Science and Technology,
The University of  Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Jeroen Ooms

unread,
Oct 2, 2011, 10:36:40 PM10/2/11
to Kohske Takahashi, ggplot2
Hmm I needed something more flexible, so I ended up simply using the %/% arithmetic:

#bin per aggregate
bin.by.date <- function(dates, values, aggregate=7, FN=quantile, ...){
bindates <- structure((unclass(dates) %/% aggregate  + 0.5) * aggregate, class="Date")
quantiles <- sapply(split(values, bindates), quantile, probs=c(0, 0.5, 1), na.rm=T)
myData <- as.data.frame(t(quantiles));
myData <- na.omit(myData)
myData <- cbind(Date=as.Date(row.names(myData)), myData);
row.names(myData) <- NULL;
return(myData);
}

#simulate some data
dates <- as.Date(Sys.time()) + runif(1000,-180,0);
values <- rnorm(1000);
myData <- bin.by.date(dates, values, aggregate = 30, probs=c(0,.5,1));
names(myData) <- c("Date", "Min", "Mean", "Max");

#create plot
myplot <- ggplot(aes(x=Date, y=Mean, ymin=Min, ymax=Max),
data=myData) +
geom_ribbon(alpha=0.3) +
geom_line(size=1, color="blue") +
geom_point(size=3, color="red");

print(myplot);

Hadley Wickham

unread,
Oct 4, 2011, 11:16:59 AM10/4/11
to Jeroen Ooms, Kohske Takahashi, ggplot2
You might also want to look at lubridate - www.jstatsoft.org/v40/i03/

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages