Summary (count) of time series events by date

322 views
Skip to first unread message

paulcody

unread,
Mar 8, 2011, 2:52:47 AM3/8/11
to ggplot2
I am trying to rewrite a plot in ggplot2 and can't figure it out
although I'm sure it's easy for someone out there. I have a list of
operative cases and am interested in creating a summary of case
totals per month. The current code is:

'data.frame': 554 obs. of 5 variables:
[exec] $ Date : chr "05/05/2009" "05/06/2009"
"05/06/2009" "05/07/2009" ...
[exec] $ PatientName: chr "FOO" "BAR" "BAZ" "BUZ" ...
[exec] $ Room : chr "ML10" "ML26" "ML19" "ML11" ...

hist_volume <- function(data) {
str(data)
filename <- 'tmp/cases/hist-volume.pdf'
pdf(filename, width=7, height=5, pointsize=10, colormodel="rgb")
dos = as.Date(data$Date, format="%m/%d/%Y")
hist(dos, "months", freq=TRUE, plot=TRUE, format="%m/%y",
col="grey", las=3, main="Case volume by month")
dev.off()
print(paste("Wrote", filename))
}



I had written a different similar plot some time ago that transforms
git log output into commit by month plots, but in that case I am using
shell tools to summarize the data. What is the/a correct way to do
this in ggplot2?


$ git log --pretty=format:'%ai' | cut -c1-10 | uniq -c | sort -k2 >
count.log
$ head count.log
16 2010-06-22
4 2010-07-11
16 2010-07-12
2 2010-07-15
4 2010-07-21
9 2010-07-22
3 2010-07-31
2 2010-08-10



library(ggplot2)

d <- read.table("count.log", sep="");
colnames(d) <- c("count", "date")
d$date <- as.Date(d$date)

ggplot(d, aes(date, count)) +
geom_bar(stat="identity", fill="blue") +
scale_x_date(format = "%b") +
xlab("") +
ylab("") +
opts(axis.text.x=theme_text(size=3),
axis.text.y=theme_text(size=3),
axis.line=theme_blank(),
panel.background=theme_blank(),
panel.grid.major=theme_line(linetype="dotted", size=0.1,
colour="gray"),
panel.grid.minor=theme_blank(),
plot.title=theme_text(size=3),
title="Commit activity by date"
)

ggsave("tmp/commits.png", height=2, width=4,scale=0.8);


Thanks in advance,
Paul

James Howison

unread,
Mar 8, 2011, 8:14:38 AM3/8/11
to ggplot2
Sorry, I'm confused, which plot did you want help with? The second seems to be working for you, no? So you want to adapt the syntax you used for the second (which is 'pre-summarized' data) for the first (which is not summarized/counted)?

It would help a lot if you provided a minimal dataset that we could play with (not just the metadata of that set), ideally using dput, see https://gist.github.com/270442

Having said that I think you will figure it out from

http://had.co.nz/ggplot2/geom_bar.html

the trick, if I understand you right, is realizing that with geom_bar you don't need a 'y' variable in the aes (ie ggplot2 generates the count).

ggplot(data, aes(Date)) + geom_bar() + scale_x_date(format = "%b")

With that in place probably the more interesting question is: how do I organize the count by month/week, ie set the binwidth appropriately. That I'm not as sure about. I had thought that it would be expressed in seconds, but messing around with

data <- data.frame( date = seq(Sys.Date(), len=100, by="1 day")[sample(100, 50)], price = runif(50) )
ggplot(data, aes(date)) + geom_bar(binwidth=30) + scale_x_date(format = "%b")

suggests that binwidth is in days there ... perhaps that's because this is scale_x_date, not scale_x_datetime? How does scale_x_date (e.g. major=months and binwidth in geom_bar interact? To get counts by month (an irregular time period) would one be best to normalize the date of each event to the middle of the month?

--J

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

Reply all
Reply to author
Forward
0 new messages