Time Series graph with vertical lines marking events?

2,474 views
Skip to first unread message

James Howison

unread,
Nov 26, 2008, 12:17:41 PM11/26/08
to ggp...@googlegroups.com
I'm trying to produce a graph which is a time-series graph, with
vertical lines marking events which occur on a particular day.

Sample data frame (csv):

date,download_count,release
2005-11-04,430
2005-11-05,393
2005-11-06,438
2005-11-07,458,
2005-11-08,545,
2005-11-09,520,
2005-11-10,507,
2005-11-11,449,
2005-11-12,417,
2005-11-13,3117,1
2005-11-14,3521,
2005-11-15,2062,
2005-11-16,1750,
2005-11-17,1606,
2005-11-18,1348,
2005-11-19,1180,
2005-11-20,1018,
2005-11-21,1280,
2005-11-22,1245,
2005-11-23,900
2005-11-24,756

I'm having two difficulties: 1) getting a line graph to work, and 2)
getting a vertical line to show up.

This is what I've tried so far (where the file has the data above)

downloads <- read.csv("/Users/james/Desktop/test-data.csv")

p <- ggplot(data=downloads, aes(x=date,y=download_count))

Then

p + geom_line()

produces the error,

"Error in dim(data) <- dim : attempt to set an attribute on NULL"

but

p + geom_point() + scale_x_date()

produces the points in the right place and the dates nicely
formatted. So the first question is, why does geom_line not work,
I've compared my dataframe to the example ones and I don't see any
differences.

Then I'd like to use the "1" in the releases column to add a vertical
line on the appropriate date. I tried:

p + geom_point() + scale_x_date() + geom_vline(aes(x=releases))

which produced a vertical line near the Y axis (presumably at point
number 1 on the x axis). I changed the "1" to be "2005-11-13" (ie the
date of the release) but end up with the same result. Perhaps since
that's the only date in that column it is being treated as day "1"?

So the second question is what's the best way to get a vertical line
at the right position? I could convert all the dates and releases to
day_count (and have day_count values in the releases column only on
the day of a release), but that's not very convenient and I'd like to
use the Date formatting features.

Thanks,
James

ps. ggplot 0.7, R 2.7.1, Mac OS X.

hadley wickham

unread,
Nov 26, 2008, 12:37:02 PM11/26/08
to James Howison, ggp...@googlegroups.com
Hi James,

You might want to have a look at the presidents example in section .9
of the toolbox chapter - http://had.co.nz/ggplot2/book/toolbox.pdf -
it's doing exactly what you want. There are a few tricks:

* upgrade to ggplot2 0.8 ;)

* make sure date is a date variable - downloads$date <- as.Date(downloads$date)

* use a subset of the full data in your geom_vline call:
geom_vline(aes(intercept=date), data = subset(data, release == 1))

The reason that geom_line() didn't work in your example is because by
default R treats the date column as a factor, and the default
behaviour of ggplot is to draw one line for each group defined by the
combination of categorical variables used in the plot. This means
that you get one line per day, and obviously you need more than one
point to define a line, so you don't see anything.

Hadley
--
http://had.co.nz/

James Howison

unread,
Nov 27, 2008, 1:56:50 PM11/27/08
to ggp...@googlegroups.com
Thanks Hadley,

On 26 Nov 2008, at 12:37 PM, hadley wickham wrote:

>
> Hi James,
>
> You might want to have a look at the presidents example in section .9
> of the toolbox chapter - http://had.co.nz/ggplot2/book/toolbox.pdf -
> it's doing exactly what you want. There are a few tricks:
>
> * upgrade to ggplot2 0.8 ;)

Done (btw, seemed to have difficulty automatically pulling plyr?

> * make sure date is a date variable - downloads$date <-
> as.Date(downloads$date)

ah, right, yes, that helped :)

> * use a subset of the full data in your geom_vline call:
> geom_vline(aes(intercept=date), data = subset(data, release == 1))

This was basically spot on, except it needed as.numeric (as in the
helpful presidents example)

p + geom_line() + geom_vline(aes(intercept=as.numeric(date)), data =
subset(downloads, release == 1))

Given that date is a Date, not entirely sure that needing to cast it
to numeric is super-intuitive.

> The reason that geom_line() didn't work in your example is because by
> default R treats the date column as a factor, and the default
> behaviour of ggplot is to draw one line for each group defined by the
> combination of categorical variables used in the plot. This means
> that you get one line per day, and obviously you need more than one
> point to define a line, so you don't see anything.

Righto.

hadley wickham

unread,
Nov 27, 2008, 2:03:00 PM11/27/08
to James Howison, ggp...@googlegroups.com
> This was basically spot on, except it needed as.numeric (as in the
> helpful presidents example)
>
> p + geom_line() + geom_vline(aes(intercept=as.numeric(date)), data =
> subset(downloads, release == 1))
>
> Given that date is a Date, not entirely sure that needing to cast it
> to numeric is super-intuitive.

It's not and that is a bug, but I haven't been able to figure out how
to fix it without major structural changes. I might change the
aesthetics from intercept to x and y for geom_vline and geom_hline
respectively. That way the correct axis transformations can be
applied.

Hadley

--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages