binwidth with freqpoly/histogram and Date or DateTime variable

1,768 views
Skip to first unread message

hywelm

unread,
Nov 8, 2012, 5:49:54 AM11/8/12
to ggp...@googlegroups.com
I can't find anything explaining binwidth beyond that it defaults to range/30. In the case of a Date or DateTime variable I don't know how to interpret that. In the case of a Date variable I guess binwidth would refer to days. Is that right? In the case of a DateTime variable would it be seconds?

Dennis Murphy

unread,
Nov 8, 2012, 4:39:24 PM11/8/12
to hywelm, ggp...@googlegroups.com
Hi:

I'd suggest experimenting with examples that are different enough to expose the default behavior in ggplot2. Here's a small toy example to get you started, using geom_bar() on data that are meant to be interpreted as pre-summarized counts per month:

library('ggplot2')
DF <- data.frame(date = seq(as.Date('2012-01-01'), as.Date('2012-11-01'),
                                by = "months"),
                          val = rpois(11, 50))
ggplot(DF, aes(x = date, y = val)) +
     geom_bar(stat = "identity", fill = "orange")

You'd use scale_x_date() to modify the format of the dates in the plot if necessary.

The default binwidth in geom_histogram() is range/30 but you can always define your own in a ggplot() or qplot() call by providing a value to the binwidth =  argument; in fact, it's encouraged. There are numerous examples in the on-line help page for geom_histogram(): http://docs.ggplot2.org/current/geom_histogram.html. The binwidth argument also exists in geom_bar(), which I'm assuming you're using with date as the x-variable; see the help page for geom_bar() at the same site for examples.

Dennis



On Thu, Nov 8, 2012 at 2:49 AM, hywelm <hywelm....@googlemail.com> wrote:
I can't find anything explaining binwidth beyond that it defaults to range/30. In the case of a Date or DateTime variable I don't know how to interpret that. In the case of a Date variable I guess binwidth would refer to days. Is that right? In the case of a DateTime variable would it be seconds?

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

hywelm

unread,
Nov 9, 2012, 6:46:59 AM11/9/12
to ggp...@googlegroups.com, hywelm
Dennis,

Thanks for looking at this. I did use geom_bar in fact, not geom_histogram. Using you example, I convinced myself a bit more than binwidth uses days in the case of a Date variable:

library('ggplot2')
DF <- data.frame(date = seq(as.Date('2012-01-01'), as.Date('2012-11-01'),
                                by = "months"),
                          val = rpois(11, 50))
     
library(scales)
ggplot(DF, aes(x = date)) + 
 geom_bar(aes(weight = val),stat = "bin", binwidth = 60, fill = "orange") +scale_x_date(breaks = date_breaks("2 month"),labels = date_format("%b"))

This seems to be putting Jan in bin 1, Feb-Mar in bin 2, etc, though this is not clear from the x-scale.

Maybe I'd be better using something like the following to show the trend, rather than binning as a rough form of smoothing.
ggplot(DF, aes(x = date,y = val)) + 
 geom_point()  + geom_path() + geom_smooth()

It's not what I wanted though and I'd really like to understand binwidth better. Maybe I'd have to look at the source code. None of the examples seem to use a Date or DateTime variable.

Hywel
@hywelm
Reply all
Reply to author
Forward
0 new messages