histogram and density ..count..

3,291 views
Skip to first unread message

baptiste auguie

unread,
Oct 25, 2010, 4:25:04 AM10/25/10
to ggplot2
Dear list,

I'd like to superimpose a histogram and a density plot, both using
..count.. as the y variable,

library(ggplot2)

d <- data.frame(x=rnorm(1e4, 10, 1))

foo <- function(n)
ggplot(d, aes(x=x)) +
geom_histogram(aes(y = ..count..), binwidth=diff(range(d$x))/n,
fill="grey50", colour="grey40") +
geom_line(aes(y = ..count..), stat="density",
size = 1, colour="red", linetype=2) +
scale_x_continuous("x") +
scale_y_continuous("Count") +
theme_bw()

foo(10) # density and histogram roughly of same height
foo(40) # histogram much below

My very naive understanding of histograms and density plots using
..count.. would have been that no matter the binwidth, the area under
the curve and the histogram should be similar. This isn't the case
here; should I adjust some parameters, or is this a bad idea
altogether?

Best regards,

baptiste

Hadley Wickham

unread,
Oct 25, 2010, 7:48:14 AM10/25/10
to baptiste auguie, ggplot2
> My very naive understanding of histograms and density plots using
> ..count.. would have been that no matter the binwidth, the area under
> the curve and the histogram should be similar. This isn't the case
> here; should I adjust some parameters, or is this a bad idea
> altogether?

That's true when you plot density, not count.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Dennis Murphy

unread,
Oct 25, 2010, 8:05:16 AM10/25/10
to baptiste auguie, ggplot2
Hi Baptiste:

On Mon, Oct 25, 2010 at 1:25 AM, baptiste auguie <bapt...@googlemail.com> wrote:
Dear list,

I'd like to superimpose a histogram and a density plot, both using
..count.. as the y variable,

library(ggplot2)

d <- data.frame(x=rnorm(1e4, 10, 1))

foo <- function(n)
 ggplot(d, aes(x=x)) +
 geom_histogram(aes(y = ..count..), binwidth=diff(range(d$x))/n,
                fill="grey50", colour="grey40") +
 geom_line(aes(y = ..count..), stat="density",
           size = 1, colour="red", linetype=2) +
 scale_x_continuous("x") +
 scale_y_continuous("Count") +
 theme_bw()

foo(10) # density and histogram roughly of same height
foo(40) # histogram much below

If you plot a histogram on a density scale, it adjusts the heights so that the sum of the areas is 1. This is on the same scale as a density estimate, whose total area under the curve is 1. If you use the count scale, then multiplying the densities by the count should work in principle, but the problem is that the density doesn't adjust for the way that the histogram is binned, and that's what I think you're seeing. Finer partitions of the x-scale will reduce the counts in each bin, so the heights of the bars will get lower as the binwidths shrink. But AFAICT, there is no way to pass that information on to the density, so it should look the same no matter how the histograms are partitioned, since all you've done is to take the density estimate and blown it up by a factor of n.

Example:

x <- data.frame(x = rnorm(1000))
g <- ggplot(x, aes(x = x))

# Both histogram and density plot on density scale:

g + geom_histogram(aes(y = ..density..) + geom_density()
g + geom_histogram(aes(y = ..density..)) + geom_density()
g + geom_histogram(aes(y = ..density..), binwidth = 0.05) + geom_density()
g + geom_histogram(aes(y = ..density..), binwidth = 0.1) + geom_density()

# Both histogram and density plot on count scale:

g + geom_histogram(aes(y = ..count..)) + geom_line(aes(y = ..count..), stat="density",

            size = 1, colour="red", linetype=2)
g + geom_histogram(aes(y = ..count..), binwidth = 0.5) + geom_line(aes(y = ..count..), stat="density",

            size = 1, colour="red", linetype=2)
g + geom_histogram(aes(y = ..count..), binwidth = 1.0) + geom_line(aes(y = ..count..), stat="density",

            size = 1, colour="red", linetype=2)
g + geom_histogram(aes(y = ..count..), binwidth = 0.1) + geom_line(aes(y = ..count..), stat="density",

            size = 1, colour="red", linetype=2)
 
HTH,
Dennis


My very naive understanding of histograms and density plots using
..count.. would have been that no matter the binwidth, the area under
the curve and the histogram should be similar. This isn't the case
here; should I adjust some parameters, or is this a bad idea
altogether?

Best regards,

baptiste

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

baptiste auguie

unread,
Oct 25, 2010, 10:24:28 AM10/25/10
to Hadley Wickham, ggplot2
OK, thanks Hadley and Dennis.

It also works with binwidth=1, but eventually I just removed the
density lines, they didn't bring more information.

Thanks,

baptiste

Reply all
Reply to author
Forward
0 new messages