geom_histogram, number of count

1,956 views
Skip to first unread message

Sav Tan

unread,
Oct 17, 2011, 4:15:11 AM10/17/11
to ggplot2
Hello,

I'm doing like this.

ggplot(TotCalc, aes(x=CO3,y=100*(..count../sum(..count..)))) +
ylab(ylabel) +
geom_histogram(colour = "darkblue", fill = "white", binwidth=20)

Is it possible to know a number of count for each bin?

I dont understand how binwidth works, I have 3123 value of CO3 and
1697 unique value for CO3, when I plot ggplot I have 25 bars.

Help me to understand please

Thank you

Dennis Murphy

unread,
Oct 17, 2011, 5:27:30 AM10/17/11
to Sav Tan, ggplot2
Hi:

(1) Save the plot to an object.
(2) Use ggplot_build() to extract the counts. An example below shows how.
(3) There is a default binwidth, but it doesn't mean that there will be exactly
30 bars every time; the shape of the data has something to do with how
many bins are created. You can always establish your own binwidth as
an argument to geom_histogram().

Example:

d <- data.frame(x = rnorm(3000))
g <- ggplot(d, aes(x, y = ..count../sum(..count..))) + geom_histogram()
> str(ggplot_build(g))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
List of 6
$ data :List of 1
..$ 1:List of 1
.. ..$ :'data.frame': 33 obs. of 11 variables:
.. .. ..$ y : num [1:33] 0 0.000667 0.001 0.001667 0.005 ...
.. .. ..$ ymin : num [1:33] 0 0 0 0 0 0 0 0 0 0 ...
.. .. ..$ ymax : num [1:33] 0 0.000667 0.001 0.001667 0.005 ...
.. .. ..$ x : num [1:33] -3.46 -3.24 -3.02 -2.79 -2.57 ...
.. .. ..$ xmin : num [1:33] -3.57 -3.35 -3.13 -2.9 -2.68 ...
.. .. ..$ xmax : num [1:33] -3.35 -3.13 -2.9 -2.68 -2.46 ...
.. .. ..$ count : num [1:33] 0 2 3 5 15 19 30 43 61 103 ...
.. .. ..$ ndensity: num [1:33] 0 0.00707 0.0106 0.01767 0.053 ...
.. .. ..$ ncount : num [1:33] 0 0.00707 0.0106 0.01767 0.053 ...
.. .. ..$ density : num [1:33] 0 0.00298 0.00448 0.00746 0.02239 ...
.. .. ..$ group : num [1:33] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "dim")= int [1:2] 1 1
<snip>

# The key element in this case is the count component, but it's nested
# four layers deep. To access it, use

> ggplot_build(g)$data[[1]][[1]]$count
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
[1] 0 2 3 5 15 19 30 43 61 103 130 164 210 197 243 283 236 252 232
[20] 197 188 122 100 61 33 31 18 7 6 5 2 2 0

# There are 33 bins in this histogram. Now specify the binwidth:

h <- ggplot(d, aes(x, y = ..count../sum(..count..))) +
geom_histogram(color = 'orange', binwidth = 0.5)
h
ggplot_build(h)$data[[1]][[1]]$count
[1] 0 3 17 56 130 286 438 578 549 452 300 115 55 15 6 0

Specifying binwidth = 0.5 in this example reduces the histogram to 16 bins.
Different binwidths will obviously change the number of bins and the shape of
the histogram.

BTW, if you get an unevenly rendered histogram, the issue is known to
the developers. Also, the ggplot_build() function may disappear in
some future version, so caveat emptor.

HTH,
Dennis

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Brandon Hurr

unread,
Oct 17, 2011, 6:22:12 AM10/17/11
to Sav Tan, ggplot2
Hard to say for certain what you are after and even harder when we don't have your data or a sample dataset to play with. You could use bin() to figure it out yourself. Small example follows.

> x<-rnorm(3123, mean=100, sd=35)
> head(x)
[1]  31.22613 113.72626 197.61921 100.80194  36.18743  85.19606
> bin(x, binwidth=20)
   count   x width    ndensity      ncount      density
1      0 -30    20 0.000000000 0.000000000 0.000000e+00
2      6 -10    20 0.009009009 0.009009009 9.606148e-05
3     26  10    20 0.039039039 0.039039039 4.162664e-04
4     84  30    20 0.126126126 0.126126126 1.344861e-03
5    236  50    20 0.354354354 0.354354354 3.778418e-03
6    519  70    20 0.779279279 0.779279279 8.309318e-03
7    666  90    20 1.000000000 1.000000000 1.066282e-02
8    666 110    20 1.000000000 1.000000000 1.066282e-02
9    523 130    20 0.785285285 0.785285285 8.373359e-03
10   259 150    20 0.388888889 0.388888889 4.146654e-03
11    98 170    20 0.147147147 0.147147147 1.569004e-03
12    30 190    20 0.045045045 0.045045045 4.803074e-04
13    10 210    20 0.015015015 0.015015015 1.601025e-04
14     0 230    20 0.000000000 0.000000000 0.000000e+00

Sav Tan

unread,
Oct 17, 2011, 8:15:38 AM10/17/11
to ggplot2
Thank you for your answer

Sav Tan

unread,
Oct 17, 2011, 8:18:05 AM10/17/11
to ggplot2
It is a good idea, thank you

On Oct 17, 12:22 pm, Brandon Hurr <brandon.h...@gmail.com> wrote:
> Hard to say for certain what you are after and even harder when we don't
> have your data or a sample dataset to play with. You could use bin() to
> figure it out yourself. Small example follows.
>
> *> x<-rnorm(3123, mean=100, sd=35)*
> *> head(x)*
> *[1]  31.22613 113.72626 197.61921 100.80194  36.18743  85.19606*
> *> bin(x, binwidth=20)*
> *   count   x width    ndensity      ncount      density*
> *1      0 -30    20 0.000000000 0.000000000 0.000000e+00*
> *2      6 -10    20 0.009009009 0.009009009 9.606148e-05*
> *3     26  10    20 0.039039039 0.039039039 4.162664e-04*
> *4     84  30    20 0.126126126 0.126126126 1.344861e-03*
> *5    236  50    20 0.354354354 0.354354354 3.778418e-03*
> *6    519  70    20 0.779279279 0.779279279 8.309318e-03*
> *7    666  90    20 1.000000000 1.000000000 1.066282e-02*
> *8    666 110    20 1.000000000 1.000000000 1.066282e-02*
> *9    523 130    20 0.785285285 0.785285285 8.373359e-03*
> *10   259 150    20 0.388888889 0.388888889 4.146654e-03*
> *11    98 170    20 0.147147147 0.147147147 1.569004e-03*
> *12    30 190    20 0.045045045 0.045045045 4.803074e-04*
> *13    10 210    20 0.015015015 0.015015015 1.601025e-04*
> *14     0 230    20 0.000000000 0.000000000 0.000000e+00*

Sav Tan

unread,
Oct 20, 2011, 11:52:43 AM10/20/11
to ggplot2
Hi,

I have another question,
to have a number of count I can do like you said: ggplot_build(g)
$data[[1]][[1]]$count, and how can I plot this labels into my graph?
0   2   3   5  15  19  30  43  61 103 130 164 210 197 243 283 236 252
232 197 188 122 100  61  33  31  18   7   6   5   2   2   0
But first I'm calculate frequency 100*(..count../sum(..count..)) and
above each bar I want to plot this frequency if it is not 0.

Is it possible to do this?

On Oct 17, 11:27 am, Dennis Murphy <djmu...@gmail.com> wrote:

Brandon Hurr

unread,
Apr 26, 2012, 12:26:06 PM4/26/12
to Geoffrey Stoel, ggplot2
Funny enough... I have no idea. But I remember this working so I must have had something weird loaded. When I load all the ggplot2 stuff it doesn't work. ??bin doesn't find it either so either I'm remembering wrong or I had something strange loaded. 

Anyone else know where I got bin() from? 
I'm at a loss. 

Brandon

On Thu, Apr 26, 2012 at 17:05, Geoffrey Stoel <g.s...@gmail.com> wrote:
Hello Brandon,

sorry for the probably dumb question, but to what library belongs the bin function? 
I can not execute it in my R version...

tnx,

G.

Winston Chang

unread,
Apr 26, 2012, 12:49:51 PM4/26/12
to Brandon Hurr, Geoffrey Stoel, ggplot2
bin() is an internal function in ggplot2. Now you have to run it like this:
  ggplot2:::bin()

Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility

 
To post: email ggp...@googlegroups.com

Geoffrey Stoel

unread,
Apr 26, 2012, 12:58:35 PM4/26/12
to ggp...@googlegroups.com, Brandon Hurr, Geoffrey Stoel
tnx...that helps :)
Reply all
Reply to author
Forward
0 new messages