In this email I show one way. But it requires the user to hard-code two things:
(1) a formula for the percent or proportion
(2) a non-default name for the y axis.
Is there a simpler or more elegant (and less error-prone) approach?
set.seed(1)
NN<-12
categories<- c("dog", "flea", "human", "rat", NA)
DAAT<-data.frame(
species=factor(sample(categories , prob=c( 1.5 ,3, 1,8,1), size=NN, replace=T), levels=categories)
)
print(DAAT)
# Order the bars by frequency:
DAAT$species<- factor( DAAT$species, levels=rev(names(sort(table( DAAT$species)))))
print(summary(DAAT))
require(ggplot2)
# A barplot of frequencies or counts is straightforward:
print(
ggplot(data=DAAT, aes(x=species)) + geom_bar()
)
# Here is a hard-coded solution to put proportion on the y axis.
print(
ggplot(data=DAAT, aes(x=species, weight=1/nrow(DAAT))) + geom_bar()
# Without the following, the y axis will be incorrectly labeled "count".
+ scale_y_continuous(name="Proportion ")
)
Thanks for any comments
Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2
Your numbers are different because I divided by the number of rows, whereas with your syntax, ggplot2 does not count the NA row(s) in the sample size. The following produces a plot identical to yours:
set.seed(1)
NN<-12
categories<- c("dog", "flea", "human", "rat", NA)
DAAT<-data.frame(
species=factor(sample(categories
, prob=c( 1.5 ,3, 1,8,1)
, size=NN, replace=T)
, levels=categories)
)
# Order the bars by frequency:
DAAT$species<- factor( DAAT$species, levels=rev(names(sort(table( DAAT$species)))))
print(
ggplot(data=DAAT, aes(x=species, weight=1/sum(!is.na(DAAT$species))))
+ geom_bar()
+ scale_y_continuous(name="Proportion ")
)
Your syntax translated into the intermediate syntax (halfway between the black box of qplot and the explicit syntax of layer), with bar (as you sugggested) instead of histogram, is:
ggplot(data=DAAT, aes(x=species, y=..density.., group=1)) + geom_bar() + scale_y_continuous(name="Proportion")
and the full layer syntax is:
ggplot() + layer(data=DAAT, mapping=aes(x=species, y=..density.., group=1), geom="bar") + scale_y_continuous(name="Proportion")
The crux is the mysterious "group" term. Without that term, all bars are of equal height. The help page online for geom_bar says:
“Layers are divided into groups by the group aesthetic. By default this is set to the interaction of all categorical variables present in the plot.”
Has anyone found an explanation of what it means for layers to be divided into groups in this context?
Jake
On Tue, 3 May 2011, Scott Chamberlain wrote:
> --4dbff11a_2eb141f2_e8
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: 8bit
> Content-Disposition: inline
> --4dbff11a_2eb141f2_e8
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: inline
>
> <div>
> <div>
> <span>Numbers seem slightly different from your final sol=
> ution at the bottom, but this is close:</span></div><div><span><br></span=
> ></div><div><span>qplot(x=3Dspecies, y=3D..density.., data=3DDAAT, geom=3D=
> =22histogram=22, group=3D1) + ylab(=22Proportion=22)</span></div><div><sp=
> an><br></span></div><div><span>replacing histogram with bar would give th=
> e same result. </span></div><div><span><br></span></div><div><span><=
> br></span></div><div><span><br></span></div><div><span>Scott<br>
> </span>
> <span></span>
> =20
> <=21-- <p style=3D=22color: =23a0a0a0;=22>On Monday, May =
> 2, 2011 at 8:44 PM, Jacob Wegelin wrote:</p> -->
> <p style=3D=22color: =23a0a0a0;=22>On Monday, May 2, 2011=
> at 8:44 PM, Jacob Wegelin wrote:</p>
> <blockquote type=3D=22cite=22 style=3D=22border-left-styl=
> e:solid;border-width:1px;margin-left:0px;padding-left:10px;=22>
> <span><div><div><br>Suppose we have a factor (a nomin=
> al variable) and we want to plot its distribution with a barplot. The cod=
> e for plotting the counts (or frequencies) is straightforward in ggplot2.=
> But how does one relabel the quantitative axis (the =22continuous scale=22=
> ) for proportions or percents=3F That is, how does one make a barplot of =
> proportions rather than counts=3F<br><br>In this email I show one way. Bu=
> t it requires the user to hard-code two things:<br><br>(1) a formula for =
> the percent or proportion<br><br>(2) a non-default name for the y axis.<b=
> r><br>Is there a simpler or more elegant (and less error-prone) approach=3F=
> <br><br>set.seed(1)<br>NN<-12<br>categories<- c(=22dog=22, =22flea=22=
> , =22human=22, =22rat=22, NA)<br>DAAT<-data.frame(<br> species=3Dfact=
> or(sample(categories , prob=3Dc( 1.5 ,3, 1,8,1), size=3DNN, replace=3DT),=
> levels=3Dcategories)<br>)<br>print(DAAT)<br>=23 Order the bars by freque=
> ncy:<br>DAAT=24species<- factor( DAAT=24species, levels=3Drev(names(so=
> rt(table( DAAT=24species)))))<br>print(summary(DAAT))<br>require(ggplot2)=
> <br>=23 A barplot of frequencies or counts is straightforward:<br>print(<=
> br> ggplot(data=3DDAAT, aes(x=3Dspecies)) + geom=5Fbar()<br>)<br>=23 Her=
> e is a hard-coded solution to put proportion on the y axis.<br>print(<br>=
> ggplot(data=3DDAAT, aes(x=3Dspecies, weight=3D1/nrow(DAAT))) + geom=5Fb=
> ar()<br>=23 Without the following, the y axis will be incorrectly labeled=
> =22count=22.<br> + scale=5Fy=5Fcontinuous(name=3D=22Proportion =22)<br=
> >)<br><br>Thanks for any comments<br><br>Jacob A. Wegelin<br>Assistant Pr=
> ofessor<br>Department of Biostatistics<br>Virginia Commonwealth Universit=
> y<br>730 East Broad Street Room 3006<br>P. O. Box 980032<br>Richmond VA 2=
> 3298-0032<br><br>-- <br>You received this message because you are subscri=
> bed to the ggplot2 mailing list.<br>Please provide a reproducible example=
> : <a href=3D=22http://gist.github.com/270442=22>http://gist.github.com/27=
> 0442</a><br><br>To post: email <a href=3D=22mailto:ggplot2=40googlegroups=
> .com=22>ggplot2=40googlegroups.com</a><br>To unsubscribe: email <a href=3D=
> =22mailto:ggplot2+unsubscribe=40googlegroups.com=22>ggplot2+unsubscribe=40=
> googlegroups.com</a><br>More options: <a href=3D=22http://groups.google.c=
> om/group/ggplot2=22>http://groups.google.com/group/ggplot2</a><br></div><=
> /div></span>
> =20
> =20
> =20
> =20
> </blockquote>
> =20
> <div>
> <br>
> </div>
> </div>
> </div>
> --4dbff11a_2eb141f2_e8--
>
>
Each group receives it's own geom - one bar per group, one line per group, etc.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+unsubscribe@googlegroups.com
More options: http://groups.google.com/group/ggplot2
---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2
---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.