labeling boxplots

1,284 views
Skip to first unread message

jacktanner

unread,
Sep 30, 2010, 2:13:30 PM9/30/10
to ggplot2
I'd like to label each boxplot with a number of the elements in the
boxplot. I've figured that this will probably involve geom_text(), but
then I get stuck.

Brandon Hurr

unread,
Sep 30, 2010, 2:25:50 PM9/30/10
to jacktanner, ggplot2
Jack,

What's your boxplot and data look like?

B

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Ista Zahn

unread,
Sep 30, 2010, 4:16:42 PM9/30/10
to jacktanner, ggplot2
Hi,
This should get you started:

mtcars.size <- ddply(mtcars, .(cyl), summarize, mpg=mean(mpg), n=length(mpg))
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_text(aes(label=paste("n =", n)), data=mtcars.size)

-Ista

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

jacktanner

unread,
Sep 30, 2010, 11:38:11 PM9/30/10
to ggplot2
Ista, that's a great suggestion. That method works for the mtcars
data, but it doesn't work in a more complicated case:

mydf=data.frame(list(
f1=factor(rep(c("a","b"), each=15)),
f2=factor(rep(c("x","y","z"), each=5)),
y=as.vector(sapply(1:6, function(x) { rnorm(5, x, 1) }))
))

> mydf.size <- ddply(mydf, .(f1, f2), summarise, y=mean(y), n=length(y))
> p = ggplot(mydf) + geom_boxplot(aes(x=f1, y=y, fill=f2))
> p + geom_text(aes(label=paste("n =", n)), data=mydf.size)

Error in which(cat) : argument to 'which' is not logical


On Sep 30, 4:16 pm, Ista Zahn <iz...@psych.rochester.edu> wrote:
> Hi,
> This should get you started:
>
> mtcars.size <- ddply(mtcars, .(cyl), summarize, mpg=mean(mpg), n=length(mpg))
> p <- ggplot(mtcars, aes(factor(cyl), mpg))
> p + geom_boxplot() + geom_text(aes(label=paste("n =", n)), data=mtcars.size)
>
> -Ista
>

Dennis Murphy

unread,
Oct 1, 2010, 6:29:10 AM10/1/10
to jacktanner, ggplot2
Hi:

Here's an example from a post by Dr. Bryan Hanson earlier this summer; perhaps it can help you in your problem. It doesn't provide a direct solution, but elements of the code may give you some avenues to pursue.

HTH,
Dennis

# test data
res = c(rnorm(10, 5, 1.5), rnorm(10, 8, 2), rnorm(10, 14, 2.0),
        rnorm(10, 10, 1.5), rnorm(10, 15, 2), rnorm(10, 12,2.5))
fac1 <- c(rep("L", 20), rep("M", 20), rep("H", 20))
fac1 <- factor(fac1, levels = c("L", "M", "H"))
fac2 <- c(rep("WT", 10), rep("GM", 10))
fac2 <- as.factor(rep(fac2, 3))
td <- data.frame(r = res, f1 = fac1, f2 = fac2)
td[c(13,32:33, 55:58), 1] <- NA
td <- na.omit(td)


ggplot(td, aes(x = f1, y = r, colour = f2, group = f2)) +
   stat_summary(fun.y = "mean", geom = "line") +
   geom_point(position = position_jitter(width = 0.05)) +
   stat_bin(aes(x = f1, y = c(-0.5, 0.8)[as.numeric(f2)],
     label=paste("n =", ..count..)), geom = "text", legend = FALSE) +
   scale_colour_manual(name = "", values = c('red', 'blue'))

Ista Zahn

unread,
Oct 1, 2010, 9:19:14 AM10/1/10
to jacktanner, ggplot2
It just requires a little playing around. See below.

On Thu, Sep 30, 2010 at 11:38 PM, jacktanner <ih...@hotmail.com> wrote:
> Ista, that's a great suggestion. That method works for the mtcars
> data, but it doesn't work in a more complicated case:
>
> mydf=data.frame(list(
>  f1=factor(rep(c("a","b"), each=15)),
>  f2=factor(rep(c("x","y","z"), each=5)),
>  y=as.vector(sapply(1:6, function(x) { rnorm(5, x, 1) }))
> ))
>
>> mydf.size <- ddply(mydf, .(f1, f2), summarise, y=mean(y), n=length(y))
>> p = ggplot(mydf) + geom_boxplot(aes(x=f1, y=y, fill=f2))
>> p + geom_text(aes(label=paste("n =", n)), data=mydf.size)
>
> Error in which(cat) : argument to 'which' is not logical
>

One thing I notice about this is that geom_text might not know what
any of it's aesthetics are, because none are specified in the ggplot
call or in the geom_text call. So move the x, y, and fill
specification out of the geom_boxplot call and into the ggplot call:

p <- ggplot(mydf, aes(x=f1, y=y, fill=f2)) + geom_boxplot()
p + geom_text(aes(label=paste("n =", n), group = f2), data=mydf.size)

that almost worked except that the text labels were stacked instead of
dodged. So I tried

p + geom_text(aes(label=paste("n =", n), group = f2), data=mydf.size,
position=position_dodge(width=.75))

and that seems to work (the width=.75 was arrived at by trial and error).

Best,
Ista

jacktanner

unread,
Oct 1, 2010, 12:46:35 PM10/1/10
to ggplot2
I could kiss you right now.

In a perfect world, this kind of thing would be the default display
for all boxplots.
Reply all
Reply to author
Forward
0 new messages