How to get proper means for boxplot with fill aesthetics?

1,120 views
Skip to first unread message

Mikhail Titov

unread,
Apr 9, 2012, 7:56:28 PM4/9/12
to ggp...@googlegroups.com
Hello!

I am a lattice user but from what I understand there is an attitude against group-wise boxplots. So I decided to switch to ggplot2 for that. I saw an example that adds means to boxplot. However I can't figure out how to properly use it when there are groups, i.e. when I use "fill" aesthetics (do I use proper terms?). So far I have the following as a quick example:

library(ggplot2)
df <- data.frame(
      x = factor(month.abb[rep(1:12, each=30)], month.abb),
      y = runif(12*30, max=rep(1:12, each=30))*rep(3:1,each=12*10),
      z = rep(factor(c("a","b","c")), 12*10)
      )
ggplot(df, aes(x=x,y=y, fill=z)) +
    geom_boxplot() +
    stat_summary(fun.y=mean, geom="point", aes(x=x,group=z), shape=5, size=2)
#    stat_summary(fun.y=mean, geom="point", aes(x=x,fill=z), shape=5, size=2)

which apparently doesn't work properly. How can I make means appear properly along the x axis?

Also is there an easy way to rename factors in legend only like aaaaa, bbb, cccccc instead of a,b,c ? The reason for this is that I use some ids internally as it is convenient for me but I'd like use full descriptions in plots.

Thank you,
Mikhail
out.png

Jean-Olivier Irisson

unread,
Apr 10, 2012, 2:01:43 AM4/10/12
to ggplot2
Forgot to reply to the list.

On 2012-Apr-10, at 01:56 , Mikhail Titov wrote:
>
> I am a lattice user but from what I understand there is an attitude against group-wise boxplots. So I decided to switch to ggplot2 for that. I saw an example that adds means to boxplot. However I can't figure out how to properly use it when there are groups, i.e. when I use "fill" aesthetics (do I use proper terms?). So far I have the following as a quick example:
>
> library(ggplot2)
> df <- data.frame(
> x = factor(month.abb[rep(1:12, each=30)], month.abb),
> y = runif(12*30, max=rep(1:12, each=30))*rep(3:1,each=12*10),
> z = rep(factor(c("a","b","c")), 12*10)
> )
> ggplot(df, aes(x=x,y=y, fill=z)) +
> geom_boxplot() +
> stat_summary(fun.y=mean, geom="point", aes(x=x,group=z), shape=5, size=2)
> # stat_summary(fun.y=mean, geom="point", aes(x=x,fill=z), shape=5, size=2)
>
> which apparently doesn't work properly. How can I make means appear properly along the x axis?

The boxplot are automatically "dodged" in order to not overlap. You need to also dodge the points for the means. By default points are dodged along the y axis though. If you do:

ggplot(df, aes(x=x,y=y, fill=z)) +
geom_boxplot() +

stat_summary(fun.y=mean, geom="point", aes(x=x,group=z), shape=5, size=2, position=position_dodge(width=0.75, height=0))

it issues a warning but works. Someone more knowledgeable than me of ggplot's innards might comment why.

> Also is there an easy way to rename factors in legend only like aaaaa, bbb, cccccc instead of a,b,c ? The reason for this is that I use some ids internally as it is convenient for me but I'd like use full descriptions in plots.

Just rename the factors in the data and then plot

df$z = factor(df$z, levels=c("a", "b", "c"), labels=c("aaa", "bbb", "ccc"))

or change the scale in ggplot only

last_plot() + scale_fill_discrete(breaks=levels(df$z), labels=c("aaa", "bbb", "ccc"))

Jean-Olivier Irisson
---
Observatoire Océanologique
Station Zoologique, B.P. 28, Chemin du Lazaret
06230 Villefranche-sur-Mer
Tel: +33 04 93 76 38 04
Mob: +33 06 21 05 19 90
http://jo.irisson.com/

Mikhail Titov

unread,
Apr 10, 2012, 1:33:22 PM4/10/12
to ggp...@googlegroups.com
On Tuesday, April 10, 2012 1:01:43 AM UTC-5, Jean-Olivier Irisson wrote:

The boxplot are automatically "dodged" in order to not overlap. You need to also dodge the points for the means. By default points are dodged along the y axis though. If you do:

ggplot(df, aes(x=x,y=y, fill=z)) +
   geom_boxplot() +
   stat_summary(fun.y=mean, geom="point", aes(x=x,group=z), shape=5, size=2, position=position_dodge(width=0.75, height=0))

it issues a warning but works. Someone more knowledgeable than me of ggplot's innards might comment why.


I see. So "dodging" was a keyword. Indeed warning puzzles me.
 

> Also is there an easy way to rename factors in legend only like aaaaa, bbb, cccccc instead of a,b,c ? The reason for this is that I use some ids internally as it is convenient for me but I'd like use full descriptions in plots.

Just rename the factors in the data and then plot

df$z = factor(df$z, levels=c("a", "b", "c"), labels=c("aaa", "bbb", "ccc"))

or change the scale in ggplot only

last_plot() + scale_fill_discrete(breaks=levels(df$z), labels=c("aaa", "bbb", "ccc"))

I like the latter as I don't want to create another data.frame or change levels.

Thank you,
M.
Reply all
Reply to author
Forward
0 new messages