How to add weighted means to a boxplot using ggplot2

1,097 views
Skip to first unread message

Greg Blevins

unread,
Apr 24, 2013, 3:29:15 PM4/24/13
to ggplot2
Greetings,

After considerable time searching and fiddling, I am reaching out for help in my attempt to display weighted means on a boxplot.

I provide a toy example below.

#data

value <- c(5, 7, 8, 6, 7, 9, 10, 6, 7, 10)
category <- c("one", "one", "one", "two", "two", "two","three", "three", "three","three")
weight <- c(1, 1.2, 2, 3, 2.2, 2.5, 1.8, 1.9, 2.2, 1.5)
df <- data.frame(value, category, weight)

#unweighted means by category
ddply(df, .(category), summarize, mean=round(mean(value, na.rm=TRUE), 2))

  category mean
1      one 6.67
2    three 8.25
3      two 7.33

#weighted means by category
ddply(df, .(category), summarize, wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2))

  category wmean
1      one  7.00
2    three  8.08
3      two  7.26

#boxplot with unweighted means added to boxplot (which works fine)
ggplot(df, aes(x = category, y = value, weight = weight)) + geom_boxplot(width=0.6,  colour = I("#3366FF")) + stat_summary( fun.y ="mean", geom ="point", shape = 23, size = 3, fill ="white") 

My question is, how do I set up the syntax to add weighted means rather than unweighted means to the boxplot?

Guidance would be much appreciated.

Thank you,

Greg

--
Gregory L. Blevins
Blevins-O'Meara Insights
Office 952 944-5743
Cell 612 251 0232
greg...@gmail.com

Dennis Murphy

unread,
Apr 25, 2013, 5:15:27 PM4/25/13
to Greg Blevins, ggplot2
You need to mention where wtd.mean() comes from (a search with package
sos came up empty, but it did mention the Hmisc package, which luckily
worked). After loading Hmisc, there is a potential conflict between
Hmisc::summarize and plyr::summarize: whichever one is higher in the
search path will be used in ddply(). This is why it's safer in plyr to
use the Commonwealth spelling summarise as a matter of course.

That said, your stat_summary() code uses fun.y = "mean", so it will
compute the unweighted mean. The path of least resistance is to take
your wm data frame and use it as the input data for a geom_point layer
as follows, given df:

library(Hmisc)
library(ggplot2)
library(plyr)

wm <- ddply(df, "category", summarise,
wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2)

ggplot(df, aes(x = category, y = value)) +
geom_boxplot(width=0.6, colour = "#3366FF") +
geom_point(data = wm, aes(x = category, y = wmean),
shape = 23, size = 3, fill ="white")

# Another way is to add the weight variable as an aesthetic in stat_summary():

ggplot(df, aes(x = category, y = value)) +
geom_boxplot(width=0.6, colour = "#3366FF") +
stat_summary( fun.y ="wtd.mean", aes(weight = weight), geom ="point",
shape = 23, size = 3, fill ="white" )

# Using your code, you would have to have done this to get it to work:

ggplot(df, aes(x = category, y = value, weight = weight)) +
geom_boxplot(aes(weight = NULL), width=0.6, colour = "#3366FF") +
stat_summary( fun.y ="wtd.mean", geom ="point",
shape = 23, size = 3, fill ="white" )

Assuming you want the ordering of your boxplots to be one, two, three,
redefine the levels of df$category before running any of the above
code.

Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages