Using stat_summary to generate error bars for grouped data.

3,637 views
Skip to first unread message

dha 2001

unread,
Jun 30, 2010, 12:47:37 PM6/30/10
to ggplot2
Hi,

I'm trying to generate y-error bars on the fly using stat_summary and
seem to be missing something. What I would like is for the error bar
calculation to be applied to each group. The following example, using
the "mtcars" data is close but seems to be generating the error bars
based on the entire column and applying them to each point.

cars <- ggplot(mtcars, aes(x=gear, y=mpg, group=as.factor(cyl),
colour=as.factor(cyl), ymin=(mpg-sd(mpg)), ymax=(mpg+sd(mpg)) )) +
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line") +
geom_errorbar(width=0.1)
print(cars)

Thanks for any clues!

-David

Hadley Wickham

unread,
Jun 30, 2010, 12:51:59 PM6/30/10
to dha 2001, ggplot2
Hi David,

The summaries are fine:

ggplot(mtcars, aes(x=gear, y=mpg, group=as.factor(cyl),
colour=as.factor(cyl), ymin=(mpg-sd(mpg)), ymax=(mpg+sd(mpg)) )) +
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line")

it's the error bar that's being created for every point. What are you
trying to do?

Hadley

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

dha 2001

unread,
Jun 30, 2010, 1:03:31 PM6/30/10
to ggplot2
I'd like the error bar for each summarized point to represent the
standard deviation of the points that were summarized. e.g. I'd like
the error bar for the point at 3 gears, in the group if 8 cylinder
cars to be +/- sd(mpg of cars with 3 gears and 8 cylinders) and
plotted at mean(mpg of cars with 3 gears and 8 cylinders).

Joe

unread,
Jul 1, 2010, 12:07:49 PM7/1/10
to ggplot2
Unless there's an easier within ggplot approach, what I'd do is write
a function to calculate the upper and lower ends of the error bar.
Here, I've got two that calculate 95% confidence intervals.

errorUpper <- function(x){
x.mean <- mean(x)
x.sd <- sd(x)

SEM <- x.sd / (sqrt(length(x)))

return(x.mean + (SEM*1.96))
}

errorLower <- function(x){
x.mean <- mean(x)
x.sd <- sd(x)

SEM <- x.sd / (sqrt(length(x)))

return(x.mean - (SEM*1.96))
}

Now, the ggplot code would look like this:

ggplot(mtcars, aes(x=gear, y=mpg, group=as.factor(cyl),
colour=as.factor(cyl))) +
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line")+
stat_summary(fun.ymax = errorUpper, fun.ymin = errorLower, geom =
"errorbar")

A ribbon might look nicer

ggplot(mtcars, aes(x=gear, y=mpg, group=as.factor(cyl),
colour=as.factor(cyl))) +
stat_summary(fun.ymax = errorUpper, fun.ymin = errorLower, geom =
"ribbon", alpha = 0.6)+
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line")

-Joe

Luciano Selzer

unread,
Jul 1, 2010, 12:31:54 PM7/1/10
to Joe, ggplot2
Have a look at smean.cl.normal() in Hmisc, it does what you want.
Luciano


2010/7/1 Joe <jofr...@gmail.com>

dha 2001

unread,
Jul 1, 2010, 9:36:15 PM7/1/10
to ggplot2
Thanks for the help. That did the right thing.

bb

unread,
Jul 13, 2010, 3:33:42 PM7/13/10
to ggplot2
Can you please post a snippet demonstrating this? I'm not sure how to
get the info out of the vector returned by smean.cl.normal()

Thanks

Luciano Selzer

unread,
Jul 13, 2010, 4:09:41 PM7/13/10
to bb, ggplot2
Hi,here's a snippet. I hope it helps ;)

library(Hmisc); library(ggplot2)
data(diamonds)

#Mean plus 95% confidence interval (default)
price.by.cut <- ddply(diamonds, .(cut), function(df) smean.cl.normal(df$price))
qplot(cut, Mean, ymin = Lower, ymax = Upper, data = price.by.cut,
    geom = c("bar", "errorbar"))

#Mean +- 2 * SD
price.by.cut <- ddply(diamonds, .(cut), function(df) smean.sdl(df$price))
qplot(cut, Mean, ymin = Lower, ymax = Upper, data = price.by.cut,
    geom = c("bar", "errorbar"))

#Mean +- SD
price.by.cut <- ddply(diamonds, .(cut), function(df) smean.sdl(df$price, mult = 1))
qplot(cut, Mean, ymin = Lower, ymax = Upper, data = price.by.cut,
    geom = c("bar", "errorbar"))
    
#Mean +- SE
price.by.cut <- ddply(diamonds, .(cut), function(df) smean.sdl(df$price,
      mult = sqrt(length(df$price))^-1))
qplot(cut, Mean, ymin = Lower, ymax = Upper, data = price.by.cut,
    geom = c("bar", "errorbar"))

Luciano


2010/7/13 bb <bra...@peds.ucsf.edu>
smean.cl.normal()

Reply all
Reply to author
Forward
0 new messages