http://had.co.nz/ggplot2/stat_summary.html
Best,
Ista
> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
d <- data.frame(date = rep(1:7, each = 50),
score = sample(1:7, 350, replace = TRUE),
tree = rep(1:10, 35 ))
## stat_summary can actually take any function, so we define one to
return the values we want.
se <- function(x) {
return(c(ymin=(mean(x) - (sd(x)/sqrt(length(x)))), ymax=(mean(x) +
(sd(x)/sqrt(length(x))))))
}
g <- ggplot(d, aes(x = date, y = score))
g + stat_summary(fun.data = 'se', geom = 'errorbar', width = 0.2, size = 1) +
stat_summary(fun.y = mean, geom = 'point', size = 3, color = 'red') +
stat_summary(fun.y = mean, geom = 'line', size = 1, color = 'red') +
facet_wrap(~ tree)
# Or we can do the summarizing separately.
ds <- ddply(d, .(tree, date ), summarise, mean = mean(score),
se = sd(score)/sqrt(length(score)) )
g + geom_point(position = position_jitter(width = 0.2, height = 0.2),
alpha = 0.2) +
geom_point(data = ds, aes(y = mean), size = 3, colour = 'red') +
geom_line(data = ds, aes(y = mean), size = 1, colour = 'red') +
geom_errorbar(data = ds, aes(y = mean, ymin = mean - se, ymax = mean + se),
width = 0.3, size = 1) +
facet_wrap(~ tree )
Best,
Ista
--
Thanks again Ista!!
I forgot to mention that I have NAs in my data as well and using different approaches I get different standard errors:
So I used your example dataset and included some NAs
require(plyr)
d <- data.frame(date = rep(1:7, each = 50),
score = sample(1:7, 350, replace = TRUE),
tree = rep(1:10, 35 ))
d$score <- ifelse(d$score==6,NA,d$score)
#First approach removing the NAs within the ddply function:
d1 <- ddply(d, .(tree, date ), summarise, mean = mean(score,na.rm=T),se = sd(score,na.rm=T)/sqrt(length(score)))
#Second approach removing NAs before using ddply function:
d0 <- subset(d,!is.na(d$score))
d2 <- ddply(d0, .(tree, date ), summarise, mean = mean(score),se = sd(score)/sqrt(length(score)))
Now, looking at these to datasets there some standard errors that are not the same.
I assume that when removing the NAs within ddply the sqrt(length(score)) still includes the rows which have NAs.
I think in this special case it would be better to go for the second approach, or what would you suggest?
Also then (assuming all the NAs), I cannot really do the calculation within ggplot2 using the stat_summary() function?
Thanks for your help!
Stefan
Thanks again Ista!!
I forgot to mention that I have NAs in my data as well and using different approaches I get different standard errors:
So I used your example dataset and included some NAs
require(plyr)d$score <- ifelse(d$score==6,NA,d$score)
d <- data.frame(date = rep(1:7, each = 50),
score = sample(1:7, 350, replace = TRUE),
tree = rep(1:10, 35 ))
#First approach removing the NAs within the ddply function:
d1 <- ddply(d, .(tree, date ), summarise, mean = mean(score,na.rm=T),se = sd(score,na.rm=T)/sqrt(length(score)))
#Second approach removing NAs before using ddply function:
d0 <- subset(d,!is.na(d$score))
d2 <- ddply(d0, .(tree, date ), summarise, mean = mean(score),se = sd(score)/sqrt(length(score)))
Now, looking at these to datasets there some standard errors that are not the same.
I assume that when removing the NAs within ddply the sqrt(length(score)) still includes the rows which have NAs.
I think in this special case it would be better to go for the second approach, or what would you suggest?
Also then (assuming all the NAs), I cannot really do the calculation within ggplot2 using the stat_summary() function?
Thanks Dennis!
Now it's all clear :))
With this '7 lines in one plot' approach I was just trying to increase the information to ink ratio. I was thinking of providing 4 average standard errors (one for the 7 trees per day) in the figure caption and no standard errors in the actual figure. Not sure if this is appropriate or not... I was just scoping out the options to present the data.
Thanks again!
Stefan