Hi All,
I am sure I am missing something very obvious, but I've been going in circles trying to figure out why this does not work. I have some (discrete) data, that was measured at T1 and T2. I want to plot all the individual lines (a la a spaghetti plot), and then add an overall line of best fit. This sample data illustrates my problem and is fairly representative of my actual data.
# An example of my data
set.seed(1716)
sampdat <- data.frame(ids = rep(1:50, each = 2),
variable = factor(rep(c(0:1), 50)),
value = sample(0:4, 100, replace = TRUE))
# individual lines, works fine
ggplot(data = sampdat, aes(x = variable, y = value, group = ids)) +
geom_line()
# How I *thought* I should add the overall trend line
ggplot(data = sampdat, aes(x = variable, y = value, group = ids)) +
geom_line() + stat_smooth(aes(group = 1), method = "lm", size = 2, colour = "blue")
Thanks for your help,
Josh
FWIW:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i486-pc-linux-gnu (32-bit)
other attached packages:
[1] ggplot2_0.8.8 proto_0.3-8 reshape_0.8.3 plyr_1.2.1
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2
Hi Josh:
I think the problem is that group = 1 is meant to be used for a double purpose here. In the book example of a spaghetti plot (sec. 4.9.3), geom_smooth(aes(group = 1)) was applied to a continuous x variable and rendered an overall mean + SE envelope of the individual profiles. When the x-variable is a factor, one uses geom_line(aes(group = 1)) to plot a line across the levels of that factor. You're apparently trying to do both
in one call, and evidently ggplot() is balking :)
On Wed, Oct 27, 2010 at 5:39 PM, Joshua Wiley <jwiley...@gmail.com> wrote:Hi All,
I am sure I am missing something very obvious, but I've been going in circles trying to figure out why this does not work. I have some (discrete) data, that was measured at T1 and T2. I want to plot all the individual lines (a la a spaghetti plot), and then add an overall line of best fit. This sample data illustrates my problem and is fairly representative of my actual data.
# An example of my data
set.seed(1716)
sampdat <- data.frame(ids = rep(1:50, each = 2),
variable = factor(rep(c(0:1), 50)),
value = sample(0:4, 100, replace = TRUE))
# individual lines, works fine
ggplot(data = sampdat, aes(x = variable, y = value, group = ids)) +
geom_line()
# How I *thought* I should add the overall trend line
ggplot(data = sampdat, aes(x = variable, y = value, group = ids)) +
geom_line() + stat_smooth(aes(group = 1), method = "lm", size = 2, colour = "blue")dsumm <- ddply(sampdat, .(variable), summarise, m = mean(value), s = sd(value))
How about something like this?
# Get mean/SD summaries from each level of variable:
# Spaghetti plots
g <- ggplot(data = sampdat, aes(x = variable)) +
geom_line(aes(y = value, group = ids))
# Mean line with geom_ribbon to add SDs
g + geom_line(data = dsumm, aes(y = m, group = 1), color = 'blue', size = 1) +
geom_ribbon(data = dsumm, aes(ymin = m - s, ymax = m + s, group = 1),
colour = 'gray80', alpha = I(0.2))
Not perfect
, but perhaps enough to get you started.
HTH,
This may technically be a bug, but stat_smooth assumes that if you
want to fit a smooth line then you have at least 3 x values.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
Thanks Hadley; that seems reasonable. Perhaps there could be a
warning (there may be a very good reason you did not)?
http://github.com/hadley/ggplot2/blob/master/R/stat-smooth.r
lines 16 - 19
############################
if (length(unique(data$x)) <= 2) {
# Not enough data to perform fit
message("geom_smooth: There must be at least 3 unique x values.",
"No smooth will be added.")
return(data.frame())
}
############################
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
I'm a bit torn on this issue - there's a similar problem with
geom_line (which need at least two points). The problem is that it
often crops up for a single group in a single panel, and if you
already know about it, then it's a bit of a pain. If I'm going to do
it in one place, I really should do it everywhere to be consistent
(and have some argument to turn it off?)
As long as it is just a message, to me it does not seem anymore of a
pain than the warning when missing values are removed.
> it in one place, I really should do it everywhere to be consistent
> (and have some argument to turn it off?)
I appreciate why you are torn; warning that a line requires two points
is rather obvious. I was initially thrown because lm() will fit a
line to two values, but the default for stat_smooth is se = TRUE,
which obviously requires at least 3 values, suggesting that it
actually followed the expected behavior. Lacking a clearly "better"
or "right" choice, staying with the status quo seems most efficient.
Josh