As far as the plot goes, here's one way to get the pooled fitted line
(I'm assuming you still want the points to be distinguished by group):
ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75) +
geom_point(size=3, aes(colour=A, shape=A))
Some points to mention:
(1) In this case, grouping is overkill - mapping the levels of A to
colors and shapes is enough. [In fact, for one factor, only one
aesthetic is necessary.] There are situations where grouping is
required, but this isn't one of them.
(2) One of the key things to learn in ggplot2 is that there is an
important distinction between plot aesthetics that are *mapped* and
plot aesthetics that are *set*. By aesthetic, we mean things like
color, shape, size or linetype. Mapping variables to aesthetics means
that different values of a variable are assigned to unique values of
an aesthetic; to effect this, the mapping is done inside an aes()
paragraph. All mapped aesthetics generate a legend by default,
although it is possible to suppress them. Assignment of a constant
value to an aesthetic is done outside the aes() statement.
For example, in the code above, the size aesthetic is set to 0.75 in
stat_smooth() and the point size to 3 in geom_point(). No size legend
is created because these are defined outside aes(). For geom_point(),
color and shape are mapped to the levels of the factor A so a legend
is generated. Since both are mapped to the same aesthetic, only one
legend is generated. (It's not always this easy, though: legends are
merged when they have the same title, breaks and labels.)
(3) ggplot2 is a graphics package; although it is possible to extract
certain values from a ggplot, it's usually less of a headache get
numerical information outside of ggplot2. Since the fitted model in
this example is linear, it's particularly easy to get what you want:
> m1 <- lm(D ~ B, data = mid_lift_force.df)
> summary(m1)[['coefficients']] ## or equivalently, summary(m1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.983870 0.26595544 7.459408 2.993534e-04
B 2.799899 0.07757535 36.092636 3.016846e-08
> preds <- predict(m1, interval = 'confidence')
> pred.df <- data.frame(B = mid_lift_force.df$B, preds)
> pred.df
B fit lwr upr
1 0.000000 1.983870 1.333101 2.634640
2 4.200000 13.743446 13.282107 14.204785
3 5.000000 15.983365 15.420486 16.546244
4 0.000000 1.983870 1.333101 2.634640
5 1.322774 5.687503 5.218441 6.156564
6 2.645547 9.391135 9.013475 9.768795
7 4.200000 13.743446 13.282107 14.204785
8 5.000000 15.983365 15.420486 16.546244
See below for more comments.
With respect to the last sentence, it isn't necessary. By defining B
and D as aesthetics in ggplot(), all subsequent layers are expected to
use B as the x-variable and D as the y-variable. (Each geom defines a
separate layer.) This is another subtlety of ggplot2 that bites people
sometimes (been there).
> 2. How would you define the legend so that the geom_point and the
> stat_smooth show up on the same key (point and line)?
Assuming this is a follow-up to the code where you *did* fit separate
lines by B groups, it appears to me that ggplot() did what you asked
for. There is one legend which has different colors, shapes and
linetypes to represent two groups. What were you expecting that was
different from what was rendered?
>
> 3. How do I get the information (fit coefficients, confidence intervals etc)
> calculated by stat_smooth?
I showed how above, but if you insist on doing it the hard way :), try
p <- ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75) +
geom_point(size=3, aes(colour=A, group=A, shape=A))
str(ggplot_build(p))
Look in the $data$`1` list component at the top to find where the
predicted values and confidence limits used in the plot reside. To
access it, try
u <- ggplot_build(p)
predictDF <- u$data[[1]][[1]]
head(predictDF)
You'll notice this data frame has 80 observations instead of 8. The
set of x's at which prediction takes place in geom_smooth() is equally
spaced at reasonably small intervals so that the plot renders
smoothly. The point of showing you this is to underline that it is
often easier to extract numeric information more easily outside of
ggplot2 rather than inside it. There are many functions in R to do
things like groupwise summary statistics, model fits, etc. more
quickly and more transparently than through output from ggplot2. What
you *should* learn is which functions are used in the code to render
the plot. For example, there are several types of functions one can
pass through geom_smooth(), but the default is loess(). Knowing these
things makes it easier to extract information from the underlying R
functions themselves rather than rely on ggplot2 to save the
intermediate results for you. It wasn't designed for that purpose. In
fact, the output of ggplot_build() corresponds to the information
required to render the ggplot object (here, p).
HTH,
Dennis
>
> Thanks!
>
>
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
Here are a couple ways you could do it; the first extends your
'working' example, which produces warnings about using ColorBrewer
palettes with fewer than three levels, while the second uses the more
'hands-on' manual scale for color.
ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75, colour = 'blue') +
geom_point(size=3, aes(colour=A, shape=A)) +
scale_colour_brewer(palette="Set1",
name="Temperature",
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60')) +
scale_shape_discrete(name="Temperature",
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60'))
ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75) +
geom_point(size=3, aes(colour=A, shape=A)) +
scale_colour_manual('Temperature (C)',
breaks = levels(mid_lift_force.df$A),
values = c('red', 'blue'),
labels = c('20', '60')) +
scale_shape_discrete('Temperature (C)',
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60'))
You could also use scale_shape_manual() and use the values = argument
to set specific plotting characters in a similar fashion to what I did
with scale_colour_manual(). The key is that the scales, breaks and
labels are the same in both scale_* calls - that's what allows them to
merge.
Dennis