Avoid fitting stat_smooth to data groups

pacificprince

unread,

Dec 2, 2011, 4:31:27 PM12/2/11

to ggp...@googlegroups.com

Dear R users,
I am new to ggplot2 and R and have been scrounging around the internet to get ggplot2 to plot what I want. For the most part I have figured everything out, but this one has me stumped. here is my data:

A    B    C    D    E
20C    0    1    2    3
20C    4.2    13    14    15
20C    5    15    16.5    18
60C    0    1    2    3
60C    1.32277357    5    6    7
60C    2.64554715    8    9    10
60C    4.2    12    13    14
60C    5    15    16    17

I have three issues:

1. I wish to plot B vs D using stat_smooth, and here is my code, but I DO NOT want two separate fits for the 20C and 60C cases. Here is my code which gives two separate fits,
ggplot(mid_lift_force.df, aes(x=B, y=D)) +
                                theme_bw() +
                                stat_smooth(method="lm",
                                    aes(linetype=A, color=A, shape=A, fill=A), size=0.75) +
                            geom_point(size=3, aes(colour=A, group=A, shape=A))

I tried adding x=B and y=D in the aes for stat_smooth but this does not work. Maybe I am doing something wrong

2. How would you define the legend so that the geom_point and the stat_smooth show up on the same key (point and line)?

3. How do I get the information (fit coefficients, confidence intervals etc) calculated by stat_smooth?

Thanks!

Dennis Murphy

unread,

Dec 2, 2011, 6:30:31 PM12/2/11

to ggp...@googlegroups.com

Hi:

As far as the plot goes, here's one way to get the pooled fitted line
(I'm assuming you still want the points to be distinguished by group):

ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +

stat_smooth(method="lm", size=0.75) +
geom_point(size=3, aes(colour=A, shape=A))

Some points to mention:
(1) In this case, grouping is overkill - mapping the levels of A to
colors and shapes is enough. [In fact, for one factor, only one
aesthetic is necessary.] There are situations where grouping is
required, but this isn't one of them.

(2) One of the key things to learn in ggplot2 is that there is an
important distinction between plot aesthetics that are *mapped* and
plot aesthetics that are *set*. By aesthetic, we mean things like
color, shape, size or linetype. Mapping variables to aesthetics means
that different values of a variable are assigned to unique values of
an aesthetic; to effect this, the mapping is done inside an aes()
paragraph. All mapped aesthetics generate a legend by default,
although it is possible to suppress them. Assignment of a constant
value to an aesthetic is done outside the aes() statement.

For example, in the code above, the size aesthetic is set to 0.75 in
stat_smooth() and the point size to 3 in geom_point(). No size legend
is created because these are defined outside aes(). For geom_point(),
color and shape are mapped to the levels of the factor A so a legend
is generated. Since both are mapped to the same aesthetic, only one
legend is generated. (It's not always this easy, though: legends are
merged when they have the same title, breaks and labels.)

(3) ggplot2 is a graphics package; although it is possible to extract
certain values from a ggplot, it's usually less of a headache get
numerical information outside of ggplot2. Since the fitted model in
this example is linear, it's particularly easy to get what you want:

> m1 <- lm(D ~ B, data = mid_lift_force.df)
> summary(m1)[['coefficients']] ## or equivalently, summary(m1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.983870 0.26595544 7.459408 2.993534e-04
B 2.799899 0.07757535 36.092636 3.016846e-08
> preds <- predict(m1, interval = 'confidence')
> pred.df <- data.frame(B = mid_lift_force.df$B, preds)
> pred.df
B fit lwr upr
1 0.000000 1.983870 1.333101 2.634640
2 4.200000 13.743446 13.282107 14.204785
3 5.000000 15.983365 15.420486 16.546244
4 0.000000 1.983870 1.333101 2.634640
5 1.322774 5.687503 5.218441 6.156564
6 2.645547 9.391135 9.013475 9.768795
7 4.200000 13.743446 13.282107 14.204785
8 5.000000 15.983365 15.420486 16.546244

See below for more comments.

With respect to the last sentence, it isn't necessary. By defining B
and D as aesthetics in ggplot(), all subsequent layers are expected to
use B as the x-variable and D as the y-variable. (Each geom defines a
separate layer.) This is another subtlety of ggplot2 that bites people
sometimes (been there).

> 2. How would you define the legend so that the geom_point and the
> stat_smooth show up on the same key (point and line)?

Assuming this is a follow-up to the code where you *did* fit separate
lines by B groups, it appears to me that ggplot() did what you asked
for. There is one legend which has different colors, shapes and
linetypes to represent two groups. What were you expecting that was
different from what was rendered?

>
> 3. How do I get the information (fit coefficients, confidence intervals etc)
> calculated by stat_smooth?

I showed how above, but if you insist on doing it the hard way :), try

p <- ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75) +
geom_point(size=3, aes(colour=A, group=A, shape=A))
str(ggplot_build(p))

Look in the $data$`1` list component at the top to find where the
predicted values and confidence limits used in the plot reside. To
access it, try

u <- ggplot_build(p)
predictDF <- u$data[[1]][[1]]
head(predictDF)

You'll notice this data frame has 80 observations instead of 8. The
set of x's at which prediction takes place in geom_smooth() is equally
spaced at reasonably small intervals so that the plot renders
smoothly. The point of showing you this is to underline that it is
often easier to extract numeric information more easily outside of
ggplot2 rather than inside it. There are many functions in R to do
things like groupwise summary statistics, model fits, etc. more
quickly and more transparently than through output from ggplot2. What
you *should* learn is which functions are used in the code to render
the plot. For example, there are several types of functions one can
pass through geom_smooth(), but the default is loess(). Knowing these
things makes it easier to extract information from the underlying R
functions themselves rather than rely on ggplot2 to save the
intermediate results for you. It wasn't designed for that purpose. In
fact, the output of ggplot_build() corresponds to the information
required to render the ggplot object (here, p).

HTH,
Dennis

>
> Thanks!
>
>
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

pacificprince

unread,

Dec 2, 2011, 8:54:08 PM12/2/11

to ggp...@googlegroups.com

Hi Dennis,
Thanks a ton for your prompt and clear response! You saved me many hours of playing around with this. Consider me schooled! I am planning to read the ggplot2 book by Hadley but I guess will have to wait till Christmas break :). In any case, I now better understand the philosophy behind the way ggplot2 works and will hopefully understand what I am doing.

As regards the legend, I was trying to customize it through the following code

                             + scale_colour_brewer(palette="Set1",
                                              name="Temperature (C)",
                                              breaks=c(20, 60),
                                              labels=c("20", "60")
                                              )+
                              scale_shape_discrete(name="Temperature (C)",
                                              breaks=c(20, 60),
                                              labels=c("20", "60")
                                              );
which does not produce the symbols next to the text, but this works
                             + scale_colour_brewer(palette="Set1",
                                              name="Temperature")+
                              scale_shape_discrete(name="Temperature");

I suspect this has something to do with the subtlties you mentioned in the 2nd point, but I will take what I get for now.

I have attached my plot (there are many other options I have customized (copy-pasted from places all over the interwebs...))

plot-mid-lift-force.png

Dennis Murphy

unread,

Dec 2, 2011, 9:28:46 PM12/2/11

to ggp...@googlegroups.com

Hi:

Here are a couple ways you could do it; the first extends your
'working' example, which produces warnings about using ColorBrewer
palettes with fewer than three levels, while the second uses the more
'hands-on' manual scale for color.

ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +

stat_smooth(method="lm", size=0.75, colour = 'blue') +
geom_point(size=3, aes(colour=A, shape=A)) +
scale_colour_brewer(palette="Set1",
name="Temperature",
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60')) +
scale_shape_discrete(name="Temperature",
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60'))

ggplot(mid_lift_force.df, aes(x=B, y=D)) +
theme_bw() +
stat_smooth(method="lm", size=0.75) +

geom_point(size=3, aes(colour=A, shape=A)) +
scale_colour_manual('Temperature (C)',
breaks = levels(mid_lift_force.df$A),
values = c('red', 'blue'),
labels = c('20', '60')) +
scale_shape_discrete('Temperature (C)',
breaks = levels(mid_lift_force.df$A),
labels = c('20', '60'))

You could also use scale_shape_manual() and use the values = argument
to set specific plotting characters in a similar fashion to what I did
with scale_colour_manual(). The key is that the scales, breaks and
labels are the same in both scale_* calls - that's what allows them to
merge.

Dennis

Message has been deleted

pacificprince

unread,

Dec 2, 2011, 10:03:05 PM12/2/11

to ggp...@googlegroups.com

Perfect, worked like a charm! I chose the scale_colour_brewer since I use it for my other plots and it gives a consistent appearance across all my graphics. I did realize that the labels and breaks needed to be identical across scale_* calls but was struggling with the exact syntax. Saved me another couple of hours...Thanks again!

Reply all

Reply to author

Forward