ggplot2 and weights

4,979 views
Skip to first unread message

jebyrnes

unread,
Jan 25, 2011, 5:52:00 PM1/25/11
to ggplot2
I'm trying to plot a regression in ggplot2 for which I have weighed
data. I've tried something like

qplot(hp, mpg, size=cyl, data=mtcars)+stat_smooth(method="lm",
weights=cyl)

However, this merely gives an error that cyl is not found or that
weights is matched by multiple arguments. Can weights be used with
stat_smooth?

jebyrnes

unread,
Jan 25, 2011, 5:52:00 PM1/25/11
to ggplot2

Dennis Murphy

unread,
Jan 26, 2011, 4:48:47 AM1/26/11
to jebyrnes, ggplot2
Hi:

I don't know if you can do weighted linear regression inside of ggplot2, but I wouldn't expect it to be so. Perhaps someone else can show how it's done, if at all. Looking at the help page of stat_smooth(), there is no input argument for weights, so it would seem to be safer to fit a weighted model outside of ggplot2. Below is some code for weighted simple linear regression outside of ggplot2,where I created a data frame of predictions and then generated the equivalent of geom_smooth() manually.

wtdlm <- lm(mpg ~ hp, data = mtcars, weights = cyl)
mp <- as.data.frame(cbind(hp = mtcars$hp,
                                           predict(wtdlm, interval = 'confidence')))
# Base plot
p1 <- ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point(aes(size = cyl))
# Add the fitted model + confidence envelope
p1 + geom_line(data = mp, aes(x = hp, y = fit), size = 1, color = 'blue') +
        geom_line(data = mp, aes(x = hp, y = lwr), color = 'gray80') +
        geom_line(data = mp, aes(x = hp, y = upr), color = 'gray80') +
        geom_ribbon(data = mp, aes(x = hp, ymin = lwr, ymax = upr), alpha = 0.1)

It's not neatly packaged, but neither is it overly hard to produce. If you don't want the SE bars, eliminate the second argument in predict(); then, the single line of p1 + geom_line(...) without the plus sign at the end should be enough.

If you were not already aware, one of the nice things about ggplot2 is that you can add _compatible_ layers using data in different data frames. In this case, 'compatible' means the same x-variable and the same y-scale.

You can see that the fitted line above is not the same as the fitted LS line from geom_smooth() with

last_plot() + geom_smooth(data = mtcars, method = 'lm', size = 1, color = 'red', se = FALSE)

HTH,
Dennis


--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Hadley Wickham

unread,
Jan 26, 2011, 9:28:59 AM1/26/11
to Dennis Murphy, jebyrnes, ggplot2
> I don't know if you can do weighted linear regression inside of ggplot2, but
> I wouldn't expect it to be so. Perhaps someone else can show how it's done,
> if at all. Looking at the help page of stat_smooth(), there is no input
> argument for weights, so it would seem to be safer to fit a weighted model
> outside of ggplot2. Below is some code for weighted simple linear regression
> outside of ggplot2,where I created a data frame of predictions and then
> generated the equivalent of geom_smooth() manually.

Such doubt and skepticism! ;)

You just need to use weight as an aesthetic:

qplot(hp, mpg, size=cyl, data=mtcars)+
stat_smooth(method="lm", aes(weight=cyl))

But I totally agree that when you're doing a more complicated model
it's much better to do it outside of ggplot so that you can check that
it's doing what you think it's doing.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Dennis Murphy

unread,
Jan 26, 2011, 10:02:27 AM1/26/11
to Hadley Wickham, jebyrnes, ggplot2
On Wed, Jan 26, 2011 at 6:28 AM, Hadley Wickham <had...@rice.edu> wrote:
> I don't know if you can do weighted linear regression inside of ggplot2, but
> I wouldn't expect it to be so. Perhaps someone else can show how it's done,
> if at all. Looking at the help page of stat_smooth(), there is no input
> argument for weights, so it would seem to be safer to fit a weighted model
> outside of ggplot2. Below is some code for weighted simple linear regression
> outside of ggplot2,where I created a data frame of predictions and then
> generated the equivalent of geom_smooth() manually.

Such doubt and skepticism! ;)

It's how I learn from people like you, Kohske, Brian and Baptiste, among others. What's more fun than proving a skeptic wrong?  :)

You just need to use weight as an aesthetic:

qplot(hp, mpg, size=cyl, data=mtcars)+
 stat_smooth(method="lm", aes(weight=cyl))

Aaah, thanks! That's the same as what I got from my code...

Cheers,
Dennis
Reply all
Reply to author
Forward
0 new messages