Enhancing the discoverability of the variables computed by stats and,
particularly, the variables extracted by fortify would be nice.
However, I don't have a good solution (a message would probably be
annoying). Any suggestion is welcome.
The name "fortify" itself is a bit cryptic too, even though I
understand the overall intention. as.data.frame() is a function that
most people already know and is a generic so why not simply define
methods for it: fortify.lm = as.data.frame.lm, fortify.map =
as.data.frame.map etc. ? Those do not seem to exist already. Does
anyone see a conflict?
JiHO
---
http://maururu.net
Hi all,
Enhancing the discoverability of the variables computed by stats and,
particularly, the variables extracted by fortify would be nice.
However, I don't have a good solution (a message would probably be
annoying). Any suggestion is welcome.
Because, in general, fortify takes two arguments: an object and a data
frame. As.data.frame only takes one.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
I'd rather make it easier to do the opposite:
mod <- gam(y ~ s(x) ,data = dat)
p <- ggplot(dat, aes(x=x, y=y)) + geom_point() + geom_smooth(model = mod)
I generally think it's a bad idea to rely on a graphics package to do
your statistics for you. If you're doing anything moderately
complicated, you should do it outside of ggplot2 and then add the
results on to a plot. I agree that the tools for doing this could be
better, and they will get better as the statistical functionality of
ggplot2 is extracted out into separate, stand-alone, well-documented
functions.
Uh, OK. I never realized it since most well formed modeling functions
in R keep the data in the model object I always extract it form there.
I thought the goal of fortify was to be able to do
ggplot(someModel) + geom_***
In this case, you can't specifiy both the model and the data, can you?
JiHO
---
http://maururu.net
I agree. And to continue on the teacher's point of view, I always
teach my students to compute the model manually and extract necessary
data from it (fitted values, residuals etc.) to make the plots.
Otherwise, they don't understand how the display of the model and the
actual modeling process are connected.
I think that ggplot can help with this data extraction part (cf
fortify) but shouldn't try to do the whole thing. A plot of the
results of the model is not enough anyway.
As for the syntax, what about
mod <- gam(y ~ s(x) ,data = dat)
ggplot(mod) + geom_point(aes(x=x, y=y)) + geom_line(aes(x=x, y=.fitted.values.))
with fortify handling the data extraction from mod behind the scenes.
This format would allow fine control and to produce any plot
imaginable. And this
autoplot(mod)
autoplot(mod, type="resid")
to produce automatically some commonly used plots (fit, residuals vs
fitted, qqplot of residuals etc.). This is the direction I took based
on your advice for factorial analysis plots. The
geom_smooth(model=mod) strikes me as kind of in-between: neither fully
automated nor giving complete control and I am not sure how it would
scale/generalize.
JiHO
---
http://maururu.net
Doesn't that work already? (modulo a missing fortify method for gam objects)
> This format would allow fine control and to produce any plot
> imaginable. And this
>
> autoplot(mod)
> autoplot(mod, type="resid")
>
> to produce automatically some commonly used plots (fit, residuals vs
> fitted, qqplot of residuals etc.). This is the direction I took based
> on your advice for factorial analysis plots. The
> geom_smooth(model=mod) strikes me as kind of in-between: neither fully
> automated nor giving complete control and I am not sure how it would
> scale/generalize.
The devel version has an autoplot generic for just this reason.
mod <- gam(y ~ s(x) ,data = dat)
p <- ggplot(dat, aes(x=x, y=y)) + geom_point() + geom_smooth(model = mod)
Zekai
--
You received this message because you are subscribed to the ggplot2 mailing
list.
Please provide a reproducible example: http://gist.github.com/270442
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2
mod <- lme(y ~ x, data = dat, random = ~ x | z)
p <- ggplot(dat) + facet_wrap( ~ z) + geom_point(aes(x=x, y=y)) +
geom_smooth(model = mod)
I can already foresee some potential implementation issues though --
e.g. lme predictions require the specification of the level in the
hierarchical structure of the model at which predictions are to be made;
and interval estimates of lme predictions are notoriously difficult to
make (so maybe force 'se = FALSE' in geom_smooth?). In any case, its
exciting to me that you're thinking along these lines Hadley. I'll give
it some more thought myself, in terms of specific implementation ideas.
Steve
--
----------------------------------------------------
Steven C Walker
Postdoctoral researcher
D�partement de Sciences Biologiques
Universit� de Montr�al
https://sites.google.com/site/stevencarlislewalker/
514-343-1233
----------------------------------------------------
I agree with Hadley here. I'm often wanting to pass fitted model objects to geom_smooth -- for example, if I've already fitted a model, why make ggplot fit it again? I'm not sure how difficult this would be to implement, but a wish of mine is to be able to pass any fitted model object of class with a predict method (e.g. lme):I'd rather make it easier to do the opposite:
mod<- gam(y ~ s(x) ,data = dat)
p<- ggplot(dat, aes(x=x, y=y)) + geom_point() + geom_smooth(model = mod)
I generally think it's a bad idea to rely on a graphics package to do
your statistics for you. If you're doing anything moderately
complicated, you should do it outside of ggplot2 and then add the
results on to a plot. I agree that the tools for doing this could be
better, and they will get better as the statistical functionality of
ggplot2 is extracted out into separate, stand-alone, well-documented
functions.
Hadley
mod <- lme(y ~ x, data = dat, random = ~ x | z)
p <- ggplot(dat) + facet_wrap( ~ z) + geom_point(aes(x=x, y=y)) + geom_smooth(model = mod)
I can already foresee some potential implementation issues though -- e.g. lme predictions require the specification of the level in the hierarchical structure of the model at which predictions are to be made; and interval estimates of lme predictions are notoriously difficult to make (so maybe force 'se = FALSE' in geom_smooth?). In any case, its exciting to me that you're thinking along these lines Hadley. I'll give it some more thought myself, in terms of specific implementation ideas.
---------- Forwarded message ----------
From: JiHO <jo.l...@gmail.com>
Date: Fri, Jan 20, 2012 at 22:57
Subject: Re: Suggestion: enhancing the discovery of newly created variables
To: Hadley Wickham <had...@rice.edu>
On Fri, Jan 20, 2012 at 14:46, Hadley Wickham <had...@rice.edu> wrote:
>> mod <- gam(y ~ s(x) ,data = dat)
>> ggplot(mod) + geom_point(aes(x=x, y=y)) + geom_line(aes(x=x, y=.fitted.values.))
>>
>> with fortify handling the data extraction from mod behind the scenes.
>
> Doesn't that work already? (modulo a missing fortify method for gam objects)
It does. I was just pointing out that passing the model object to
geom_smooth is both:
- not giving as much power to the user as fortify, where the fortified
data can then be used with any geom
- not as automated as having autoplot() do a complete plot for you
I understand the conceptual appeal of it. I'm just not sure how well
it would generalize (what about multivariate models, lme as already
pointed out etc.). But maybe that's also because I more often use
techniques that don't easily sum up to a line +/- se.
>> This format would allow fine control and to produce any plot
>> imaginable. And this
>>
>> autoplot(mod)
>> autoplot(mod, type="resid")
>
> The devel version has an autoplot generic for just this reason.
Great. What's your take on this then: put the autoplot functions in
ggplot for now before moving them to a different package?
JiHO
---
http://maururu.net
That seems like exactly what fortify is meant for. You probably should
wrap this into a method for fortify and follow its convention in
naming the variables (preprend a dot "." for extracted variables). See
fortify.lm() as an example.
> One thing I'm not sure about is how to deal with aesthetic mappings. Say you
> have a factor mapped to colour and want to add model lines for each one. You
> could add each one separately and manually set the color, but this seems
> clunky. I guess maybe you put the model objects in a list somehow, but you'd
> also have to preserve information about the factor(s) levels that each model
> object corresponds to. I don't see a really good way to do this. Thoughts?
I think that the "correct" way to do this would be to have the factor
in the model in a way that affects both the intercept and slope (i.e.
fit one separate model per factor level, but do this in one model
call) and then use this in ggplot. Example:
set.seed(123)
x <- runif(20, 0, 10)
a <- factor(rep(c("foo", "bar"), each=10))
y <- x*(0.5*as.numeric(a)) + (3+as.numeric(a)) + runif(20, 0, 3)
d <- data.frame(x, y, a)
library("ggplot2")
ggplot(d) + geom_point(aes(x=x, y=y, colour=a))
m <- lm(y ~ x + a + a:x, data=d)
df <- fortify(m)
ggplot(df, aes(x=x, y=y, colour=a)) + geom_point() +
geom_line(aes(y=.fitted))
Is that what you meant?
JiHO
---
http://maururu.net
On Fri, Jan 20, 2012 at 17:55, Winston Chang <winsto...@gmail.com> wrote:That seems like exactly what fortify is meant for. You probably should
> I have some code that takes a model object and generates a data frame that
> you then use with geom_line(). Here's roughly how it's used:
> pline <- predictdf( lm(y ~ x, data=dat) )
> ggplot(dat, aes(x=x, y=y)) + geom_point() + geom_line(data=pline)
>
> It probably wouldn't be hard to adapt geom_smooth/stat_smooth to do this...
> Maybe I'll give it a shot over the weekend. (I'll probably post progress to
> the ggplot2-dev list instead of this one.)
wrap this into a method for fortify and follow its convention in
naming the variables (preprend a dot "." for extracted variables). See
fortify.lm() as an example.
I think that the "correct" way to do this would be to have the factor
> One thing I'm not sure about is how to deal with aesthetic mappings. Say you
> have a factor mapped to colour and want to add model lines for each one. You
> could add each one separately and manually set the color, but this seems
> clunky. I guess maybe you put the model objects in a list somehow, but you'd
> also have to preserve information about the factor(s) levels that each model
> object corresponds to. I don't see a really good way to do this. Thoughts?
in the model in a way that affects both the intercept and slope (i.e.
fit one separate model per factor level, but do this in one model
call) and then use this in ggplot. Example:
set.seed(123)
x <- runif(20, 0, 10)
a <- factor(rep(c("foo", "bar"), each=10))
y <- x*(0.5*as.numeric(a)) + (3+as.numeric(a)) + runif(20, 0, 3)
d <- data.frame(x, y, a)
library("ggplot2")
ggplot(d) + geom_point(aes(x=x, y=y, colour=a))
m <- lm(y ~ x + a + a:x, data=d)
df <- fortify(m)
ggplot(df, aes(x=x, y=y, colour=a)) + geom_point() +
geom_line(aes(y=.fitted))
Is that what you meant?
Indeed fortify requires a method for each type of model, but that's
probably because it extract more than just the fit. fortify.lm uses
predict though:
fortify.lm
function (model, data = model$model, ...)
{
infl <- influence(model, do.coef = FALSE)
data$.hat <- infl$hat
data$.sigma <- infl$sigma
data$.cooksd <- cooks.distance(model, infl)
data$.fitted <- predict(model)
data$.resid <- resid(model)
data$.stdresid <- rstandard(model, infl)
data
}
Maybe the code above is valid for more than just lm. In which case
maybe it should be called something like fortify.regression and
fortify.lm, fortify.glm, fortify.loess would just be aliases. From the
documentation, lm and glm would be OK but loess probably not.
The default fortify method is currently just a fallback which gives an
error message:
function (model, data, ...)
{
stop("ggplot2 doesn't know how to deal with data of class ",
class(model), call. = FALSE)
}
so I don't think it would be appropriate to have default actions
there. In addition, fortify can be applied to stuff other than the
result of a regression model (maps etc.) and predict does not make
sense in these other contexts.
JiHO
---
http://maururu.net