two questions about stat_smooth and glm

394 views

Skip to first unread message

Jarrett Byrnes

unread,

Jul 20, 2009, 8:11:19 PM7/20/09

to ggp...@googlegroups.com

I have two quick questions about plotting the results of a glm
analysis using ggplot2.

Let us assume I have data that the x value varies from 0 to 1, has
some noise, and then is exponentiated. This leads to a gamma type of
error. Assume there is also some error in measurement, leading to the
occasional negative measurement and such. I wish to then plot both
the data and a curve fitted using a generalized linear model with a
Gamma error and a log link.

Now, two questions

1) Is there a way to plot the curve of interest, but ONLY plot it for
groups (and I'm using facets here) for which you get a result that
meets some criterion (e.g. where the coefficient p value < 0.05 in my
case).

2)Due to negative values from measurement error, one needs to add a
slight offset to the y values for the fit. How can one do this and
still get a properly plotted line. Currently I'm getting "Error in
eval(expr, envir, enclos) :
non-positive values not allowed for the gamma family" errors
regardless of different attempts to add offsets with the formula in
stat_smooth.

Here is some sample code I have been playing with for this problem.

#create data - x is a predictor, g is a group, y is the response
x<-seq(0,1,.01)

g<-as.factor(c("A", "B", "C", "D"))
z<-data.frame(expand.grid(x=x,g=g))

#add a little error in to our results
set.seed(1001)
z$y<- rnorm(length(z$x), z$x*c(.3,1.5,2.9, 5.0)[unclass(z$g)], sd=0.1)
z$y=rnorm(length(z$x), exp(z$y), sd=2)

#this first bit just sets up the basic plotting, with colors and panels
a<-qplot(x,y,group=g, colour=g, data=z)+facet_wrap( ~ g, scales="free")

#now, add a fitted line with the error around it shown.
a+stat_smooth(method="glm", se=T, group=g, fill="black",
family=Gamma(link="log"), formula=(y+1)~x)

Thanks!

-Jarrett

hadley wickham

unread,

Jul 21, 2009, 3:16:35 PM7/21/09

to Jarrett Byrnes, ggp...@googlegroups.com

Hi Jarrett,

It sounds like you're trying to do something that's a bit too
complicated for stat_smooth. I'd suggest fitting the models
separately, and predicting on a grid of x values and then plotting the
predictions in a separate layer:

# Fit model to each group. I'm just using a linear model here because I
# know how to extract the overall model p-value.
models <- dlply(z, "g", function(df) lm(y ~ x, data = df))

# Calculate pvalue
pvalues <- laply(models, function(mod) {
fstat <- summary(mod)$fstatistic
pf(fstat[1L], fstat[2L], fstat[3L], lower.tail = FALSE)
})

# Only look at significant models
signif <- models[pvalues < 0.05]

# Create grid and add predictions
grid <- data.frame(x = seq(0, 1, length = 20))
predictions <- ldply(signif, function(model) {
grid$y <- predict(model, newdata = grid)
grid
})
# Shouldn't have to do this, but it looks like there's a bug in plyr
names(predictions)[1] <- "g"

qplot(x, y, colour = g, data = z) +
facet_wrap(~ g, scales="free") +
geom_line(data = predictions)

However, off the top of my head, I don't know how to resolve your
problem with negative values. Simply adding a offset is unlikely to
be the right solution.

Hadley