line plot to visualize three-way lm interaction

606 views

Skip to first unread message

Keith

unread,

May 12, 2009, 9:56:50 AM5/12/09

to ggplot2

Hello again, thanks for your help on my earlier problem, I now have
another one! Sorry to not include reproducible code this time.

I am running across the situation of having to interpret three-way
regression interactions graphically.... I am interested in your
thoughts on the best/easiest way to use ggplot() for the scenario
below:

With one continuous dependent variable Y and three continuous
predictors x1, x2, x3, assume the lm call (Y ~ x1*x2*x3) is run and
results in a significant three-way interaction. I am interested in a
line plot that evaluates the regression model at arbitrary "low" (-1
SD) and "high" (+1 SD) values of each predictor variable, to create a
somewhat quick-and-dirty method for exploring such interactions
graphically.

Ideally, this would involve plotting low vs. high on x3 on separate
plots; low vs. high on x2 on separate (color) lines within each plot;
and low vs. high on x1 as left vs. right side of the x-axis.
Obviously, Y on y-axis for everything.

I imagine that some combination of interaction.plot() calls from base
R would work (although the help file seems to focus on two-way
interactions), but I would like to take advantage of the far superior
display of your package.

I know that facet_grid() can be used easily for the separate plots, I
am just struggling with the best way to code the other aspects of the
plot, which I know are not very complex. Part of the issue is that
for this purpose I am NOT interested in plotting any raw data (nor the
full set of predicted values), just lines for each arbitrary set of
predictor values described above. I'm not sure whether this would be
easiest by creating an intermediate summary-level dataset from the
regression coefficients, or whether it's something the summary
functions in ggplot() can do in one swoop.

As always, thanks in advance for any help or advice you can provide!

hadley wickham

unread,

May 12, 2009, 10:06:54 AM5/12/09

to Keith, ggplot2

Hi Keith,

The basic strategy for this type of problem is as follows.

# 1. Create a grid of the response values you're interested in
grid <- expand.grid(
# you do one variable with a finer resolution than the others
# because this can go on the x axis
x1 = seq(min(df$x1), max(df$x1), length = 20),
# use quartiles for low, medium and high, or pick them yourself
x2 = quantile(df$x2, c(0.25, 0.5, 0.75),
x3 = quantile(df$x3, c(0.25, 0.5, 0.75)
)

# 2. Compute predicted values
grid$y <- predict(mymodel, grid)

# 3. Plot
qplot(x1, y, data = grid, geom = "line") + facet_grid(x2 ~ x3)
qplot(x1, y, data = grid, geom = "line", colour = x2) + facet_grid(~ x3)

# You'll need to play around with where each of the three variables go
# to get the best view for your data/model.

# You should probably also repeat the same plots with the raw data
# just to check that you're not missing anything in the model.

Hadley