ggplot density plots overlay points

339 views
Skip to first unread message

Timothy Lau

unread,
Oct 12, 2015, 5:12:28 PM10/12/15
to ggplot2

Any suggestions for how to go about plotting something like this in ggplot?


Dennis Murphy

unread,
Oct 13, 2015, 12:30:35 AM10/13/15
to Timothy Lau, ggplot2
Hi:

I can get you part of the way there, but I got stuck on trying to replicate the normal distribution grob. Since, as usual, there is no reproducible example with which to work, I had to create one based on your plot.

library(grid)
library(ggplot2)

## Step 1: Create a rotated standard normal distribution

# Create a sequence of x-values at which to apply the dnorm() function
DF0 <- data.frame(x = seq(-4, 4, by = 0.05))

# Create the normal distribution plot, stripping everything except the graph
p <- ggplot(DF0, aes(x = x)) +
   theme_minimal() +
   stat_function(fun = dnorm, color = "red", size = 1) +
   geom_hline(yintercept = 0, color = "red", size = 1) +
   coord_flip() +
   theme(panel.grid.major = element_blank(),
         axis.ticks = element_blank(),
         axis.text = element_blank(),
         panel.background = element_rect(fill = "transparent",
                                         colour = "transparent")) +
   labs(x = NULL, y = NULL) +
   theme(plot.margin = grid::unit(c(0, 0, 0, 0), "lines"))

# Convert the above into a grob, which we'll need below.
pr <- ggplotGrob(p)


# Attempt to replicate your input data
DF <- data.frame(age = seq(6.5, 16.5), 
                 observed = c(18, 22, 27, 30, 31, 35, 38, 41, 40, 46, 43))

# Since you didn't provide a function to estimate the predicted values,
# I produced a loess smooth and fit a spline function through it 
# as a hack solution. The point of the exercise is to produce 
# predicted values at the observed ages.

f <- with(DF, splinefun(loess.smooth(age, observed)))
DF$predicted <- f(DF$age)

# Produce the scatterplot. A factor for the observed and predicted
# means is generated on the fly to allow for a legend. I chose to create
# separate legends for point color and linetype. Although both colors are
# set to black in scale_color_manual(), one is modified in the guides() call.
pp <- ggplot(DF, aes(x = age)) +
    theme_bw() +
    geom_point(aes(y = observed, color = "Observed mean",
                                 linetype = "Observed mean")) +
    geom_line(aes(y = predicted, color = "Predicted mean",
                                 linetype = "Predicted mean")) +
    theme(legend.position = c(1, 0), 
          legend.justification = c("right", "bottom")) +
    scale_color_manual(values = c("black", "black")) +
    scale_linetype_manual(values = c("blank", "solid")) +
    scale_x_continuous(breaks = seq(6.5, 16.5)) +
    guides(color = guide_legend(override.aes = list(shape = 21,
                                         fill = c("black", "transparent")))) +
    labs(x = "Age, in years", y = "Raw score", color = "", linetype = "") +
    ylim(0, 50)


# This is my attempt to map the distribution grob at each predicted value.
# It works the first time but fails afterward, so there's probably something
# obvious I'm missing here...
for(i in 1:nrow(DF))   # also tried seq_along(DF$age) with same results
{
  pp <- pp + annotation_custom(grob = pr,
                               xmin = DF$age[i] - 0.2, xmax = DF$age[i] + 0.4,
                               ymin = DF$predicted[i] - 5, 
                               ymax = DF$predicted[i] + 5)
}

# You really shouldn't be using R^2 for nonlinear models (Hint: what is the
# null model against which you're comparing the present fit?), but if you think
# of it as the squared correlation between observed and predicted values, I
# guess this will suffice to add its annotation:

r2 <- with(DF, cor(observed, predicted))^2
pp + annotate("text", x = 6.5, y = 45, hjust = 0,
              label = paste("R^2 ==", round(r2, 3)), parse = TRUE)


Someone else will have to figure out how to get the loop to "work". This requires Baptiste's magic, but I don't know if he still monitors this group.

Re the comment about R^2 above, if you don't know the answer to the hint, you shouldn't be using it. Seriously. Moreover, what is the point of "adjusted" R^2 when you only have one covariate in the model? 

Dennis


On Mon, Oct 12, 2015 at 2:12 PM, Timothy Lau <timoth...@gmail.com> wrote:

Any suggestions for how to go about plotting something like this in ggplot?


--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages