Hi:
ggplot2 has some facilities for fitting mean functions in GLMs, but
sometimes it's safer to do the fitting yourself and pass the results
you trust into ggplot2. This may be one of those cases.
Firstly, I want to mention that your ggplot2 example did not return an
error - it returned a warning. The plot was rendered, but you were
warned that the y values were not on the 0-1 scale the routine
expected. Secondly, your ggplot2 fit was of a logistic regression
model, not a probit, since the default link function in glm() is
'logit'. Unfortunately, if you try either
(a) method.args = list(family = "binomial", link = "probit"), or
(b) method.args = list(family = "binomial(link = 'probit')"),
inside of geom_smooth(), you get an error message. I don't work with
GLMs enough any more to know how to get ggplot2 to recognize
non-canonical links in method.args(). If anyone knows the right
mantra, I'm happy to learn it.
Thirdly, I have to make a comment about the way you entered your data.
First of all, you entered the variables in one at a time and then
composed a data frame out of them. Then, you attached the data frame
to the session. This is not good practice in R - when you created the
individual variables, you already did the equivalent of attach(),
whose purpose is to make the individual variables in a data frame
exposed in the global environment. Not only is attach() overkill in
this context, if you had to edit the values of one or more of your
variables, you would have had a real mess on your hands trying to
figure out in which objects the changes were actually made. Advice re
attach(): don't do that! The with() function is your friend - it is
much safer than attach().
All that said, here's my approach to your problem - note that the
fitted curve at the end is on the probit scale rather than the logit
scale, which is why it looks a bit different from your example.
## Read the data into a data frame
library(ggplot2)
library(MASS)
dose_data <- data.frame(
conc = c(0.02, 0.45, 0.46, 0.50, 0.78, 0.80, 0.80, 0.92, 0.93, 1.00,
1.16, 1.17, 1.17, 1.48, 1.51, 1.55, 1.88, 1.90, 2.02),
dead = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 11, 4, 14, 14, 12,
12, 18, 17),
survive = c(15, 16, 15, 15, 15, 16, 14, 14, 10, 15, 12, 5, 12, 0,
1, 3, 0, 0, 0),
total = c(15, 16, 15, 15, 15, 16, 14, 14, 10, 16, 19, 16, 16, 14, 15,
15, 12, 18, 17))
## Add a variable to be the ratio of survivors to the number at risk...
dose_data$prop <- with(dose_data, round(survive/total, 4))
## Fit a probit model - note the order of variables in the LHS matrix
## This will come into play below with respect to MASS::dose.p()
model_results <- glm(cbind(survive, dead) ~ conc,
data = dose_data,
family = binomial(link = "probit"))
## Generate a data frame of x values at which to make predictions
new_conc <- data.frame(conc = seq(0, 2.1, by = 0.1))
## Predict the response probabilties at various doses in new_conc
model_preds <- predict(model_results, newdata = new_conc,
type = "response", se.fit = FALSE)
## Collect the concentrations and predicted probabllities into a data frame
predDF <- data.frame(new_conc, phat = model_preds$fit)
## Add LDp information
## Use 1 - p since we used survive as the success event in the GLM
u <- MASS::dose.p(model_results, p= 1 - c(0.1, 0.25, 0.5, 0.99))
## Construct a data frame out of u, which has class 'glm.dose'
doseDF <- data.frame(conc = as.vector(u),
labl = paste0("LD", 100 * (1 - attr(u, "p"))),
stringsAsFactors = FALSE)
## Produce the plot
## Want nothing in the base layer as the y-variable names will change from
## one layer to another
ggplot() +
geom_point(data = dose_data, aes(x = conc, y = pct), size = 3) +
geom_line(data = predDF, aes(x = conc, y = phat), size = 1,
color = "blue") +
geom_vline(data = doseDF, aes(xintercept = conc), color = "darkorange") +
geom_text(data = doseDF, aes(x = conc,
y = c(0.6, 0.5, 0.4, 0.3), label = labl),
size = 5, color = "firebrick")
I want to point out that the above code shows that you can use
different data frames to produce different layers. This is a major
advantage of the ggplot2 system. My solution isn't particularly
elegant, but it contains most of the necessary elements.
I didn't plot confidence bands for the predicted probabilities because
that takes a bit more work and my exploration time is limited these
days. See Gavin Simpson's response in the following post to get a
sense of the issues:
https://stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression
HTH,
Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
>
https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email
ggp...@googlegroups.com
> To unsubscribe: email
ggplot2+u...@googlegroups.com
> More options:
http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
ggplot2+u...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.