creating a dose-response curve (probit anlysis) using ggplot2

989 views

Skip to first unread message

benjami...@gmail.com

unread,

Jun 21, 2017, 5:01:43 PM6/21/17

to ggplot2

Hi everyone,

For those of you unfamiliar with dose-response curves, a dose response curve uses a chemical concentrations and percentage of survival to predict the concentration needed to be lethal at 10 %, 50%, or 99% or whatever percentage you are interested in (LCxx). I am able to use a glm to determine what my LC50 and LC99 are and I am able to create a plot using R's basic plotting functions. However, I'd like to create dose response curve using ggplot but I run into some issues. The main issues is that I am unable to apply the probit function to the geom_smooth without producing an error. Below is the following code that causes issues.

It is noticeable that the curve produce geom_smooth is not the correct curve, additionally the confidence errors are not correct either. Please let me know if you have any issues with the code or any suggestions. Thank you

#load packages

library(ggplot2)

library(Hmisc)

library(plyr)

library(MASS)

#create dataframe:

#1) column is exposure concentration

#2) column is the total number of organism died over 12 h of exposure to the corresponding concentration

#3) column is the total number that survived over 12 h to the corresponding concentration

#4) column is the total number of organism exposed to the corresponding concentration

#5) fifth is the percentage of organism that survived exposure at the corresponding concentration

conc <- c(0.02, 0.45, 0.46, 0.50, 0.78, 0.80, 0.80, 0.92, 0.93, 1.00, 1.16, 1.17, 1.17, 1.48,

1.51, 1.55, 1.88, 1.90, 2.02)

dead <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 11, 4, 14, 14, 12, 12, 18, 17)

survive <- c(15, 16, 15, 15, 15, 16, 14, 14, 10, 15, 12, 5, 12, 0, 1, 3, 0, 0, 0)

total <- c(15, 16, 15, 15, 15, 16, 14, 14, 10, 16, 19, 16, 16, 14, 15, 15, 12, 18, 17)

perc <- c(1.00, 1.00, 1.00, 1.00, 1.00,1.00, 1.00, 1.00, 1.00, 0.94,0.63, 0.31,0.75,0.00,

0.07, 0.20, 0.00, 0.00,0.00)

data<-data.frame(conc,dead,survive,total,perc)

head(data)

attach(data)

#create matrix of dead and survival

y = cbind(dead,survive)

#create bionomial glm (probit model)

model.results = glm(data = data, y ~ conc,binomial(link="probit"))

summary(model.results)

#use function from MASS to calculate LC

dose.p(model.results,p=0.5)

dose.p(model.results,p=c(0.1,0.25,0.5,0.99))

#plot curve

plot(conc,(survive/(survive+dead)), ylab = "Percent Survival",

xlab="Concentration ")

#To make function use the esEstimate prarameters from the bionomial glm used above

logisticline <- function(z) {eta = -6.7421 + 5.4468 * z;1 / (1 + exp(eta))}

x <- seq(0,200.02,0.01)

lines(x,logisticline(x),new = TRUE)

#plot using ggplot

ggplot(data, aes(x = conc, y = perc)) +

geom_point() +

geom_smooth(method="glm",method.args = list(family = "binomial"))

Dennis Murphy

unread,

Jun 24, 2017, 9:03:53 PM6/24/17

to benjami...@gmail.com, ggplot2

Hi:

ggplot2 has some facilities for fitting mean functions in GLMs, but
sometimes it's safer to do the fitting yourself and pass the results
you trust into ggplot2. This may be one of those cases.

Firstly, I want to mention that your ggplot2 example did not return an
error - it returned a warning. The plot was rendered, but you were
warned that the y values were not on the 0-1 scale the routine
expected. Secondly, your ggplot2 fit was of a logistic regression
model, not a probit, since the default link function in glm() is
'logit'. Unfortunately, if you try either

(a) method.args = list(family = "binomial", link = "probit"), or
(b) method.args = list(family = "binomial(link = 'probit')"),

inside of geom_smooth(), you get an error message. I don't work with
GLMs enough any more to know how to get ggplot2 to recognize
non-canonical links in method.args(). If anyone knows the right
mantra, I'm happy to learn it.

Thirdly, I have to make a comment about the way you entered your data.
First of all, you entered the variables in one at a time and then
composed a data frame out of them. Then, you attached the data frame
to the session. This is not good practice in R - when you created the
individual variables, you already did the equivalent of attach(),
whose purpose is to make the individual variables in a data frame
exposed in the global environment. Not only is attach() overkill in
this context, if you had to edit the values of one or more of your
variables, you would have had a real mess on your hands trying to
figure out in which objects the changes were actually made. Advice re
attach(): don't do that! The with() function is your friend - it is
much safer than attach().

All that said, here's my approach to your problem - note that the
fitted curve at the end is on the probit scale rather than the logit
scale, which is why it looks a bit different from your example.

## Read the data into a data frame

library(ggplot2)
library(MASS)

dose_data <- data.frame(
conc = c(0.02, 0.45, 0.46, 0.50, 0.78, 0.80, 0.80, 0.92, 0.93, 1.00,
1.16, 1.17, 1.17, 1.48, 1.51, 1.55, 1.88, 1.90, 2.02),
dead = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 11, 4, 14, 14, 12,
12, 18, 17),
survive = c(15, 16, 15, 15, 15, 16, 14, 14, 10, 15, 12, 5, 12, 0,
1, 3, 0, 0, 0),
total = c(15, 16, 15, 15, 15, 16, 14, 14, 10, 16, 19, 16, 16, 14, 15,
15, 12, 18, 17))

## Add a variable to be the ratio of survivors to the number at risk...
dose_data$prop <- with(dose_data, round(survive/total, 4))

## Fit a probit model - note the order of variables in the LHS matrix
## This will come into play below with respect to MASS::dose.p()

model_results <- glm(cbind(survive, dead) ~ conc,
data = dose_data,
family = binomial(link = "probit"))

## Generate a data frame of x values at which to make predictions
new_conc <- data.frame(conc = seq(0, 2.1, by = 0.1))

## Predict the response probabilties at various doses in new_conc
model_preds <- predict(model_results, newdata = new_conc,
type = "response", se.fit = FALSE)

## Collect the concentrations and predicted probabllities into a data frame
predDF <- data.frame(new_conc, phat = model_preds$fit)

## Add LDp information

## Use 1 - p since we used survive as the success event in the GLM
u <- MASS::dose.p(model_results, p= 1 - c(0.1, 0.25, 0.5, 0.99))

## Construct a data frame out of u, which has class 'glm.dose'
doseDF <- data.frame(conc = as.vector(u),
labl = paste0("LD", 100 * (1 - attr(u, "p"))),
stringsAsFactors = FALSE)

## Produce the plot
## Want nothing in the base layer as the y-variable names will change from
## one layer to another
ggplot() +
geom_point(data = dose_data, aes(x = conc, y = pct), size = 3) +
geom_line(data = predDF, aes(x = conc, y = phat), size = 1,
color = "blue") +
geom_vline(data = doseDF, aes(xintercept = conc), color = "darkorange") +
geom_text(data = doseDF, aes(x = conc,
y = c(0.6, 0.5, 0.4, 0.3), label = labl),
size = 5, color = "firebrick")

I want to point out that the above code shows that you can use
different data frames to produce different layers. This is a major
advantage of the ggplot2 system. My solution isn't particularly
elegant, but it contains most of the necessary elements.

I didn't plot confidence bands for the predicted probabilities because
that takes a bit more work and my exploration time is limited these
days. See Gavin Simpson's response in the following post to get a
sense of the issues:
https://stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression

HTH,
Dennis

> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages