coord transform

26 views
Skip to first unread message

adam.l...@pnc.com

unread,
Jul 17, 2015, 8:42:04 AM7/17/15
to ggp...@googlegroups.com
Hi,

Is it possible to use the coord_trans function to transform the y-axis of the following plot so that it's on the log-odds scale?

data(iris)
iris$virginica <- ifelse(iris$Species=="virginica",1,0)
ggplot(iris,aes( Petal.Width, virginica )) + stat_smooth(method="glm", family="binomial")







Adam Loveland


The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may be considered a commercial electronic message under Canadian law or this message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US law. You may unsubscribe at any time from receiving commercial electronic messages from PNC at http://pages.e.pnc.com/globalunsub/
PNC, 249 Fifth Avenue, Pittsburgh, PA 15222; pnc.com

Dennis Murphy

unread,
Jul 17, 2015, 4:19:56 PM7/17/15
to adam.l...@pnc.com, ggplot2
Hi Adam:

Not the way you have defined it. The problem is that a coordinate
transform works on the _data_ *after* the x- and y-scales have been
trained, not on model predictions derived from the data. In your
example, virginica is a numeric y-variable with values 0 and 1, which,
on the logit scale, map to -Inf and Inf, respectively. An axis scale,
or a coordinate transformation thereof, is meant to be applied to the
y-variable in the ggplot()/qplot() call, and you evidently want to
apply the logit transformation to the model predictions. I believe
you'll have to pass the predicted probabilities as the y-variable,
logit transform them in advance before passing them to ggplot() or
using ggplot_build() to get the data that produced the fit and then
plot/coordinate transform that.

This is probably easier to do in ggvis because it has a
compute_model_prediction() function that can handle the model fit as a
data transformation. If you want to do this in ggplot2, I think it
would be easier to fit the model and the predictions/confidence limits
in advance, save the results to a data frame, and then set the y-axis
on the logit scale. That would be the 'easier' way IMO. You could
also use ggplot_build() to grab the output from ggplot(), which I'll
do at the end.

Here's a fairly simple way to do this, which you probably already
know, but in case this topic comes up again...

# Generate a data frame of appropriate elements from a model object
using predict()
m <- glm(virginica ~ Petal.Width, data = iris2, family = "binomial")
p <- predict(m, se.fit = TRUE) # default predictions are on the logit scale

DF <- data.frame(iris2, phat = p$fit,
lcl = p$fit - 1.96 * p$se.fit, # Wald-based CIs
ucl = p$fit + 1.96 * p$se.fit)

ggplot(DF, aes(x = Petal.Width, y = phat)) +
geom_ribbon(aes(ymin = lcl, ymax = ucl), fill = "gray80") +
geom_line(color = "blue", size = 1) +
ylab("Log odds")


The CIs were produced using the Wald procedure (large-sample
normality). If you have a better CI method, then apply it when
building the data frame.

To get what I think you want based on the ggplot fit, here's one way to do it:

# Define a coordinate transformation function, which requires
# definition of a transformation and its inverse. The new scale
# is defined by scales::trans_new()
logodds_trans <- function()
{
trans <- function(x) log(x/(1 - x))
inv <- function(x) exp(x)/(1 + exp(x))
breaks <- function(x) log_breaks(exp(1))(x)

scales::trans_new("logodds", trans, inv, breaks)
}

# Use ggplot_build() to get the data used in the smooth
q <- ggplot(iris2, aes( Petal.Width, virginica )) +
stat_smooth(method="glm", family="binomial")

qq <- ggplot_build(q)
qqq <- qq$data[[1]]

# This is the default appearance
ggplot(qqq, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax), fill = "gray80") +
geom_line(color = "blue", size = 1) +
coord_trans(ytrans = "logodds")

# Two escape routes:

# (i) Manually add scale breaks using scale_y_continuous()

ggplot(qqq, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax), fill = "gray80") +
geom_line(color = "blue", size = 1) +
scale_y_continuous(breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5,
0.9, 0.99, 0.999, 0.9999)) +
coord_trans(ytrans = "logodds")

## As you can see when plotting this, it's not an ideal approach.

Another approach is to redefine the function used in the breaks object
of logodds_trans(), but this may or may not be productive. When
spreading out a scale from (0, 1) to (-Inf, Inf), the original breaks
are going to be pretty tightly packed in the transformed scale unless
you define a suitable set of breaks on the probability scale. This is
why I mentioned the idea of plotting on the logit scale up front.

If you have designs on doing this kind of thing routinely, I'd suggest
wrapping it up into a function. There are likely better ways to do
this, so if others have ideas, feel free to chime in.

HTH,
Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

adam.l...@pnc.com

unread,
Jul 21, 2015, 3:29:47 PM7/21/15
to Dennis Murphy, ggplot2
Thanks Dennis. This was very helpful.



Adam Loveland
Reply all
Reply to author
Forward
0 new messages