two types of points in one plot, abline and log-scale

Johannes Radinger

unread,

Sep 6, 2011, 6:06:04 AM9/6/11

to ggplot2

Hello,

I've kind of following setup: 3 variables where variable b and c are measured in similar units. For example:

a <- c(1,3,5,2,5,3,1,6,7,2,3,2,6)
b <- c(12,15,18,10,18,22,9,7,9,23,12,17,13)
c <- c(22,26,32,33,32,28,29,37,34,29,30,32,29)
data <- data.frame(a,b,c)

I'd like to create a simple scatterplot where a is on
the x axis and b and c (as the are measured in similar units) share
the y axis. Additionally I'd like to add to each variable
a regression line from the lm-function. I am very new
to this topic and my ggplot2-book hasn't arrived yet so I don't know
really how to do it. What I tried so far is:

coef1 <- coef(lm(a~b))
coef2 <- coef(lm(a~c))

ggplot()+
geom_point(aes(a,b,data=data))
geom_abline(aes(x=coef1)
geom_point(aes(a,c,data=data))
geom_abline(aes(x=coef2)

But that isn't working. I think it is a very common question so probably it is quite easy for you to help me.

An addtional question came up: How do you display log-transformed data and its regression line. Assuming I had to log-log transform my data to perform a regression (acutally the were log(x+1) transformed). I'd like to display the original data but with log scales and I want to add the regression line from the transformed data. How do I have to do this (especially is there any special issue because they were log(x+1) transformed and not log(x)?)?

Thank you very much

Johannes
--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone

Dennis Murphy

unread,

Sep 6, 2011, 11:35:14 AM9/6/11

to Johannes Radinger, ggplot2

Hi:

Here's one way to do the 'easy part':

a <- c(1,3,5,2,5,3,1,6,7,2,3,2,6)
b <- c(12,15,18,10,18,22,9,7,9,23,12,17,13)
c <- c(22,26,32,33,32,28,29,37,34,29,30,32,29)

dat <- data.frame(a,b,c)

md <- melt(dat, id = 'a')
ggplot(md, aes(x = a)) +
geom_point(aes(y = value, colour = variable), size = 3) +
geom_smooth(aes(y = value, colour = variable), method = 'lm', se =
FALSE, size = 1)

On Tue, Sep 6, 2011 at 3:06 AM, Johannes Radinger <JRad...@gmx.at> wrote:
> Hello,
>
> I've kind of following setup: 3 variables where variable b and c are measured in similar units. For example:
>
> a <- c(1,3,5,2,5,3,1,6,7,2,3,2,6)
> b <- c(12,15,18,10,18,22,9,7,9,23,12,17,13)
> c <- c(22,26,32,33,32,28,29,37,34,29,30,32,29)
> data <- data.frame(a,b,c)
>
>
> I'd like to create a simple scatterplot where a is on
> the x axis and b and c (as the are measured in similar units) share
> the y axis. Additionally I'd like to add to each variable
> a regression line from the lm-function. I am very new
> to this topic and my ggplot2-book hasn't arrived yet so I don't know
> really how to do it. What I tried so far is:
>
> coef1 <- coef(lm(a~b))
> coef2 <- coef(lm(a~c))
>
> ggplot()+
> geom_point(aes(a,b,data=data))
> geom_abline(aes(x=coef1)
> geom_point(aes(a,c,data=data))
> geom_abline(aes(x=coef2)
>
>
> But that isn't working. I think it is a very common question so probably it is quite easy for you to help me.
>

As for this part, see scale_continuous() to find out how to plot on a
log scale; as for an offset log-log scale, I don't believe there is
anything built-in but you may be able to construct one. One option may
be to play with the new scales package in conjunction with the
development version of ggplot2. Alternatively, you may be able to do
it by defining a formatting function and using the formatter =
argument of scale_continuous(). I'd try that first before venturing
into the development version to find a solution. There should be a few
examples of user-written formats in the ggplot2 list archives to give
you an idea of how they work.

HTH,
Dennis

> An addtional question came up: How do you display log-transformed data and its regression line. Assuming I had to log-log transform my data to perform a regression (acutally the were log(x+1) transformed). I'd like to display the original data but with log scales and I want to add the regression line from the transformed data. How do I have to do this (especially is there any special issue because they were log(x+1) transformed and not log(x)?)?
>
> Thank you very much
>
> Johannes
> --
> NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Johannes Radinger

unread,

Sep 7, 2011, 5:07:13 AM9/7/11

to Dennis Murphy, ggp...@googlegroups.com

Hi again,

I just came up with a slightly different approach using cbind to
calculate regressions for multiple Ys, and to concatenate vectors and
use grouping variables. But there is still a problem... Somehow my regression lines are at a wrong position. I think it's a very simple thing but I can't find the reason for the problem :(

a <- c(1,3,5,2,5,3,1,6,7,2,3,2,6)
b <- c(12,15,18,10,18,22,9,7,9,23,12,17,13)
c <- c(22,26,32,33,32,28,29,37,34,29,30,32,29)

X <-c(a,a)
Y <-c(b,c)
groupcode <-rep(c("b","c"),c(length(b),length(c)))

data <- data.frame(X,Y,groupcode)

model <-lm(cbind(b,c)~a,data=data)
coef <- data.frame(coef(model))

ggplot()+
geom_point(aes(X,Y,data=data,colour=groupcode))+
geom_abline(aes(intercept=coef[1,],slope=coef[2,],colour=groupcode),data=coef)

I think this idea can work quite well (accept for the stupid problem;)) Maybe you know what to do...

Concerning the log-problem I think I'll create an example and write an extra-post after this here is solved.

cheers

Johannes

-------- Original-Nachricht --------
> Datum: Tue, 6 Sep 2011 08:35:14 -0700
> Von: Dennis Murphy <djm...@gmail.com>
> An: Johannes Radinger <JRad...@gmx.at>
> CC: ggplot2 <ggp...@googlegroups.com>
> Betreff: Re: two types of points in one plot, abline and log-scale

Johannes Radinger

unread,

Sep 7, 2011, 6:17:58 AM9/7/11

to r-h...@r-project.org, ggp...@googlegroups.com

Hello,

I've some questions concerning log-transformations and plotting of the regression lines. So far as I know is it a problem to log-transform values smaller than 1 (0-1). In my statistics lecture I was told to do a log(x+1) transformation in such cases. So I provide here a small example to explain my questions:

# Some example data for testing
a1 <-c(0.2,1.9,0.1,0.2,0.8,22,111.3,19.9,23.9,138,42.3,54.2,0.9)
b1 <-c(1.8,28.2,0.3,12.4,3.2,81.1,122.1,2.9,37.2,98.9,21,28.7,1.8)
data1 <- data.frame(a1,b1)

model <- lm(log(a1+1)~log(b1+1))

because of values less then one I did the log(x+1) transformation for running the lm. Is that correct so far? (Just to mention: These are example data so I haven't checked if the need a transformation at all)

Then some questions arise when it comes to plot the data. As usual I'd like to plot the original data (not log transformed) but in a log-scale.

I tried two approaches the standard plot function and ggplot.

# Plot with ggplot
ggplot()+
geom_point(aes(b1,a1,data=data1))+
geom_abline(aes(intercept=coef(model)[1],slope=coef(model)[2]))+
scale_y_log()+
scale_x_log()

# Plot with standard plot
plot(b1,a1,log="xy")
abline(model,untf=T)
abline(model,untf=F)

1) The regression lines are different for plot vs. ggplot(transformed or untransformed). So what is actually the correct line?

2) The regression line was calculated on basis of log(x+1), but the log scale on my axis is just simple log (without +1). So how are such cases usually treated? I thought about subtracting the value 1 from the intercept?

So my simple question: What is the best way to display such data with a regression line?

Thank you
/Johannes

James Waters

unread,

Sep 11, 2011, 7:47:04 PM9/11/11

to Johannes Radinger, ggp...@googlegroups.com

Johannes,

The only reason I can imagine to add +1 to the log analysis is in case there are values that equal 0, in which case the logarithm isn't defined. Fractional values should be fine. But be careful with artificially manipulating the intercept for the data (it can affect the estimated exponent). A better approach may be to omit those values or fit a nonlinear model. See the cautions described in this paper:

Model Selection and Logarithmic Transformation in Allometric Analysis
Gary C. Packard and Thomas J. Boardman
Physiological and Biochemical Zoology
Vol. 81, No. 4 (July/August 2008), pp. 496-507

Back to your sample data, here's my take on working with it:

# example data for testing

a1 <-c(0.2,1.9,0.1,0.2,0.8,22,111.3,19.9,23.9,138,42.3,54.2,0.9)
b1 <-c(1.8,28.2,0.3,12.4,3.2,81.1,122.1,2.9,37.2,98.9,21,28.7,1.8)
data1<- data.frame(a1,b1)

model1<- lm(log(a1+1)~log(b1+1))
model2<- lm(log(a1)~log(b1))
summary(model1)
summary(model2)

# Visualizing the data

# Quick and simple
qplot(data=data1, x=b1, y=a1, log="xy")

# With a linear regression fit to log (base 10) transformed data
p <- ggplot(data=data1, aes(x=log10(b1), y=log10(a1)))
p+geom_point()+stat_smooth(method="lm")

# Using the original units and log scales
p2 <- ggplot(data=data1, aes(x=b1, y=a1))
p2+geom_point()+stat_smooth(method="lm")+scale_y_log10()+scale_x_log10()

# The best way to analyze these data may not actually involve
# log transformation, but rather a straight up nonlinear
# method (with thanks to Dennis Murphy for an example of how
# to plot 95% confidence bands for this model in ggplot2)

nonlinear.model <- nls(a1 ~ A * b1^B, data=data1, start=list(A=1, B=1))
summary(nonlinear.model)

#    Formula: a1 ~ A * b1^B
#    Parameters:
#    Estimate Std. Error t value Pr(>|t|)
#    A   0.5779     1.0061   0.574   0.5772
#    B   1.1052     0.3808   2.902   0.0144 *

#    Note that the exponent (B=1.1052) is the same
#    parameter estimated by the model2 above (cool!)

# Plotting confidence bands for a 'simple' nonlinear model in ggplot2
# Add predictions and margin of error (moe) to the original data frame dd

data1$yhat <- predict(nonlinear.model)   # the easy part...

# In this example, f(x; A, B) = A * x^B
# The partial derivatives of f with respect to the parameters are
# df/dA = x^B, df/dB = A * x^B * log(x)
# The derivative matrix required has the derivatives as column functions,
# => each row consists of the two derivatives evaluated at the x value
# and the estimated values of A and B

# Estimated derivative matrix at the LS estimates (note that u = a * x^b,
# where a and b are the LSEs) [needed for the estimated SE below]

v0 <- with(data1, cbind(yhat/coef(nonlinear.model)[1], yhat * log(b1)))

# Estimated vector of margins of error for the estimated expectation
# function f-hat:
# note: supply correct degrees of freedom in qt()
data1$moe <- qt(.975, 11) * sqrt(diag(v0 %*% vcov(nonlinear.model) %*% t(v0)))

g <- ggplot(data1)
g + geom_point(aes(x = b1, y = a1), size = 2.5) +
    geom_ribbon(aes(x = b1, y = yhat, ymin = yhat - moe, ymax = yhat + moe),
                fill = alpha('gray20', 0.1)) +
    geom_line(aes(x = b1, y = yhat), color = 'red', size = 1) +
    geom_line(aes(x = b1, y = yhat - moe), color = 'blue', linetype = 'dashed', size = 0.5) +
    geom_line(aes(x = b1, y = yhat + moe), color = 'blue', linetype = 'dashed', size = 0.5)

James

PS: The last part of the code above is based on an answer by Dennis last year:
http://groups.google.com/group/ggplot2/msg/446b89687448bbc7

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

--
________________________
School of Life Sciences
Arizona State University
PO Box 874601
Tempe, AZ 85287-4601
cell: (480) 388-0728
http://solsgrads.asu.edu/waters

Reply all

Reply to author

Forward