Scatter plot (two Y variables and one X variable) with regression line and equation

123 views
Skip to first unread message

Lidwina Bertrand

unread,
Jul 30, 2015, 3:52:45 PM7/30/15
to ggplot2
Dear alls,
I need help to make a scatter plot with ggplot2.
My data is like:

S P SP
0,05146604 20,0546 20,11
0,03857084 20,0546 20,09
0,03121054 12,2288 12,26
0,30647028 12,5126 12,82
0,28482765 14,7220 15,01
0,50051247 45,0246 45,53
0,19338897 59,2978 59,49
1,2906287 86,3796 87,67
0,43907263 79,2731 79,71
0,53855983 77,6721 78,21
0,17065523 39,8464 40,02
0,22226842 10,8266 11,05
0,28976245 14,1776 14,47
0,82201466 13,1247 13,95
0,26347576 21,1123 21,38
0,59649658 27,6495 28,25
0,22606365 12,2622 12,49
0,32395152 24,2651 24,59
0,08511838 24,2651 24,35
0,02545399 15,4228 15,45
0,36901547 51,4263 51,80
0,47847889 37,6949 38,17
3,01828548 70,2802 73,30
0,27841585 58,9307 59,21
0,24509299 31,4420 31,69
0,51293635 6,3886 6,90
0,43961007 20,1171 20,56
0,3532001 10,2026 10,56
1,80320357 40,0416 41,84
0,08934678 15,9644 16,05
0,23344384 119,5648 119,80
0,88881239 86,8860 87,77
0,49942548 115,9956 116,49
0,80560236 38,8773 39,68
1,0598936 47,3107 48,37
6,4852884 166,4927 172,98
0,20561425 81,9291 82,13
1,64810016 197,9828 199,63
1,27276508 17,2545 18,53

I need a scatter plot with S (Y1) vs SP (X) (with a dot color) and P (Y2) vs SP (X) (with an other dot color) in the same plot, a regression line for each set of data and finally the equation with R2.

Could you help me?

thank you

Lidwina

Dennis Murphy

unread,
Jul 30, 2015, 6:31:03 PM7/30/15
to Lidwina Bertrand, ggplot2
Hi:

The general process is this:

1. Fit both of your models outside of ggplot2 and extract the coefficients from each.
2. Create a data frame that contains the x and y positions where you want to place the fitted equations along with a third column that contains a text string representation of the fitted line using plotmath code. You can either write a function to generate this string or produce it manually - for only two lines, the manual approach probably takes less time. There should be one line in this data frame for each model.
3. Melt the data such that your two explanatory variables are stacked. This makes it easier to plot lines with different colors or line types, for example.
4. Create a ggplot using the result of the melt operation in step 3 as the input data. Use geom_smooth(method = "lm")  to generate the two lines and geom_text() to insert the fitted line equations. You'll need to use parse = TRUE as an argument to geom_text() in order to convert the text string to a mathematical equation.

ggplot2 can generate the fitted lines internally, but it does not "remember" the model coefficients, so you can't use ggplot2 directly to get at the fitted coefficients. There is a way to get them after the fact, but it is not any less complicated than fitting the models beforehand and generating a data frame to create and position the fitted lines by hand.

Since your data are unreadable as is (it is never a good idea to copy/paste data from the R console or a spreadsheet into a message to a text-based list), you'll have to figure out several of these steps on your own, but here are a few lines of code to get you started, taking the name of your data frame as DF:

# Get the model coefficients
coef1 <- coef(lm(S ~ SP, data = DF))
coef2 <- coef(lm(P ~ SP, data = DF))

# Generate the data frame for the equations on your own
# See ?plotmath - you'll need at least hat[y], ~, == and +
# The examples of ?plotmath are helpful - there are also
# examples of how to do this with ggplot2 in the list archives.
# How you want to input the coefficients is up to you.
# For the first call to ggplot() below, assume this data frame has columns
# x, y and eqn for the x-position, y-position and equation as text string,
# respectively.

# Melt the data
library(reshape2)
DFm <- melt(DF, id = "SP")    # generates two new vars named variable and value

# Do the plot - puts both lines in the same panel
library(ggplot2)
ggplot(DFm, aes(x = SP, y = value, colour = variable)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, size = 1) +
    geom_text(data = coefDF, aes(x = x, y = y, label = eqn), parse = TRUE) +
    scale_colour_manual(values = c("darkorange", "blue"))

This will give an inane plot because the range of P is much wider than the range of S, so you're better off faceting the two plots into separate panels (no, ggplot2 does not produce multiple y-scales...by design). In this case, you need to add another column to the data frame containing the fitted equation: variable = factor(c("S", "P")),  which represents the y-variable in each row. Then,

# separate plot for each line
ggplot(DFm, aes(x = SP, y = value, colour = variable)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, size = 1) +
    geom_text(data = coefDF, aes(x = x, y = y, label = eqn), parse = TRUE) +
    scale_colour_manual(values = c("darkorange", "blue"))  +
    facet_wrap(~ variable, ncol = 1, scales = "free_y")

Getting the data frame for the fitted equations is the hard part: you'll have to decide where to position each equation and you have to remember that ggplot2 expects correct plotmath code encased in a text string. The argument parse = TRUE parses and evaluates the string before plotting the result.

Dennis



--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vivek Patil

unread,
Jul 31, 2015, 7:35:06 AM7/31/15
to Dennis Murphy, Lidwina Bertrand, ggplot2
Reply all
Reply to author
Forward
0 new messages