Two scatterplots with groups and a trendline

Skip to first unread message

Ace

unread,
Apr 8, 2012, 12:27:44 AM4/8/12
to ggplot2
Okay I have four csv files. 2 have 3 columns (name, control,
experimentvalue(cat/dog)). 2 have 2 columns (name, group).

My goal is 2 scatterplots arranged next to each other with a trendline
each and the points colored differently by group. Here is my code.


info <- read.csv(file.choose())
info2<-read.csv(file.choose())
responseCATS<-read.csv(file.choose())
responseDOGS<-read.csv(file.choose())

library(ggplot2)
library(grid)

infomerge <- merge(info, responseCATS, by="name")
infomerge2 <- merge(info2, responseDOGS, by="name")

df = data.frame(siControl =infomerge$value_siCONTROL, siCATS=infomerge
$value_siCATS, namelab=infomerge$name, Group=infomerge$Group)

df2 = data.frame(siControl =infomerge2$value_siCONTROL,
siDOGS=infomerge2$value_siDOGS,
namelab=infomerge2$name,Group=infomerge2$Group )


plot1 <- ggplot(df, aes(x=siControl, y=siCATS, colour=Group)) +
geom_point()+ geom_smooth(method="lm")

plot2 <- ggplot(df2, aes(x=siControl, y=siDOGS, colour=Group)) +
geom_point()+ geom_smooth(method="lm")



vp.layout <- function(x, y) viewport(layout.pos.row=x,
layout.pos.col=y)
arrange <- function(..., nrow=NULL, ncol=NULL, as.table=FALSE) {
dots <- list(...)
n <- length(dots)
if(is.null(nrow) & is.null(ncol)) { nrow = floor(n/2) ; ncol =
ceiling(n/nrow)}
if(is.null(nrow)) { nrow = ceiling(n/ncol)}
if(is.null(ncol)) { ncol = ceiling(n/nrow)}
## NOTE see n2mfrow in grDevices for possible alternative
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow,ncol) ) )
ii.p <- 1
for(ii.row in seq(1, nrow)){
ii.table.row <- ii.row
if(as.table) {ii.table.row <- nrow - ii.table.row + 1}
for(ii.col in seq(1, ncol)){
ii.table <- ii.p
if(ii.p > n) break
print(dots[[ii.table]], vp=vp.layout(ii.table.row, ii.col))
ii.p <- ii.p + 1
}
}
}
arrange(plot1,plot2,ncol=1)

The problem is that instead of one trendline per graph I'm getting
several. Also it seems that only a few points are showing up on the
graph instead of all of them. Is there any other thing wrong with the
code that you can see? Thanks

Brandon Hurr

unread,
Apr 8, 2012, 4:59:55 AM4/8/12
to Ace, ggplot2
You're doing a lot of stuff here and some of it looks unnecessary, but it's hard to tell without the original data. Could you please provide sample datasets or dput(df) and dput(df2)? 

It seems to me that you could rbind() df and df2 together and use faceting to get your side-by-side plots so you don't need to use the viewports. 

The reason why you're getting multiple lines per plot is that your geom_smooth() is inheriting colour=Group from your ggplot() call. At least that is my theory in the absence of data. A way around this would be... 

plot1 <- ggplot() +
geom_point(df, aes(x=siControl, y=siCATS, colour=Group)) +
geom_smooth(df, aes(x=siControl, y=siCATS), method="lm")

And the same for plot2... 

Brandon


--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Ace

unread,
Apr 8, 2012, 9:35:15 PM4/8/12
to ggplot2


On Apr 8, 1:59 am, Brandon Hurr <brandon.h...@gmail.com> wrote:
> You're doing a lot of stuff here and some of it looks unnecessary, but it's
> hard to tell without the original data. Could you please provide sample
> datasets or dput(df) and dput(df2)?
>
> It seems to me that you could rbind() df and df2 together and use faceting
> to get your side-by-side plots so you don't need to use the viewports.
>
> The reason why you're getting multiple lines per plot is that your
> geom_smooth() is inheriting colour=Group from your ggplot() call. At least
> that is my theory in the absence of data. A way around this would be...
>
> plot1 <- ggplot() +
> geom_point(df, aes(x=siControl, y=siCATS, colour=Group)) +
> geom_smooth(df, aes(x=siControl, y=siCATS), method="lm")
>
> And the same for plot2...
>
> Brandon
>


When I switch to that syntax I get 'ggplot2 doesn't know how to deal
with data of class uneval'

I'm also trying to add point labels to the graph but it isn't working
either

plot1 <- ggplot() + geom_point(df, aes(x=siControl, y=siCATS,
colour=Group)) + geom_text(size=2,aes(x=siControl, y=siCATS2,
label=name))+ geom_smooth(df, aes(x=siControl, y=siCATS),
method="lm")



Sample data would be

info (3 columns Name siControl and siCATS)
Name siControl siCATS
Point1 0 3
Point2 1 1
Point3 2 3
Point4 3 2

ResponseCATS (2 Columns Name and Group)
Name Group
Point1 1
Point2 1
Point3 2
Point4 2


Brandon Hurr

unread,
Apr 9, 2012, 6:02:36 AM4/9/12
to Ace, ggplot2
The easier it is for us to replicate what you're doing on your machine the easier it is for us to help you. Replicating what you're doing wasn't easy given the information. siCATS2 didn't exist for example. If you use dput() on your data we can replicate your situation exactly. 

I'm not sure if I got there in the end or not, but this is what I got to. 

infomerge<-structure(list(Name = structure(1:4, .Label = c("Point1", "Point2", "Point3", "Point4"), class = "factor"), siControl = 0:3, siCATS = c(3L, 1L, 3L, 2L), group = c(1L, 1L, 2L, 2L)), .Names = c("Name", "siControl", "siCATS", "group"), row.names = c(NA, -4L), class = "data.frame")

ggplot(data=infomerge, aes(x=siControl, y=siCATS)) + geom_point(aes(colour=as.factor(group))) + geom_text(aes(label=Name), size=2)+ geom_smooth(method="lm")

Previously it was treating "Group" as a numeric and giving a scale, but if they are factors (and I think they are) then using as.factor() is what you need. You'll have to clean up the legend, but I think that's what you're after. 1 smooth line for all the data, different colors for the points, and labels for the points. 

output.png

A D

unread,
Apr 10, 2012, 3:22:37 AM4/10/12
to Brandon Hurr, ggplot2
Sorry siCATS2 was a mistype. Also I wasn't able to get dput() to work
properly but I've attached a larger alternate dataset.


>
> I'm not sure if I got there in the end or not, but this is what I got to.
>
> infomerge<-structure(list(Name = structure(1:4, .Label = c("Point1",
> "Point2", "Point3", "Point4"), class = "factor"), siControl = 0:3, siCATS =
> c(3L, 1L, 3L, 2L), group = c(1L, 1L, 2L, 2L)), .Names = c("Name",
> "siControl", "siCATS", "group"), row.names = c(NA, -4L), class =
> "data.frame")

Yes, you have the right idea of what I'm aiming for but I don't
understand what the above is for. I simply used the second part of
your solution.

geneexp <- read.csv(file.choose())
responseTCF7L2<-read.csv(file.choose())

geneexpmerge <- merge(geneexp, responseTCF7L2, by="gene")

df = data.frame(siControl =geneexpmerge$value_siCONTROL,
siTCF7L2=geneexpmerge$value_siTCF7L2, genelab=geneexpmerge$gene,
Group=geneexpmerge$Group)

ggplot(data=df, aes(x=siControl, y=siTCF7L2)) +
geom_smooth(method="lm" ) + coord_trans(x = "log", y = "log") +
geom_point(aes(colour=as.factor(Group))) +
geom_text(aes(label=genelab), size=2)

and it got me mostly what I want except the trendline bends and
doesn't look like its going through the points. I've tried different
things but its either use regular scale and have the points crammed
together or use log and have the bendy trendline.

TCF7L2Sig.csv
ResponseTCF7L2.csv
trendline.png

Brandon Hurr

unread,
Apr 10, 2012, 6:16:06 AM4/10/12
to A D, ggplot2
You could log scale your data before plotting it and then your "lm" should fit to that data pretty well. You'll have to play with the axes to get them to make sense. Either that or fit a log model to your data and plot that on the log scale. Either way you'll have to do this outside of ggplot. 
Reply all
Reply to author
Forward
0 new messages