Plotting lines between 2 data points from different columns

2,224 views
Skip to first unread message

Sean

unread,
Jan 26, 2011, 11:23:36 AM1/26/11
to ggplot2
Hi ggplot users,

I've searched within this group on this topic and couldn't find
something close.
I'm trying to plot lines between data points from different columns to
see how drug treatment effects pre and post condition. A sample data
frame is below along with the ggplot commands I'm using:

df <- data.frame(rep(c("male", "female"), 5), rep(c("Drug1", "Drug2"),
5), pre_D1=c(10:1), post_D1=c(1:10), pre_D2=c(25:16),
post_D2=c(16:25))
colnames(df)[1:2] <- c("Sex", "Drug")
dfm <- melt(df, id=1:2)
ggplot(dfm, aes(variable, value)) + geom_point() + facet_grid(Sex ~
Drug)

I'm trying to draw specific lines just between the pre_drug and
post_drug values for different category. I've trying using
+ geom_line(aes(group =1))
but it only resulted drawing a single line from beginning to end.

Is there a way that I can work this out with ggplot?

Thanks for your help.

Sean Ma
Univ of Michigan

Scott Chamberlain

unread,
Jan 26, 2011, 11:49:54 AM1/26/11
to Sean, ggplot2
i found this example on ggplot2 google groups:

myData <- structure(list(treatment = c(0L, 0L, 100L, 100L, 200L, 200L, 
200L, 600L, 0L, 100L, 600L), time = structure(c(1L, 3L, 2L, 3L, 
1L, 2L, 3L, 2L, 2L, 1L, 1L), .Label = c("1a", "1b", "1c"), class = "factor"), 
    mean = c(8.351316, 8.200078, 10.355362, 8.709843, 10.336664, 
    10.404017, 8.984964, 10.519717, 7.955357, 10.301795, 10.35637 
    ), uci = c(8.592008, 8.454073, 10.65889, 9.105403, 10.62428, 
    10.754237, 9.425563, 10.710556, 8.402852, 10.301795, 10.530051 
    ), lci = c(8.110625, 7.946084, 10.051834, 8.314283, 10.049048, 
    10.053798, 8.544365, 10.328879, 7.507862, 10.301795, 10.182688 
    )), .Names = c("treatment", "time", "mean", "uci", "lci"), class = 
"data.frame", row.names = c(NA, 
-11L)) 

q <- ggplot(myData) 
q + geom_pointrange(aes(x= factor(time), y=mean, ymin=lci, ymax=uci, 
colour=factor(treatment)), width = 1.2, size = 1.2) + 
geom_line(aes(x = factor(time), y = mean, group=1), colour = 'blue') + 
facet_grid(. ~ treatment) + 
scale_colour_discrete('Treatment') 


On Wednesday, January 26, 2011 at 10:23 AM, Sean wrote:

+ geom_line(aes(group =1))

Ista Zahn

unread,
Jan 26, 2011, 12:04:28 PM1/26/11
to Sean, ggplot2
Here is one way;

dfm$D <- "D1"
dfm$D[grep("D2", dfm$variable)] <- "D2"


ggplot(dfm, aes(variable, value)) + geom_point() + facet_grid(Sex

~Drug) + stat_summary(fun.y = mean, geom="line", aes(group=D))

Best,
Ista

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Brian Diggs

unread,
Jan 26, 2011, 2:36:11 PM1/26/11
to ggplot2


On Jan 26, 8:23 am, Sean <tehsh...@gmail.com> wrote:
> Hi ggplot users,
>
> I've searched within this group on this topic and couldn't find
> something close.
> I'm trying to plot lines between data points from different columns to
> see how drug treatment effects pre and post condition. A sample data
> frame is below along with the ggplot commands I'm using:
>
> df <- data.frame(rep(c("male", "female"), 5), rep(c("Drug1", "Drug2"),
> 5), pre_D1=c(10:1), post_D1=c(1:10), pre_D2=c(25:16),
> post_D2=c(16:25))
> colnames(df)[1:2] <- c("Sex", "Drug")
> dfm <- melt(df, id=1:2)
> ggplot(dfm, aes(variable, value)) + geom_point() + facet_grid(Sex ~
> Drug)
>
> I'm trying to draw specific lines just between the pre_drug and
> post_drug values for different category. I've trying using
> + geom_line(aes(group =1))
> but it only resulted drawing a single line from beginning to end.

I am not sure what you mean by "for different category". Should there
be 10 lines, each one corresponding to a row in your initial df data?
If so, you lost that information when you melted because there is not
a variable that identifies the row. In this example, I've created an
additional column ID which is unique across rows in df, and kept it
when melting into dfm. This is then the grouping aesthetic for the
lines. Is this what you wanted?

df <- data.frame(ID=1:10,
Sex=rep(c("male", "female"), 5),
Drug=rep(c("Drug1", "Drug2"), 5),
pre_D1=c(10:1),
post_D1=c(1:10),
pre_D2=c(25:16),
post_D2=c(16:25))
dfm <- melt(df, id=c("ID", "Sex", "Drug"))
ggplot(dfm, aes(variable, value)) +
geom_point() +
geom_line(aes(group=ID)) +
facet_grid(Sex ~ Drug)

> Is there a way that I can work this out with ggplot?
>
> Thanks for your help.
>
> Sean Ma
> Univ of Michigan

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

Sean T Ma

unread,
Jan 26, 2011, 1:32:00 PM1/26/11
to ggplot2
Thanks for your quick reply. 

Drawing lines between means is actually not what I need but I thank you for pointing that out. What I actually need are lines connecting the pre and post treatment for each individual subject in that specific drug group rather than the mean point for that drug group.  

Inspired by Ista's reply, I thought recoding each individual into a new group by Sex + Drug would be a good idea, but it didn't work out. 

df$code <- paste(df$Sex, df$Drug, sep = "_")
dfmm <- melt(df, id = c(1:2, 7))
ggplot(dfmm, aes(variable, value)) + geom_point() + facet_grid(Sex
+ ~Drug) + geom_line(aes(group=code))

Any other suggestions?



Best,

Sean T. Ma
Postdoc Fellow
Univ of Michigan

Ista Zahn

unread,
Jan 26, 2011, 3:57:17 PM1/26/11
to Sean T Ma, ggplot2
Hi again Sean,
As Brian suggested, you're going to want to keep an ID column so you
can tell ggplot which pairs of points to connect with lines. I'm also
still not sure exactly what you want -- either Brian's solution, or
perhaps something like

df <- data.frame(rep(c("male", "female"), 5), rep(c("Drug1", "Drug2"),
5), pre_D1=c(10:1), post_D1=c(1:10), pre_D2=c(25:16),
post_D2=c(16:25))
colnames(df)[1:2] <- c("Sex", "Drug")

df$ID <- factor(1:10)
dfm <- melt(df, id=c(1:2, 7))

dfm$d <- "D1"
dfm$d[grep("D2", dfm$variable)] <- "D2"
dfm$code <- paste(dfm$ID, dfm$d, sep="_")

ggplot(dfm, aes(variable, value)) + geom_point() +

facet_grid(Sex~Drug) + geom_line(aes(group=code))

Best,
Ista

Brian Diggs

unread,
Jan 26, 2011, 5:02:26 PM1/26/11
to Sean T Ma, ggplot2
On 1/26/2011 12:33 PM, Sean T Ma wrote:
> Hi all,
>
> I've uploaded a mock drawing of what I meant by connecting lines
> between the pre-treatment and post-treatment.
> https://docs.google.com/leaf?id=0B7APsK8xS4QtZGMzMmY2MTAtMjU3ZS00YTQ5LWJmMzgtYWQ1MWI0MzU4ZDJl&sort=name&layout=list&num=50
>
> Hopefully this will help better describe what is needed. Much
> thanks!

I'm not sure your clarification made it to the whole group (didn't see
it just now), but Ista's (second) solution does do what you want. Here
is a slightly different approach that gives the exact same plot that
Ista's solution does.

df <- data.frame(ID=factor(1:10),


Sex=rep(c("male", "female"), 5),
Drug=rep(c("Drug1", "Drug2"), 5),
pre_D1=c(10:1),
post_D1=c(1:10),
pre_D2=c(25:16),
post_D2=c(16:25))
dfm <- melt(df, id=c("ID", "Sex", "Drug"))

dfm <- cbind(dfm,
colsplit(dfm$variable, "_", names=c("timing","D")))


ggplot(dfm, aes(variable, value)) +
geom_point() +

geom_line(aes(group=interaction(ID, D))) +
facet_grid(Sex ~ Drug)


The only real difference between our solutions (aside from trivial
indenting/formatting) is how the D1/D2 information is extracted into its
own column (assignments with grep versus using colsplit) and how the
ultimate line grouping is specified (a separate variable combining ID
and D versus using interaction in the group specification avoiding
creating a new variable)

> Sean T. Ma
> Univ of Michigan


>
>
> On Wed, Jan 26, 2011 at 2:36 PM, Brian
> Diggs<dig...@ohsu.edu<mailto:dig...@ohsu.edu>> wrote:
>
>
> On Jan 26, 8:23 am,

Sean T Ma

unread,
Jan 27, 2011, 12:40:19 AM1/27/11
to Brian Diggs, ggplot2
Thank you so much for the help!! I love this community!!



Best,

Sean T. Ma
Univ of Michigan



Sean T Ma

unread,
Jan 26, 2011, 3:33:08 PM1/26/11
to Brian Diggs, ggplot2
Hi all, 

I've uploaded a mock drawing of what I meant by connecting lines between the pre-treatment and post-treatment. 
Hopefully this will help better describe what is needed. 
Much thanks!



Sean T. Ma
Univ of Michigan



--

skyjo

unread,
Feb 25, 2011, 4:34:50 PM2/25/11
to ggplot2
Hi all-
I wanted to do something very similar to this, and this thread helped
me get most of the way there. Thanks. Here's my plot:

subj<-c(1:50)
trials1<-floor(runif(50,1,13))
p1<-rbinom(50,trials1,.30)/trials1
trials2<-floor(runif(50,1,13))
p2<-rbinom(50,trials2,.70)/trials2

id<-rep(id,2)
time<-c(rep(1,50),rep(2,50))
trials<-c(trials1,trials2)
p<-c(p1,p2)

w<-data.frame(id,time,trials,p)

ggplot(w, aes(w[,2], w[,4], size=w[,3])) + geom_point() +
geom_line(aes(group = w[,1]), size=0.25)


However, I want to add one final wrinkle. I'd like to color each
individual line based on the slope of that line. If p1 is the
proportion of successes at time point 1, and p2 is the proportion of
successess at time point 2, then a person with (p1, p2) = (0, 1) would
have, say, a dark blue line. Whereas someone with (p1, p2) = (1, 0)
would have a dark red line. Any line with a slope in between these two
extremes would have a color that falls somewhere in the middle of the
blue:red spectrum. Doable? I used a 'long' dataset to create what I
have. If I used 'wide' dataset, I could add a difference variable,
DIFF=P2-P1, then specify 'Color=DIFF'. But then my 'group = w[,1]'
argument in geom_line() would no longer be valid... Suggestions?

Thanks,
Skyler



On Jan 26, 2:57 pm, Ista Zahn <iz...@psych.rochester.edu> wrote:
> Hi again Sean,
> As Brian suggested, you're going to want to keep an ID column so you
> can tell ggplot which pairs of points to connect with lines. I'm also
> still not sure exactly what you want -- either Brian's solution, or
> perhaps something like
>
> df <- data.frame(rep(c("male", "female"), 5), rep(c("Drug1", "Drug2"),
> 5), pre_D1=c(10:1), post_D1=c(1:10), pre_D2=c(25:16),
> post_D2=c(16:25))
> colnames(df)[1:2] <- c("Sex", "Drug")
> df$ID <- factor(1:10)
> dfm <- melt(df, id=c(1:2, 7))
>
> dfm$d <- "D1"
> dfm$d[grep("D2", dfm$variable)] <- "D2"
> dfm$code <- paste(dfm$ID, dfm$d, sep="_")
>
> ggplot(dfm, aes(variable, value)) + geom_point() +
> facet_grid(Sex~Drug) + geom_line(aes(group=code))
>
> Best,
> Ista
> Department of Clinical and Social Psychologyhttp://yourpsyche.org- Hide quoted text -
>
> - Show quoted text -

Ista Zahn

unread,
Feb 25, 2011, 5:38:05 PM2/25/11
to skyjo, ggplot2
At first I thought this would be a one-liner, but it is a little more
complicated than that. Here is one solution:

d <- ddply(w, .(id), function(df) {
r <- df[df$time==2, "p"] - df[df$time==1, "p"]
d <- data.frame(id = unique(df$id), diff=r)
return(d)})
w <- merge(w, d)
ggplot(w, aes(time, p, size=trials)) + geom_point() +
geom_line(size=0.25, aes(group=id, color=diff))

Best,
Ista

skyjo

unread,
Feb 28, 2011, 11:05:40 AM2/28/11
to ggplot2
Yes this gives the results I was wanting. Thank you much. However:

> dim(w)
[1] 1638400 5
> dim(d)
[1] 6400 2


Why are the dimensions so large?

Thanks,
Skyler
> >> Department of Clinical and Social Psychologyhttp://yourpsyche.org-Hide quoted text -

Ista Zahn

unread,
Feb 28, 2011, 11:28:27 AM2/28/11
to skyjo, ggplot2
I don't know, because you are doing something different from what I
posted, and you didn't tell me what it was. Here is the complete
example:

subj<-c(1:50)
trials1<-floor(runif(50,1,13))
p1<-rbinom(50,trials1,.30)/trials1
trials2<-floor(runif(50,1,13))
p2<-rbinom(50,trials2,.70)/trials2

id<-rep(subj,2)


time<-c(rep(1,50),rep(2,50))
trials<-c(trials1,trials2)
p<-c(p1,p2)

w<-data.frame(id,time,trials,p)

library(ggplot2)

d <- ddply(w, .(id), function(df) {
r <- df[df$time==2, "p"] - df[df$time==1, "p"]
d <- data.frame(id = unique(df$id), diff=r)
return(d)})
w <- merge(w, d)

ggplot(w2, aes(time, p, size=trials)) + geom_point() +
geom_line(size=0.25, aes(group=id, color=diff))

dim(w)
[1] 100 5

Best,
Ista

skyjo

unread,
Feb 28, 2011, 11:27:59 AM2/28/11
to ggplot2
NEVERMIND! Disregard that last comment. The large dimensions were from
rerunning the code several times. Each run increased the size of the
datasets.

Thanks again, Ista.

Thanks,
Skyler
> > >> Department of Clinical and Social Psychologyhttp://yourpsyche.org-Hidequoted text -
>
> > >> - Show quoted text -
>
> > > --
> > > You received this message because you are subscribed to the ggplot2 mailing list.
> > > Please provide a reproducible example:http://gist.github.com/270442
>
> > > To post: email ggp...@googlegroups.com
> > > To unsubscribe: email ggplot2+u...@googlegroups.com
> > > More options:http://groups.google.com/group/ggplot2
>
> > --
> > Ista Zahn
> > Graduate student
> > University of Rochester
> > Department of Clinical and Social Psychologyhttp://yourpsyche.org-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -

skyjo

unread,
Feb 28, 2011, 3:49:55 PM2/28/11
to ggplot2
Hmmm..... Actually rerunning the code repeatedly doesn't increase the
dataset size. I'm not sure what I did to get such large dimensions. Oh
well.
> > > >> Department of Clinical and Social Psychologyhttp://yourpsyche.org-Hidequotedtext -
>
> > > >> - Show quoted text -
>
> > > > --
> > > > You received this message because you are subscribed to the ggplot2 mailing list.
> > > > Please provide a reproducible example:http://gist.github.com/270442
>
> > > > To post: email ggp...@googlegroups.com
> > > > To unsubscribe: email ggplot2+u...@googlegroups.com
> > > > More options:http://groups.google.com/group/ggplot2
>
> > > --
> > > Ista Zahn
> > > Graduate student
> > > University of Rochester
> > > Department of Clinical and Social Psychologyhttp://yourpsyche.org-Hidequoted text -
>
Reply all
Reply to author
Forward
0 new messages