Legend of mixed line and point plots

Keith

unread,

Jul 13, 2011, 11:24:19 AM7/13/11

to ggplot2

Dear all,

I would like to plot a mixture of lines and points in a plot. Well, it
works however I think the legend itself is too smart to mix the points
and lines into points+lines for all data sets :-)

Here is a simple example:
The data contains 4 columns. After reading the data into a data frame
named data, I would like to plot F1 in points, F2 in line and F3 in
points+line.

Index,F1,F2,F3
1,1.034,0.035,2
2,1.069,0.036,1.984
3,1.104,0.037,1.969
4,1.139,0.038,1.954
5,1.173,0.040,1.939
6,1.207,0.041,1.925
7,1.241,0.042,1.910
8,1.275,0.043,1.896
9,1.309,0.044,1.881
10,1.342,0.045,1.867

I used the following code to generate the plot:
ggplot(data, aes(Index)) + geom_point(aes(y=F1, colour="F1")) +
geom_line(aes(y=F2, colour="F2")) + geom_point(aes(y=F3, colour="F3"))
+ geom_line(aes(y=F3, colour="F3"))

The ggplot2 generate the plot I want however the legend showed all the
data in "points+line". Is there anyway to just show F1 in points, F2
in line and F3 in points+line respectively? Besides, I would like to
represent the line in different linetypes and tried to add linetype
into aes of geom_line, e.g. geom_line(aes(y=F2, colour="F2",
linetype="F2")), but it didn't work. I also tried to merge the data
first, however I didn't know how to separate the data into different
kind of plots (line/points).

Does anyone have any idea?

regards,
Keith

Brian Diggs

unread,

Jul 13, 2011, 1:09:10 PM7/13/11

to Keith, ggplot2

> in line and F3 in points+line respectively?Besides, I would like to

> represent the line in different linetypes and tried to add linetype
> into aes of geom_line, e.g. geom_line(aes(y=F2, colour="F2",
> linetype="F2")), but it didn't work.

You would need to specify the linetype in both line calls. Otherwise
there is just one specified, so it is the default one.

ggplot(data, aes(Index)) +
geom_point(aes(y=F1, colour="F1")) +

geom_line(aes(y=F2, colour="F2", linetype="F2")) +

geom_point(aes(y=F3, colour="F3")) +

geom_line(aes(y=F3, colour="F3", linetype="F3"))

But set below to get what you want.

> I also tried to merge the data
> first, however I didn't know how to separate the data into different
> kind of plots (line/points).
>
> Does anyone have any idea?

Here is the melted approach:

data.m <- melt(data, id.vars="Index")

ggplot(data.m, aes(x=Index, y=value, colour=variable)) +
geom_point(aes(shape=variable)) +
geom_line(aes(linetype=variable)) +
scale_linetype_manual(breaks=c("F1","F2","F3"),
value=c("blank","solid","dashed")) +
scale_shape_manual(breaks=c("F1","F2","F3"),
value=c(16,NA,16))

Basically the approach is to add a shaped aesthetic (which I will use to
say there/not there) for the points. Then the scales are set up
manually to suppress the line for F1 and the point shape for F2.

> regards,
> Keith
>

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

Keith

unread,

Jul 13, 2011, 3:23:28 PM7/13/11

to dig...@ohsu.edu, ggp...@googlegroups.com

Hello Brain,

Thanks for your kind and quick reply. Unfortunately, the first code
doesn't work well on my computer but the 2nd one works. The result of
the 1st code shows me 2 legends:
In legend "F1" shows all three datasets in "line+point" icon with
different colours which are wrong because some data sets should only
show points or a line symbols, and
In legend "F2" shows datasets of F2 and F3 in black solid and dashed
lines respectively.

Actually I'm implementing a general script for post-processing tasks in
my project. In this case, I prefer the number of parameters as less as
possible to reduce some hassles from the user side. In this case, I
would like to use the default options of ggplot2 for the selection of
line types and colour and so on. That's the reason I would like the 1st
code would work. Any ideas?

with regards,
Keith

Brian Diggs

unread,

Jul 13, 2011, 5:36:11 PM7/13/11

to Keith, ggp...@googlegroups.com

Yes, that is what I got as well. That is because the two scales don't
all have the same number of entries and labels (color has three (F1, F2,
and F3), linetype has two (F2 and F3)). ggplot doesn't know that they
can be combined (although that might be an interesting feature request;
combined scales if one is a strict subset of the other, setting the
missing values to the equivalent null (NA, "blank", 0) value. That
would still leave the case of partially overlapping entries in scales
(say F1 and F2 in one; F2 and F3 in the other) that I don't know how/if
they could/should be combined)).

> Actually I'm implementing a general script for post-processing tasks in
> my project. In this case, I prefer the number of parameters as less as
> possible to reduce some hassles from the user side. In this case, I
> would like to use the default options of ggplot2 for the selection of
> line types and colour and so on. That's the reason I would like the 1st
> code would work. Any ideas?

Closest thing I can think of is to wrap the call in a function. Will
there always be 3 levels named F1, F2, and F3? If not, how would you
decide which get points and which get lines? If it is just the 3, you
could do something like:

plot.my.data <- function(data) {
ggplot(melt(data, id.vars="Index"),

aes(x=Index, y=value, colour=variable)) +
geom_point(aes(shape=variable)) +
geom_line(aes(linetype=variable)) +
scale_linetype_manual(breaks=c("F1","F2","F3"),
value=c("blank","solid","dashed")) +
scale_shape_manual(breaks=c("F1","F2","F3"),
value=c(16,NA,16))
}

plot.my.data(data)

You can still do more ggplot-ish things with what plot.my.data returns:

plot.my.data(data) + theme_bw()

If you really want to get fancy, you give your data a class and write a
specialized function of ggplot for it:

class(data) <- c("mydata", class(data))

ggplot.mydata <- function(data = NULL, ...) {
ggplot(melt(data, id.vars="Index"),

aes(x=Index, y=value, colour=variable)) +
geom_point(aes(shape=variable)) +
geom_line(aes(linetype=variable)) +
scale_linetype_manual(breaks=c("F1","F2","F3"),
value=c("blank","solid","dashed")) +
scale_shape_manual(breaks=c("F1","F2","F3"),
value=c(16,NA,16))
}

ggplot(data)

This last makes the most sense if the structure of your "data"
data.frame is very strictly defined (especially if it is the result of
some computations you make, which means you can also assign the class to
it) and you have very specific ways of working with and displaying this
particular type of data.

> with regards,

Brian Diggs

unread,

Jul 13, 2011, 7:35:27 PM7/13/11

to Keith, ggp...@googlegroups.com

> Thanks a lot, Brian. Your explanation is really clear and now I know why
> the 1st script doesn't work well. Earlier I was struggling to get the
> idea why it doesn't go well and now everything is clear. But, it might
> be an interesting feature to request :-p

>
>
>>> Actually I'm implementing a general script for post-processing tasks in
>>> my project. In this case, I prefer the number of parameters as less as
>>> possible to reduce some hassles from the user side. In this case, I
>>> would like to use the default options of ggplot2 for the selection of
>>> line types and colour and so on. That's the reason I would like the 1st
>>> code would work. Any ideas?
>>
>> Closest thing I can think of is to wrap the call in a function. Will
>> there always be 3 levels named F1, F2, and F3? If not, how would you
>> decide which get points and which get lines? If it is just the 3, you
>> could do something like:
>>
>

> Just as I've mentioned earlier I'm trying to have a general script for
> the post-processing purposes of my project. In this project, some parts
> of the code are based on Java. Hence, the data frame will be constructed
> in Java and will be passed to R to generate the plot. In this case, the
> number of columns and their names are really dependent on the results
> passed from Java and the user could choose which one gets points and
> which gets lines. Of course, there will be some default settings for users.
>
> Due to these reasons, I was trying to hide the settings like
> "value=c("blank","solid","dashed"))" from users to make it as
> simple/automatic as possible. It seems somehow I have to figure out how
> to deal with those "value" settings automatically.

How are the users specifying which should be points and which should be
lines? Does that come from the Java part as well? If so, that
information can be passed to R with the dataframe (in a list, say), and
parsed out on the R side.

Maybe you pass a list which has the data and two vectors of names which
variables should be points and which should be lines.

# Make a mock data structure that represents what would be passed from
# the Java part:
# "data" is as before
fromJava <- list(data=data, points=c("F1", "F3"), lines=c("F2", "F3"))
class(fromJava) <- "KeithClass"

> dput(fromJava)
structure(list(data = structure(list(Index = 1:10, F1 = c(1.034,
1.069, 1.104, 1.139, 1.173, 1.207, 1.241, 1.275, 1.309, 1.342
), F2 = c(0.035, 0.036, 0.037, 0.038, 0.04, 0.041, 0.042, 0.043,
0.044, 0.045), F3 = c(2, 1.984, 1.969, 1.954, 1.939, 1.925, 1.91,
1.896, 1.881, 1.867)), .Names = c("Index", "F1", "F2", "F3"), class =
"data.frame", row.names = c(NA,
-10L)), points = c("F1", "F3"), lines = c("F2", "F3")), .Names = c("data",
"points", "lines"), class = "KeithClass")

ggplot.KeithClass <- function(data = NULL, ...) {
DF <- data$data
cols <- setdiff(names(DF),"Index")
points.idx <- cols %in% data$points
lines.idx <- cols %in% data$lines
shapes <- rep(NA, length(cols))
shapes[points.idx] <- 16
linetypes <- rep("blank", length(cols))
suppressWarnings(linetypes[lines.idx] <- c("solid", "dashed",
"dotdash", "longdash", "todash"))
eval(bquote(ggplot(melt(DF, id.vars="Index"),

aes(x=Index, y=value, colour=variable)) +
geom_point(aes(shape=variable)) +
geom_line(aes(linetype=variable)) +

scale_shape_manual(breaks=.(cols), value=.(shapes)) +
scale_linetype_manual(breaks=.(cols), value=.(linetypes))))
}

ggplot(fromJava)

There is some weird eval(bquote()) stuff going on to get around the
where arguments are evaluated (the breaks and values aren't defined when
the plot is actually made after the function returns otherwise). Even
if you don't wrap all the info into a single list, this at least gives
you an idea of how you can go from a list of variables to the
appropriate arguments for the values.

Brian Diggs

unread,

Jul 14, 2011, 11:42:29 AM7/14/11

to Keith, ggp...@googlegroups.com

> Yes, all the options are from the Java part and I only use R to plot the
> results in this case.

> Thanks a lot Brian. This is almost what I would like to have. Here I
> only have 2 questions left:
> * I've seen you still have to specify the type of lines manually. Is
> there any way I could get the default line type settings from ggplot2?
> In this case, I don't have to worry about how many lines the user is
> going to plot.

I don't know of an easy way (it wouldn't be hard with the new scales
package that ggplot2 is moving to, but the current approach does not
lend itself to doing so). Digging in the source, the default set of
lines are:

c("solid", "22", "42", "44", "13", "1343", "73", "2262", "12223242",
"F282", "F4448444", "224282F2", "F1")

Note that I misspelled "twodash" in my previous version. You can just
replace this set for the one I had before.

The list has to be somewhere, and here it is buried inside your
function, so the end users don't have to see it.

For a description of what these mean in terms of the line shown, see
?"par", especially the entry for "lty" and the "Line Type Specification"
section.

> * I saw you use "scale_shape_manual(breaks=.(cols), value=.(shapes))".
> May I ask why you use dot + bracket, say .(cols), to assign parameters?
> Does this mean anything in R? sorry for this newbie question.

Ah, this is not a newbie question at all. I was getting ready to launch
into a discussion about the finer points to scoping and evaluating
expressions in specific environments when I changed my code back to
reproduce the error I had been getting. And the error wasn't there. So
the short answer is here, you don't.

The longer answer is that the . notation is related to the fact that the
ggplot call (was) inside a bquote function call which causes arguments
inside .() to be replaced with their value as found in a specified
environment. I thought I needed to do this to get around scoping
problems, but that wasn't really the problem I was having then.

Here is an updated version:

ggplot.KeithClass <- function(data = NULL, ...) {
DF <- data$data
cols <- setdiff(names(DF),"Index")
points.idx <- cols %in% data$points
lines.idx <- cols %in% data$lines
shapes <- rep(NA, length(cols))
shapes[points.idx] <- 16
linetypes <- rep("blank", length(cols))
suppressWarnings(linetypes[lines.idx] <-

c("solid", "22", "42", "44", "13", "1343", "73",
"2262", "12223242", "F282", "F4448444", "224282F2",
"F1"))

ggplot(melt(DF, id.vars="Index"),
aes(x=Index, y=value, colour=variable)) +
geom_point(aes(shape=variable)) +
geom_line(aes(linetype=variable)) +

scale_shape_manual(breaks=cols, value=shapes) +
scale_linetype_manual(breaks=cols, value=linetypes)
}

> with regards,
> Keith

Keith

unread,

Jul 14, 2011, 9:55:45 AM7/14/11

to dig...@ohsu.edu, ggp...@googlegroups.com

Yes, all the options are from the Java part and I only use R to plot the
results in this case.

> Maybe you pass a list which has the data and two vectors of names which

Thanks a lot Brian. This is almost what I would like to have. Here I

only have 2 questions left:
* I've seen you still have to specify the type of lines manually. Is
there any way I could get the default line type settings from ggplot2?
In this case, I don't have to worry about how many lines the user is
going to plot.

* I saw you use "scale_shape_manual(breaks=.(cols), value=.(shapes))".
May I ask why you use dot + bracket, say .(cols), to assign parameters?
Does this mean anything in R? sorry for this newbie question.

with regards,
Keith

Keith

unread,

Jul 13, 2011, 6:19:47 PM7/13/11

to dig...@ohsu.edu, ggp...@googlegroups.com

Thanks a lot, Brian. Your explanation is really clear and now I know why
the 1st script doesn't work well. Earlier I was struggling to get the

idea why it doesn't go well and now everything is clear. But, it might
be an interesting feature to request :-p

>> Actually I'm implementing a general script for post-processing tasks in
>> my project. In this case, I prefer the number of parameters as less as
>> possible to reduce some hassles from the user side. In this case, I
>> would like to use the default options of ggplot2 for the selection of
>> line types and colour and so on. That's the reason I would like the 1st
>> code would work. Any ideas?
>
> Closest thing I can think of is to wrap the call in a function. Will
> there always be 3 levels named F1, F2, and F3? If not, how would you
> decide which get points and which get lines? If it is just the 3, you
> could do something like:
>

Just as I've mentioned earlier I'm trying to have a general script for

the post-processing purposes of my project. In this project, some parts
of the code are based on Java. Hence, the data frame will be constructed
in Java and will be passed to R to generate the plot. In this case, the
number of columns and their names are really dependent on the results
passed from Java and the user could choose which one gets points and
which gets lines. Of course, there will be some default settings for users.

Due to these reasons, I was trying to hide the settings like
"value=c("blank","solid","dashed"))" from users to make it as
simple/automatic as possible. It seems somehow I have to figure out how
to deal with those "value" settings automatically.

> plot.my.data <- function(data) {

Reply all

Reply to author

Forward