problem with aesthetic group in geom_line and scale_manual_colour

1,129 views
Skip to first unread message

Herbert Jägle

unread,
Jun 25, 2012, 7:00:23 PM6/25/12
to ggp...@googlegroups.com

I am trying to create a plot of timeseries data. I am manually specifying the colours and use a group aesthetic to group datapoints to lines.

The following code reproduces my problem:

If i use the code for plot p, i get the correct colours, but the grouping does not work. The lines are created mixing datapoints from both groups. Did i miss something?

If i split the data to plot the lines for both groups separately (like for plot q), the lines are correct, but the colours are wrong. Since the factors are the same, i would expect to still get black for 'WT' and red for 'KO'.

Any explanation and solution?

Thanks,

Herbert

------------------

df.d <- data.frame(pgroup=c(rep('WT', 100), rep('KO', 100)), step=rep(c(rep(1, 50), rep(2, 50)), 2),

timep=rep(c(1:50), 4), amplitude=c(sin(1:50), 2*sin(1:50), 3*sin(1:50), 4*sin(1:50)),

se=rep(c(0.18, 0.2, 0.22, 0.25), 50))

df.d$fact_step <- factor(df.d$step, levels=unique(df.d$step), labels=unique(df.d$step), ordered=TRUE)

df.d$fact_colour <- factor(df.d$pgroup, levels=groups, labels=groups, ordered=TRUE)


p <- ggplot(df.d) +

geom_line(aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('WT', 'KO')) +

scale_fill_manual(name='', values=c('black', 'red'), guide='none')

print(p)


q <- ggplot() +

geom_line(data=subset(df.d, pgroup=='WT'), aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

geom_line(data=subset(df.d, pgroup=='KO'), aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('WT', 'KO')) +

scale_fill_manual(name='', values=c('black', 'red'), guide='none')

print(q)

Brandon Hurr

unread,
Jun 26, 2012, 3:02:49 AM6/26/12
to Herbert Jägle, ggp...@googlegroups.com
I'm sorry, but your example isn't reproducible. 

> df.d$fact_colour <- factor(df.d$pgroup, levels=groups, labels=groups, ordered=TRUE)
Error in factor(df.d$pgroup, levels = groups, labels = groups, ordered = TRUE) : 
  object 'groups' not found

In all honesty though, I'm not sure why you're making a new column of data to replicate another as an ordered factor. I'll assume this is over simplified from your real dataset. 
You also don't need scale_fill for geom_line(). 

ggplot(df.d) +
geom_line(aes(timep, amplitude, colour=pgroup, group=step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',
breaks=c('WT', 'KO'), labels=c('WT', 'KO'))

If I do this, I get a plot that doesn't look strange. 

Could you clarify the issue?

B
file.png

Herbert Jägle

unread,
Jun 26, 2012, 4:22:16 AM6/26/12
to ggp...@googlegroups.com, Herbert Jägle
Thanks. The data is similar to a huge data set and the plot just one of a series of plots created from the data. In the original data the factor is created from real numbers and are used to control facetting.

Attached a corrected code example to reproduce my issue as well as three figures. "correct.png" illustrates what i am trying to achieve, "wrong_line.png" looks pretty similar to your figure except the colours. As you can see the lines are not representing the waveforms i created.
And "wrong_colour.png" shows the correct waveforms, but for some reason i do not understand the colours are exchanged ("Control" is coloured red and "Gene" is black). I thought that using an ordered factor the colours are always assigned according to the order of the factor (in my example "black" to "WT" and "red" to "KO"). If this is not the case, how do i have control about the assignment of the colours?

Thanks,
Herbert

------------

df.d <- data.frame(pgroup=c(rep('WT', 100), rep('KO', 100)),

step=rep(c(rep(1, 50), rep(2, 50)), 2),

timep=rep(c(1:50), 4),

amplitude=c(sin(1:50), 2*sin(1:50), 3*sin(1:50), 4*sin(1:50)),

se=rep(c(0.18, 0.2, 0.22, 0.25), 50))

df.d$fact_step <- factor(df.d$step, levels=unique(df.d$step), ordered=TRUE)

df.d$fact_colour <- factor(df.d$pgroup, levels=c('WT', 'KO'), ordered=TRUE)


p <- ggplot(df.d) +

geom_line(aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('Control', 'Gene'))

print(p)


q <- ggplot() +

geom_line(data=subset(df.d, pgroup=='WT'), aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

geom_line(data=subset(df.d, pgroup=='KO'), aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('Control', 'Gene'))

print(q)

-----------

correct.png
wrong_colour.png
wrong_line.png

Brandon Hurr

unread,
Jun 26, 2012, 4:55:34 AM6/26/12
to Herbert Jägle, ggp...@googlegroups.com
OK, that is more clear. 

ggplot has no way of knowing that it needs to separate WT step 1 from KO step 1 when plotting. I think you're thinking like faceting where these things would be split out by a factor and then plotted and this isn't happening. 

A simple solution to this would be to create a factor that would separate these two things easily. 

df.d$fact_step <- paste(df.d$pgroup, df.d$step, sep="")

ggplot(df.d) +

geom_line(aes(timep, amplitude, colour=fact_colour, group=fact_step)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('Control', 'Gene'))


--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

isthisright.png

Herbert Jägle

unread,
Jun 26, 2012, 5:25:11 AM6/26/12
to ggp...@googlegroups.com, Herbert Jägle
Yes, this is what i want. Thanks.

There are still two questions left:

1. Why does the "colour=fact_colour" not separate WT step 1 from KO step 1 as it does if i reduce the data set to step 1 only (without group):
-----

p <- ggplot(subset(df.d, step==1)) +

geom_line(aes(timep, amplitude, colour=fact_colour)) +

scale_colour_manual(name='', values=c('black', 'red'), guide='legend',

breaks=c('WT', 'KO'), labels=c('Control', 'Gene'))

print(p)

------

I would expect this separation is still active if i add the second separation "group=fact_step".


2. I still have no idea why the colours in my previous example with two separate geom_line statements gets exchanged. It would be very helpful to have an idea what i am missing here.

Thanks,

Herbert

one_step_only.png

Brandon Hurr

unread,
Jun 26, 2012, 5:38:23 AM6/26/12
to Herbert Jägle, Dennis Murphy, ggp...@googlegroups.com
Technically it doesn't... If you add the group=fact_step you get the bad lines again...

ggplot(subset(df.d, step==1)) +

geom_line(aes(timep, amplitude, colour=fact_colour, group=fact_step)) +
scale_colour_manual(name='', values=c('black', 'red'), guide='legend',
breaks=c('WT', 'KO'), labels=c('Control', 'Gene'))

RE: color... I'm not sure. I think when you are defining the color in your independent geom_line() calls you need to specify the color directly outside of the aes() (e.g. geom_line(aes(x=x, y=y, group=group), color="red")... when I did that the legend disappeared though. 

Dennis is very good with these things, perhaps he can chime in whenever the sun is up where he lives. 

B

Dennis Murphy

unread,
Jun 26, 2012, 9:23:11 AM6/26/12
to Brandon Hurr, Herbert Jägle, ggp...@googlegroups.com
Hi:

The only thing I can see differently is that you can use the
interaction() function to get combinations of fact_colour and
fact_step and then use that as the grouping factor.

df.d <- data.frame(pgroup=c(rep('WT', 100), rep('KO', 100)),
step=rep(c(rep(1, 50), rep(2, 50)), 2),
timep=rep(c(1:50), 4),
amplitude=c(sin(1:50), 2*sin(1:50), 3*sin(1:50), 4*sin(1:50)),
se=rep(c(0.18, 0.2, 0.22, 0.25), 50))
df.d$fact_step <- factor(df.d$step, levels=unique(df.d$step),
labels=unique(df.d$step), ordered=TRUE)
df.d$fact_colour <- factor(df.d$pgroup, levels=c('WT', 'KO'), ordered=TRUE)
df.d$fact_colstep <- with(df.d, interaction(fact_step, fact_colour))

ggplot(df.d, aes(x = timep, y = amplitude, colour = fact_colour)) +
geom_line(aes(group = fact_colstep), size = 1) +
scale_colour_manual(name='', guide='legend',
breaks=c('WT', 'KO'), values=c('black', 'red'), labels =
c('Control', 'Gene'))

Does that help?

Dennis

Herbert Jägle

unread,
Jun 26, 2012, 1:56:15 PM6/26/12
to ggp...@googlegroups.com, Brandon Hurr, Herbert Jägle
Thanks, this solves a part of my problem. I have still problems getting the correct colours. Here is another example using the data attached.
The colours of the data plotted and the legend does not correspond. While the legend shows the correct colours, the data is plotted with the wrong colours.

Is this reproducible for you and what is going wrong here?

Thanks,
Herbert

---------

load('df.RData')

# i want the Control data to be plotted in black and Gene data in red

colours=c('black', 'red')

labels=c('Control', 'Gene')

# now some control plots

p <- ggplot(subset(df.d, pgroup==1), aes(timep, amplitude)) + geom_point()

print(p)

# -> 100 ms plot corresponding to Control data

q <- ggplot(subset(df.d, pgroup==2), aes(timep, amplitude)) + geom_point()

print(q)

# -> 80 ms plot corresponding to Gene data

s <- ggplot(df.d) +

geom_ribbon(aes(timep, ymin = amplitude-se, ymax = amplitude+se,

fill=fact_colour, group=fact_colint), alpha = 0.5) +

geom_line(aes(timep, amplitude, colour=fact_colour, group=fact_colint)) +

scale_colour_manual(name='', values=colours, guide='legend',

labels=labels) +

scale_fill_manual(name='', values=colours, guide='none')

print(s)

# -> the Control data (100 ms) is shown in red, but the legend indicates black

# -> the Gene data (80 ms) is shown in black, but the legend indicates red

------------

df.RData
example2_out.png

Brandon Hurr

unread,
Jun 26, 2012, 2:33:05 PM6/26/12
to Herbert Jägle, ggp...@googlegroups.com
It's because you've mislabeled your color factor... 
# -> the Gene data (80 ms) is shown in black, but the legend indicates red
range(subset(df.d, fact_colour == "Control")[["timep"]])
[1] 0.00000000 0.07953032

# -> the Control data (100 ms) is shown in red, but the legend indicates black
> range(subset(df.d, fact_colour == "\"Tgfbr2\"^\"-/-\"")[["timep"]])
[1] 0.00000000 0.09956946

Herbert Jägle

unread,
Jun 26, 2012, 3:05:02 PM6/26/12
to ggp...@googlegroups.com, Herbert Jägle
Great!!
The problem is solved if i add a sort() into the factor statement.
Many thanks.
Herbert
Reply all
Reply to author
Forward
0 new messages