overlaying geom_boxplot() with geom_point

1,439 views
Skip to first unread message

a11_msp

unread,
Apr 27, 2009, 12:16:44 PM4/27/09
to ggplot2
Hi everyone,

I have the following table:

head(res)

ID TF TIME VAL PAR
1 1 X 2 -0.52 1.02
2 1 X 4 0.75 2.23
3 1 Y 2 1.45 1.32
4 1 Y 4 0.84 1.88
5 2 X 2 1.66 1.02
6 2 X 4 3.07 2.23

First, I'm plotting a set of stacked boxplots of VAL for each TF/TIME
combination and this works fine:

p = ggplot(data=res)+
geom_boxplot(aes(x=factor(TF), y=VAL, fill=factor(TIME)))

Now I'm trying to overlay each boxplot with a set of points
corresponding to another parameter, PAR, so I do:

p + geom_point(aes(x=factor(TF),y=MEDIAN, colour=factor(TIME))

Ideally, I would like the X coordinate of the points to coincide with
the centres of the boxplots.
With the default parameters of position to geom_boxplot, setting
fill=factor(TIME) automatically shifts the boxplots so they are
positioned next to each other. The code above obviously does not take
this shift into account and so the X coordinate is the same for both
levels of factor(TIME).
Is there any way to tell geom_point to shift the X coordinate
accordingly?

Actually, perhaps geom_point() isn't even the best way of plotting
this, because (as you may have noticed above) there's actually only a
single value of PAR for each TF/TIME combination - ie, one per
boxplot. What I currently do is just plot loads of PAR points on top
of each other. So if there's a different function suitable for this
purpose, which can better handle the shift in position, this would be
as good.

Many thanks,
Mikhail

--
Mikhail Spivakov, PhD
European Molecular Biology Laboratory

a11_msp

unread,
Apr 27, 2009, 1:20:35 PM4/27/09
to ggplot2
Sorry there was a typo in the code example I gave:

p + geom_point(aes(x=factor(TF),y=MEDIAN, colour=factor(TIME))

should read:

p + geom_point(aes(x=factor(TF),y=PAR, colour=factor(TIME))

hadley wickham

unread,
Apr 28, 2009, 2:36:07 PM4/28/09
to a11_msp, ggplot2
Hi Mikhail,

Could you also please provide your data in a data that's easy for us
to load into R? The simplest approach is just to paste the output of
dput(res)) (or some sample there of) in to your email.

Hadley
--
http://had.co.nz/

Mikhail Spivakov

unread,
Apr 28, 2009, 4:31:09 PM4/28/09
to hadley wickham, ggplot2
Hi Hadley,

Here's the output of dput(res):

structure(list(ID = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 19, 19,
19, 19, 19, 19, 19, 19, 19, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35,
35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36,
36, 36, 36, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 47, 47, 47,
47, 47, 47, 47, 47, 47, 47), TF = c("X", "X", "Y", "Y", "Z",
"Z", "A", "A", "B", "B", "X", "X", "Y", "Y", "Z", "Z", "A", "A",
"B", "B", "X", "X", "Y", "Y", "Z", "Z", "A", "A", "B", "B", "X",
"X", "Y", "Y", "Z", "Z", "A", "A", "B", "B", "X", "X", "Y", "Y",
"Z", "Z", "A", "A", "B", "B", "X", "X", "Y", "Y", "Z", "Z", "A",
"A", "B", "B", "X", "X", "Y", "Y", "Z", "Z", "A", "A", "B", "B",
"X", "X", "Y", "Y", "Z", "Z", "A", "A", "B", "B", "X", "X", "Y",
"Y", "Z", "Z", "A", "A", "B", "B", "X", "X", "Y", "Y", "Z", "Z",
"A", "A", "B", "B"), TIME = c(2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4,
2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4,
2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4), VAL = c(-0.52, 0.75, 1.45, 0.84, 1.66, 3.07,
3.53, 4.1, 3.58, 2.5, -0.2, 1.34, 0.58, 1.58, 0.73, 3, 3.84,
4.35, 2.67, 1.75, 1.27, 0.81, 0.28, 0.9, -0.53, 1.95, 3.95, 3.92,
2.95, 1.59, 0.57, 0.86, 1.01, 1.27, 0.38, 0.8, 1.91, 2.43, 2.99,
1.01, 1.52, 1.94, -0.1, 1.03, -0.01, 2.64, 4.19, 3.99, 3.74,
3.14, 0.36, 1.56, 0.26, 0.53, 1.13, 2.4, 4.84, 5.21, 3.4, 0.76,
-0.33, -0.47, 1.2, 1.69, 0.92, 0.7, -0.47, -0.01, 6.48, 3.44,
2.29, 0.58, 1.03, 1.53, 2.23, 0.74, 0.56, 0.2, 7.68, 1.51, 1.95,
1.64, 2.91, 3.14, 2.99, 3.01, 4.04, 3.91, 4.55, 1.74, 2.15, 2.65,
-0.09, 1.8, 1.84, 1.73, 0.99, 1.26, 4.28, 4.1), PAR = c(0.349974351871954,
0.314372599616707, 0.366158192067888, 0.286869398822942, 0.367861169367239,
0.347178403222464, 0.355099310185781, 0.324241722657399, 0.0766259660968184,
0.292224064036734, 0.349974351871954, 0.314372599616707, 0.366158192067888,
0.286869398822942, 0.367861169367239, 0.347178403222464, 0.355099310185781,
0.324241722657399, 0.0766259660968184, 0.292224064036734, 0.349974351871954,
0.314372599616707, 0.366158192067888, 0.286869398822942, 0.367861169367239,
0.347178403222464, 0.355099310185781, 0.324241722657399, 0.0766259660968184,
0.292224064036734, 0.349974351871954, 0.314372599616707, 0.366158192067888,
0.286869398822942, 0.367861169367239, 0.347178403222464, 0.355099310185781,
0.324241722657399, 0.0766259660968184, 0.292224064036734, 0.349974351871954,
0.314372599616707, 0.366158192067888, 0.286869398822942, 0.367861169367239,
0.347178403222464, 0.355099310185781, 0.324241722657399, 0.0766259660968184,
0.292224064036734, 0.349974351871954, 0.314372599616707, 0.366158192067888,
0.286869398822942, 0.367861169367239, 0.347178403222464, 0.355099310185781,
0.324241722657399, 0.0766259660968184, 0.292224064036734, 0.349974351871954,
0.314372599616707, 0.366158192067888, 0.286869398822942, 0.367861169367239,
0.347178403222464, 0.355099310185781, 0.324241722657399, 0.0766259660968184,
0.292224064036734, 0.349974351871954, 0.314372599616707, 0.366158192067888,
0.286869398822942, 0.367861169367239, 0.347178403222464, 0.355099310185781,
0.324241722657399, 0.0766259660968184, 0.292224064036734, 0.349974351871954,
0.314372599616707, 0.366158192067888, 0.286869398822942, 0.367861169367239,
0.347178403222464, 0.355099310185781, 0.324241722657399, 0.0766259660968184,
0.292224064036734, 0.349974351871954, 0.314372599616707, 0.366158192067888,
0.286869398822942, 0.367861169367239, 0.347178403222464, 0.355099310185781,
0.324241722657399, 0.0766259660968184, 0.292224064036734)), .Names = c("ID",
"TF", "TIME", "VAL", "PAR"), row.names = c(NA, 100L), class = "data.frame")

And the code to generate the plot is:

p = ggplot(data=res)+
geom_boxplot(aes(x=factor(TF), y=VAL, fill=factor(TIME)))+
geom_point(aes(x=factor(TF),y=PAR, colour=factor(TIME)))

As I said, what I'm trying to achieve is for the X coordinate of the
points to coincide with the middle of the respective boxplots.

Many thanks,
Mikhail

hadley wickham

unread,
Apr 28, 2009, 4:46:51 PM4/28/09
to Mikhail Spivakov, ggplot2
> And the code to generate the plot is:
>
> p = ggplot(data=res)+
>             geom_boxplot(aes(x=factor(TF), y=VAL, fill=factor(TIME)))+
>             geom_point(aes(x=factor(TF),y=PAR, colour=factor(TIME)))
>
> As I said, what I'm trying to achieve is for the X coordinate of the
> points to coincide with the middle of the respective boxplots.

Well the easiest way is to switch to using facetting:

ggplot(res, aes(factor(TIME))) +
geom_boxplot(aes(y = VAL, fill = factor(TIME))) +
geom_point(aes(y = PAR, colour = factor(TIME))) +
facet_wrap(~ TF, nrow = 1)

Hadley

--
http://had.co.nz/

Mikhail Spivakov

unread,
Apr 28, 2009, 4:59:18 PM4/28/09
to hadley wickham, ggplot2
Yes, this works! Thanks very much!
Mikhail

On 4/28/09, hadley wickham <h.wi...@gmail.com> wrote:

Reply all
Reply to author
Forward
0 new messages