Using position_dodge on point geoms

4,151 views
Skip to first unread message

Karl Ove Hufthammer

unread,
Jan 19, 2013, 12:49:35 PM1/19/13
to ggp...@googlegroups.com
Dear ggplot2 list members,

I have some data with identical (x,y) values. Is it possible to use
position_dodge() to avoid the points overlapping? I have tried to do
so, but only get strange warnings and results.

Example:

d=subset(iris, Petal.Length < 2)
sunflowerplot(d[, 3:4]) # The ‘petals’ indicate duplicate points

# Ordinary plotting hides these duplicates
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) + geom_point(size=3)

# One solution is to use jittering, but even then some points
# may by chance happen to lie directly on top of each other
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
geom_jitter(size=3, position=position_jitter(w=.01,h=.01))

# So I want to use dodging. But just setting position="dodge"
# doesn’t work:
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
geom_point(size=3, position="dodge")

# Setting ‘width’ and ‘height’ only seems to have
# the effect of compressing the plot horizontally
# (only ‘width’ seems to matter – ‘height’ has no effect)
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) + '
geom_point(size=3, position=position_dodge(width=.1, height=.1))

I also get strange warning, ‘ymax not defined: adjusting position
using y instead’, but neither ‘position_dodge’ nor ‘geom_point’ takes
a ‘ymax’ argument, AFAICS.

And I don’t really understand the documentation at
http://docs.ggplot2.org/0.9.3/position_dodge.html
Looking at plots 3–7, it seems like ‘position_dodge’
doesn’t have an effect at all?

--
Karl Ove Hufthammer

Ista Zahn

unread,
Jan 19, 2013, 1:58:50 PM1/19/13
to Karl Ove Hufthammer, ggp...@googlegroups.com
Hi Karl,

See in line below.

On Sat, Jan 19, 2013 at 12:49 PM, Karl Ove Hufthammer
<Karl.Hu...@math.uib.no> wrote:
> Dear ggplot2 list members,
>
> I have some data with identical (x,y) values. Is it possible to use
> position_dodge() to avoid the points overlapping? I have tried to do so, but
> only get strange warnings and results.
>
> Example:
>
> d=subset(iris, Petal.Length < 2)
> sunflowerplot(d[, 3:4]) # The ‘petals’ indicate duplicate points
>
> # Ordinary plotting hides these duplicates
> ggplot(d, aes(x=Petal.Length, y=Petal.Width)) + geom_point(size=3)
>
> # One solution is to use jittering, but even then some points
> # may by chance happen to lie directly on top of each other
> ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
> geom_jitter(size=3, position=position_jitter(w=.01,h=.01))
>
> # So I want to use dodging. But just setting position="dodge"
> # doesn’t work:
> ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
> geom_point(size=3, position="dodge")
>
> # Setting ‘width’ and ‘height’ only seems to have
> # the effect of compressing the plot horizontally
> # (only ‘width’ seems to matter – ‘height’ has no effect)
> ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
> geom_point(size=3, position=position_dodge(width=.1, height=.1))

position_dodge is for dodging groups. Since you have no groups defined
dodging has no effect. You can create groups as I've done below (note
that there is probably a better way to do this...)

tmp <- unique(d[c("Petal.Length", "Petal.Width")])
tmp$group <- interaction(tmp)
d <- merge(d, tmp)
d <- ddply(d, "group", transform, gn = factor(1:length(group)))

Now you can dodge the groups

ggplot(d, aes(x=Petal.Length, y=Petal.Width, group = gn)) +
geom_point(size=3, position=position_dodge(width=.08))

Personally I don't like this approach very well. Jiiter works better
in my opinion. Or you could size by the number of points:

d <- ddply(d, "group", transform, gsize = length(gn))

ggplot(d, aes(x=Petal.Length, y=Petal.Width, size = gsize)) +
geom_point()


>
> I also get strange warning, ‘ymax not defined: adjusting position using y
> instead’, but neither ‘position_dodge’ nor ‘geom_point’ takes a ‘ymax’
> argument, AFAICS.

I see that a lot, and just ignore it.

>
> And I don’t really understand the documentation at
> http://docs.ggplot2.org/0.9.3/position_dodge.html
> Looking at plots 3–7, it seems like ‘position_dodge’
> doesn’t have an effect at all?

I think that is a documentation error. There is no dodging there for
the same reason there was no dodging in your original effort. The
example should be set up as

df <- data.frame(x=c("a","a","b","b"), y=1:4, g = rep(1:2, 2))
(p <- qplot(x, y, data=df, group=g, position="dodge", geom="bar",
stat="identity"))

then the remaining examples should work.

Best,
Ista

>
> --
> Karl Ove Hufthammer
>
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

Karl Ove Hufthammer

unread,
Jan 20, 2013, 4:09:44 AM1/20/13
to ggp...@googlegroups.com, Ista Zahn
>> # Setting 'width' and 'height' only seems to have
>> # the effect of compressing the plot horizontally
>> # (only 'width' seems to matter - 'height' has no effect)
>> ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
>> geom_point(size=3, position=position_dodge(width=.1, height=.1))
>
> position_dodge is for dodging groups. Since you have no groups defined
> dodging has no effect. You can create groups as I've done below (note
> that there is probably a better way to do this...)

Thanks! That did the trick (but see comment below). And an easier way
to create the groups is simply:

d$gn=paste(d$Sepal.Length, d$Sepal.Width)

> ggplot(d, aes(x=Petal.Length, y=Petal.Width, group = gn)) +
> geom_point(size=3, position=position_dodge(width=.08))
>
> Personally I don't like this approach very well. Jiiter works better
> in my opinion. Or you could size by the number of points:

I too think jittering works better here. But I was hoping for
position_dodge to do *two-dimensional* dodging, just like jittering
can do two-dimensional jittering, so for example for four points, I
would get

**
**

centred on their common (x,y) coordinate. For three points, a

*
* *

group (but with the bottom points close together), etc.

--
Karl Ove Hufthammer

Karl Ove Hufthammer

unread,
Jan 20, 2013, 4:50:31 AM1/20/13
to ggp...@googlegroups.com, Ista Zahn
> Thanks! That did the trick (but see comment below). And an easier
> way to create the groups is simply:
>
> d$gn=paste(d$Sepal.Length, d$Sepal.Width)

Sorry, that’s incorrect. I misunderstood how ‘group’ was supposed to
work. I thought each cluster of points should be in the same group.
But actually they should be in *different* groups, so we need your
‘merge’ solution. But in theory, I guess

d$gn=seq(nrow(d))

should also work. But it gives a somewhat different result then your
‘merge’ solution (which should allow parts of clusters to overlap).

But interestingly, none of the two group specifications gives the
*correct* result. See for example the top-most point (1.6, 0.6). This
is a single point, with no points nearby, so ‘position_dodge’ should
have no effect. But when using ‘position_dodge’, this point is moved
horizontally. A bug in ‘position_dodge’?

--
Karl Ove Hufthammer

Ista Zahn

unread,
Jan 20, 2013, 9:10:13 AM1/20/13
to Karl Ove Hufthammer, ggp...@googlegroups.com
Hi Karl,

Ah, I see. I don't think dodging in both dimensions is supported. That
would be nice!

Ista Zahn

unread,
Jan 20, 2013, 9:49:28 AM1/20/13
to Karl Ove Hufthammer, ggp...@googlegroups.com
On Sun, Jan 20, 2013 at 4:50 AM, Karl Ove Hufthammer
<Karl.Hu...@math.uib.no> wrote:
>> Thanks! That did the trick (but see comment below). And an easier way to
>> create the groups is simply:
>>
>> d$gn=paste(d$Sepal.Length, d$Sepal.Width)
>
>
> Sorry, that’s incorrect. I misunderstood how ‘group’ was supposed to work. I
> thought each cluster of points should be in the same group. But actually
> they should be in *different* groups, so we need your ‘merge’ solution. But
> in theory, I guess
>
> d$gn=seq(nrow(d))
>
> should also work.

Yeah, I guess that works. I don't really like it though, because we're
putting every row in a different group. I would still go with

d$group <- interaction(d[c("Petal.Width", "Petal.Length")])
d <- ddply(d, "group", transform, gn = factor(1:length(group)))

if I were to take this approach at all

But it gives a somewhat different result then your ‘merge’
> solution (which should allow parts of clusters to overlap).
>
> But interestingly, none of the two group specifications gives the *correct*
> result. See for example the top-most point (1.6, 0.6). This is a single
> point, with no points nearby, so ‘position_dodge’ should have no effect. But
> when using ‘position_dodge’, this point is moved horizontally. A bug in
> ‘position_dodge’?

Maybe, but I think we are using it for something it was not designed
to do. My suggestion is to choose one of:

#jitter
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
geom_jitter()

#size
d <- ddply(d, "group", transform, gsize = length(group))
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
geom_point(aes(size = gsize))

# alpha
ggplot(d, aes(x=Petal.Length, y=Petal.Width)) +
geom_point(aes(alpha = gsize), size=4)

Aaron Mackey

unread,
Jan 21, 2013, 9:51:52 AM1/21/13
to Karl Ove Hufthammer, ggplot2, Ista Zahn
You're looking for geom_dotplot, with position="center", etc., but it's still not dodging.  See also "beeswarm" package and related.  These techniques only work well when the data is relatively sparse/minimal; consider stat_binhex to convey density distributions of larger datasets. 

-Aaron

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
ama...@virginia.edu
http://www.cphg.virginia.edu/mackey




--
Karl Ove Hufthammer

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility

To post: email ggp...@googlegroups.com

Karl Ove Hufthammer

unread,
Jan 29, 2013, 4:08:49 PM1/29/13
to Aaron Mackey, ggplot2, Ista Zahn
> You're looking for geom_dotplot, with position="center", etc., but it's
> still not dodging. See also "beeswarm" package and related. These
> techniques only work well when the data is relatively sparse/minimal;
> consider stat_binhex to convey density distributions of larger datasets.

Yes, binhex is nice for larger datasets. I guess one solution for
simple two-dimensional dodging for smaller datasets in ggplot2 would
be to create a new geom taking an ndots (‘number of dots’) aesthetic
and drawing a special geom based on this, e.g.

http://en.wikipedia.org/wiki/Circle_packing_in_a_circle
http://en.wikipedia.org/wiki/File:Circles_packed_in_square_5.svg
or the glyps used in ‘sunflowerplot’.

(I have no plans to create such a geom, but if I ever find the time to
learn how to create ggplot2 geoms, this might be a nice ‘first
project’.)

Thanks for the ‘beeswarm’ suggestion. It looks like a very nice package
(using base graphics). I very much like the example images at
http://www.cbs.dtu.dk/~eklund/beeswarm/

These types of stripcharts are an excellent alternative to the
horrible dynamite
plots that are often used:
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/TatsukiRcode/Poster3.pdf

--
Karl Ove Hufthammer
Reply all
Reply to author
Forward
0 new messages