I do not believe there is currently a way to do this. I also have
some interest in this and one of my side projects has been working on
this, but it has been slow. There are basically two steps:
1) determine which points are overlapping
2) calculate new values in whichever dimension you are using (in the
picture you referenced, the x plane) to avoid any overlap
Depending what assumptions you can make, 1) becomes more or less
difficult. For example, if you have roughly discrete values that will
not overlap vertically (unless they are exactly the same and directly
on top of each other), then you only need to displace the points when
they have equal values. However, if they could overlap, then you need
to take into account the size of your plotting symbol when determining
if they overlap and the amount of displacement required to alleviate
this.
You can check out the beeswarm package:
http://www.cbs.dtu.dk/~eklund/beeswarm/ it does not work directly
with ggplot2, but you may be able to do some manual work with the
swarmx() or swarmy() functions to get a new set of coordinates for
your points you can directly plot.
Assuming you have a recent version of R with internet access:
source("http://joshuawiley.com/R/Jmisc_devel_installer.R")
require(Jmisc)
Jmisc:::StackAlgorithm
will show you how I attempted to implement an algorithm by Wilkinson
to avoid overplotting. It seems (to me) somewhat more precise than
what is done in the beeswarm package. Essentially, for n points, it
creates a logical n x n matrix of all neighbors. It first picks the
point with the most neighbors, makes that the first 'stack', adjust
the coordinates for all those k points so they do not overlap, removes
the points and proceeds likewise on the n - k x n - k matrix until
there are no more points. It sort of works but has quite a few
troubles and I my attempts to get even the troubled version working
with grid so it could eventually become a grob used in ggplot2 could
make a novel of short failure stories. The only recent progress (from
the last time I discussed this) is that I know a lot more C++ than I
did before and sometime next year I should have the algorithm itself
(which works just fine), implemented in C++ using the Rcpp package by
Dirk and Romain. This should at least take care of the algorithm's
performance issues. Now if only my poor brain could be rewritten in
C++....
Cheers,
Josh
> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
Note to developers: it would be great if there were a geom or stat that made these dot plots -- I wish I could write it myself, but I don't know enough about the inner workings of ggplot2 to do it. If someone wants to do this, I can send the code I have.I hope this is helpful!--Winston
ben
--
On Mon, Nov 14, 2011 at 9:44 AM, Hadley Wickham <had...@rice.edu> wrote:
> I'd love to see a version of this code that used a grid grob to base the
> binning on the size of the points.
>
Not sure if that can help here but in a previous discussion with JiHo
I proposed this dummy grob to stack points dynamically in the
available space,
library(grid)
pointlessGrob <- function(n=3){
grob(n=n, cl="pointless")
}
drawDetails.pointless <- function(x, recording = FALSE){
y.space <- convertY(unit(1,"npc"), "in", TRUE)
y.quantum <- y.space/x$n
grid.points(x=unit(rep(0.5, x$n), "npc"), y=unit(seq(y.quantum/2,
length=x$n, by=y.quantum), "in"),
size = unit(y.quantum*4/3, "in"), pch=18)
}
grid.pointless <- function(...)
grid.draw(pointlessGrob(...))
grid.pointless()
This is adjusting the point size to fit in a given space; perhaps
that's a good option here.
The reverse problem -- deciding how many points to have for a given
point size and space -- should be quite similar (possibly moving
calculations in a preDrawDetails method).
Cheers,
baptiste
I'd love to see a version of this code that used a grid grob to base the binning on the size of the points.
Hadley
Both may be nice, but I think the bin size being determined by the
point size is more important. If you set fixed bins, all you have
done is duplicate a histogram. The whole point of Wilkinson's paper
was to provide something closer to the raw data that avoids (to the
extent possible) arbitrary bin sizes and shifting of points from their
true values. Based on the point size, bins are sized to be the bare
minimum. Also, bins are not applied to the range of the data---the
data are iteratively binned to avoid a left to right or right to left
bias.
>
> In other words, this is the reverse: instead of the bins being based on the
> size of points, set the size of points based on the bins. I think this is
> like what Baptiste suggested in his code. I don't yet understand grid
> graphics well enough to do this myself, but based on Baptiste's code, it
> doesn't look too complicated.
>
> -Winston
>
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
--
On Sun, Nov 13, 2011 at 2:19 PM, Winston Chang <winsto...@gmail.com> wrote:Both may be nice, but I think the bin size being determined by the
> On Sun, Nov 13, 2011 at 2:44 PM, Hadley Wickham <had...@rice.edu> wrote:
>>
>> I'd love to see a version of this code that used a grid grob to base the
>> binning on the size of the points.
>>
>> Hadley
>
> I think that one drawback to doing it this way is that you're not
> controlling the bin size directly, so you'd have a hard time ending up with
> a nice round number for the bin size. Also, if you have a stat or geom do
> the binning for you, you won't have easy access to find out how big the bins
> are (I think), which is probably very important for a graph like this.
> It would be nice to be able to say, "I want my bins and points to be exactly
> 1.5 units wide. Size my points to the appropriate diameter."
point size is more important. If you set fixed bins, all you have
done is duplicate a histogram. The whole point of Wilkinson's paper
was to provide something closer to the raw data that avoids (to the
extent possible) arbitrary bin sizes and shifting of points from their
true values. Based on the point size, bins are sized to be the bare
minimum. Also, bins are not applied to the range of the data---the
data are iteratively binned to avoid a left to right or right to left
bias.