Mike
unread,Nov 30, 2009, 4:14:03 PM11/30/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ggplot2
All,
I know that there are several different ways to deal with
overplotting with ggplot 2, but none of them seem to capture what I am
looking for, when "the rubber hits the road". My problem is as
follows:
1) I have data across 2 dimensions, x & y, with one legend already
consumed by a categorical factor
2) I have data that will span several orders of magnitude, sometimes
with sparse data, and sometimes with dense data.
What I am "normally" used to doing is to perform an x-y "point"
plot with different colors to represent the different categories, then
I will place a histogram below the x-axis and to the left of the y-
axis (broken down by factor in each case). I cannot seem to find a
nice, easy way to create a histogram inside of the plot. It can be
done, but not easily at all, to my knowledge / skill level.
I have tried the following methods:
o point plot, making the points smaller, turning on a little
jitter, and giving some alpha. I don't like this because (a) I REALLY
have to make the points tiny to make data points not pile on top of
each other which makes the sparse points hard to see, (b) I don't like
jitter because it is distorting the data, and (c) alpha similarly
makes certain points disappear.
o bin_2d plot, with the colour as the factor and the fill as
the ..count.. I don't like this because a particular bin is EITHER
one factor or the other, so a lot of data is lost. Plus, when I put
in two "aes" parameters like ..count.. and my factor, the legend for
count becomes empty. I could probably figure out what I am doing
wrong in that, though.
o I have also created a point plot like above, but with an overlay
of "stat_density2d", using "tile" and fill by density, to create an
effect similar to what Hadley has in his example. I then put a low
alpha on this plot so that it is mostly see-through and I can see the
points behind it. But again, this ends up making the graph just not-
quite-right. The density layer really just darkens things and it is
hard to see the variation in density.
Barring a solution that would relatively easily be able to create
a histogram, which would be ideal, the easiest other solution that I
see is to plot the points in a different ORDER. E.g., if I have 3
factors, A, B, and C, I would like to be able to plot so that A is
drawn first, then B, then C. Then draw the plot again so that C is
drawn first, then A, etc. In this manner, I can use multiple plots of
the same data set so that I can see how many points of the "other
factor" are hidden behind the factor that is being drawn last. I am
hoping this is feasible in it. Otherwise, I'll have to go back to non-
ggplot plots, and draw the data one factor at a time.
Thanks!
Mike Williamson