Dot plots

Winston Chang

unread,

Dec 9, 2011, 7:24:46 PM12/9/11

to ggplo...@googlegroups.com

Hi all -

I've written an implementation of dot plots. It required creating a new grob, which I called dotcluster. Because of the way the dots have to be stacked, the dotcluster grob acts somewhat differently from other objects used in ggplot2.

There are two binning algorithms: "histodot", which is the same as "stat_bin" but with modifications to bin along x or y, and "dotdensity" (the default), which is taken from Wilkinson (1999). To use geom_dotplot, you set the bin width, and it scales the dots so that the diameter is the same as the bin width. (With the dotdensity algorithm, the bin width is really the maximum bin width.)

It's still a work in progress... if you feel like testing it out, please let me know if you run into problems or how it can be improved.

https://github.com/wch/ggplot2/tree/feature/wdotplot

Some notes:

- With dot density binning, dot stacks can overlap with each other, by up to 50% of the dot width.

- Stacking is by dot diameter, not by y coordinate. This means that when you resize the window, the dots stay visually stacked, while the y axis might change. Circles have the same x and y dimensions in visual space, but not in the data space.

- When binning along the x axis, each dot has a y coordinate (and ymin and ymax) so that the range of the y axis is set. However, the dots aren't actually placed at that y coordinate -- they're stacked visually. Each stack of dots can be thought of as a single object which, like a text object, doesn't change dimensions when you change the y scaling.

- When binning along y axis (and stacking along x), the stack of dots is assigned a total width of 'width', usually 0.9. This is to allow for dodging. I tried adding an option so that each dot in a stack gets an incremented x value (1, 2, 3), but the scale system didn't seem to like me messing with x values like that. It also didn't like when I used just aes(y=y), without setting x or mapping something to x. I don't know if there's a way to deal with it

Issues I haven't figured out yet:

- Should it be called something other than "dotplot", to distinguish it from Cleveland dot plots?

- When stacking on y, instead of setting the y values of each dot to 1, 2, 3, should I just treat the whole stack as an object with height=1?

- I think the dot density binning algorithm is right, but there may be trouble cases that I haven't encountered yet. He mentions some sort of smoothing algorithm that takes a final pass over the bins, but I haven't implemented it.

- Coord transforms don't work. At this point I don't know if they even make sense conceptually for these objects.

Problems:

- Right now, if you want to bin along the y axis (instead of the default x), you have to set binaxis="x" and binstataxis="x". The reason for this is because I can't figure out how to give one parameter to both the stat_bindot and GeomDotplot. It looks like the way things are coded, stats "eat up" parameters so that they're not available for the geom.

In Wilkinson's grammar of graphics book, he gives specifications of these of dot plots. However, they have some characteristics that seem to violate the GoG -- at least, if you treat each dot as an object with x and y coordinates. If you treat each stack as a single object, then it might make more sense, but then position adjustments like dodging are problematic. I'll have to think about this some more. If someone wants to enlighten me, I'd appreciate it!

At any rate, on to the fun stuff, the examples and pictures. Here are some graphs I've been using for testing.

set.seed(122)

dat <- data.frame(x=rnorm(20), y=rnorm(20))

# Stack vertically, sitting on 0 as baseline.

dp1 <- ggplot(dat, aes(x)) + geom_rug() + scale_x_continuous(breaks=seq(-4,4,.4))

dp1 + geom_dotplot(binwidth=.4, alpha=.2, colour="red")

# Notice each dot stack is centered over a set of observations. The binning is done with

# Wilkinson's (1999) dot density algorithm. 'binwidth' sets the maximum bin width.

# The y range is correctly set from 0 to 5, but the y axis scale actually has nothing

# to do within y positioning of the dots. The dot diameter is the same as the maximum

# bin width and they're stacked visually; if you resize the window to make it taller

# or shorter, they stay visually stacked. You could resize the window so that the dots

# align with the tick marks

# Use histodot binning

dp1 + geom_dotplot(binwidth=.4, alpha=.2, colour="red", binmethod="histodot")

# This uses the algorithm from stat_bin: with fixed-width intervals.

# Squish together vertically with smaller stackratio

dp1 + geom_dotplot(binwidth=.4, stackratio=.8)

# Stacking methods (stackdir="up" is default)

# stack down

dp1 + geom_dotplot(binwidth=.4, alpha=1, stackdir="down")

# stack center

dp1 + geom_dotplot(binwidth=.4, alpha=1, stackdir="center")

# stack centerwhole

# keep dots aligned and add one dot up, then one down, then one up, etc.

dp1 + geom_dotplot(binwidth=.4, alpha=1, stackdir="centerwhole")

# stack centerwholedown - reverse of centerwhole

dp1 + geom_dotplot(binwidth=.4, alpha=1, stackdir="centerwholedown")

# Dot diameter expanded to 1.4 * max binwidth.

# Stacking remains so that they're just touching vertically

dp1 + geom_dotplot(binwidth=.4, alpha=1,colour="black", dotsize=1.4)

# Bin along Y

dp1y <- ggplot(dat, aes(x=0, y=y)) + geom_rug() + scale_y_continuous(breaks=seq(-4,4,.4))

dp1y + geom_dotplot(binwidth=.4, binaxis="y", binstataxis="y", stackdir="center")

# Notice that 'binaxis' and 'binstataxis' need to be set.

# Y direction, stack centerwhole

dp1y + geom_dotplot(binwidth=.4, binaxis="y", binstataxis="y", stackdir="centerwhole")

# Data with x and g as factors

dat2 <- data.frame(x=LETTERS[1:3], y=round(rnorm(90),2), g=LETTERS[1:2])

# Groups on x axis

dp2 <- ggplot(dat2, aes(x=x, y=y)) + scale_y_continuous(breaks=seq(-4,4,.4))

dp2 + geom_dotplot(binwidth=.25, colour="black", binaxis="y", binstataxis="y",

stackdir="centerwhole")

# Groups on x axis with violins (also smaller bin size)

dp2 + geom_violin() +

geom_dotplot(binwidth=.15, position="dodge", binaxis="y", binstataxis="y",

stackdir="center")

# With boxplots and violins, also violin width scaled relative to each other

dp2 + geom_violin(fullwidth=FALSE) +

geom_boxplot(position="dodge", width=.2, outlier.size=0) +

geom_dotplot(alpha=.3, binwidth=.15, position="dodge", binaxis="y", binstataxis="y",

stackdir="center")

# Dodging, mapping "x" to fill instead of x

ggplot(dat2, aes(x="foo", y=y, fill=x)) + scale_y_continuous(breaks=seq(-4,4,.4)) +

geom_dotplot(binwidth=.25, alpha=.4, position="dodge", binaxis="y", binstataxis="y",

stackdir="centerwhole")

# grouping on x and g, dodging

ggplot(dat2, aes(x=x, y=y, fill=g)) + scale_y_continuous(breaks=seq(-4,4,.4)) +

geom_dotplot(binwidth=.2, alpha=.2, position="dodge", binaxis="y", binstataxis="y",

stackdir="centerwhole")

# These clusters don't have an "real" x width, so dodging is a bit weird. In this case

# the clusters are too close together, but if you just make the window wider, the clusters

# will move apart (within each cluster the dots will stay together).

# Vertical, with grouping on x and g, with boxplots and violins

ggplot(dat2, aes(x=x, y=y)) + scale_y_continuous(breaks=seq(-4,4,.4)) +

geom_violin(aes(colour=g), fill="white") +

geom_boxplot(aes(colour=g), position=position_dodge(0.9),

width=.3, outlier.size=0) +

geom_dotplot(aes(fill=g), binwidth=.15, alpha=.3, position="dodge",

binaxis="y", binstataxis="y", stackdir="center")

dotplot-18.png

dotplot-2.png

dotplot-1.png

dotplot-3.png

dotplot-17.png

dotplot-9.png

dotplot-15.png

dotplot-19.png

dotplot-7.png

dotplot-5.png

dotplot-8.png

dotplot-12.png

dotplot-13.png

dotplot-6.png

dotplot-14.png

dotplot-16.png

Hadley Wickham

unread,

Dec 11, 2011, 9:58:38 AM12/11/11

to Winston Chang, ggplo...@googlegroups.com

This looks really cool! You may want to also post to the main ggplot2 mailing list with brief instructions on how try out:

library(devtools)

dev_mode()

install_github("ggplot2", "wch", "...")

etc

If you start submitting these as pull requests, I can start giving more formal feedback on the code.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

dotplot-5.png

dotplot-12.png

dotplot-14.png

dotplot-3.png

dotplot-18.png

dotplot-15.png

dotplot-8.png

dotplot-2.png

dotplot-16.png

dotplot-9.png

dotplot-1.png

dotplot-7.png

dotplot-6.png

dotplot-13.png

dotplot-17.png

dotplot-19.png

Winston Chang

unread,

Dec 11, 2011, 12:52:29 PM12/11/11

to Hadley Wickham, ggplo...@googlegroups.com

This looks really cool! You may want to also post to the main ggplot2 mailing list with brief instructions on how try out:

library(devtools)
dev_mode()
install_github("ggplot2", "wch", "...")

etc

Sounds like a good idea, I'll do that.

For those of you who are already using a local git repository, you can do the following to test out my branch without actually installing it (https://gist.github.com/1150934):

# ====== Do this once =======

# Add the remote git repository (this needs to be done once):

git remote add wch https://github.com/wch/ggplot2.git

git fetch wch

# Install devtools (this needs to be done once):

install.packages("devtools")

# ====== Do this each time you want to try the code =======

# Check out the desired branch:

git checkout wch/feature/wdotplot

# It'll say something about detached HEAD state. This is OK.

# Load the library in R, without installing:

library(devtools)

load_all("/path/to/ggplot2/")

[test things out]

# ====== Other misc stuff =======

# To get your git repository back to the main branch:

git checkout master

# To get new changes from the repository:

git fetch wch

# You can remove the remote repository from your list by doing

git remote rm wch

If you start submitting these as pull requests, I can start giving more formal feedback on the code.

Should I submit a pull request even when I think the code isn't yet ready? The other things I've written are pretty clean now, but the dot plot stuff is still messy.

-Winston

Winston Chang

unread,

Dec 11, 2011, 1:19:58 PM12/11/11

to Hadley Wickham, ggplo...@googlegroups.com

One thing that I need help with: I can't seem to figure out how to make a single parameter get passed to both the geom and stat.

I want to use use a single parameter to set the binningaxis, like binaxis="y". This parameter needs to go to both geom_dotplot and the stat_bindot, but if I add that parameter to stat_bindot, it doesn't get passed along to geom_dotplot. Is there a way to make this happen?

Right now I've been using two parameters, binaxis and binstataxis. One of them goes to the geom and one goes to the stat, but this is obviously not an ideal solution.

Thanks!

-Winston

Hadley Wickham

unread,

Dec 12, 2011, 7:49:54 AM12/12/11

to Winston Chang, ggplo...@googlegroups.com

> One thing that I need help with: I can't seem to figure out how to make a
> single parameter get passed to both the geom and stat.
>
> I want to use use a single parameter to set the binningaxis, like
> binaxis="y". This parameter needs to go to both geom_dotplot and the
> stat_bindot, but if I add that parameter to stat_bindot, it doesn't get
> passed along to geom_dotplot. Is there a way to make this happen?

You might need to explicitly supply geom_params and stat_params in the
new layer call.

Hadley

Hadley Wickham

unread,

Dec 12, 2011, 7:50:21 AM12/12/11

to Winston Chang, ggplo...@googlegroups.com

> Should I submit a pull request even when I think the code isn't yet ready?
> The other things I've written are pretty clean now, but the dot plot stuff
> is still messy.

Sure - if you want comments, that's the easiest way to get them.

Hadley

Winston Chang

unread,

Dec 12, 2011, 11:52:49 AM12/12/11

to Hadley Wickham, ggplo...@googlegroups.com

Thanks - that did the trick.

-Winston

Reply all

Reply to author

Forward