Alternative to Dynamite Plots

380 views
Skip to first unread message

Sam

unread,
Dec 13, 2010, 4:36:15 PM12/13/10
to ggplot2
Hello all,

I stumbled on to this thread: http://groups.google.com/group/ggplot2/browse_thread/thread/e21c20dc1fe455b8

This led me to this rant: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/TatsukiKoyama/Poster3.pdf

Now I am a little wary on using "Dynamite Plots". The points in that
poster are well made and I tend to agree.

Normally I produce a Dynamite Plot with something like this:

#Dynamite Plot

library(ggplot2)
se <-function(x) sqrt(var(x)/length(x))
x <-runif(27, 0, 125)
data <- as.data.frame(x)
data$z <- factor(c("A","A","A","B","B","B","C","C","C"))
data$yy <- factor(c("a","b","c"))

x.data.avg <-ddply(data, c("yy", "z"), function(df)
return(c(avg=mean(df$x), avg.se=se(df$x))))

avg.plot<-qplot(z, avg,
fill=factor(yy),
data=x.data.avg,
geom="bar",
position="dodge"
)

dodge <- position_dodge(width=0.9)
pg <- avg.plot+geom_linerange(aes(ymax=avg+avg.se, ymin=avg-avg.se),
position=dodge) + scale_fill_grey(start=0, end=0.8, "Section") +
theme_bw()
pg

##

My question is, can someone either a) identify how I might make the
plot found in the poster or b) suggest another type of plot to present
similar data (all in ggplot2 of course). The desire to avoid hiding
any data is fairly strong so I would love to develop a new way of
presenting this type of data. I am sure someone here on this list
might have some useful ideas.

Thanks in advance!

Sam

Zack Weinberg

unread,
Dec 13, 2010, 5:42:36 PM12/13/10
to Sam, ggplot2
Well, you could start with simply

ggplot(data, aes(x=z, y=x, colour=yy)) + geom_boxplot(position="dodge")

Unfortunately, position="dodge" doesn't do anything useful in
geom_point, I'd love to know if there's a way to apply the effect that
it has on geom_boxplot to geom_point.

zw

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Jonathan Christensen

unread,
Dec 13, 2010, 5:55:35 PM12/13/10
to Sam, ggplot2
Sam,

I would probably use faceting:

##
library(ggplot2)
se <-function(x) sqrt(var(x)/length(x))
x <-runif(270, 0, 125)
dat <- as.data.frame(x)
dat$z <- factor(c("A","A","A","B","B","B","C","C","C"))
dat$yy <- factor(c("a","b","c"))


ggplot(data=dat, aes(x=yy, y=x)) + geom_boxplot() +
geom_jitter(position=position_jitter(width=0.1)) +
facet_wrap(~z,nrow=1)+theme_bw()
##

This gets around the lack of dodging that Zack noted.

Jonathan

Hadley Wickham

unread,
Dec 13, 2010, 6:21:17 PM12/13/10
to Zack Weinberg, Sam, ggplot2
On Mon, Dec 13, 2010 at 4:42 PM, Zack Weinberg <za...@panix.com> wrote:
> Well, you could start with simply
>
> ggplot(data, aes(x=z, y=x, colour=yy)) + geom_boxplot(position="dodge")
>
> Unfortunately, position="dodge" doesn't do anything useful in
> geom_point, I'd love to know if there's a way to apply the effect that
> it has on geom_boxplot to geom_point.

If any one was interested, I could dig up the references that discuss
how to create the dodged points in that plot. I haven't had time to
implement the algorithm, but it isn't a huge job.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Joshua Wiley

unread,
Dec 13, 2010, 6:23:22 PM12/13/10
to Sam, ggplot2
Hi Sam,

I agree with Jonathan that faceting is probably the easiest way around
the dodge issue, also it makes it very clear to read, I think. Here
is another option with jittered points, and a bigger point for the
mean and a line for +/- se. Rather than type that all out, I just
edited your se function and then passed it to stat_summary()

Cheers,

Josh


library(ggplot2)

dat <- data.frame(x = runif(270, 0, 125), z = rep(LETTERS[1:3], each = 3),
yy = letters[1:3], stringsAsFactors = TRUE)

## define summary function for mean +/- SE
smry <-function(x) {
mu <- mean(x)
se <- sqrt(var(x)/length(x))
return(data.frame(y = mu, ymin = mu - se, ymax = mu + se))
}

## small data points, big mean point + line
ggplot(data = dat, aes(x = yy, y = x)) +
geom_jitter(position = position_jitter(width = 0.1), size = 1) +
stat_summary(fun.data = smry, size = .9) +
facet_wrap(~ z, nrow = 1) +
theme_bw()

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Brian Diggs

unread,
Dec 13, 2010, 6:34:33 PM12/13/10
to ggplot2
Several people have already given you ideas; I throw a few more in. I
opted to not use faceting (though I think it would probably be best).
Also, I thought I'd show you more compact ways to create the datasets
(though they are very similar to what Joshua did):

data <- data.frame(x=runif(270, 0, 125),
z = factor(rep(c("A","B","C"),each=3,times=30)),
yy = factor(rep(c("a","b","c"),times=90)))

x.data.avg <- ddply(data, .(yy, z), summarize, avg=mean(x),
avg.se=sqrt(var(x)/length(x)))

ggplot(data) +
geom_boxplot(aes(x=z, y=x, group=interaction(z,yy)), fill=NA,
outlier.size=0, outlier.colour="white") +
geom_point(aes(x=z, y=x, group=yy),
position=position_dodge(width=0.9)) +
theme_bw()

ggplot(data) +
geom_boxplot(aes(x=interaction(z,yy), y=x), fill=NA, outlier.size=0,
outlier.colour="white") +
geom_point(aes(x=interaction(z,yy), y=x),
position=position_dodge(width=0.4)) +
theme_bw()

ggplot(data) +
geom_point(aes(x=z, y=x, group=yy),
position=position_dodge(width=0.2)) +
geom_pointrange(data=x.data.avg,
mapping=aes(x=z, y=avg, ymin=avg-avg.se, ymax=avg+avg.se,
group=yy),
position=position_dodge(width=0.2), shape=3) +
theme_bw()

ggplot(data) +
geom_point(aes(x=interaction(z,yy), y=x),
position=position_dodge(width=0.2)) +
geom_pointrange(data=x.data.avg,
mapping=aes(x=interaction(z,yy), y=avg, ymin=avg-avg.se, ymax=avg
+avg.se),
shape=3) +
theme_bw()

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University


On Dec 13, 1:36 pm, Sam <tonightstheni...@gmail.com> wrote:
> Hello all,
>
> I stumbled on to this thread:http://groups.google.com/group/ggplot2/browse_thread/thread/e21c20dc1...

Sam Albers

unread,
Dec 13, 2010, 7:21:27 PM12/13/10
to Joshua Wiley, ggplot2
Thanks all! A much better way of presenting data!

Sam
--
*****************************************************
Sam Albers
Geography Program
University of Northern British Columbia
3333 University Way
Prince George, British Columbia
Canada, V2N 4Z9
phone: 250 960-6777
*****************************************************

Ben Bolker

unread,
Dec 13, 2010, 9:36:44 PM12/13/10
to ggp...@googlegroups.com

For the record, here (below) are the ggplot versions of the (base
plotting) examples I have up at
<http://emdbolker.wikidot.com/blog:dynamite>.

On my wish list for new geoms etc.:

* the 'point dodging' stuff from
<http://biostat.mc.vanderbilt.edu/wiki/Main/TatsukiRcode>
* a dedicated 'geom_pie' (as opposed to the coord_polar() transform),
for superimposing pie charts on maps [I do believe this is actually
sometimes a good idea -- little mini barplots might be marginally
better, but population geneticists are used to the pies]
* a dedicated geom_violin rather than the geom_ribbon hack below
* a geom_beanplot?

is there thought to be any point to boxplots-with-notches any more, or
is that too 1980s?

I guess I should start studying the ggExtra package to figure out how
to write these myself ...

===========
library(ggplot2)
library(MASS)

theme_set(theme_bw())
g0 <- ggplot(OrchardSprays,aes(x=treatment,y=decrease))+
scale_y_log10()
g_dyn <- g0 +
stat_summary(fun.data=mean_cl_normal,geom="bar",colour="gray")+
stat_summary(fun.data=mean_cl_normal,geom="errorbar",width=0.5)
g_errbar <- g0 + stat_summary(fun.data=mean_cl_normal,geom="pointrange")
mm1 <- function(...) {
mean_cl_normal(...,mult=1)
}
mm2 <- function(...) {
mean_cl_normal(...,mult=2)
}
g_errbar2 <- g0 +
stat_summary(fun.data=mm1,geom="linerange",lwd=1)+
stat_summary(fun.data=mm2,geom="linerange")+
stat_summary(fun.data=mm1,geom="point")
g_point <- g0 +geom_point()
g_boxplot <- g0 + geom_boxplot()
g_violin <- ggplot(OrchardSprays,
aes(x=log10(decrease)))+
geom_ribbon(aes(ymax = ..density.., ymin = -..density..),
stat = "density")+
facet_grid(. ~ treatment, as.table = FALSE,
scales = "free_y")+
opts(panel.margin=unit(0 , "lines"))+
coord_flip()+opts(axis.text.x=theme_blank())


library(gridExtra)
grid.arrange(g_dyn,g_errbar,g_errbar2,g_point,g_boxplot,g_violin,
nrow=2)

Joshua Wiley

unread,
Dec 13, 2010, 9:42:29 PM12/13/10
to Hadley Wickham, ggplot2
On Mon, Dec 13, 2010 at 3:21 PM, Hadley Wickham <had...@rice.edu> wrote:
> On Mon, Dec 13, 2010 at 4:42 PM, Zack Weinberg <za...@panix.com> wrote:
>> Well, you could start with simply
>>
>> ggplot(data, aes(x=z, y=x, colour=yy)) + geom_boxplot(position="dodge")
>>
>> Unfortunately, position="dodge" doesn't do anything useful in
>> geom_point, I'd love to know if there's a way to apply the effect that
>> it has on geom_boxplot to geom_point.
>
> If any one was interested, I could dig up the references that discuss
> how to create the dodged points in that plot.  I haven't had time to
> implement the algorithm, but it isn't a huge job.

I've been looking for a way to contribute, and it looks like several
people are interested, so I'll bite. Any other references/tips you
think would help in general with understanding how to work with ggplot
would also be appreciated.

Thanks,

Josh

>
> Hadley

Hadley Wickham

unread,
Dec 14, 2010, 9:11:04 AM12/14/10
to Ben Bolker, ggp...@googlegroups.com
>  For the record, here (below) are the ggplot versions of the (base
> plotting) examples I have up at
> <http://emdbolker.wikidot.com/blog:dynamite>.
>
>  On my wish list for new geoms etc.:
>
> * the 'point dodging' stuff from
> <http://biostat.mc.vanderbilt.edu/wiki/Main/TatsukiRcode>
> * a dedicated 'geom_pie' (as opposed to the coord_polar() transform),
> for superimposing pie charts on maps [I do believe this is actually
> sometimes a good idea -- little mini barplots might be marginally
> better, but population geneticists are used to the pies]
> * a dedicated geom_violin rather than the geom_ribbon hack below
> * a geom_beanplot?
>
>  is there thought to be any point to boxplots-with-notches any more, or
> is that too 1980s?
>
>  I guess I should start studying the ggExtra package to figure out how
> to write these myself ...

If you wanted to write your own, the best place to start would be the
new layers package - https://github.com/hadley/layers. This will
provide the implementation of geom, stats and position adjustments
from ggplot2 0.9, and has been heavily rewritten to be much simpler
and uses much better development practices (e.g. S3 instead of proto,
uses roxygen, namespaces etc). Start with
https://github.com/hadley/layers/blob/master/R/geom.r for
documentation about the methods.

Instead of geom_pie, I'd suggest geom_wedge, and look at the
parameters protovis uses
(http://vis.stanford.edu/protovis/docs/wedge.html). You could write a
new position adjustment for the special case of pies that rotates each
wedge so they don't overlap.

For density estimation, I'd check out https://github.com/hadley/bin
(probably soon to be renamed), where I've been trying provide a common
interface to density estimation across all R packages that do it.
This is even more under development, but hopefully you can see the
general idea.

Hadley Wickham

unread,
Dec 14, 2010, 9:15:26 AM12/14/10
to Joshua Wiley, ggplot2
>> If any one was interested, I could dig up the references that discuss
>> how to create the dodged points in that plot.  I haven't had time to
>> implement the algorithm, but it isn't a huge job.
>
> I've been looking for a way to contribute, and it looks like several
> people are interested, so I'll bite.  Any other references/tips you
> think would help in general with understanding how to work with ggplot
> would also be appreciated.

Awesome! I'd start with these two papers from Lee Wilkinson:


@article{wilkinson:1999a,
Author = {Wilkinson, Leland},
Journal = {The American Statistician},
Title = {Dot plots},
Year = {1999}}


@article{dang:2010,
Author = {Dang, Tuan Nhon and Wilkinson, L and Anand, A.},
Journal = {IEEE Transactions on Visualization and Computer Graphics},
Number = {6},
Pages = {1044-1052},
Title = {Stacking Graphic Elements to Avoid Over-plotting},
Volume = {16},
Year = {2010}}


I would start by getting the basic algorithm working and testing it
with base graphics. Integration with ggplot2 should happen once
everything else is working - but if you are interested, I'd again
recommend looking at https://github.com/hadley/layers - position is
the least development of the three, but again it should give you some
idea where I'm going.

Dennis Murphy

unread,
Dec 14, 2010, 1:50:56 PM12/14/10
to Sam, ggplot2
Hi:

Since I've ranted on this particular subject several times in this forum, I feel I must weigh in on this thread :)

I've never found a dynamite plot that couldn't be easily replaced by an error bar plot without losing an iota of information. The bars themselves are distracting enough and dominate the graphical landscape; moreover, the presence of the bars forces the origin of the y-axis to zero, which is not always wise or necessary. Center points in an error bar plot render the bar chart aspect moot.

As for alternatives, there have already been several nice suggestions, but I have a couple more :) I went back into my stash of ggplot2 code to resurrect an example that I posted some time last summer. The first plot is one with dodged boxplots and superimposed points, similar in spirit to some of the suggestions made earlier in this thread. I think I got lucky with this particular choice of settings because I haven't always been able to replicate the results reliably when I've tried this code in other examples.

d2 <- data.frame(junk = rnorm(3000, mean = rep(1:3, each = 1000)),
                  x = factor(rep(1:3, each = 1000)),
                  z = factor(rep(rep(c('A', 'B', 'C', 'D'), each = 250), 3)))

# Dodged boxplots with jittered points

g <- ggplot(d2, aes(x = x, y = junk))
g + geom_boxplot(aes(fill = z), position = position_dodge(width = 0.8)) +
     geom_point(aes(fill = z), size = 1, color = 'blue',
                position = position_dodge(width = 0.8))

To me, this plot is OK, but I would have preferred to have more control over the jitter of points while being located in the positions specified by geom_boxplot(). AFAIK there is no way at present to condition the center line of the point jitter relative to the dodged positions of the boxplots. If someone is going to undertake the task of writing a new geom that combines boxplots and points, this is something to keep in mind. For me to get this code to 'work', the width has to be the same in both position_dodge() calls - change the latter in the above code to see what I mean.

Another possibility is to include jittered points with a set of dodged error bars. To keep this simple, I created a summary data frame and call it to generate the error bars. The original points are in the background, jittered and made more transparent so that they don't dominate the graphic. Again, the widths specified in position_dodge() have to be consistent for the plot to make sense. In this case, the width of the bars on the error bar plot provide a reference for which set of jittered points belong to each subgroup.

summ <- ddply(d2, .(x, z), summarise, m = mean(junk), s = se(junk))
summ

# Error bar plots
h <- ggplot(summ, aes(x = x, y = m))
h + geom_point(data = d2, aes(x = x, y = junk), colour = 'gold', alpha = 0.4,
                 position = position_dodge(width = 0.8)) +
    geom_errorbar(aes(ymin = m - 1.96 * s, ymax = m + 1.96 * s, colour = z),
            width = 0.8, size = 1, position = position_dodge(width = 0.8)) +
    geom_point(aes(colour = z), size = 2.5,
               position = position_dodge(width = 0.8))

Either of these is, at least IMO, superior to dynamite plots because they both show the original data and the summary in the same graphic.

More generally, an automated way to combine geom_point() with several of the summary geoms such as boxplot, (h)errorbar and crossbar would be a useful contribution. (I've excluded pointrange and linerange because in the presence of jittered points, they provide no reference element to distinguish jittered data in one group from another without an external feature to separate the point clouds.)

An aside to Hadley and other developers: a recent package called beeswarm contains a very nice approach to point jittering that would be useful to include in ggplot2. It's much more compact (not to mention versatile) than the current implementation in base graphics but still faithfully shows the point density in a localized region.
http://www.cbs.dtu.dk/~eklund/beeswarm/

Hope this is of some help - I'm not a great function writer, but I'd be willing to contribute to this effort in some capacity.

Dennis


On Mon, Dec 13, 2010 at 1:36 PM, Sam <tonights...@gmail.com> wrote:

Hadley Wickham

unread,
Dec 14, 2010, 9:21:54 PM12/14/10
to Dennis Murphy, Sam, ggplot2
> An aside to Hadley and other developers: a recent package called beeswarm
> contains a very nice approach to point jittering that would be useful to
> include in ggplot2. It's much more compact (not to mention versatile) than
> the current implementation in base graphics but still faithfully shows the
> point density in a localized region.
> http://www.cbs.dtu.dk/~eklund/beeswarm/

It looks like an interesting approach, but it's not usable with
ggplot2 because it uses base graphics. It's also frustratingly
written - the data transformation is all tangled up with the plotting
which makes it very difficult to see what's going on. Do you know if
this method has been published anywhere?

LeeBranum-Martin

unread,
Dec 15, 2010, 11:42:30 AM12/15/10
to ggplot2
I was trying to reproduce this and came across a problem using
facet_grid.
My thought was that if gaps, outliers, and distribution are not
serious problems, then density plots would work as nice summaries. My
first attempt was:

# density of x within panels of z
dp <- ggplot(d2, aes(x = junk, colour = x))
dp + geom_density(fill = NA) + facet_grid(z ~ .)

But that did not replicate the boxplots with jittered points approach,
so I tried to swap z for x:

dz <- ggplot(d2, aes(x = junk, colour = z))
dz + geom_density(fill = NA) + facet_grid(x ~ .)

However, this generates an error which I do not understand:
Error in names(df) <- output :
'names' attribute [2] must be the same length as the vector [1]

Any ideas on how to achieve the swap of z for x?
I think one strength of this approach is that it shows differences in
distributions which are missed by boxes and difficult to discern in
jittered points.

Kohske Takahashi

unread,
Dec 16, 2010, 2:11:13 AM12/16/10
to LeeBranum-Martin, ggplot2
Hi, probably the name of variable induce the error, although I have
not yet found the place inducing the error.
try:

d2$x2 <- d2$x


dz <- ggplot(d2, aes(x = junk, colour = z))

dz + geom_density(fill = NA) + facet_grid(x2 ~ .)

--
Kohske Takahashi <takahash...@gmail.com>

Research Center for Advanced Science and Technology,
The University of  Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

Hadley Wickham

unread,
Dec 16, 2010, 10:14:42 AM12/16/10
to Kohske Takahashi, ggplot2
On Thu, Dec 16, 2010 at 1:11 AM, Kohske Takahashi
<takahash...@gmail.com> wrote:
> Hi, probably the name of variable induce the error, although I have
> not yet found the place inducing the error.
> try:

The problem is that facetting columns are added on to the data frame
containing the aesthetics - this needs a complete redesign to fix (in
progress in the main branch), where instead of adding on all the
facetting variables, we just add on a column that gives the panel that
the data belongs to.

Aaron Mackey

unread,
May 10, 2011, 12:14:11 PM5/10/11
to Hadley Wickham, ggplot2
FYI, looks like you can do this to get transformed data (for use with
ggplot2), without plotting: df <- beeswarm(..., do.plot=F)

-Aaron

Reply all
Reply to author
Forward
0 new messages