Dotplot, position="stack"

1,103 views
Skip to first unread message

Matthew Shun-Shin

unread,
Feb 27, 2012, 10:26:35 AM2/27/12
to ggplo...@googlegroups.com
Hi,

The new geom_dotplot() is great and creating some wonderful graphs.

However, should the position attribute behave like the geom_histogram?

x <- rnorm(100)
group <-c(rep("A", 60), rep("B", 40))
data <- data.frame(x=x, group=group)

plot.hist_identity <- ggplot(aes(x=x, fill=group), data=data) + geom_histogram(position="identity", alpha=0.5)
plot.hist_stack <- ggplot(aes(x=x, fill=group), data=data) + geom_histogram(position="stack", alpha=0.5)
plot.dotplot_identity <- ggplot(aes(x=x, fill=group), data=data) + geom_dotplot(method="histodot", position="identity", alpha=0.5)
plot.dotplot_stack <- ggplot(aes(x=x, fill=group), data=data) + geom_dotplot(method="histodot", position="stack", alpha=0.5)

Currently position="stack" behaves like position="identity" (alpha set so that you can see the overplotting).

Best wishes and many thanks for such a great library,

Matthew



Winston Chang

unread,
Feb 27, 2012, 10:52:41 AM2/27/12
to Matthew Shun-Shin, ggplo...@googlegroups.com
Hi Matthew -

The way that geom_dotplot is implemented is a bit different, and position="stack" won't work.  Here's an attempt at explaining the root of the problem:

With geom_histogram, the bars are stacked by moving their y position so that the starting y position of one bar is the ending y position of the one below it. When you resize a graph to be taller, the bars will scale accordingly.

With geom_dotplot, the dots are always circular, so when you resize the graph to be taller, they can't be stretched vertically the way that histogram bars are.

For reasons related to this, each stack of dots is a single grob, which has a y position for the baseline. Unlike the bars in geom_histogram the dot stacks don't have a real y height, since the y height of each stack can't be determined (at least, not with the way that ggplot2 currently works). And because it doesn't have a y height, position="stack" won't work.


All that said, it may be possible add a hack to support position="stack". You're not the only person who has asked about it... if it's something that's a lot of people need, I can take a look into it.

-Winston

Matthew Shun-Shin

unread,
Feb 27, 2012, 11:03:05 AM2/27/12
to ggplo...@googlegroups.com, Matthew Shun-Shin
Thanks for the quick reply.

If there was a vote for hack to make it work it would be great...

Currently I am working around it using either overplotting, or the beeswarm package...

Best wishes,

Matt

Winston Chang

unread,
Feb 29, 2012, 3:48:33 AM2/29/12
to Matthew Shun-Shin, ggplo...@googlegroups.com
Well, it turns out it wasn't as ugly as I thought it would be. I haven't really tested it thoroughly, but it works with these examples:

set.seed(113)
x <- rnorm(100)
group <-c(rep("A", 40), rep("B", 60))
data <- data.frame(x=x, group=group)

# histodot, no stack
ggplot(aes(x=x, fill=group), data=data) + 
  geom_dotplot(binwidth=.2, method="histodot", alpha=0.5)

# histodot with stack
ggplot(aes(x=x, fill=group), data=data) + 
  geom_dotplot(binwidth=.2, method="histodot", position="stack", alpha=0.5)

# dotdensity with bins aligned across groups (binpositions="all"), no stack
ggplot(aes(x=x, fill=group), data=data) +
  geom_dotplot(binwidth=.2, binpositions="all", alpha=0.5)

# dotdensity with bins aligned across groups (binpositions="all"), and stack
ggplot(aes(x=x, fill=group), data=data) +
  geom_dotplot(binwidth=.2, binpositions="all", position="stack", alpha=0.5)


Also it doesn't work for binaxis="y". This is because stacking normally goes up, but when binaxis="y", it should go right. This could be remedied by using a new argument besides position="stack".

Making the geom capture the position="stack" argument was probably the least elegant part of implemention, so I'd be happy to get rid of that code. The new argument could be "stackgroups". The drawback to adding a new parameter is that it adds nonstandard user-facing code, for an operation that already exists (at least when binaxis="x").


Installing via install_github may not work for you because of a bug in some versions of devtools. Normally you can do this:
  library(devtools)
  dev_mode()
  install_github('ggplot2', 'wch', 'experimental')
But there is a problem with some versions of devtools where it doesn't actually use the branch.


-Winston
dotplot_stack-1.png
dotplot_stack-2.png
dotplot_stack-3.png
dotplot_stack-4.png

Winston Chang

unread,
Mar 1, 2012, 12:43:20 AM3/1/12
to Matthew Shun-Shin, ggplot2-dev
I decided to change it from using position="stack" to stackgroups=TRUE. 
- This makes it clear that it's not using the normal position adjustment methods.
- Getting it to work on the y-axis is simple.
- It prints a message when you try to use position="stack", saying that you probably want to use stackgroups=TRUE instead.
- The code is cleaner. Using position="stack" was messy and circumvented some normal behavior.

I've attached a bunch of examples. The last two show how to plot two groups along a center line, and they don't require the new stackgroups parameter -- they can be done without the new modifications.


set.seed(113)
data <- data.frame(x = rnorm(100),
                   y = rnorm(100),
                   group = c(rep("A", 40), rep("B", 60)))


# histodot, no stackgroups
ggplot(data, aes(x=x, fill=group)) + 
  geom_dotplot(binwidth=.2, method="histodot", alpha=0.5)

# histodot with stackgroups
ggplot(data, aes(x=x, fill=group)) + 
  geom_dotplot(binwidth=.2, method="histodot", stackgroups=TRUE, alpha=0.5)

# dotdensity with bins aligned across groups (binpositions="all"), no stackgroups
ggplot(data, aes(x=x, fill=group)) + 
  geom_dotplot(binwidth=.2, binpositions="all", alpha=0.5)

# dotdensity with bins aligned across groups (binpositions="all"), and stackgroups
ggplot(data, aes(x=x, fill=group)) + 
  geom_dotplot(binwidth=.2, binpositions="all", stackgroups=TRUE, alpha=0.5)

# Vertical, no stackgroups
ggplot(aes(x=1, y=x, fill=group), data=data) + 
  geom_dotplot(binaxis="y", binwidth=.2, method="histodot", alpha=0.5)

# Vertical, with stackgroups
ggplot(aes(x=1, y=x, fill=group), data=data) + 
  geom_dotplot(binaxis="y", binwidth=.2, method="histodot", stackgroups=TRUE, 
               alpha=0.5)

# stackgroups with stackdir="center"
ggplot(aes(x=x, fill=group), data=data) + 
  geom_dotplot(binwidth=.2, method="histodot", stackgroups=TRUE, stackdir="center")

# stackgroups with stackdir="center", vertical
ggplot(aes(x=1, y=x, fill=group), data=data) + 
  geom_dotplot(binaxis="y", binwidth=.2, method="histodot",
               stackgroups=TRUE, stackdir="center")

# Stacking along centerline (doesn't require stackgroups!)
ggplot(aes(x=x, fill=group), data=data) + 
  geom_dotplot(data = subset(data, group=="A"), binwidth=.2,
               method="histodot", stackdir="up") +
  geom_dotplot(data = subset(data, group=="B"), binwidth=.2,
               method="histodot", stackdir="down")

# Stacking along centerline, vertical  (doesn't require stackgroups!)
ggplot(aes(x=1, y=x, fill=group), data=data) + 
  geom_dotplot(data = subset(data, group=="A"), binaxis="y", binwidth=.2,
               method="histodot", stackdir="up") +
  geom_dotplot(data = subset(data, group=="B"), binaxis="y", binwidth=.2,
               method="histodot", stackdir="down")


-Winston

dotplot_stack-1.png
dotplot_stack-10.png
dotplot_stack-2.png
dotplot_stack-3.png
dotplot_stack-4.png
dotplot_stack-5.png
dotplot_stack-6.png
dotplot_stack-7.png
dotplot_stack-8.png
dotplot_stack-9.png

Jean-Olivier Irisson

unread,
Mar 1, 2012, 2:58:01 AM3/1/12
to Winston Chang, Matthew Shun-Shin, ggplot2-dev
On 2012-Mar-01, at 06:43 , Winston Chang wrote:
>
> I decided to change it from using position="stack" to stackgroups=TRUE.
> - This makes it clear that it's not using the normal position adjustment methods.
> - Getting it to work on the y-axis is simple.
> - It prints a message when you try to use position="stack", saying that you probably want to use stackgroups=TRUE instead.
> - The code is cleaner. Using position="stack" was messy and circumvented some normal behavior.
>
> I've attached a bunch of examples. The last two show how to plot two groups along a center line, and they don't require the new stackgroups parameter -- they can be done without the new modifications.

Even if the actual code to get this done is different from the one usually used for position="stack", this is only an implementation detail. IMHO, the "intent" is the same for the user so the user interface should also stay the same. This means that:
- I would personally prefer position="stack" over stackgroups=TRUE
- position="stack" should be the default, and the current behaviour when aligning from the x or y axis should be what happens when position="identity"
- with position="identity", when aligning from the center, the dots should be laid out independently for each level of the factor (maybe they already are but there's no example here).

In addition, while I think that the behaviour of your geom regarding colour and fill is the correct one, it is different from that of geom_point, which is what people will associate it with the most. So maybe you could consider making an exception to map colour to the fill+stroke of your circles when the default shape is used. If, at some point, other shapes become available, then this might be reconsidered.

Thanks again for your hard work, I love dotplots ;)

Jean-Olivier Irisson
---
Observatoire Océanologique
Station Zoologique, B.P. 28, Chemin du Lazaret
06230 Villefranche-sur-Mer
Tel: +33 04 93 76 38 04
Mob: +33 06 21 05 19 90
http://jo.irisson.com/

Winston Chang

unread,
Mar 1, 2012, 11:21:24 AM3/1/12
to Jean-Olivier Irisson, ggplot2-dev

Even if the actual code to get this done is different from the one usually used for position="stack", this is only an implementation detail. IMHO, the "intent" is the same for the user so the user interface should also stay the same. This means that:
- I would personally prefer position="stack" over stackgroups=TRUE

I agree that there is an advantage of consistency, but the biggest problem is that the normal use of position="stack" only stacks geoms vertically. So it doesn't really make sense to use position="stack" when binning along the y axis, where you would want to stack horizontally.

 
- position="stack" should be the default, and the current behaviour when aligning from the x or y axis should be what happens when position="identity"
- with position="identity", when aligning from the center, the dots should be laid out independently for each level of the factor (maybe they already are but there's no example here). 

Setting aside the issue I mentioned above, I think makes sense.
 
 
In addition, while I think that the behaviour of your geom regarding colour and fill is the correct one, it is different from that of geom_point, which is what people will associate it with the most. So maybe you could consider making an exception to map colour to the fill+stroke of your circles when the default shape is used. If, at some point, other shapes become available, then this might be reconsidered.

Well, the geom_point behavior is already inconsistent, and adding a similar but slightly different special case on top of that ('colour' controls fill unless 'fill' is also specified) could make it even more confusing, so I think it should stay the way it is. At least with geom_point, the change in behavior of colour/fill happens only when you specifically ask for a different point shape. Maybe in the future geom_point will be fixed. :)

-Winston

Jean-Olivier Irisson

unread,
Mar 5, 2012, 2:12:40 PM3/5/12
to Winston Chang, ggplot2-dev
On 2012-Mar-01, at 17:21 , Winston Chang wrote:
>
> Even if the actual code to get this done is different from the one usually used for position="stack", this is only an implementation detail. IMHO, the "intent" is the same for the user so the user interface should also stay the same. This means that:
> - I would personally prefer position="stack" over stackgroups=TRUE
>
> I agree that there is an advantage of consistency, but the biggest problem is that the normal use of position="stack" only stacks geoms vertically. So it doesn't really make sense to use position="stack" when binning along the y axis, where you would want to stack horizontally.

I'm not saying that you should reuse any of the code behind position_stack(), I'm just suggesting that the *interface* should stay the same, even if it calls completely custom code (if that's possible of course). From the point of view of a user, when one wants to stack stuff, one uses position="stack" and that's the only thing one should have to worry about IMHO.

> In addition, while I think that the behaviour of your geom regarding colour and fill is the correct one, it is different from that of geom_point, which is what people will associate it with the most. So maybe you could consider making an exception to map colour to the fill+stroke of your circles when the default shape is used. If, at some point, other shapes become available, then this might be reconsidered.
>
> Well, the geom_point behavior is already inconsistent, and adding a similar but slightly different special case on top of that ('colour' controls fill unless 'fill' is also specified) could make it even more confusing, so I think it should stay the way it is. At least with geom_point, the change in behavior of colour/fill happens only when you specifically ask for a different point shape. Maybe in the future geom_point will be fixed. :)

OK. Now we need to convince Hadley to change geom_point ;)

Reply all
Reply to author
Forward
0 new messages