Grouped boxplots with different width

2,428 views
Skip to first unread message

thomas

unread,
Jun 17, 2011, 6:21:18 PM6/17/11
to ggplot2
Hi,

I have a plot with 8 boxplots. The problem now is, I want the width of
each of the boxplots to correspond to the number of samples in the
dataset. normally all 8 boxplots would have the same width.

if i do something simple like:
ggplot(data,aes(x=c(1:64),y=y,group=Section))
the width of each boxplot is perfect. the spacing between the boxplots
isn't.

figure: http://amstetten-falcons.at/charts/a.jpg

anybody knows of a way to pull it off?

Thomas

thomas

unread,
Jun 18, 2011, 12:53:57 PM6/18/11
to ggplot2
Hi,

here is an example.
code: https://gist.github.com/1033268
result: http://bit.ly/mxVkZQ

thanks,
Thomas

thomas

unread,
Jun 23, 2011, 10:21:10 AM6/23/11
to ggplot2
*bump*

Nobody?

Dennis Murphy

unread,
Jun 23, 2011, 12:29:30 PM6/23/11
to thomas, ggplot2
Hi:

See
http://had.co.nz/ggplot2/geom_boxplot.html

As there is no parameter for specifying box width, it is apparently
not implemented in ggplot2 at present. It can be done with the
boxplot() function in the base package, however. Apparently, it can
also be done using panel.bwplot() in the lattice package, from a brief
scan of the bwplot() help page.

Dennis

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

Thomas Kern

unread,
Jun 23, 2011, 12:37:30 PM6/23/11
to Dennis Murphy, ggplot2
Hi,

no, but I could still use a continuous variable or some other 'workaround'.

@James:
That is what I am doing now. And I think I am almost done too.

Thanks for the reply.

Best,
Thomas

> ------------------------------------------------------------------------
>
> Dennis Murphy <mailto:djm...@gmail.com>
> 23 June 2011 6:29 PM

James McCreight

unread,
Jun 23, 2011, 12:48:01 PM6/23/11
to Thomas Kern, Dennis Murphy, ggplot2
my original thought was that I could ggb <- ggplot_build(), and then edit the returned data which specifies the widths (etc) and then replot that. But I dont see that I can make it plot what i've edited. Maybe I missed something that would have let me make this trick work.

anyway, it could be useful to have a width aes for boxplot. praise the git branch.

J

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com



--
-
******************************************************************************
James McCreight                               
cell: (831) 261-5149
VoIP (to cell): (720) 897-7546

Thomas Kern

unread,
Jun 23, 2011, 3:19:13 PM6/23/11
to James McCreight, Dennis Murphy, ggplot2
i solved it by creating my own proto and overriding the draw function.
unfortunately my R skills are almost none existent, so no 'batch' of the
git branch...

actually, a width aes would be useful, but it is almost implemented
anyways. if you are using a continuous variable for the x-axis, the
width works. it is just the spacing between them that is calculated
incorrectly.
either i am using it wrong or it is rather a bug to be honest.

Thomas

> ------------------------------------------------------------------------
>
> James McCreight <mailto:mccr...@gmail.com>
> 23 June 2011 6:48 PM


>
>
> my original thought was that I could ggb <- ggplot_build(), and then
> edit the returned data which specifies the widths (etc) and then
> replot that. But I dont see that I can make it plot what i've edited.
> Maybe I missed something that would have let me make this trick work.
>
> anyway, it could be useful to have a width aes for boxplot. praise the
> git branch.
>
> J
>
>
>
>

> --
> -
> ******************************************************************************
> James McCreight
> cell: (831) 261-5149
> VoIP (to cell): (720) 897-7546

> ------------------------------------------------------------------------
>
> Thomas Kern <mailto:thk....@gmail.com>
> 23 June 2011 6:37 PM

Ista Zahn

unread,
Jun 23, 2011, 4:11:43 PM6/23/11
to Thomas Kern, James McCreight, Dennis Murphy, ggplot2
Hi Thomas,

On Thu, Jun 23, 2011 at 3:19 PM, Thomas Kern <thk....@gmail.com> wrote:
> i solved it by creating my own proto and overriding the draw function.
> unfortunately my R skills are almost none existent, so no 'batch' of the git
> branch...
>
> actually, a width aes would be useful, but it is almost implemented anyways.
> if you are using a continuous variable for the x-axis, the width works. it
> is just the spacing between them that is calculated incorrectly.
> either i am using it wrong or it is rather a bug to be honest.

I think it is a bug. I played around with you problem this morning,
but didn't post anything because I never solved the problem. I think I
did manage to clarify the issue though.

Basically, when using a continuous variable on the x-axis with a
boxplot geom, the boxplot should span the width of the x-axis values
that go into it. So, using your example data (reproduced here for
convenience), let's take a look at what those spans should be:

dat <- data.frame(x = 1:600,
y = runif(600, 50, 200),
Section = factor(
c(
rep(1,75),
rep(2,25),
rep(3,400),
rep(4,100))))

spans <- ddply(dat, .(Section), summarize,
min = min(x),
max = max(x),
span = max(x) - min(x))

spans

Section min max
1 1 1 75
2 2 76 100
3 3 101 500
4 4 501 600

OK, so the first boxplot should span 1-75, the second 76-100, the
third 101-500, and the fourth 501-600.

Now take a look at

ggplot(dat,aes(x=x,y=y, fill=Section)) +
geom_boxplot() +
coord_cartesian(ylim=c(120, 170)) +
scale_x_continuous(breaks = seq(0, 600, 10)) +
opts(axis.text.x = theme_text(size = 8, angle = 90))

(you might want to maximize the graphics window so you can see the
labels without overlap). To me it just looks wrong. Specifically, each
boxplot is too narrow. Boxplot 1 should span 1-75 (see above), but it
actually only about spans 4-71. Boxplot 2 should span 76-100, but it
only spans about 77-99. Boxplot 3 is the worst, it should span
101-500, but actually only spans 121-480. Finally, boxplot 4 should
span 501-600, but only spans 505-596. OK, so it looks like there is a
method to this madness:

with(spans, (span - actual.span)/span)
[1] 0.09459459 0.08333333 0.10025063 0.08080808

So each span is about 9% too narrow. I thought this might be a problem
in position_dodge (which geom_boxplot defaults to) but setting the
width there had no effect. So, this is all just a long-winded way of
agreeing that this is a bug :)

Best,
Ista

> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com

> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Joshua Wiley

unread,
Jun 23, 2011, 4:45:30 PM6/23/11
to Ista Zahn, ggplot2, Thomas Kern

Hmm, I wonder if it has to do with the .9 multiplier of resoultion()
(which does get called in your example. look at line 5 of
https://github.com/hadley/ggplot2/blob/master/R/geom-boxplot.r I do
not have a good explanation about why the difference is inconsistent,
though.

>
> Best,
> Ista
>>
>> Thomas
>>
>>> ------------------------------------------------------------------------
>>>
>>>        James McCreight <mailto:mccr...@gmail.com>
>>> 23 June 2011 6:48 PM
>>>
>>>
>>> my original thought was that I could ggb <- ggplot_build(), and then edit
>>> the returned data which specifies the widths (etc) and then replot that. But
>>> I dont see that I can make it plot what i've edited. Maybe I missed
>>> something that would have let me make this trick work.
>>>
>>> anyway, it could be useful to have a width aes for boxplot. praise the git
>>> branch.

Agreed that this should not be that difficult. width is already
available as a paramter for stat_boxplot, it would just have to be
upgraded to an aesthetic. Random side note, for the actual plot, what
about adding geom_jitter()? It shows the actual data which I always
like with boxplots :)

Josh

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Thomas Kern

unread,
Jun 23, 2011, 4:58:01 PM6/23/11
to Joshua Wiley, Ista Zahn, ggplot2
> Random side note, for the actual plot, what about adding
geom_jitter()? It shows the actual data which I always like with
boxplots :)

Great idea, I added that to the code i came up with today too ;)

Thomas

Ista Zahn

unread,
Jun 23, 2011, 5:31:29 PM6/23/11
to Joshua Wiley, ggplot2, Thomas Kern
Hi Josh,
See in line.

On Thu, Jun 23, 2011 at 4:45 PM, Joshua Wiley <jwiley...@gmail.com> wrote:
> On Thu, Jun 23, 2011 at 1:11 PM, Ista Zahn <iz...@psych.rochester.edu> wrote:

> Hmm, I wonder if it has to do with the .9 multiplier of resoultion()
> (which does get called in your example. look at line 5 of
> https://github.com/hadley/ggplot2/blob/master/R/geom-boxplot.r  I do
> not have a good explanation about why the difference is inconsistent,
> though.

It's (probably) not inconsistent, the difference is calculated based
on eyeballing the actual x-range the boxplots span, from the plot
itself. I notice now I actually forgot to paste the code showing how I
calculated the differences. It was just

a <- c(4, 77, 121, 505)
b <- c(71, 99, 480, 596)
spans <- data.frame(spans, actual.span = b - a)

where a and b were reconstructed from eyeballing the plot :). So that
.9 multiplier could well be where the problem comes from.

Best,
Ista


>
>>
>> Best,
>> Ista
>>>
>>> Thomas
>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>        James McCreight <mailto:mccr...@gmail.com>
>>>> 23 June 2011 6:48 PM
>>>>
>>>>
>>>> my original thought was that I could ggb <- ggplot_build(), and then edit
>>>> the returned data which specifies the widths (etc) and then replot that. But
>>>> I dont see that I can make it plot what i've edited. Maybe I missed
>>>> something that would have let me make this trick work.
>>>>
>>>> anyway, it could be useful to have a width aes for boxplot. praise the git
>>>> branch.
>

> Agreed that this should not be that difficult.  width is already
> available as a paramter for stat_boxplot, it would just have to be

> upgraded to an aesthetic.  Random side note, for the actual plot, what


> about adding geom_jitter()?  It shows the actual data which I always
> like with boxplots :)
>

> Josh

> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>

--

Hadley Wickham

unread,
Jun 26, 2011, 7:33:50 PM6/26/11
to Joshua Wiley, Ista Zahn, ggplot2, Thomas Kern
> Agreed that this should not be that difficult.  width is already
> available as a paramter for stat_boxplot, it would just have to be
> upgraded to an aesthetic.

Yes, but it's not obvious to me how a scale_width would work.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages