Bug in facet_grid?

262 views
Skip to first unread message

Ben

unread,
Mar 12, 2012, 3:15:09 PM3/12/12
to ggplot2
Hi,

I think I may have discovered a bug in facet_grid that is not present
in facet_wrap. If you look very carefully, there are differences
between the following two plots:

ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() + facet_grid(.
~ cut)
ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() + facet_wrap(~
cut, ncol=5)

Here is how I am looking at these:
p <- ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() +
facet_grid(. ~ cut)
ggsave(p, file = "facet_grid.pdf")
p <- ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() +
facet_wrap(~ cut, ncol=5)
ggsave(p, file = "facet_wrap.pdf")

A few of the medians are shifted, and some outliers differ. It also
seems like the drawing order differs (the "Very Good" label extends
outside the panel with facet_wrap, but not facet_grid).

I am using version 0.9.0 from CRAN, and here is my session info:
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] stats graphics grDevices utils datasets methods
base

other attached packages:
[1] ggplot2_0.9.0

loaded via a namespace (and not attached):
[1] MASS_7.3-17 RColorBrewer_1.0-5 colorspace_1.1-1
dichromat_1.2-4 digest_0.5.1 grid_2.14.2
memoise_0.1
[8] munsell_0.3 plyr_1.7.1 proto_0.3-9.2
reshape2_1.2.1 scales_0.2.0 stringr_0.6
tools_2.14.2


I am working on a dataset where I think I am seeing a more extreme
version of this difference. Am I overlooking something, or could this
be a bug?

Right now, I am using facet_wrap as a workaround, since with my
dataset it produces a plot that looks identical to the standard
boxplot() function, and its medians look correct to me.

Cheers,
Ben

Winston Chang

unread,
Mar 12, 2012, 4:27:52 PM3/12/12
to Ben, ggplot2
I'm seeing this difference too. This seems odd. I've attached pictures of each, including an image showing the differences in red.

The diff image was generated with 'compare' from ImageMagick:
  compare facet-grid.png facet-wrap.png facet-diff.png

-Winston



--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

facet-grid.png
facet-wrap.png
facet-diff.png

Brian Diggs

unread,
Mar 12, 2012, 5:20:09 PM3/12/12
to Winston Chang, Ben, ggplot2
On 3/12/2012 1:27 PM, Winston Chang wrote:
> I'm seeing this difference too. This seems odd. I've attached pictures of
> each, including an image showing the differences in red.
>
> The diff image was generated with 'compare' from ImageMagick:
> compare facet-grid.png facet-wrap.png facet-diff.png
>
> -Winston

facet_grid is deleting duplicate data before the stats are run (I
think). Consider these much simpler examples:

DF <- expand.grid(alpha = letters[1:3],
beta = LETTERS[1:3])
DF <- cbind(DF, n = 1:27)

p <- ggplot(DF, aes(alpha, n)) + geom_boxplot()
pg <- p + facet_grid(.~beta)
pw <- p + facet_wrap(~beta, ncol=3)

DF2 <- DF[c(rep(1,10),2:27),]

p2 <- ggplot(DF2, aes(alpha, n)) + geom_boxplot()
p2g <- p2 + facet_grid(.~beta)
p2w <- p2 + facet_wrap(~beta, ncol=3)


pg and pw look the same; p2g looks like pg and pw; p2w does not. p2g
SHOULD look like p2w, not pg/pw.

I got a hint of what was happening by looking at the output of all.equal
on the results of ggplot_build on the original diamond plots; the
results of the stats were different, but what really tipped me off was
that the list of the outlier points were different lengths. However,
when looking at the points, some were listed multiple times (for wrap)
and there were no duplicates (grid). From that I built this example
which shows the difference dramatically.

I've not looked at the code base to see where this is happening.

>> To post: email ggplot2-/JYPxA39Uh5...@public.gmane.org
>> To unsubscribe: email ggplot2+unsubscribe-/JYPxA39Uh5...@public.gmane.org
>> More options: http://groups.google.com/group/ggplot2
>>
>


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

Benjamin Lang

unread,
Mar 12, 2012, 4:59:11 PM3/12/12
to Winston Chang, ggplot2
Thanks for the images Winston!

I've zoomed in on one of the facets where the boxplot median is different between facet_grid and facet_wrap here:

p <- ggplot(subset(diamonds, cut=="Premium" & clarity=="VVS2"), aes(clarity, carat)) + geom_boxplot() + facet_grid(. ~ cut)
ggsave(p, file = "facet_zoom_grid.pdf")
p <- ggplot(subset(diamonds, cut=="Premium" & clarity=="VVS2"), aes(clarity, carat)) + geom_boxplot() + facet_wrap(~ cut, ncol=5)
ggsave(p, file = "facet_zoom_wrap.pdf")

median(subset(diamonds, cut=="Premium" & clarity=="VVS2")$carat)

This last line gives a median of 0.455. If I open the PDF in Illustrator and look at the coordinates, facet_wrap's median is correct (0.455), while facet_grid draws it at 0.45.

Cheers,
Ben

Brian Diggs

unread,
Mar 12, 2012, 5:48:22 PM3/12/12
to ggplot2, Winston Chang, Ben
On 3/12/2012 2:20 PM, Brian Diggs wrote:
> On 3/12/2012 1:27 PM, Winston Chang wrote:
>> I'm seeing this difference too. This seems odd. I've attached pictures of
>> each, including an image showing the differences in red.
>>
>> The diff image was generated with 'compare' from ImageMagick:
>> compare facet-grid.png facet-wrap.png facet-diff.png
>>
>> -Winston
>
> facet_grid is deleting duplicate data before the stats are run (I
> think). Consider these much simpler examples:
>
> DF <- expand.grid(alpha = letters[1:3],
> beta = LETTERS[1:3])
> DF <- cbind(DF, n = 1:27)
>
> p <- ggplot(DF, aes(alpha, n)) + geom_boxplot()
> pg <- p + facet_grid(.~beta)
> pw <- p + facet_wrap(~beta, ncol=3)
>
> DF2 <- DF[c(rep(1,10),2:27),]
>
> p2 <- ggplot(DF2, aes(alpha, n)) + geom_boxplot()
> p2g <- p2 + facet_grid(.~beta)
> p2w <- p2 + facet_wrap(~beta, ncol=3)
>
>
> pg and pw look the same; p2g looks like pg and pw; p2w does not. p2g
> SHOULD look like p2w, not pg/pw.

More evidence that it is in the facet and not something in boxplot:

ggplot(DF, aes(alpha, n)) + stat_summary(aes(colour=beta),
fun.data="mean_cl_normal", position=position_dodge(width=0.2))
ggplot(DF2, aes(alpha, n)) + stat_summary(aes(colour=beta),
fun.data="mean_cl_normal", position=position_dodge(width=0.2))
ggplot(DF, aes(alpha, n)) + stat_summary(fun.data="mean_cl_normal") +
facet_grid(.~beta)
ggplot(DF, aes(alpha, n)) + stat_summary(fun.data="mean_cl_normal") +
facet_wrap(~beta, ncol=3)
ggplot(DF2, aes(alpha, n)) + stat_summary(fun.data="mean_cl_normal") +
facet_grid(.~beta)
ggplot(DF2, aes(alpha, n)) + stat_summary(fun.data="mean_cl_normal") +
facet_wrap(~beta, ncol=3)

facet_grid with data with duplicated rows (the DF2 sets) looks just like
those without, while facet_wrap with the duplicated data has the correct
summary (which agrees with what is drawn when differentiation is just by
colour and not facet).


> I got a hint of what was happening by looking at the output of all.equal
> on the results of ggplot_build on the original diamond plots; the
> results of the stats were different, but what really tipped me off was
> that the list of the outlier points were different lengths. However,
> when looking at the points, some were listed multiple times (for wrap)
> and there were no duplicates (grid). From that I built this example
> which shows the difference dramatically.
>
> I've not looked at the code base to see where this is happening.
>
>> On Mon, Mar 12, 2012 at 2:15 PM,

>> Ben<langbnj-gM/Ye1E23mwN+BqQ9rBEUg-X...@public.gmane.org>

>>> ggplot2-/JYPxA39Uh5TLH3MbocFFw-...@public.gmane.org
>>> To unsubscribe: email
>>> ggplot2+unsubscribe-/JYPxA39Uh5TLH3MbocFFw-...@public.gmane.org

Brian Diggs

unread,
Mar 12, 2012, 5:58:51 PM3/12/12
to ggplot2, Benjamin Lang, Winston Chang
On 3/12/2012 1:59 PM, Benjamin Lang wrote:
> Thanks for the images Winston!
>
> I've zoomed in on one of the facets where the boxplot median is different
> between facet_grid and facet_wrap here:
>
> p<- ggplot(subset(diamonds, cut=="Premium"& clarity=="VVS2"),
> aes(clarity, carat)) + geom_boxplot() + facet_grid(. ~ cut)
> ggsave(p, file = "facet_zoom_grid.pdf")
> p<- ggplot(subset(diamonds, cut=="Premium"& clarity=="VVS2"),
> aes(clarity, carat)) + geom_boxplot() + facet_wrap(~ cut, ncol=5)
> ggsave(p, file = "facet_zoom_wrap.pdf")
>
> median(subset(diamonds, cut=="Premium"& clarity=="VVS2")$carat)
>
> This last line gives a median of 0.455. If I open the PDF in Illustrator
> and look at the coordinates, facet_wrap's median is correct (0.455), while
> facet_grid draws it at 0.45.

This is consistent with what I've found (see my other posts in this thread):

> median(subset(diamonds[!duplicated(diamonds),], cut=="Premium" &
clarity=="VVS2")$carat)
[1] 0.45


> median(subset(diamonds, cut=="Premium" & clarity=="VVS2")$carat)

[1] 0.455

Also, you can get these numbers out of the plot itself (not just by
back-reading from the PDF):

> ggplot_build(ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() +
facet_grid(.~cut))[[1]][[1]][30,3]
[1] 0.45
> ggplot_build(ggplot(diamonds, aes(clarity, carat)) + geom_boxplot() +
facet_wrap(~cut, ncol=3))[[1]][[1]][30,3]
[1] 0.455

(figuring out it was row 30 column 3 was experimenting; the whole data
set is large to print out and I didn't want to flood the list. You can
explore it more yourself.)

> Cheers,
> Ben

>>> To post: email ggplot2-/JYPxA39Uh5...@public.gmane.org
>>> To unsubscribe: email ggplot2+unsubscribe-/JYPxA39Uh5...@public.gmane.org

Benjamin Lang

unread,
Mar 12, 2012, 8:03:20 PM3/12/12
to Brian Diggs, ggplot2, Winston Chang
Thank you Brian, that explains my problem perfectly! The data I am working with does have a lot of ties in it.

It looks to me like this is a bug then, and I can't think of any workaround other than using facet_wrap for the moment.

Cheers,
Ben

Cheers,
Ben
To post: email ggplot2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe: email ggplot2+unsubscribe-/JYPxA39Uh5TLH3M...@public.gmane.org
More options: http://groups.google.com/group/ggplot2






--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442




--
Benjamin Lang

MRC Laboratory of Molecular Biology
Regulatory Genomics & Systems Biology (Dr. M. Madan Babu)
Herchel Smith Research Student, University of Cambridge

bl...@mrc-lmb.cam.ac.uk

Kohske Takahashi

unread,
Mar 12, 2012, 10:06:49 PM3/12/12
to Benjamin Lang, Brian Diggs, ggplot2, Winston Chang
Hi,

IIRC, this is relevant to previous discussion of removing duplicates
by facet_grid,
I cannot find pointer to it though...

Here is a workaround:

DF2 <- DF[c(rep(1,10),2:27),]

DF2$id <- 1:nrow(DF2)

p2 <- ggplot(DF2, aes(alpha, n)) + geom_boxplot()
p2g <- p2 + facet_grid(.~beta)
p2w <- p2 + facet_wrap(~beta, ncol=3)

kohske

2012年3月13日9:03 Benjamin Lang <lan...@googlemail.com>:

>>>>> To post: email ggplot2-/JYPxA39Uh5...@public.gmane.org
>>>>> To unsubscribe: email
>>>>> ggplot2+unsubscribe-/JYPxA39Uh5...@public.gmane.org

>>>>> More options: http://groups.google.com/group/ggplot2
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Brian S. Diggs, PhD
>> Senior Research Associate, Department of Surgery
>> Oregon Health & Science University
>>
>> --
>> You received this message because you are subscribed to the ggplot2
>> mailing list.
>> Please provide a reproducible example: http://gist.github.com/270442
>>
>> To post: email ggp...@googlegroups.com

>> To unsubscribe: email ggplot2+u...@googlegroups.com


>> More options: http://groups.google.com/group/ggplot2
>
>
>
>
> --
> Benjamin Lang
>
> MRC Laboratory of Molecular Biology
> Regulatory Genomics & Systems Biology (Dr. M. Madan Babu)
> Herchel Smith Research Student, University of Cambridge
>
> bl...@mrc-lmb.cam.ac.uk
>

> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com

> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

--
--
Kohske Takahashi <takahash...@gmail.com>

Research Center for Advanced Science and Technology,
The University of Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

Kohske Takahashi

unread,
Mar 12, 2012, 10:11:42 PM3/12/12
to Benjamin Lang, Brian Diggs, ggplot2, Winston Chang
Here is the thread:

https://groups.google.com/group/ggplot2/browse_thread/thread/3728b94f783d5963?pli=1

kohske

2012年3月13日11:06 Kohske Takahashi <takahash...@gmail.com>:

Kohske Takahashi

unread,
Mar 12, 2012, 10:21:35 PM3/12/12
to Benjamin Lang, Brian Diggs, ggplot2, Winston Chang
And here is the discussion:
https://github.com/hadley/ggplot2/commit/9fb2a973166c2e60ea057c8e072804de8369f3a8#L2R14

2012年3月13日11:11 Kohske Takahashi <takahash...@gmail.com>:

Benjamin Lang

unread,
Mar 13, 2012, 2:28:46 AM3/13/12
to Kohske Takahashi, Brian Diggs, ggplot2, Winston Chang
That's great, thank you Kohske! So it looks like it's being taken care of. Thanks everybody for the great responses.

Cheers,
Ben

Winston Chang

unread,
Mar 13, 2012, 8:04:38 PM3/13/12
to Kohske Takahashi, Benjamin Lang, Brian Diggs, ggplot2, Winston Chang
Did the changes from the previous discussion fix this issue?

I tried Benjamin's original boxplot with the diamonds data set, using the the latest version of ggplot2, with install_github('ggplot2'), and facet_grid still looks different from facet_wrap.

-Winston

Winston Chang

unread,
Mar 15, 2012, 2:54:40 AM3/15/12
to ggplot2, Benjamin Lang, Brian Diggs, Kohske Takahashi
I think this problem is still present in the current version on github -- facet_grid is dropping duplicates. This illustrates it pretty clearly:

dat <- data.frame(x=rep(1:4, 10), g=rep(c("A","B"),each=20))
p <- ggplot(dat, aes(x,x)) + geom_point(position="jitter")

# OK: 20 points in each facet
p + facet_wrap(~g)

# Not OK: 4 points in each facet
p + facet_grid(.~g)


-Winston
wrap.png
grid.png
Reply all
Reply to author
Forward
0 new messages