0.8.3 upgrade broke plot with stat_summary

76 views
Skip to first unread message

Harlan Harris

unread,
May 4, 2009, 5:38:47 PM5/4/09
to ggp...@googlegroups.com
Hi,

I'm enjoying working with ggplot2, but the most recent upgrade of all my packages, including ggplot2, seems to have broken something. Here's the code that worked in the previous version but now doesn't:

> summary(ef4.LC2)
    Subject     CategoryStructure Instantiation     value           Index     
 1      :  16   FFdiff :592       Object:576    Min.   :0.100   Min.   : 1.00 
 10     :  16   FFequal:576       Parts :592    1st Qu.:0.600   1st Qu.: 4.75 
 11     :  16                                             Median :0.727   Median : 8.50 
 12     :  16                                   Mean   :0.732   Mean   : 8.50 
 13     :  16                                   3rd Qu.:0.900   3rd Qu.:12.25 
 14     :  16                                   Max.   :1.000   Max.   :16.00 
 (Other):1072                                                                 
          Condition 
 FFdiff.Object :288 
 FFdiff.Parts  :304 
 FFequal.Object:288 
 FFequal.Parts :288 

theme_set(theme_bw())
p <- ggplot(ef4.LC2, aes(Index, value, colour=Condition))
p <- p + stat_summary(fun.data="mean_cl_boot", geom="smooth", size=2, alpha=.1)
p <- p + scale_colour_manual(value=c("darkred", "red", "darkgreen", "green"))
p <- p + scale_x_continuous("Block", breaks=c(1,6,11,16))
p <- p + scale_y_continuous("Mean Accuracy", limits=c(.5,1))
p <- p + opts(plot.margin=unit(c(1,5,2,2), "lines"))
p

When I run the above, I get the following warning:
WARNING: Warning: Removed 116 rows containing missing values (stat_summary).

(There are no missing values in the data.) And the plot is broken, as if the Index was not sorted or something. (See attachment.)

This seems to be an issue with stat_summary or mean_cl_boot, or maybe with Hmisc, I'm not sure. Simpler plots work fine. Any ideas? Thanks,

 -Harlan
broken.png

hadley wickham

unread,
May 5, 2009, 10:20:53 AM5/5/09
to Harlan Harris, ggp...@googlegroups.com
Hi Harlan,

It's difficult to diagnose without a reproducible example
(http://ggplot2.wik.is/Creating_a_reproducible_example) - could you
please provide one?

Thanks,

Hadley
--
http://had.co.nz/

Harlan Harris

unread,
May 5, 2009, 2:23:45 PM5/5/09
to hadley wickham, ggp...@googlegroups.com
OK. This is strange. It has something to do maybe with numeric vs ascii sorting or something? I'm attaching an rda file with a subset of my data (8 subjects worth). To reproduce the problem:

> library("ggplot2")
> load("test-8Ss.rda")
> ggplot(head(ef4.LC2,72), aes(Index, value, colour=Condition)) + stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)
> ggplot(head(ef4.LC2,80), aes(Index, value, colour=Condition)) + stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)

The first ggplot looks fine (error bars are big, of course). Notice the x axis goes to 9. The second ggplot, where the x axis goes to 10 is garbage. There's no obvious problem with the data between rows 72 and 80. The data was put into this format by cast/melt calls. (I had to fill in some missing data.)

The data types seems reasonable -- everything that should be a factor is a factor, and everything that should be numeric is numeric. The order of the rows in the data frame shouldn't matter (right??), but in any case they're in the order of the Index (x) variable.

Thank you for any suggestions!

 -Harlan
test-8Ss.rda

Harlan Harris

unread,
May 7, 2009, 8:58:05 AM5/7/09
to ggplot2
I've replicated this issue on a separate (Windows) installation of R.
Would someone please help?

Thank you,

-Harlan

On May 5, 2:23 pm, Harlan Harris <harlan.har...@gmail.com> wrote:
> OK. This is strange. It has something to do maybe with numeric vs ascii
> sorting or something? I'm attaching an rda file with a subset of my data (8
> subjects worth). To reproduce the problem:
>
> > library("ggplot2")
> > load("test-8Ss.rda")
> > ggplot(head(ef4.LC2,72), aes(Index, value, colour=Condition)) +
>
> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)> ggplot(head(ef4.LC2,80), aes(Index, value, colour=Condition)) +
>
> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)
>
> The first ggplot looks fine (error bars are big, of course). Notice the x
> axis goes to 9. The second ggplot, where the x axis goes to 10 is garbage.
> There's no obvious problem with the data between rows 72 and 80. The data
> was put into this format by cast/melt calls. (I had to fill in some missing
> data.)
>
> The data types seems reasonable -- everything that should be a factor is a
> factor, and everything that should be numeric is numeric. The order of the
> rows in the data frame shouldn't matter (right??), but in any case they're
> in the order of the Index (x) variable.
>
> Thank you for any suggestions!
>
>  -Harlan
>
> On Tue, May 5, 2009 at 10:20 AM, hadley wickham <h.wick...@gmail.com> wrote:
> > Hi Harlan,
>
> > It's difficult to diagnose without a reproducible example
> > (http://ggplot2.wik.is/Creating_a_reproducible_example) - could you
> > please provide one?
>
> > Thanks,
>
> > Hadley
>
> > On Mon, May 4, 2009 at 4:38 PM, Harlan Harris <harlan.har...@gmail.com>
>  test-8Ss.rda
> 1KViewDownload

Harlan Harris

unread,
May 7, 2009, 12:03:18 PM5/7/09
to ggplot2
OK, it has something to do with geom="smooth". If you replace that in
the second (broken) plot with geom="line", the sorting problem goes
away. Does this suggest anything?

-Harlan

On May 5, 2:23 pm, Harlan Harris <harlan.har...@gmail.com> wrote:
> OK. This is strange. It has something to do maybe with numeric vs ascii
> sorting or something? I'm attaching an rda file with a subset of my data (8
> subjects worth). To reproduce the problem:
>
> > library("ggplot2")
> > load("test-8Ss.rda")
> > ggplot(head(ef4.LC2,72), aes(Index, value, colour=Condition)) +
>
> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)> ggplot(head(ef4.LC2,80), aes(Index, value, colour=Condition)) +
>
> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)
>
> The first ggplot looks fine (error bars are big, of course). Notice the x
> axis goes to 9. The second ggplot, where the x axis goes to 10 is garbage.
> There's no obvious problem with the data between rows 72 and 80. The data
> was put into this format by cast/melt calls. (I had to fill in some missing
> data.)
>
> The data types seems reasonable -- everything that should be a factor is a
> factor, and everything that should be numeric is numeric. The order of the
> rows in the data frame shouldn't matter (right??), but in any case they're
> in the order of the Index (x) variable.
>
> Thank you for any suggestions!
>
>  -Harlan
>
> On Tue, May 5, 2009 at 10:20 AM, hadley wickham <h.wick...@gmail.com> wrote:
> > Hi Harlan,
>
> > It's difficult to diagnose without a reproducible example
> > (http://ggplot2.wik.is/Creating_a_reproducible_example) - could you
> > please provide one?
>
> > Thanks,
>
> > Hadley
>
> > On Mon, May 4, 2009 at 4:38 PM, Harlan Harris <harlan.har...@gmail.com>
>  test-8Ss.rda
> 1KViewDownload

Harlan Harris

unread,
May 8, 2009, 11:46:10 AM5/8/09
to hadley wickham, ggp...@googlegroups.com
I'm still having trouble with this. I sent several clarifying messages over the past couple of days via the Google Groups web page, but haven't heard anything back. Do messages sent via the web page not get sent out? I'm happy to resend those messages via email if it would help... Thank you!

 -Harlan

hadley wickham

unread,
May 8, 2009, 12:07:46 PM5/8/09
to Harlan Harris, ggp...@googlegroups.com
Harlan,

I currently get over 300 personally addressed emails every week. It
is just not possible for me to reply to every email within a couple of
days, but I will get to it eventually.

Hadley
--
http://had.co.nz/

hadley wickham

unread,
May 11, 2009, 5:31:50 PM5/11/09
to Harlan Harris, ggp...@googlegroups.com
On Tue, May 5, 2009 at 1:23 PM, Harlan Harris <harlan...@gmail.com> wrote:
> OK. This is strange. It has something to do maybe with numeric vs ascii
> sorting or something? I'm attaching an rda file with a subset of my data (8
> subjects worth). To reproduce the problem:
>
>> library("ggplot2")
>> load("test-8Ss.rda")
>> ggplot(head(ef4.LC2,72), aes(Index, value, colour=Condition)) +
>> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)
>> ggplot(head(ef4.LC2,80), aes(Index, value, colour=Condition)) +
>> stat_summary(fun.data="mean_cl_normal", geom="smooth", size=2, alpha=.1)
>
> The first ggplot looks fine (error bars are big, of course). Notice the x
> axis goes to 9. The second ggplot, where the x axis goes to 10 is garbage.
> There's no obvious problem with the data between rows 72 and 80. The data
> was put into this format by cast/melt calls. (I had to fill in some missing
> data.)

Sorry it took me so long to look into this. It's a bug in the latest
version of ggplot2, but I've fixed it in the development version. If
you let me know your operating system, I can send you a fixed version
to try out.

Hadley

--
http://had.co.nz/

Harlan Harris

unread,
May 15, 2009, 3:19:20 PM5/15/09
to hadley wickham, ggp...@googlegroups.com
For the benefit of posterity, Hadley found a bug and fixed this problem. Thank you!

Another minor issue, though. When I use the smooth geom with grouping by color, the colors of the line in the legend looks desaturated. I think the alpha parameter that is supposed to just be affecting the ribbon is also affecting the line in the legend.

Demo:
m <- ggplot(movies, aes(x=round(rating), y=votes, color=mpaa))
m2 <- m + stat_summary(fun.data = "mean_cl_normal", geom = "smooth", size=2)
m2 + scale_y_log10()

At least on my screen (see attachment) I get faded-looking colors in the legend.

Now do:
m2 <- m + stat_summary(fun.data = "mean_cl_normal", geom = "smooth", size=2, alpha=.1)
m2 + scale_y_log10()

The grey overlaid ribbons look nicer, but the legend is even worse. I'd blame it on my error somehow in the second case, but the fact that it's a problem in the first case suggests a bug somewhere...

R 2.8.1, development version of ggplot2, Linux. The problem is present on the screen, and in both pdf and png exports via ggsave.

(Incidentally, there have been threads about this, but I have to add "dpi=72" to the ggsave command to get a good .png file...)

 -Harlan
test1.png

hadley wickham

unread,
May 18, 2009, 9:31:25 PM5/18/09
to Harlan Harris, ggp...@googlegroups.com
On Fri, May 15, 2009 at 2:19 PM, Harlan Harris <harlan...@gmail.com> wrote:
> For the benefit of posterity, Hadley found a bug and fixed this problem.
> Thank you!
>
> Another minor issue, though. When I use the smooth geom with grouping by
> color, the colors of the line in the legend looks desaturated. I think the
> alpha parameter that is supposed to just be affecting the ribbon is also
> affecting the line in the legend.
>
> Demo:
> m <- ggplot(movies, aes(x=round(rating), y=votes, color=mpaa))
> m2 <- m + stat_summary(fun.data = "mean_cl_normal", geom = "smooth", size=2)
> m2 + scale_y_log10()
>
> At least on my screen (see attachment) I get faded-looking colors in the
> legend.

Thanks for the bug report. It's fixed in the development version.

Hadley

--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages