dplyr: cannot subset within summarise?

1,311 views
Skip to first unread message

Jonathan Dobres

unread,
Jan 23, 2014, 6:04:50 PM1/23/14
to manip...@googlegroups.com
Hi there!

I'm a big plyr fan who's trying to make the switch to dplyr, but I've run into a deal-breaker issue. plyr has the ability to subset within summarise(), but it looks like dplyr doesn't. For example:

diamond.plyr <- ddply(diamonds, .(cut), summarise, 
                      all=mean(price), 
                      E=mean(price[color == 'E']));

Produces:

        cut      all        E
1      Fair 4358.758 3682.312
2      Good 3928.864 3423.644
3 Very Good 3981.760 3214.652
4   Premium 4584.258 3538.914
5     Ideal 3457.542 2597.550

But, the equivalent with dplyr:

diamond.dplyr <- diamonds %.%
    group_by(cut) %.%
    dplyr::summarise(all=mean(price),
              E=mean(price[color == 'E']));

Gives:

        cut      all   E
1      Fair 4358.758 NaN
2 Very Good 3981.760 NaN
3      Good 3928.864 NaN
4   Premium 4584.258 NaN
5     Ideal 3457.542 NaN

Am I misunderstanding something about dplyr's syntax? Is this type of internal subsetting no longer valid, or is there another way to do it? It's a succinct shortcut that's saved me a lot of time and kept my code very readable, so despite dplyr's obvious speed improvements, I can't really switch until I figure out an equivalent.

-Jon

Hadley Wickham

unread,
Jan 23, 2014, 6:38:47 PM1/23/14
to Jonathan Dobres, manipulatr
Hi Jonathan,

It works fine for me:

library(dplyr)
data(diamonds, package = "ggplot2")
diamonds %.%
  group_by(cut) %.%
  summarise(all=mean(price), E=mean(price[color == 'E']));


Source: local data frame [5 x 3]

        cut      all        E
1      Fair 4358.758 3682.312
2 Very Good 3981.760 3214.652
3      Good 3928.864 3423.644
4   Premium 4584.258 3538.914
5     Ideal 3457.542 2597.550

Maybe some plyr/dplyr interference?

Hadley


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/groups/opt_out.



--
http://had.co.nz/

Jonathan Dobres

unread,
Jan 23, 2014, 9:18:01 PM1/23/14
to Hadley Wickham, manipulatr
I've unloaded all the extra libraries I usually load, cleared my workspace, and copied/pasted your code directly, and I still get a column of NaNs where numbers should be. At this point I have no idea what's going on. I'm also running the latest versions of R and RStudio for Mac, on Mac OS 10.7.5.

Winston Chang

unread,
Jan 23, 2014, 9:51:34 PM1/23/14
to Jonathan Dobres, Hadley Wickham, manipulatr
I get the same NaN when I use the CRAN version of dplyr. With the development version from Github, I get the correct numeric values.

-Winston


On Thu, Jan 23, 2014 at 8:18 PM, Jonathan Dobres <jdo...@gmail.com> wrote:
I've unloaded all the extra libraries I usually load, cleared my workspace, and copied/pasted your code directly, and I still get a column of NaNs where numbers should be. At this point I have no idea what's going on. I'm also running the latest versions of R and RStudio for Mac, on Mac OS 10.7.5.

--

Jonathan Dobres

unread,
Jan 23, 2014, 9:53:18 PM1/23/14
to Winston Chang, Hadley Wickham, manipulatr
Interesting. I had been using the CRAN version. I'll try the Github version tomorrow.

Jonathan Dobres

unread,
Jan 24, 2014, 11:07:22 AM1/24/14
to Winston Chang, Hadley Wickham, manipulatr
Grabbing the version on Github does seem to have resolved the problem. Looks like the issue is specific to the version on CRAN.
Reply all
Reply to author
Forward
0 new messages