Hi there!
I'm a big plyr fan who's trying to make the switch to dplyr, but I've run into a deal-breaker issue. plyr has the ability to subset within summarise(), but it looks like dplyr doesn't. For example:
diamond.plyr <- ddply(diamonds, .(cut), summarise,
all=mean(price),
E=mean(price[color == 'E']));
Produces:
cut all E
1 Fair 4358.758 3682.312
2 Good 3928.864 3423.644
3 Very Good 3981.760 3214.652
4 Premium 4584.258 3538.914
5 Ideal 3457.542 2597.550
But, the equivalent with dplyr:
diamond.dplyr <- diamonds %.%
group_by(cut) %.%
dplyr::summarise(all=mean(price),
E=mean(price[color == 'E']));
Gives:
cut all E
1 Fair 4358.758 NaN
2 Very Good 3981.760 NaN
3 Good 3928.864 NaN
4 Premium 4584.258 NaN
5 Ideal 3457.542 NaN
Am I misunderstanding something about dplyr's syntax? Is this type of internal subsetting no longer valid, or is there another way to do it? It's a succinct shortcut that's saved me a lot of time and kept my code very readable, so despite dplyr's obvious speed improvements, I can't really switch until I figure out an equivalent.
-Jon