Passing a function with dots to dplyr::summarise

482 views
Skip to first unread message

Ben Bond-Lamberty

unread,
Nov 14, 2014, 10:28:49 PM11/14/14
to manip...@googlegroups.com
I've been experimenting with dplyr, reading about NSE, and googling, but don't understand why this simple example doesn't work. I'd like to have summarise use an arbitrary function that might take extra parameters (e.g., weights). It works fine with aggregate, but not dplyr, and I'd appreciate if someone can explain why. Simple example:

library(dplyr)
d <- data.frame(x=c(1,1,2,2), y=1:4)  # sample data

f_dplyr <- function(d, FUN, ...) {
group_by(d, x) %>% summarise(y=FUN(y, ...))
}

f_aggregate <- function(d, FUN, ...) {
aggregate(y~x, d, FUN=FUN, ...)
}

print(f_aggregate(d, weighted.mean, c(1,2)))  # this works

print(f_dplyr(d, weighted.mean, c(1,2)))  # this does not--no output


Thanks,
Ben

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.3.0.9000

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.2 Rcpp_0.11.3    tools_3.1.2   


Brandon Hurr

unread,
Nov 15, 2014, 12:18:12 AM11/15/14
to Ben Bond-Lamberty, manipulatr
Ben, 

It's just a hunch because of other issues I've seen using dplyr with column names that contain ".", but I think the reason it's not working is for the same reason. Your function works fine with "mean", "sd", but I got an error with "length", "min" and "max".
> f_dplyr(d, length)
Error in FUN(1:2, ) : argument 2 is empty

I believe it's the ".", but I also can't explain why length, min, and max think they need another argument.

B





--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Dennis Murphy

unread,
Nov 15, 2014, 12:23:07 AM11/15/14
to Ben Bond-Lamberty, manipulatr
Hi Ben:

This appears to work in version 0.3 with R-3.1.2 on a Windows 7 box:

f_dplyr <- function(d, FUN, ...)
{
require(dplyr, quietly = TRUE)
d %>% group_by(x) %>% do(data.frame(newvar = FUN(.$y, ...)))
}

> f_dplyr(d, FUN = weighted.mean, w = c(1, 2)/3)
Source: local data frame [2 x 2]
Groups: x

x newvar
1 1 1.666667
2 2 3.666667


When moving through the pipeline, each function call is supposed to
output a data frame or tbl object; weighted.mean() returns a numeric
vector, so you have to assign its result to a variable inside a data
frame or you'll get errors like

Error: Results are not data frames at positions: 1, 2

Dennis

On Fri, Nov 14, 2014 at 7:28 PM, Ben Bond-Lamberty <bpb...@gmail.com> wrote:

Brandon Hurr

unread,
Nov 15, 2014, 11:00:24 AM11/15/14
to Dennis Murphy, Ben Bond-Lamberty, manipulatr
Dennis, 

Thanks for explaining. I've only use do() once for a similar situation but didn't remember to apply it here. This makes all of the functions work that I mentioned. 

Very cool. 
B

Ben Bond-Lamberty

unread,
Nov 15, 2014, 3:08:20 PM11/15/14
to Brandon Hurr, Dennis Murphy, manipulatr
That's interesting--thanks to you both. Much appreciated.
Ben
Reply all
Reply to author
Forward
0 new messages