Using dplyr::summarise within a function

5,581 views
Skip to first unread message

Sam Albers

unread,
Jan 8, 2016, 3:31:07 PM1/8/16
to manipulatr
Hello all,

I am writing to write a function that summarises some correlation
statistics using dplyr. Using the iris dataset as an example, I am
getting an error that I am having difficultly diagnosing. Basically,
I'd like to input into a function two variables at a species factor
level and have an output that look like this (depending on the
specific factor(s) specified:

library(dplyr)

irisL <- iris %>%
filter(Species=="setosa")

irisL %>%
group_by(Species) %>%
summarise(rho=cor.test(Petal.Length, Sepal.Width, method="spearman")$estimate,
pval=cor.test(Petal.Length, Sepal.Width, method="spearman")$p.value,
x=mean(Petal.Length, na.rm=TRUE),
y=mean(Sepal.Width, na.rm=TRUE)
)

But if I try to generalized this with a function like so:

corSummary<- function(Data, x, y,lakename){
DataL <- Data %>%
filter(Species %in% lakename)

cordf<-DataL %>%
group_by(Species) %>%
summarise(rho=substitute(round(cor.test(x, y,
method="spearman"$estimate),3)),
pval=substitute(round(cor.test(x, y,
method="spearman"$p.value),3)),
x=substitute(round(mean(x, na.rm=TRUE),3)),
y=substitute(round(mean(y, na.rm=TRUE),3))
)
}


corSummary(iris, Sepal.Width, Petal.Length,'setosa')


I get this error:

Error in summarise_impl(.data, dots) : object 'Sepal.Width' not found

Can anyone recommend a way around this issue such that I can use all
the power of summarise within a function?

Thanks in advance,

Sam

jim holtman

unread,
Jan 8, 2016, 5:00:35 PM1/8/16
to Sam Albers, manipulatr
I think you have to qualify where your data is coming from:

corSummary(iris, iris$Sepal.Width, iris$Petal.Length, 'setosa')​


Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Sam Albers

unread,
Jan 8, 2016, 5:07:41 PM1/8/16
to jim holtman, manipulatr
Hi Jim,

Thanks for the response. Unfortunately this gives me an error variant
that suggests that root problem is still present:

Error: not a vector
Called from: summarise_impl(.data, dots)


Thank you though.

Sam

Hadley Wickham

unread,
Jan 8, 2016, 5:40:57 PM1/8/16
to Sam Albers, manipulatr
Have you read https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html ?
Hadley
> --
> You received this message because you are subscribed to the Google Groups "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at https://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Sam Albers

unread,
Jan 8, 2016, 6:24:28 PM1/8/16
to Hadley Wickham, manipulatr
Thanks for the response Hadley. I had not yet read that vignette but
now I have.

Unfortunately this has not turned the brain light on for me yet. I
tried to generate a working example using what I Iearned from your
link have so far failed to generate anything other than the same error
message I've been getting. I feel like I am close here but just can't
quite wrap my brain around why this isn't working:

testfunc<-function(Data, x,y){
Data %>%
summarise_(interp(~ f(x, y, method="spearman")$estimate, f = quote(cor.test)))
}

testfunc(mtcars, mpg, disp)

Error in summarise_impl(.data, dots) : object 'disp' not found

I've clearly not made it too far as the NSE version of summarise gives
me the same error message:

testfunc<-function(Data, x,y){
Data %>%
summarise(cor.test(x,y)$estimate)
}

testfunc(mtcars, mpg, disp)

I've minimized my example to hopefully generate a response and because
it narrow the problem.

Thanks in advance!

Sam

Hadley Wickham

unread,
Jan 8, 2016, 7:02:00 PM1/8/16
to Sam Albers, manipulatr
I'm on my phone so I can't answer fully but you need to handle x and y. You can just pass them along. I have a good example in the ggplot2 book that I should pull out into this vignette. 

Hadley


--
http://had.co.nz/

Alain Content

unread,
Jan 9, 2016, 5:33:32 AM1/9/16
to Hadley Wickham, Sam Albers, manipulatr
Hi, 

This seems to work : 

testfunc<-function(Data, x,y){
Data %>%
mutate_(Vx=x, Vy=y) %>% summarise(cor(Vx, Vy))
}

but I *really* would like to understand why :)

Alain 

Hadley Wickham

unread,
Jan 9, 2016, 9:02:24 AM1/9/16
to Alain Content, Sam Albers, manipulatr
I'd say that's just luck. Start by requiring x and y in testfunc to be formulas. 

Hadley


--
http://had.co.nz/

Alain Content

unread,
Jan 9, 2016, 10:17:27 AM1/9/16
to Hadley Wickham, Sam Albers, manipulatr
Hadley,

Have read the vignette mentionned above and the examples in the ggplot2 book, but still I don’t understand what you mean. 

I tried this, which works, and seems to me to make sense : 

testfunc1 <-function(data, x, y){
     summarise_(data, M = substitute(mean(x)))
}
 
testfunc1(mtcars, mpg, cyl)
         M
1 20.09062

However, it fails when the function code includes a pipe : 

 
testfunc2 <-function(data, x, y){
    data %>% summarise_(M = substitute(mean(x)))
}
 
testfunc2(mtcars, mpg, cyl)

Error: object 'x' not found 

And it also does not recognize cor() (which is in the stats package - is that why ?)

testfunc3 <-function(data, x, y){
    summarise_(data, r = substitute(cor(x, y)))
}

testfunc3(mtcars, mpg, cyl)

Error: could not find function "cor" > 

Sam Albers

unread,
Jan 11, 2016, 11:39:34 AM1/11/16
to Alain Content, Hadley Wickham, manipulatr
Hello all,

I've spent part of the weekend trying to figure this issue out as I'd
really love to get my function working. However, running through
Hadley's and Alain's comments, I am still running into the function
not calling the object or at least not recognizing it properly.
Variations on this error message:

Error in as.lazy_dots(list(...)) : object 'mpg' not found
Called from: as.lazy_dots(list(...))

I has also read the vignette and I thought I understood the process of
requiring x and y to be functions. But applying that (incorrectly
obviously) like so:

testfunc3 <-function(data, x, y){
summarise_(data, r = substitute(~cor(x, y)))
}

testfunc3(mtcars, mpg, disp)

Gets me an error message like so:

Error: expecting result of length one, got : 2
Called from: summarise_impl(.data, dots)

That led me to this thread on github:
https://github.com/hadley/dplyr/issues/300

That led me to try this:

testfunc3 <-function(data, x, y){
group_by_(substitute(cyl)) %>%
summarise_(data, r = substitute(~cor(x, y)))
}

(mtcars, mpg, disp)

With this error message:

Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "name"
Called from: group_by_(substitute(cyl))

So at this point I am thoroughly confused. Any assistance here would
be much appreciated and I do think that it would be broadly applicable
and other folks would have a similar question.

Thanks in advance!

Sam

Alain Content

unread,
Jan 11, 2016, 1:46:20 PM1/11/16
to Sam Albers, Hadley Wickham, manipulatr
Hello Sam, 

This works for me - does it help ?

library(dplyr)
testfun <- function(data,x,y){
summarise_(data, substitute(stats::cor(x, y)))
}
testfun(mtcars, mpg, disp)

There seems to be some namespace issue that causes cor() not to be recognized within summarise_(). Haven’t got time to try and investigate further. I hope Hadley will find the time to explain.

Best, 

Alain 

ALAIN CONTENT
Laboratory Cognition Language & Development  – http://crcn.ulb.ac.be/lcld
ULB Neuroscience Institute - Centre for Research in Cognition & Neuroscience

Sam Albers

unread,
Jan 11, 2016, 2:14:12 PM1/11/16
to Alain Content, Hadley Wickham, manipulatr
Hi Alain,

It does help a bit. This spurred enough to figure out that really it
seems like the pipe is the issue. My main functionality is to use a
function with a grouping variable. I'm able to accomplish that like
this:

testfunGroupNoPipe <- function(data,x,y){
summarise_(group_by(data, cyl), substitute(stats::cor(x, y)))
}
testfunGroup(mtcars, mpg, disp)

However if at any stage I want to use a pipe then I run into problem. See:

testfunPipe <- function(data,x,y){
data %>%
summarise_(substitute(stats::cor(x, y)))
}
testfunPipe(mtcars, mpg, disp)

Or with a grouping variable:

testfunGroupPipe <- function(data,x,y){
data %>%
group_by_(substitute(cyl)) %>%
summarise_(substitute(stats::cor(x, y)))
}
testfunGroupPipe(mtcars, mpg, disp)

I've come to really enjoy using pipes mostly for readability and
eliminate interim dataframes. This would especially be useful in
functions.

Thank you for your efforts here. I too am very curious what I am doing wrong.

Sam

Hadley Wickham

unread,
Jan 14, 2016, 6:25:56 PM1/14/16
to Sam Albers, Alain Content, manipulatr
I'd highly recommend avoiding the combination of piping and NSE. And
in fact, I'd avoid trying to do multiple things on one line in the
first place.

Generally, it's easiest to start with a SE function of the function,
i.e. something where the user has to explicitly quote the inputs:


```{r}
grouped_cor_ <- function(data, x, y){
x <- lazyeval::as.lazy(x)
y <- lazyeval::as.lazy(y)
cor <- lazyeval::interp(~ cor(x, y), x = x, y = y)

summarise_(group_by(data, cyl), cor)
}
grouped_cor_(mtcars, ~mpg, ~disp)
```

Then make a version that uses NSE:

```{r}
grouped_cor <- function(data, x, y) {
x <- lazyeval::lazy(x)
y <- lazyeval::lazy(y)

grouped_cor_(data, x, y)
}
grouped_cor(mtcars, mpg, disp)
```

I really need to explain this in more detail somewhere but
unfortunately I don't have the time right now.

Hadley
--
http://had.co.nz/

Sam Albers

unread,
Jan 15, 2016, 4:36:17 PM1/15/16
to Hadley Wickham, Alain Content, manipulatr
Thank you Hadley. This help managed to get my function working. Once
I've tidied it up I'll post back here for the record.

I'm amazed Hadley that you find time to comment here. Thank you for all you do.

Sam

Sam Albers

unread,
Jan 21, 2016, 12:08:52 PM1/21/16
to Hadley Wickham, Alain Content, manipulatr
Just for posterity, I thought I would post the result of everyone's
help and my efforts to create this (probably only useful to me)
function. Hopefully the thought process is useful to someone in the
future:

## Function ####
grouped_cor_ <- function(data, x, y, form){
x <- lazyeval::as.lazy(x)
y <- lazyeval::as.lazy(y); fac <- as.name(as.character(form)[2])
cor1 <- lazyeval::interp(~ cor.test(x, y,method="spearman",na.action
= "na.exclude")$estimate, x = x, y = y)
corp <- lazyeval::interp(~ cor.test(x, y,method="spearman",
na.action = "na.exclude")$p.value, x = x, y = y)
mnx <- lazyeval::interp(~ mean(x, na.rm=TRUE), x = x, y = y)
mny <- lazyeval::interp(~ mean(y, na.rm=TRUE), x = x, y = y)

summarise_( group_by_(data, fac), rho=cor1, pval=corp, xcoord=mnx, ycoord=mny)
}


corHighlight <- function(Data, x, y, form){
cordf<-grouped_cor_(Data, x = substitute(x), y = substitute(y),
form=substitute(form))
cordf$prho <- paste("rho=",round(cordf$rho,3), "\n
p-value=",round(cordf$pval,3), sep=" ")
plt<-ggplot(Data, aes_q(x = substitute(x), y = substitute(y))) +
geom_text(data=cordf, aes_q(x=substitute(xcoord),
y=substitute(ycoord),
label=substitute(prho)), colour='red') +
geom_point(size=2, alpha=0.3) +
facet_wrap(form)
print(plt)
}

##################################

Generalized to work any dataset (I hope):

corHighlight(Data=iris,
x=Petal.Width,
y=Petal.Length, form = ~Species)


corHighlight(Data=mtcars,
x=mpg,
y=hp, form = ~cyl)

Thank you for everyone's help.

Sam
Reply all
Reply to author
Forward
0 new messages