passing variable names to dplyr

8,554 views
Skip to first unread message

Roger Bos

unread,
Jan 27, 2014, 2:50:59 PM1/27/14
to manip...@googlegroups.com

All,

I would like to figure out how to pass variable names to the dplyr function mutate.  For example, this works because hp is one of the variable names on mtcars:

mutate(mtcars, scale(hp))

Let's says I want to pass in the target variable instead of hard-coding the name, as follows:

target <- "hp"

mutate(mtcars, scale(target))

That dones't work.  I read somewhere about using lapply, but that suggestion didn't work for me either:

target <- lapply("hp", as.symbol)

mutate(mtcars, scale(target))

Does anyone know how to do this?

P.S.  I originally posted this on R-help.  I normally would not cross-post, but someone suggested that this was a better place for this question.

Thanks,

Roger

Hadley Wickham

unread,
Jan 27, 2014, 2:55:44 PM1/27/14
to Roger Bos, manipulatr
Hi Roger,

There's currently no built-in support, although you can use existing R
tools to generate a call:

call <- substitute(mutate(mtcars, scale(target)), list(target =
as.name(target)))
eval(call)

(more info at http://adv-r.had.co.nz/Computing-on-the-language.html)

Figuring out how to express problem like this elegantly (and
efficiently) is on my long-term to do list for dplyr.

Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/groups/opt_out.



--
http://had.co.nz/

Roger Bos

unread,
Jan 27, 2014, 3:17:37 PM1/27/14
to manip...@googlegroups.com, Roger Bos
Hadley,

Thanks for the quick reply and the suggestion about substitute.  It works fine for my needs.

Thanks so much for the dplyr package.

Roger

Ryan Kelly

unread,
Nov 22, 2014, 9:17:08 AM11/22/14
to manip...@googlegroups.com, roge...@gmail.com
Thank you! this works great for now. 


On Monday, January 27, 2014 2:55:44 PM UTC-5, Hadley Wickham wrote:

Christopher Wright

unread,
Jul 30, 2015, 12:08:15 AM7/30/15
to manipulatr, roge...@gmail.com, h.wi...@gmail.com
I wonder if this is the same issue (still coming to grips with the scoping / eval rules in R - starting to read The-Book!

I have a dataframe (main_schools), with one column named "school", and a number of other columns. This dataframe is defined at the top level.

I want to summarise by school, and a number of other columns, so I thought I would write a function:

test <- function(col_name) {
    summarise(group_by(main_schools, school, col_name), freq = n())
}

and then pass in many <col_name> values

but that doesn't work :

> test(facilities)
 Error: unknown column 'col_name'

I expected (wrongly - and I'm trying to educate myself), that <col_name> would be evaluated in the environment of the function, and the value <safe> be found.

So is my alternative to just copy/paste
    summarise(group_by(main_schools, school, col_name), freq = n())

with a different value for "col_name" each time, like:

    summarise(group_by(main_schools, school, facilities), freq = n())
    summarise(group_by(main_schools, school, transport), freq = n())

etc etc?

dplyr is a wonderful package, and I'm very grateful!

with thanks

Chris

Doug Mitarotonda

unread,
Jul 30, 2015, 1:52:08 AM7/30/15
to Christopher Wright, manipulatr, roge...@gmail.com, Hadley Wickham
Read up on the “_” versions of all of the dplyr functions to understand how to program on the language. 

For example (note how most people who use dplyr use the %>% function instead of the nesting you used):

dft <- data_frame(A = sample(1:3, 10, TRUE), B = sample(1:10, 10, TRUE))
col_name <- "A"
dft %>% group_by_(col_name) %>% summarize(freq = n())
Source: local data frame [3 x 2]

  A freq
1 1    2
2 2    4
3 3    4

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Christopher Wright

unread,
Jul 30, 2015, 2:31:03 AM7/30/15
to manipulatr, roge...@gmail.com, h.wi...@gmail.com, dougmit...@gmail.com
Thanks Doug,

So now we seem to have another scoping / laziness issues:

> test <- function(col_name) {
+     summarise(group_by_(main_schools, school, col_name), freq = n())
+ }
> test("site")

 Error in as.lazy_dots(list(...)) : object 'school' not found

I'm off to read about evaluation (lazy evaluation ! ) and scoping in R!

Chris

Doug Mitarotonda

unread,
Jul 30, 2015, 2:43:57 AM7/30/15
to Christopher Wright, manipulatr, roge...@gmail.com, Hadley Wickham
Consider what the _ does, it switches *all arguments* to standard evaluation. You can’t mix and match standard and non-standard evaluation (how would R know which to do for each argument?). If you put “school”, i.e., as a string, your code should work.

> dft <- data_frame(A = sample(1:3, 10, TRUE), B = sample(1:10, 10, TRUE), C = sample(4:6, 10, TRUE))
> col_name <- "A"
> dft %>% group_by_(col_name, "B") %>% summarize(freq = n())
Source: local data frame [9 x 3]
Groups: A

  A  B freq
1 1  2    1
2 1  6    1
3 1  9    1
4 2  1    1
5 2  2    1
6 2  7    1
7 2 10    1
8 3  4    1
9 3  5    2
Reply all
Reply to author
Forward
0 new messages