group_by multiuple columns more elegantly with dplyr

95 views

Skip to first unread message

Daniel Falbel

unread,

Apr 28, 2016, 10:35:26 AM4/28/16

to manipulatr

Suppose I Have the following data.frame

df <- data.frame(
  a.1 = rep(1, 10),
  a.2 = rep(c(1,2), 5),
  a.3 = rep(c(1,2), 5),
  a.4 = rep(c(1,2), 5),
  a.5 = rep(c(1,2), 5),
  b.1 = runif(10),
  b.2 = runif(10),
  b.3 = runif(10),
  c.1 = runif(10)
)

and I want to aggregate it by many columns. I can do this:

library(dplyr)
df %>% group_by(a.1, a.2, a.3, a.4, a.5) %>%
  summarise_each(funs(sum), starts_with("b"))

Or If I dont want to hard code all column names, I can do:

grp_cols <- names(df)[names(df) %>% str_sub(1,1) == "a"]
dots <- lapply(grp_cols, as.symbol)
df %>%
  group_by_(.dots=dots) %>%
  summarise_each(funs(sum), starts_with("b"))

But I think it would be great to be able to do select columns to group_by using the same functions we can use in select like starts_with(), contains(), ends_with(), etc.
So we could use a syntax like this:

df %>% group_by(starts_with("a")) %>%
  summarise_each(funs(sum), starts_with("b"))