group_by multiuple columns more elegantly with dplyr

95 views
Skip to first unread message

Daniel Falbel

unread,
Apr 28, 2016, 10:35:26 AM4/28/16
to manipulatr
Suppose I Have the following data.frame

df <- data.frame(
  a
.1 = rep(1, 10),
  a
.2 = rep(c(1,2), 5),
  a
.3 = rep(c(1,2), 5),
  a
.4 = rep(c(1,2), 5),
  a
.5 = rep(c(1,2), 5),
  b
.1 = runif(10),
  b
.2 = runif(10),
  b
.3 = runif(10),
  c
.1 = runif(10)
)

and I want to aggregate it by many columns. I can do this:

library(dplyr)
df
%>% group_by(a.1, a.2, a.3, a.4, a.5) %>%
  summarise_each
(funs(sum), starts_with("b"))

Or If I dont want to hard code all column names, I can do:

grp_cols <- names(df)[names(df) %>% str_sub(1,1) == "a"]
dots
<- lapply(grp_cols, as.symbol)
df
%>%
  group_by_
(.dots=dots) %>%
  summarise_each
(funs(sum), starts_with("b"))

But I think it would be great to be able to do select columns to group_by using the same functions we can use in select like starts_with(), contains(), ends_with(), etc.
So we could use a syntax like this:

df %>% group_by(starts_with("a")) %>%
  summarise_each
(funs(sum), starts_with("b"))

What do you think?


Reply all
Reply to author
Forward
0 new messages