filter multiple columns by the same condition using dplyr

4,258 views
Skip to first unread message

Jane

unread,
Nov 13, 2014, 1:50:13 PM11/13/14
to davi...@googlegroups.com
Dear all,

I want to filter multiple columns in a data.frame by the same condition using dplyr. The column names follow the pattern of X1, X2, X3...  I tried using regular expression, which I'm not familiar with, to solve this problem. My code is awkward and does not work. 

# Here is a sample data.frame
df.names <- c("ID", paste0("x", 1:10))
df.names
df <- data.frame(matrix(sample(1:110), nrow = 10, ncol = 11))
names(df) <- df.names

library(dplyr)
df.clean <- df %>%
  filter(ID > 5) %>%
# x2 - x10 > 10
  filter(grep("^Recipient[2-6]", names(df), value = TRUE) > 10)

It returned an error "Error: incorrect length (0), expecting: 9"

Does anyone have any idea on how to do this? I don't have to use regular expressions. All approaches are helpful.

Thanks,
Jane

Patrick Kilduff

unread,
Nov 13, 2014, 2:29:29 PM11/13/14
to davi...@googlegroups.com
Hi Jane,

I think the code below will do what you want (sans dplyr and pipes...), which is to return the rows where the "x" columns have values > 10:

set.seed(11)

df_names <- c("ID", paste0("x", 1:10))
df_names

df <- data.frame(matrix(sample(1:110), nrow=10, ncol=11))

names(df)<-df_names

# return only the rows for the "x" columns that have values > 10
df[which(apply(df[-1], 1, function(x) min(x) > 10)),]

# return only the rows for "the_cols" columns that have values > 10
the_cols <- c("x1", "x2", "x3")
df[which(apply(df[,the_cols], 1, function(x) min(x) > 10)),]

I hope that helps and that I understood your question.

Thanks,
Patrick

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.



--
Patrick Kilduff

Department of Wildlife, Fish and Conservation Biology
University of California Davis

Vince S. Buffalo

unread,
Nov 13, 2014, 2:39:04 PM11/13/14
to davi...@googlegroups.com
Hi Jane,

Because of the non-standard way dplyr evaluates arguments, this won't work. This is a very important thing to remember about dplyr — because of this non-standard evaluation, unfortunately you can't count on R code working the way you think it might.

Patrick's approach works, but from the structure of your question I wonder if maybe the underlying issue might be your data is in a wide format now, and it might be better if it were in a long format. For example:

library(reshape2)
> head(melt(df, id.vars='ID'))
  ID variable value
1 42       x1    33
2 48       x1     2
3 12       x1    91
4 35       x1   105
5 13       x1    88
6 85       x1    90

library(dplyr)
> head(filter(melt(df, id.vars='ID'), ID > 5, value > 10))
  ID variable value
1 42       x1    33
2 12       x1    91
3 35       x1   105
4 13       x1    88
5 85       x1    90
6 47       x1   104

You'd want to assign the melted dataframe to a different variable in your own code. If this fits your particular problem, it might be the cleanest solution. Column names like x1, x2, x3, ... are usually an indication your data needs to be restructured.

HTH,
Vince


Brandon Hurr

unread,
Nov 13, 2014, 2:41:13 PM11/13/14
to davi...@googlegroups.com
I thought I could be sneaky and build up the call using paste, parse and eval, but I think it doesn't work due to lazy evaluation, which I'm less certain how to work with. 

This gives you the thing you want in filter()
paste(names(df)[2:6], "> 10", collapse = ", ")
# "x1 > 10, x2 > 10, x3 > 10, x4 > 10, x5 > 10"

But, Parse doesn't like the commas:
df.clean <- df %>%
filter(ID > 5) %>%
filter(eval(parse(text = paste(names(df)[2:6], ">10", collapse = ", "))))
Error in parse(text = paste(names(list(ID = c(67L, 84L, 20L, 13L, 102L,  : 
  <text>:1:7: unexpected ','
1: x1 >10,
          ^

If you build the entire filter() statement and parse that... 
df.clean <- df %>%
filter(ID > 5) %>%
eval(parse(text = paste("filter(", paste(names(df)[2:6], "> 10", collapse = ", "), ")", sep = "")))
Error in eval(`df %>% filter(ID > 5)`, parse(text = paste("filter(", paste(names(df)[2:6],  : 
  invalid 'envir' argument of type 'expression'

It looks like it's not being evaluated in time. 

You could build what you want by pasting it and then copying it into your filter statement:
paste(names(df)[2:6], "> 10", collapse = ", ")
# "x1 > 10, x2 > 10, x3 > 10, x4 > 10, x5 > 10"
df.clean <- df %>%
filter(ID > 5, x1 > 10, x2 > 10, x3 > 10, x4 > 10, x5 > 10)

This sort of thing could be a nice addition to the syntax though. You can select multiple columns using this sort of syntax:

df.clean <- df %>%
filter(ID > 5) %>%
select(2:6)

Why couldn't this be extended to filter?
#doesn't work
df.clean <- df %>%
filter(ID > 5, 2:6 > 10)

B

Brandon Hurr

unread,
Nov 13, 2014, 2:44:22 PM11/13/14
to davi...@googlegroups.com
Addendum:

If you build the entire statement it does work, but this severely cramps your ability to read the code...
eval(parse(text = paste("df.clean <- df %>% filter(ID > 5) %>% filter(", paste(names(df)[2:6], "> 10", collapse = ", "), ")", sep = "")))
Reply all
Reply to author
Forward
0 new messages