Subsetting a data frame using a critiera: exclude if the sum of row or column equals zero; if the rowname has certain characters present

2,068 views
Skip to first unread message

Kousei Perales

unread,
Dec 10, 2014, 8:48:44 PM12/10/14
to davi...@googlegroups.com
Hi gang,

New to the R User's Group and have heard only good things about the community.

So here is my issue. I would like to subset rows and columns that meet a certain condition into a new data frame. If the sum of the row or column = 0, then it is not included in the new data frame. Additionally, I would like to subset data based on a criteria looking at the rownames (or I guess it could be just a site name in a column). This seems like a fairly straight forward problem but I am pretty new to R... Any help would be greatly appreciated. I hope the code below makes it clear what I am asking.

My data looks something like the data frame produced below.

col1 <- c(1,0,0,2,5,1,0)
col2 <- c(0,0,1,1,1,0,0)
col3 <- c(0,0,0,0,0,0,0)
col4 <- c(0,0,1,1,0,1,0)
rownames(testdataframe) <- c("CA1","CA2","CA3","CA4","LN1","LN2","LN3")
testdataframe <- data.frame(col1, col2, col3, col4)

testdataframe_subset <- testdataframe[-(if sum of row = 0, exclude),-(if sum of column = 0, exclude)]

So, testdataframe_subset would look like testdataframe[c(1,3:7),c(1,2,4)]

testdataframe_subset_ca <- testdataframe_subset[(if first two letters in rowname = "CA")]

So, testdataframe_subset_ca would look like testdataframe_subset[c(1:3),]

Perhaps the solution is something like all[which(condition)] but I am not sure.

Noam Ross

unread,
Dec 10, 2014, 11:32:56 PM12/10/14
to davi...@googlegroups.com

Kousei,

Your instincts are very close. Here’s a command for excluding zero-sum rows or columns.

testdataframe_subset = testdataframe[!(rowSums(testdataframe) == 0), !(colSums(testdataframe) == 0)]

The colSums and rowSums do what you’d expect. Note that you don’t need which here. which translates the vectors of TRUE/FALSE results returned by the x == 0 commands to numbers, but you can subset based on TRUE/FALSE as well as numbers. Note that ! reverses the TRUE/FALSE values.

To subset by rowname:

testdataframe_subset_ca = testdataframe_subset[grepl("^CA", rownames(testdataframe_subset)),]

grepl searches for it’s first argument in all the values of the second argument and returns TRUE/FALSE values. Note that "^CA" is a regular expression meaning “starts with CA”. See ?regex for more details.

Noam


--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Kousei Perales

unread,
Dec 11, 2014, 12:07:50 AM12/11/14
to davi...@googlegroups.com
Beautiful! Thank you for the solution! Additionally, thank you for providing some background and recommendations for help pages. 
Reply all
Reply to author
Forward
0 new messages