using a for-loop to reassign selected values within a data frame

79 views
Skip to first unread message

Bonnie Dixon

unread,
Jun 21, 2014, 11:38:56 PM6/21/14
to davi...@googlegroups.com
This seems like it should be simple, so I'm not sure why I am having trouble.  

I have a data frame of character strings, like this one, in which all values should say "ab", but some of the b's are missing, so I need to correct those values.  

df1 <- data.frame(A=c("ab", "a", "ab"), B=c("a", "ab", "a"))

Luckily, I have a list of the row indices where these errors occur in each column of the data frame.  

l1 <- list(A=2, B=c(1,3))

I tried to write a for-loop to iterate through the columns and paste in the necessary "b" at each of the rows indicated by the list:

for(i in c(1:2)) {
  df1$i[l1$i] <- paste0(df1$i[l1$i], "b")
}

But my for-loop is returning an error:

Error in `$<-.data.frame`(`*tmp*`, "i", value = character(0)) : 
  replacement has 0 rows, data has 3

What am I doing wrong?

Bonnie

Bonnie Dixon

unread,
Jun 21, 2014, 11:41:20 PM6/21/14
to davi...@googlegroups.com
Opps.  That for-loop should be:

for(i in names(l1)) {
  df1$i[l1$i] <- paste0(df1$i[l1$i], "b")
}

Bonnie


--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Vince S. Buffalo

unread,
Jun 22, 2014, 12:29:23 AM6/22/14
to davi...@googlegroups.com
Hi Bonnie,

I might just use ifelse here:

> d <- data.frame(A=c("a", "ab", "a", "a", "c"), stringsAsFactors=FALSE)
> d$Afixed <- ifelse(d$A == "a", "ab", d$A)
> d
   A Afixed
1  a     ab
2 ab     ab
3  a     ab
4  a     ab
5  c      c

Hope this helps,
Vince


--
Vince Buffalo
Ross-Ibarra Lab (www.rilab.org)
Plant Sciences, UC Davis

Bonnie Dixon

unread,
Jun 22, 2014, 3:21:32 PM6/22/14
to davi...@googlegroups.com
That approach still returned an error when I tried to automate it within a for-loop.  (The reason why I am trying to automate this process is because the real data frame that I need to do this manipulation on has many columns that need to be changed and also some columns that I don't want to change.)

 df1 <- data.frame(A=c("ab", "a", "ab"), B=c("a", "ab", "a"),
                  stringsAsFactors=F)
 l1 <- list(A=2, B=c(1,3))

 for(i in names(l1)) {
 df1$i <- ifelse(df1$i=="a", "ab", df1$i)
 }
 
 Error in `$<-.data.frame`(`*tmp*`, "i", value = logical(0)) : 
  replacement has 0 rows, data has 3 

But, I have figured out what was wrong.  Apparently using $ indexing was the problem, because when I switch to bracket indexing within the for-loop, either this approach, or the original approach that I tried, works fine.

for(i in names(l1)) {
  df1[ , i] <- ifelse(df1[ , i]=="a", "ab", df1[ , i])
}

for(i in names(l1)) {
  df1[l1[[i]], i] <- paste0(df1[l1[[i]], i], "b")
}

The same seems to be true for the apply() statements I had tried.  They work with bracket indexing, but not with the use of the dollar sign operator.  If anyone can explain why the $ doesn't work within a for-loop or apply() statement, I'd love to know the reason.

Bonnie

Vince S. Buffalo

unread,
Jun 22, 2014, 3:31:47 PM6/22/14
to davi...@googlegroups.com
Hi Bonnie,

Sorry, I missed that you needed to apply this to many columns. Yes, dollar signs accessor will not work for this because it's interpreting the i literally — looking for a column named "i". You do need to use square brackets if i is a variable pointing to a column name as a string/character vector.

Vince

Vince S. Buffalo

unread,
Jun 22, 2014, 3:40:33 PM6/22/14
to davi...@googlegroups.com
Also, I think this might be the cleanest way to do it:

>  d <- data.frame(A=c("a", "ab", "a", "a", "c"), B=c("a", "x", "ab", "a", "a"), C=runif(5), stringsAsFactors=FALSE)
> bad_cols <- c("A", "B")
> dcopy <- d
> dcopy[bad_cols] <- lapply(dcopy[bad_cols], function(x) ifelse(x == "a", "ab", x))
> dcopy
   A  B          C
1 ab ab 0.44395754
2 ab  x 0.04420908
3 ab ab 0.25378545
4 ab ab 0.99977372
5  c ab 0.06727521

I've used a copy of the original dataframe so it could be compared to the original (since this overwrites existing columns).

This solution exploits the fact that data.frames are just lists under the hood. They need to be lists, because data.frames can have columns of heterogeneous data type and lists are R's primary structure for storing data with heterogeneous type. 

> is.list(d)
[1] TRUE

HTH,
Vince

Bonnie Dixon

unread,
Jun 22, 2014, 5:04:26 PM6/22/14
to davi...@googlegroups.com
Yes, that is a great way to do it.  For my real task I needed to replace the ifesle() statement with gsub() and use a regular expression, but I got it to work, and that's what counts!

Thanks also for the explanation regarding the $ accessor, Vince.  That will be easier to remember now that I know the reason why it didn't work.

Bonnie
Reply all
Reply to author
Forward
0 new messages