I'm trying to vectorize a loop that processes rows of a dataframe. It
involves lots of conditionals, such as "If column 10 == 3, and if column
3 is True, and both column 5 and 6 are False, then set column 4 to True".
So, for example, any ideas about vectorizing the following?
df = data.frame( list(a=c(1,2,3,4), b=c("a","b","c","d"), c=c(T,F,T,F),
d=NA, e=c(F,F,T,T)) )
for (i in 1:nrow(df)) {
if (df[i,3] %in% c(FALSE,NA) & (df[i,1] > 2 | df[i,5]) ) {
df[i,4] = 1
}
if (df[i,5] %in% c(TRUE, NA) & df[i,2] == "b") {
df[i,4] = 2
df[i,5] = T
}
}
Thanks,
Allie
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
> Hello Folks,
>
> I'm trying to vectorize a loop that processes rows of a dataframe. It
> involves lots of conditionals, such as "If column 10 == 3, and if column
> 3 is True, and both column 5 and 6 are False, then set column 4 to True".
>
> So, for example, any ideas about vectorizing the following?
>
> df = data.frame( list(a=c(1,2,3,4), b=c("a","b","c","d"), c=c(T,F,T,F),
> d=NA, e=c(F,F,T,T)) )
>
> for (i in 1:nrow(df)) {
>
> if (df[i,3] %in% c(FALSE,NA) & (df[i,1] > 2 | df[i,5]) ) {
> df[i,4] = 1
> }
>
> if (df[i,5] %in% c(TRUE, NA) & df[i,2] == "b") {
> df[i,4] = 2
> df[i,5] = T
> }
>
> }
Your code attempts to do some things with NA that won't behave the way
you expect them to. Specifically, you cannot use %in% to test for NA,
and you cannot give the "if" function an NA. It only appears to work
because you don't actually give it a complete set of test values
consistent with your tests in the loop. My guess at your intent is:
df <- data.frame( list( a=c(1,2,3,4,5)
, b=c("a","b","c","d","e")
, c=c(TRUE,FALSE,TRUE,FALSE,NA)
, d=NA
, e=c(FALSE,FALSE,TRUE,TRUE,NA)
) )
tmpdf <- df
for (i in 1:nrow(df)) {
if ( ( is.na(df[i,3]) || !df[i,3] ) && ( df[i,1] > 2 || ( is.na(
df[i,5] ) || df[i,5] ) ) ) {
df[i,4] <- 1
}
if ( ( is.na( df[i,5] ) || df[i,5] ) && df[i,2] == "b" ) {
df[i,4] <- 2
df[i,5] <- TRUE
}
}
df2 <- df
df <- tmpdf
# intermediate logical vectors for clarity
tmp <- ( is.na(df[[3]]) | !df[[3]] ) & ( df[[1]] > 2 | df[[5]] )
tmp2 <- ( is.na(df[[5]]) | df[[5]] ) & df[[2]] == "b"
df[ tmp, "d" ] <- 1
df[ tmp2, "d" ] <- 2
df[ tmp2, "e" ] <- TRUE
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdne...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Hi.
Try the following.
cond1 <- (df[,3] %in% c(FALSE,NA)) & (df[,1] > 2 | df[,5])
df[,4] <- ifelse(cond1, 1, df[,4])
cond2 <- (df[,5] %in% c(TRUE, NA)) & (df[,2] == "b")
df[,4] <- ifelse(cond2, 2, df[,4])
df[,5] <- ifelse(cond2, TRUE, df[,5])
Hope this helps.
Petr Savicky.
> On Tue, 7 Feb 2012, Alexander Shenkin wrote:
>
>> Hello Folks,
>>
>> I'm trying to vectorize a loop that processes rows of a dataframe.
>> It
>> involves lots of conditionals, such as "If column 10 == 3, and if
>> column
>> 3 is True, and both column 5 and 6 are False, then set column 4 to
>> True".
>>
>> So, for example, any ideas about vectorizing the following?
>>
>> df = data.frame( list(a=c(1,2,3,4), b=c("a","b","c","d"),
>> c=c(T,F,T,F),
>> d=NA, e=c(F,F,T,T)) )
>>
>> for (i in 1:nrow(df)) {
>>
>> if (df[i,3] %in% c(FALSE,NA) & (df[i,1] > 2 | df[i,5]) ) {
>> df[i,4] = 1
>> }
>>
>> if (df[i,5] %in% c(TRUE, NA) & df[i,2] == "b") {
>> df[i,4] = 2
>> df[i,5] = T
>> }
>>
>> }
>
> Your code attempts to do some things with NA that won't behave the way
> you expect them to. Specifically, you cannot use %in% to test for NA,
Huh?
> NA %in% NA
[1] TRUE
> NA %in% c(5, NA)
[1] TRUE
> NA %in% c(5, 6)
[1] FALSE
--
David.
David Winsemius, MD
West Hartford, CT
Sorry, SQL rules bleeding through... %in% is clearly more forgiving in R
than IN is in SQL. However, the second if did check whether df[i,5] was
NA, yet the first if did not. Since comparisons with NA are neither false
nor true that test failed.
> NA | 1
[1] TRUE
> NA & 1
[1] NA
> NA > 1
[1] NA