Conditional replacement of values in a data.frame

82 views
Skip to first unread message

Gouri Shankar Mishra

unread,
Jul 3, 2014, 12:31:22 AM7/3/14
to davi...@googlegroups.com
Hi All

I want to conditional replace the values of a vector (df$var2) based on values of df$var1. Usually, I would write that as

df$var2 [df$var1 == "x"] <- df$var3 [df$var1 == "x"]

However, the above code looks at the value of var1 in the same row. I want to look either at a latter row or a prior row. Which row is again conditional. Which functions should I look at?

Specifically, in the attached excel, where var2 gives the travel purpose of a travel, and var1 identifies a location as an "anchor" location or not.
  • Travel1 (Activity #4) is a home to office trip, hence travel purpose (var2) is based on activity in office (var3, "work").
  • Similarly Travel2 (Activity#6) is a travel to work.
  • In contrast, Travel3 (Activity#10) depends upon prior value of var2 because the destination location is Home. Hence the travel purpose is based on activity in the prior anchor location (work in office).
DRUGJul2.xlsx

Noam Ross

unread,
Jul 3, 2014, 12:19:02 PM7/3/14
to Davis R Users Group

Tricky! To clarify, I think the problem is the following:

TravelPurpose should be “Work” IF

  • The LAST location which is an anchor location is “Office” OR
  • The Previous location which is an anchor location is “Office”

This seems like a good place to break out the old for loop, as each iteration depends on values before and after it. There may be a more elegant vectorized solution, but this works:

# Load data and empty var1. Make sure to encode strings as characters, factors
# will mess up the logic below
df = read.csv("DRUGJul2.csv", stringsAsFactors=FALSE)
df$var1=character(11)                                 

# Create vectors with the indices of anchor locations and travel activities
anchors = which(df$var2==1) 
travels = which(df$var3=="Travel") 

#Iterate through "travel" locations
for(i in travels) {
  last_anchor = df$location[max(anchors[anchors < i])]  #Get last anchor loc
  next_anchor = df$location[min(anchors[anchors > i])]  #Get next anchor loc
  if("Office" %in% c(last_anchor, next_anchor)) {       #Are either "Office"?
    df$var1[i] = "Work"
  }
}

You’ll get warnings if you have “Travel” values in the first and last rows but it should still work.

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Gouri Shankar Mishra

unread,
Jul 5, 2014, 9:45:02 PM7/5/14
to davi...@googlegroups.com
Thanks Noam
Reply all
Reply to author
Forward
0 new messages