On Aug 1, 2014, at 12:25 PM, Juan Manuel Truppia wrote:
> Hi, this sounds quite easy, but I couldn't find how to do it cleanly after searching the docs and the web.
>
> I'm used to updating (or mutating) a subset of a tbl using data.table quite easily like this dt[cond_to_change == TRUE, col_to_change := val]
The `==TRUE` looks superfluous.
If one were working with dataframes wouldn't this be equivalent to:
df[ with(df, cond_to_change) & !
is.na(cond_to_change), col_to_change ] <- val
> I tried doing something similar with dplyr, but had to resort to using ifelse, and it brings 2 problems
> • Sometimes the value can't be computed for that row (e.g., log of a neg number)
> • ifelse sometimes strips classes and is slow
> Is there a dplyr way to do this cleanly? Something like a subset argument to mutate?
I remember a contribution by G.Grothendieck to SO in a question about implementing self-reference outside of data.table:
stackoverflow.com/questions/7768686/r-self-reference/7769296
... that suggestion led me to target subsets in what seemed to be an an elegant manner using an infix operator:
http://stackoverflow.com/questions/7768686/r-self-reference/7769296#7769296
So playing around to see if this sort of think worked with data.tables:
sel.cond.set.val <- function(dt, cond, target, val) {
eval.parent(substitute(dt[cond, target := val])) }
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
sel.cond.set.val(DT, x=="a", v, 10)
x y v
1: a 1 10
2: a 3 10
3: a 6 10
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
And it supports partial vector assignment (with a warning if 'val'-data overruns):
> sel.cond.set.val(DT, x=="a", v, 10:1)
x y v
1: a 1 10
2: a 3 9
3: a 6 8
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
Warning message:
In `[.data.table`(DT, x == "a", `:=`(v, 10:1)) :
Supplied 10 items to be assigned to 3 items of column 'v' (7 unused)
That warning message made it look like I was merely rearranging a few symbols in the function call.
I don't think that violates any of the data.table conventions that would make copies. My understanding of the conventions of dplyr are pretty weak. I've formed the apparently incorrect opinion that the goal of dplyr was to convert to a left-to-right passage of data through infix operators, but I guess there aren't filters that modify data.tables or dataframes through input. It's a bit unclear what sort of syntax would be desired. I guess this is further incentive to actually learn dplyr.
Reading the Intro blog at RStudio mades me wonder if dplyr is focused only on dataframes? ( I thought it was being designed to work with data.tables, but the intro made me think not.)
I tried coming up with a version that could detect DT vs DF
sel.cond.set.val <- function(dobj, cond, col, val) { if(is.data.table(dobj)){
eval.parent(substitute(dobj[cond, col := val])) } else{
eval.parent(substitute( dobj[ with(dobj, cond) , col ] <- val)) }}
But my understanding is that the use of `with` may pose hidden dangers that I don't fully understand.
--
David Winsemius
Alameda, CA, USA