mutate a subset in dplyr

Juan Manuel Truppia

unread,

Aug 1, 2014, 3:25:48 PM8/1/14

to manip...@googlegroups.com

Hi, this sounds quite easy, but I couldn't find how to do it cleanly after searching the docs and the web.

I'm used to updating (or mutating) a subset of a tbl using data.table quite easily like this dt[cond_to_change == TRUE, col_to_change := val]

I tried doing something similar with dplyr, but had to resort to using ifelse, and it brings 2 problems

Sometimes the value can't be computed for that row (e.g., log of a neg number)
ifelse sometimes strips classes and is slow

Is there a dplyr way to do this cleanly? Something like a subset argument to mutate?

Hadley Wickham

unread,

Aug 1, 2014, 3:28:17 PM8/1/14

to Juan Manuel Truppia, manipulatr

It's not currently possible.

It's more likely that I'll have a custom version of ifelse() that's
fast (and works how you expect) than to have mutate_partial() or
similar.

Hadley

> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.

--
http://had.co.nz/

David Winsemius

unread,

Aug 2, 2014, 5:02:15 PM8/2/14

to Juan Manuel Truppia, manip...@googlegroups.com

On Aug 1, 2014, at 12:25 PM, Juan Manuel Truppia wrote:

> Hi, this sounds quite easy, but I couldn't find how to do it cleanly after searching the docs and the web.
>
> I'm used to updating (or mutating) a subset of a tbl using data.table quite easily like this dt[cond_to_change == TRUE, col_to_change := val]

The `==TRUE` looks superfluous.

If one were working with dataframes wouldn't this be equivalent to:

df[ with(df, cond_to_change) & !is.na(cond_to_change), col_to_change ] <- val

> I tried doing something similar with dplyr, but had to resort to using ifelse, and it brings 2 problems
> • Sometimes the value can't be computed for that row (e.g., log of a neg number)
> • ifelse sometimes strips classes and is slow
> Is there a dplyr way to do this cleanly? Something like a subset argument to mutate?

I remember a contribution by G.Grothendieck to SO in a question about implementing self-reference outside of data.table:

stackoverflow.com/questions/7768686/r-self-reference/7769296

... that suggestion led me to target subsets in what seemed to be an an elegant manner using an infix operator:

http://stackoverflow.com/questions/7768686/r-self-reference/7769296#7769296

So playing around to see if this sort of think worked with data.tables:

sel.cond.set.val <- function(dt, cond, target, val) {
eval.parent(substitute(dt[cond, target := val])) }

DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)

> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9

sel.cond.set.val(DT, x=="a", v, 10)
x y v
1: a 1 10
2: a 3 10
3: a 6 10
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9

And it supports partial vector assignment (with a warning if 'val'-data overruns):

> sel.cond.set.val(DT, x=="a", v, 10:1)
x y v
1: a 1 10
2: a 3 9
3: a 6 8
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
Warning message:
In `[.data.table`(DT, x == "a", `:=`(v, 10:1)) :
Supplied 10 items to be assigned to 3 items of column 'v' (7 unused)

That warning message made it look like I was merely rearranging a few symbols in the function call.

I don't think that violates any of the data.table conventions that would make copies. My understanding of the conventions of dplyr are pretty weak. I've formed the apparently incorrect opinion that the goal of dplyr was to convert to a left-to-right passage of data through infix operators, but I guess there aren't filters that modify data.tables or dataframes through input. It's a bit unclear what sort of syntax would be desired. I guess this is further incentive to actually learn dplyr.

Reading the Intro blog at RStudio mades me wonder if dplyr is focused only on dataframes? ( I thought it was being designed to work with data.tables, but the intro made me think not.)

I tried coming up with a version that could detect DT vs DF

sel.cond.set.val <- function(dobj, cond, col, val) { if(is.data.table(dobj)){
eval.parent(substitute(dobj[cond, col := val])) } else{
eval.parent(substitute( dobj[ with(dobj, cond) , col ] <- val)) }}

But my understanding is that the use of `with` may pose hidden dangers that I don't fully understand.

--

David Winsemius
Alameda, CA, USA

Reply all

Reply to author

Forward