Re: dplyr: mutate with lag and group_by creates only one NA

1,171 views
Skip to first unread message
Message has been deleted

Hadley Wickham

unread,
Jan 21, 2014, 5:56:57 PM1/21/14
to Vincent, manipulatr
I get what I expect from dplyr:

Source: local data frame [12 x 3]
Groups: name

name time lag_time
1 John 1 NA
2 John 2 1
3 John 3 2
4 Pete 1 NA
5 Pete 2 1
6 Pete 3 2
7 Pete 4 3
8 Rob 1 NA
9 Rob 2 1
10 Rob 3 2
11 Rob 4 3
12 Rob 5 4

But I modified your code slightly to make sure there was no clash
between plyr and dplyr:

library(dplyr)
name <- rep(c("John","Pete","Rob"), c(3,4,5))
time <- c(1:3,1:4,1:5)
dat <- data.frame(name = name, time = time)

# ddply
lag1 <- function(x) c(NA,x[-length(x)])
plyr::ddply(dat, "name", plyr::mutate, lag_time = lag1(time))

# dplyr
dat <- group_by(dat, name)
mutate(dat, lag_time = lag(time))

I'm also running the dev version of dplyr, so it's possible we fixed a bug.

Hadley

On Tue, Jan 21, 2014 at 4:44 PM, Vincent <vincen...@gmail.com> wrote:
> What I am looking for is the output that ddply creates below. When using
> dplyr values in lag_time 'spill' across names/groups. Is that the intended
> behavior?
>
> # data
> name <- rep(c("John","Pete","Rob"), c(3,4,5))
> time <- c(1:3,1:4,1:5)
> dat <- data.frame(name = name, time = time)
>
> # ddply
> lag1 <- function(x) c(NA,x[-length(x)])
> ddply(dat, .(name), mutate, lag_time = lag1(time))
>
> # dplyr
> dat <- group_by(dat, name)
> mutate(dat, lag_time = lag(time))
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/groups/opt_out.



--
http://had.co.nz/

ahmadou dicko

unread,
Jan 21, 2014, 5:57:44 PM1/21/14
to manip...@googlegroups.com
In the document of the dplyr package it's stated that plyr and dplyr don't play well together specially when you have two function with the same name (mutate, arrange, etc).
In this example, you have plyr::mutate and dplyr::mutate make sure to add namespace (::) when  two functions have the same name.
Try this :

library(plyr)
library(dplyr)

## data
name <- rep(c("John","Pete","Rob"), c(3,4,5))
time <- c(1:3,1:4,1:5)
dat <- data.frame(name = name, time = time)

## ddply
lag1 <- function(x) c(NA,x[-length(x)])
ddply(dat, .(name), plyr::mutate, lag_time = lag1(time))

## dplyr
dat <- group_by(dat, name)
dplyr::mutate(dat, lag_time = lag(time))

Vincent

unread,
Jan 21, 2014, 6:01:07 PM1/21/14
to manip...@googlegroups.com
My mistake. This happened because plyr was loaded after dplyr.

Huiming Xia

unread,
Aug 27, 2015, 5:34:56 PM8/27/15
to manipulatr, vincen...@gmail.com
I have additional question with that. Here is my code. I change from 3,4,5 to 2 for each group.
I can not get what I expected for the bottom codes. I even can not get properly grouped. What's wrong?

name <- rep(c("John","Pete","Rob"), c(3,4,5)) 
time <- c(1:3,1:4,1:5) 
dat <- data.frame(name = name, time = time) 
dat <- group_by(dat, name) 
mutate(dat, lag_time = lag(time)) 

name1 <- rep(c("John","Pete","Rob"), 2) 
time1 <- c(1,1.1,1.4,1.8,2,5) 
dat1 <- data.frame(name = name1, time = time1) 
dat1 <- group_by(dat1,name) 
mutate(dat1, lag_time = lag(time))

Hadley Wickham

unread,
Aug 30, 2015, 6:20:33 PM8/30/15
to Huiming Xia, manipulatr, Vincent Nijs
Not sure what you expect, but it looks ok to me in the dev version:

data.frame(name = rep(
c("John","Pete","Rob"), 2),
time = c(1,1.1,1.4,1.8,2,5)
) %>%
group_by(name) %>%
mutate(lag_time = lag(time))
#> Source: local data frame [6 x 3]
#> Groups: name [3]
#>
#> name time lag_time
#> (fctr) (dbl) (dbl)
#> 1 John 1.0 NA
#> 2 Pete 1.1 NA
#> 3 Rob 1.4 NA
#> 4 John 1.8 1.0
#> 5 Pete 2.0 1.1
#> 6 Rob 5.0 1.4
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Huiming Xia

unread,
Aug 31, 2015, 10:41:37 AM8/31/15
to Hadley Wickham, manipulatr, Vincent Nijs
Thanks for the reply! My understanding is that  
group_by(name) %>% mutate(lag_time = lag(time))  have the data group by name first, and then calculate the lag. 
e.g. the data grouped name first
1  John    1       
2  John    2        
3  John    3        
-------------------------------
4  Pete    1      
5  Pete    2        
6  Pete    3        
7  Pete    4        
--------------------------
8   Rob    1      
9   Rob    2        
10  Rob    3        
11  Rob    4        
12  Rob    5        
and then calculate the lag winin each group.

However, in the n=2 case, I did not see this happen. I expected to see
John  1.0   NA
John  1.8   0.8
-------------------------
Pete  1.1   NA
Pete  2.0   0.9
--------------------------
Rob   1.4   NA
Rob    5    3.6

However, i get this:
  name time lag_time
1 John  1.0       NA
2 Pete  1.1       NA
3  Rob  1.4       NA
--------------------------------
4 John  1.8      1.0
5 Pete  2.0      1.1
6  Rob  5.0      1.4

really confusing.




Reply all
Reply to author
Forward
0 new messages