Expanding each row across a range

40 views
Skip to first unread message

Andrew Joyner

unread,
Mar 28, 2015, 1:41:57 AM3/28/15
to manip...@googlegroups.com
I have row data that includes a range start and end, and I want to distribute a numeric value uniformly across the range by creating additional rows, one for each element in the range.

For example, the following approach gives me what I need...

df <- data.frame(x=c("a", "b", "c"), start=1:3, end=4:6, val=c(100, 200, 300))
plyr
::ddply(df, .(x), function (row) data.frame(n=row$start:row$end, val=row$val/length(row$start:row$end)))

The function creates a new data frame for each row and ddply bolts them all together. It seems ok, but is there a more natural way to do this with tidyr and dplyr that removes the $ syntax and the anonymous function?


Alain Content

unread,
Mar 28, 2015, 3:10:39 AM3/28/15
to Andrew Joyner, manip...@googlegroups.com
Hi, 
distributing the val across the range can be done with mutate, but generating the rows is a bit more tricky. Here is a proposal : 

d <- df %>% 
group_by(x) %>% 
mutate(val = val/(end + 1 - start)) %>%
do(., df=data_frame(x=.$x, n=.$start:.$end, val=.$val)) %>%  
select(df)

which does produce  a data frame of data frames :


> d$df[[1]]
Source: local data frame [4 x 3]

  x n val
1 a 1  25
2 a 2  25
3 a 3  25
4 a 4  25

alain 


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Andrew MacDonald

unread,
Mar 28, 2015, 10:25:40 AM3/28/15
to Alain Content, Andrew Joyner, manip...@googlegroups.com

This is something I’d really like to find a simple way to do! Here is a modification of Alain’s answer:


df %>% 
  group_by(x) %>% 
  mutate(val = val/(end + 1
 - start)) %>%
  do(with(., data_frame(x, n = start:end, val = val)))

If you don’t name the result of data_frame within do, then the output won’t be stored in a list.

What do we think about using with() inside do()? It certainly cleans up the syntax a bit in cases like this, where every variable was preceded by .$

D Holmes

unread,
Mar 28, 2015, 10:54:39 AM3/28/15
to manip...@googlegroups.com
Expensive, but there's always outer joins on a common variable.

> df <- data.frame(x=c("a", "b", "c"), start=1:3, end=4:6, val=c(100, 200, 300))
> test1=plyr::ddply(df, .(x), function (row) data.frame(n=row$start:row$end, val=row$val/length(row$start:row$end)))
> test2=full_join(mutate(df,com=1),data.frame(n=1:max(df$end),com=1)) %>% filter(n>=start & n<=end) %>% transmute(x,n,val=val/(end-start+1))
Joining by: "com"
> identical(test1,test2)
[1] TRUE

Interesting that "filter(between(n,start,end)) " didn't work -- it seems to want a scalar for the start and end arguments.  This might be a small enhancement for future dplyr.

Dennis Murphy

unread,
Mar 28, 2015, 10:57:20 AM3/28/15
to Andrew Joyner, manipulatr
Hi:

The rowwise() function can replace group_by() in this problem. The
following works for me:

library(dplyr)
df <- data.frame(x=c("a", "b", "c"), start=1:3, end=4:6, val=c(100, 200, 300))
df %>%
rowwise() %>%
do(with(., data.frame(x = x, n = seq(start, end), val = val)))

Dennis

Andrew Joyner

unread,
Mar 28, 2015, 7:00:39 PM3/28/15
to manip...@googlegroups.com, and...@alphajuliet.com
I'm liking this answer. Thanks for the reminder about rowwise(). As its help entry says: "It is also useful to support arbitrary complex operations that need to be applied to each row." Sounds right.

In reality, the problem I'm solving is about dividing efforts into bins across time intervals, and then summing the efforts in each bin, e.g. project resource demand by month. This has definitely helped to optimise my local issue and learn some more, but I suspect there's a better overall solution that I'll keep iterating towards. Thanks, all.
Reply all
Reply to author
Forward
0 new messages