Merge should do the trick. How to best use it will depend on what you
want to do with the data after.
The following is an example of what you could do. This will perform
best, if the rows are missing at random and do not cluster.
DF1 <- data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
DF2 <- data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
DFm <- merge(DF1, DF2, by=c("X.DATE", "X.TIME"), all=TRUE)
while(any(
is.na(DFm))){
if (any(
is.na(DFm[1,]))) stop("Complete first row required!")
ind <- which(
is.na(DFm), arr.ind=TRUE)
prind <- matrix(c(ind[,"row"]-1, ind[,"col"]), ncol=2)
DFm[
is.na(DFm)] <- DFm[prind]
}
DFm
Best,
Nello
-----Original Message-----
From:
r-help-...@r-project.org [mailto:
r-help-...@r-project.org]
On Behalf Of Adeel Amin
Sent: Donnerstag, 23. Mai 2013 07:01
To:
r-h...@r-project.org
Subject: [R] adding rows without loops
I'm comparing a variety of datasets with over 4M rows. I've solved this
problem 5 different ways using a for/while loop but the processing time
is murder (over 8 hours doing this row by row per data set). As such
I'm trying to find whether this solution is possible without a loop or
one in which the processing time is much faster.
Each dataset is a time series as such:
DF1:
X.DATE X.TIME VALUE VALUE2
1 01052007 0200 37 29
2 01052007 0300 42 24
3 01052007 0400 45 28
4 01052007 0500 45 27
5 01052007 0700 45 35
6 01052007 0800 42 32
7 01052007 0900 45 32
...
...
...
n
DF2
X.DATE X.TIME VALUE VALUE2
1 01052007 0200 37 29
2 01052007 0300 42 24
3 01052007 0400 45 28
4 01052007 0500 45 27
5 01052007 0600 45 35
6 01052007 0700 42 32
7 01052007 0800 45 32
...
...