greg.j...@weyerhaeuser.com
unread,Mar 21, 2009, 6:26:13 PM3/21/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Forest-R
I have often been stymied by aggregate using large amounts of memory
when more a one or two index variables are needed. I have found a
hack that works to avoid this issue. If any of you have a more
elegant way to do this, let me know.
lets say we have a large data.frame that we want to aggregate using 4
index variables. We tried the straightforward way but ran out of
memory. Here we want the mean of x for unique combinations of id,
plot, class, and species:
result <- aggregate( list(a.mean=df$x), list(id=df$id, plot=df$plot,
class=df$class, species=df$species), mean )
Alas, we ran out of memory. Try this hack instead:
# create a vector of a character combination of all index variables
i <- paste(df$id, df$plot, df$class, df$species, sep=",")
# aggregate using the hybrid index "i"
result <- aggregate( list(a.mean=df$x), list(i=i), mean )
# unpack the index
i2 <- matrix(unlist(strsplit(result$i, ",") ), ncol=4, byrow=T )
result$id <- i2[,1] # in this case "id" was
a character
result$plot <- as.numeric(i2[,2])
result$class <- as.numeric(i2[,3])
result$species <- i2[,4] # species was a character too
result$i <- NULL # clean up after ourselves
Now we have a data.frame "result" with id, plot, class, species, and
a.mean