But I didn't realise I accidentally took my discussion with Steve off-list.
Here's a quick example of why you can't guarantee the same order in general:
library(plyr)
df <- data.frame(x = c("a", "b", "b", "a"), y = 1:4)
# It would be possible if the aggregation function returned the
# same number of rows
ddply(df, "x", identity)
# But not if it returned fewer rows:
ddply(df, "x", head, 1)
# Or more rows
ddply(df, "x", function(df) df[c(1,2,1), ])
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to
> manipulatr+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/manipulatr?hl=en.
>
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
I missed it...
>
> But I didn't realise I accidentally took my discussion with Steve off-list.
> Here's a quick example of why you can't guarantee the same order in general:
got it.
I want something like this, but it does not look elegant...
ddply2 <- function (.data, .variables, .fun = NULL, ..., .progress = "none",
.drop = TRUE, .parallel = FALSE, .KEEP.ORDER = FALSE) {
.variables <- as.quoted(.variables)
pieces <- plyr:::splitter_d(.data, .variables, drop = .drop)
r <- ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.parallel = .parallel)
if(.KEEP.ORDER) {
stopifnot(nrow(r)==nrow(.data))
r <- r[order(unlist(pieces$index)), ]
rownames(r) <- NULL
r
} else {
r
}
}
> ddply2(df, "x", I)
x y
1 a 1
2 a 4
3 b 2
4 b 3
> ddply2(df, "x", I, .KEEP.ORDER=TRUE)
x y
1 a 1
2 b 2
3 b 3
4 a 4
> ddply2(df, "x", head, 1)
x y
1 a 1
2 b 2
> ddply2(df, "x", head, 1, .KEEP.ORDER=TRUE)
Error: nrow(r) == nrow(.data) is not TRUE
--
Kohske Takahashi <takahash...@gmail.com>
Research Center for Advanced Science and Technology,
The University of Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html
How about making your workaround into a separate function that can be
composed with ddply? This has an added advantage, that you could then
use it with other data frame functions that ordinarily disrupt order,
like merge().
keeping.order <- function(data, fn, ...) {
col <- as.character(gensym(envir=data))
data[,col] <- 1:nrow(data)
out <- fn(data, ...)
if (!col %in% colnames(out)) stop("Ordering column not preserved by function")
out <- out[order(out[,col]),]
out[,col] <- NULL
out
}
# Then,
d <- structure(list(g = c(2L, 2L, 1L, 1L, 2L, 2L), v = c(-1.90127112738315,
-1.20862680183042, -1.13913266070505, 0.14899803094742, -0.69427656843677,
0.872558638137971)), .Names = c("g", "v"), row.names = c(NA,
-6L), class = "data.frame")
ddply(d, .(g), mutate, v=scale(v)) #does not preserve order of d
keeping.order(d, ddply, .(g), mutate, v=scale(v)) #preserves order of d
# This has the advantage that you can apply it to other data frame
functions like merge.
names <- data.frame(g=c(1, 2), name = c("Thomas", "Jim"))
merge(d, names) #does not preserve order of d
keeping.order(d, merge, names) #preserves order of d
# Peter
kohske
--
Kohske Takahashi <takahash...@gmail.com>
Research Center for Advanced Science and Technology,
The University of Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html