Hi,
I'm not able to understand how "cast" handles "variables/column names" that are not found in the data.frame. Suppose I've a data.frame as follows:
set.seed(1)
DF <- data.frame(x = rep(1:5, each=5L), y=runif(25), z=1:25)
# first
dcast(DF, x ~ ., mean, value.var="y") # produces the right aggregation with column named "NA"
# second
dcast(DF, x ~ "mean.y", mean, value.var="y") # produces the right aggregation (same as "first"), but with column named "mean.y"
I'd have thought this "second" gives an error, but it does things intelligently. However, I don't understand what other use for this is, for example:
# third
dcast(DF, x ~ z+"mean", mean, value.var="y") # a weird result with just the first column having "_mean" and all others have "_NA"
So, I tested if "value.var" can also take unavailable arguments. That behaves differently depending on the presence/absence of aggregation functions and also on the formula:
# four
dcast(DF, x ~ z, mean, value.var="y") # generates 25 columns with NAs (and also with warnings)
# five
dcast(DF, x ~ z, value.var="bla") # ends up in the error below
Error in structure(ordered, dim = ns) :
dims [product 125] do not match the length of object [0]
# six
dcast(DF, x ~ "z", value.var="bla") # note the quote around z. This gives the result with out any error and it defaults to length
# seven
dcast(DF, x ~ k, value.var="y") # returns error shown below
Error in eval(expr, envir, enclos): object 'k' not found
# eight
dcast(DF, x ~ q, value.var="y") # returns another error
Error in unique.default(x) : unique() applies only to vectors
I'm sure there's something I'm missing here regarding the rules under which the results are right (and at times intelligently handled with column names), but at times results in weird results (with warnings) and at times end up with error. I'm not able to find out what's the rule that binds these all together. It'd be great if someone could help me wrap my head around this! :)
Thank you very much,
Arun.