Understanding how "cast" handles variables/column names that are not found in data.frame

Arun

unread,

Nov 27, 2013, 5:29:28 AM11/27/13

to manip...@googlegroups.com

Hi,

I'm not able to understand how "cast" handles "variables/column names" that are not found in the data.frame. Suppose I've a data.frame as follows:

set.seed(1)

DF <- data.frame(x = rep(1:5, each=5L), y=runif(25), z=1:25)

# first

dcast(DF, x ~ ., mean, value.var="y") # produces the right aggregation with column named "NA"

# second

dcast(DF, x ~ "mean.y", mean, value.var="y") # produces the right aggregation (same as "first"), but with column named "mean.y"

I'd have thought this "second" gives an error, but it does things intelligently. However, I don't understand what other use for this is, for example:

# third

dcast(DF, x ~ z+"mean", mean, value.var="y") # a weird result with just the first column having "_mean" and all others have "_NA"

So, I tested if "value.var" can also take unavailable arguments. That behaves differently depending on the presence/absence of aggregation functions and also on the formula:

# four

dcast(DF, x ~ z, mean, value.var="y") # generates 25 columns with NAs (and also with warnings)

# five

dcast(DF, x ~ z, value.var="bla") # ends up in the error below

Error in structure(ordered, dim = ns) :

dims [product 125] do not match the length of object [0]

# six

dcast(DF, x ~ "z", value.var="bla") # note the quote around z. This gives the result with out any error and it defaults to length

# seven

dcast(DF, x ~ k, value.var="y") # returns error shown below

Error in eval(expr, envir, enclos): object 'k' not found

# eight

dcast(DF, x ~ q, value.var="y") # returns another error

Error in unique.default(x) : unique() applies only to vectors

I'm sure there's something I'm missing here regarding the rules under which the results are right (and at times intelligently handled with column names), but at times results in weird results (with warnings) and at times end up with error. I'm not able to find out what's the rule that binds these all together. It'd be great if someone could help me wrap my head around this! :)

Thank you very much,

Arun.

Arun

unread,

Nov 27, 2013, 7:45:31 AM11/27/13

to manip...@googlegroups.com

Sorry the fourth one should be:

# four

dcast(DF, x ~ z, mean, value.var="bla") # generates 25 columns with NAs (and also with warnings)

Hadley Wickham

unread,

Nov 27, 2013, 9:24:36 AM11/27/13

to Arun, manipulatr

In my opinion, if the variable isn't in the data frame, dcast should
throw an error. The only reason for the current behaviour is that I
wasn't proactive about checking for problems when I wrote reshape2.

Hadley

> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/groups/opt_out.

--
http://had.co.nz/

Reply all

Reply to author

Forward