reshape: Getting NAs to appear in cast

773 views
Skip to first unread message

LeonB

unread,
May 28, 2010, 12:22:52 AM5/28/10
to manipulatr
Hi all,

It's a while since I needed this behaviour, but cast now seems do
things differently from what was documented in the JSS introductory
article.

Specifically, using the french_fries data set:

library(reshape)
# check no observations at time 10 for subject 3:
subset(french_fries, subject ==3 & time==10)

ff_d <- melt(french_fries, id=1:4, na.rm=TRUE)
cast(ff_d, subject ~ time, length) # generates "0" instead of NAs
shown at top of p.17

# BTW, above output not generated by command at bottom of p. 16
# but I guess you already knew that

cast(ff_d, subject ~ time, length, fill=0) # supposed to replace NAs
with zeroes?

cast(ff_d, subject ~ time, function(x) 30 - length(x)) # now get "30"
instead of NA

cast(ff_d, subject ~ time, function(x) 30 - length(x), fill=30) #
supposed to replace NAs with "30"?

I'm pretty sure that originally (i.e. a few versions of R ago) cast()
did what it says on p. 17 of the JSS article, i.e. generate NAs where
there were "missing cells", and the "fill" argument could be used to
substitute a value for these NAs.

Has something changed in the meantime? Is there a way to get cast() to
behave in the way it did originally?

Cheers,

Leon

Dennis Murphy

unread,
May 28, 2010, 9:38:20 AM5/28/10
to LeonB, manipulatr
Hi:

On Thu, May 27, 2010 at 9:22 PM, LeonB <Leon.B...@utas.edu.au> wrote:
Hi all,

It's a while since I needed this behaviour, but cast now seems do
things differently from what was documented in the JSS introductory
article.

Specifically, using the french_fries data set:

library(reshape)
# check no observations at time 10 for subject 3:
subset(french_fries, subject ==3 & time==10)

ff_d <- melt(french_fries, id=1:4, na.rm=TRUE)
cast(ff_d, subject ~ time, length) # generates "0" instead of NAs
shown at top of p.17

As default behavior, this makes perfect sense to me - if there are no
observations in a given cell, the length is 0, not NA.

# BTW, above output not generated by command at bottom of p. 16
# but I guess you already knew that

cast(ff_d, subject ~ time, length, fill=0) # supposed to replace NAs
with zeroes?

If you want to replace the default zeros with NA, substitute fill = NA in
the above statement.

cast(ff_d, subject ~ time, function(x) 30 - length(x)) # now get "30"
instead of NA

I can understand if you find this perplexing (I imagine you expected NA here,
too), but this statement is asking for the number of missing values per cell,
where 30 is the expected number of cell replicates to achieve full replication.
This also makes sense as default behavior - a value of 30 means that no
observations were taken in that cell (all missing), just as zero would represent no
missing values. Many users (present company included) would expect NA in
this context to mean 'can't be determined' or 'unobserved', which is not the
case here. An empty cell informs us that there are 0 observations in that cell,
or equivalently, that 30 are missing.


cast(ff_d, subject ~ time, function(x) 30 - length(x), fill=30)  #
supposed to replace NAs with "30"? 

You can always try this:
 
> cc <- cast(ff_d, subject ~ time, function(x) 30 - length(x))
> cc[cc == 30L] <- NA
> cc
   subject 1 2 3 4 5 6 7 8  9 10
1        3 0 0 0 0 0 0 0 0  0 NA
2       10 0 0 0 0 0 0 0 0  0  0
3       15 0 0 0 0 5 0 0 0  0  0
4       16 0 0 0 0 0 0 0 1  0  0
5       19 0 0 0 0 0 0 0 0  0  0
6       31 0 0 0 0 0 0 0 0 NA  0
7       51 0 0 0 0 0 0 0 0  0  0
8       52 0 0 0 0 0 0 0 0  0  0
9       63 0 0 0 0 0 0 0 0  0  0
10      78 0 0 0 0 0 0 0 0  0  0
11      79 0 0 0 0 0 0 1 2  0 NA
12      86 0 0 0 0 0 0 0 0 NA  0

HTH,
Dennis



I'm pretty sure that originally (i.e. a few versions of R ago) cast()
did what it says on p. 17 of the JSS article, i.e. generate NAs where
there were "missing cells", and the "fill" argument could be used to
substitute a value for these NAs.

Has something changed in the meantime? Is there a way to get cast() to
behave in the way it did originally?

Cheers,

Leon

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To post to this group, send email to manip...@googlegroups.com.
To unsubscribe from this group, send email to manipulatr+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en.


Hadley Wickham

unread,
May 28, 2010, 9:45:12 AM5/28/10
to Dennis Murphy, LeonB, manipulatr
> I can understand if you find this perplexing (I imagine you expected NA
> here,
> too), but this statement is asking for the number of missing values per
> cell,
> where 30 is the expected number of cell replicates to achieve full
> replication.
> This also makes sense as default behavior - a value of 30 means that no
> observations were taken in that cell (all missing), just as zero would
> represent no
> missing values. Many users (present company included) would expect NA in
> this context to mean 'can't be determined' or 'unobserved', which is not the
> case here. An empty cell informs us that there are 0 observations in that
> cell,
> or equivalently, that 30 are missing.

Yes, this is the reasoning behind the new(er) behaviour. Instead of
filling with NAs by default, I fill with the results of calling the
function on a zero-length vector.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages