Re: possible bug in reshape2::melt with numeric id.vars containing attributes.

215 views
Skip to first unread message

Hadley Wickham

unread,
Jul 27, 2012, 2:26:28 PM7/27/12
to Henrik Singmann, manip...@googlegroups.com
Hi Henrik,

I don't think it's the attributes that are a problem - it's that that
column is a matrix.

Hadley

On Fri, Jul 27, 2012 at 11:20 AM, Henrik Singmann <sing...@gmail.com> wrote:
> Hi all,
>
> today I noticed a strange behavior using reshape2. Namely that it gives an
> error while melting a data.frame with a numeric id.vars that has an
> attribute. This does not happen with a factor that as attributes. And it
> totally works with reshape.
> The following code illustrates the problem:
>
> require(reshape2)
>
> # wide data.frame with a factor, a numeric covariate (num) and two
> responses:
> df1 <- data.frame(condition = c("A", "B"), num = rnorm(2), x = rnorm(2), y =
> rnorm(2))
> str(df1)
> # 'data.frame': 2 obs. of 4 variables:
> # $ condition: Factor w/ 2 levels "A","B": 1 2
> # $ num : num 0.661 -0.404
> # $ x : num -0.26 0.181
> # $ y : num -0.478 0.599
>
> melt(df1, id.vars = c("condition", "num")) # works
> # condition num variable value
> # 1 A 0.6611 x -0.2600
> # 2 B -0.4044 x 0.1812
> # 3 A 0.6611 y -0.4779
> # 4 B -0.4044 y 0.5987
>
> # add attributes to the covariate
> df2 <- within(df1, num <- scale(num))
> str(df2)
> # 'data.frame': 2 obs. of 4 variables:
> # $ condition: Factor w/ 2 levels "A","B": 1 2
> # $ num : num [1:2, 1] 0.707 -0.707
> # ..- attr(*, "scaled:center")= num 0.128
> # ..- attr(*, "scaled:scale")= num 0.753
> # $ x : num -0.26 0.181
> # $ y : num -0.478 0.599
>
> melt(df2, id.vars = c("condition", "num")) # gives an error
> # Fehler in data.frame(ids, variable, value, stringsAsFactors = FALSE) :
> # arguments imply differing number of rows: 2, 4
>
> # remove attributes, works again:
> df3 <- within(df2, num <- factor(num))
> str(df3)
> # 'data.frame': 2 obs. of 4 variables:
> # $ condition: Factor w/ 2 levels "A","B": 1 2
> # $ num : Factor w/ 2 levels "-0.707106781186548",..: 2 1
> # $ x : num -0.26 0.181
> # $ y : num -0.478 0.599
>
> melt(df3, id.vars = c("condition", "num")) #works
> # condition num variable value
> # 1 A 0.707106781186548 x -0.2600
> # 2 B -0.707106781186548 x 0.1812
> # 3 A 0.707106781186548 y -0.4779
> # 4 B -0.707106781186548 y 0.5987
>
> # add attributes to the factor:
> contrasts(df3$condition) <- contr.sum
> str(df3)
> #'data.frame': 2 obs. of 4 variables:
> # $ condition: Factor w/ 2 levels "A","B": 1 2
> # ..- attr(*, "contrasts")= num [1:2, 1] 1 -1
> # .. ..- attr(*, "dimnames")=List of 2
> # .. .. ..$ : chr "A" "B"
> # .. .. ..$ : NULL
> # $ num : Factor w/ 2 levels "-0.707106781186548",..: 2 1
> # $ x : num -0.26 0.181
> # $ y : num -0.478 0.599
>
> melt(df3, id.vars = c("condition", "num")) # works
> # condition num variable value
> # 1 A 0.707106781186548 x -0.2600
> # 2 B -0.707106781186548 x 0.1812
> # 3 A 0.707106781186548 y -0.4779
> # 4 B -0.707106781186548 y 0.5987
>
> As said, everything works with reshape and it does not seem to be intended
> behavior. Or am I wrong?
>
> Cheers,
> Henrik
>
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/manipulatr/-/pD1N0A5ZwWgJ.
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to
> manipulatr+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/manipulatr?hl=en.



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Henrik Singmann

unread,
Jul 27, 2012, 2:39:23 PM7/27/12
to manip...@googlegroups.com, Henrik Singmann, had...@rice.edu
Hi Hadley,

I see, you are completely right. So it is actually the fault of scale() returning the matrix (and me not seeing that the variable is a matrix and not reading the scale documentation which explicitly says so).

Removing the dim attributes does work.

df2 <- within(df1, num <- (scale(num)) 

dim(df2$num) <- NULL
str(df2)
#data.frame':   2 obs. of  4 variables:

# $ condition: Factor w/ 2 levels "A","B": 1 2
# $ num      : atomic  0.707 -0.707
#  ..- attr(*, "scaled:center")= num -0.444
#  ..- attr(*, "scaled:scale")= num 0.427
# $ x        : num  -1.18 -1
# $ y        : num  -0.896 -2.018


melt(df2, id.vars = c("condition", "num"))

#   condition     num variable   value
# 1         A  0.7071        x -1.1831
# 2         B -0.7071        x -1.0041
# 3         A  0.7071        y -0.8964
# 4         B -0.7071        y -2.0183


That reshape::melt works with the matrix somehow made me believe that it is the right thing to behave so.

Thanks,
Henrik
> manipulatr+unsubscribe@googlegroups.com.

Henrik Singmann

unread,
Jul 27, 2012, 3:01:13 PM7/27/12
to manip...@googlegroups.com
Hi Hadley,

please ignore my previous mail (I had reshape instead of reshape2 loaded). It does no work if it is an atomic vector with attributes (as shown in the previous mail), it only works if the variable is simply numeric.
Note furthermore that it is no problem if the matrix is a measure.vars instead of a id.vars.

require(reshape2)

df1 <- data.frame(condition = c("A", "B"), num = rnorm(2), x = rnorm(2), y = rnorm(2))
df2 <- within(df1, num <- (scale(num))) 
dim(df2$num) <- NULL

str(df2)
#'data.frame':   2 obs. of  4 variables:
# $ condition: Factor w/ 2 levels "A","B": 1 2
# $ num      : atomic  -0.707 0.707
#  ..- attr(*, "scaled:center")= num 0.172
#  ..- attr(*, "scaled:scale")= num 0.833
# $ x        : num  0.00485 -0.65703
# $ y        : num  -0.663 -0.567

melt(df2, id.vars = c("condition", "num"))  # gives error

attributes(df2$num) <- NULL

str(df2)
#'data.frame':   2 obs. of  4 variables:
# $ condition: Factor w/ 2 levels "A","B": 1 2
# $ num      : num  -0.707 0.707
# $ x        : num  0.00485 -0.65703
# $ y        : num  -0.663 -0.567

melt(df2, id.vars = c("condition", "num"))  # no error

df2 <- within(df1, num <- (scale(num))) #num is matrix again
melt(df2, id.vars = c("condition")) # no error
#   condition variable     value
# 1         A      num -0.707107
# 2         B      num  0.707107
# 3         A        x  0.004852
# 4         B        x -0.657027
# 5         A        y -0.663203
# 6         B        y -0.566991

I know it is not really a bug, but whatever, now at least you know.

Cheers,
Henrik


2012/7/27 Henrik Singmann <sing...@gmail.com>
Reply all
Reply to author
Forward
0 new messages