dcast without an aggregation function

1,275 views
Skip to first unread message

Stuart Luppescu

unread,
Feb 5, 2014, 1:19:28 PM2/5/14
to manip...@googlegroups.com
Hello all,
I have a dataset of teacher ratings by evaluators on multiple occasions. In the dataset as shown below, tid is the teacher ID, eid is the evaluator ID, obsorder is the number of the rating occasion, and d2c1 is the actual rating.

 str(one.comp)
'data.frame': 2206 obs. of  4 variables:
 $ tid     : int  2116 2161 2223 4604 6115 6940 8050 8055 8089 8287 ...
 $ eid     : int  208138 4938 63430 211991 78628 100902 148061 19854 201788 54338 ...
 $ obsorder: int  1 2 1 1 1 1 1 1 1 1 ...
 $ d2c1    : int  4 3 2 3 2 4 3 3 3 4 ...

 head(one.comp, n=30)
     tid    eid obsorder d2c1
1   2116 208138        1    4
2   2161   4938        2    3
3   2223  63430        1    2
4   4604 211991        1    3
5   6115  78628        1    2
6   6940 100902        1    4
7   8050 148061        1    3
8   8055  19854        1    3
9   8089 201788        1    3
10  8287  54338        1    4
11  8397  49192        1    4
12  8431 121022        1    3
13  9350 111161        2    2
14  9928  54338        1    3
15 11322  48257        1    4
16 11453  50795        1    2
17 12109 132008        1    3
18 12906  21450        1    3
19 13112  50794        1    3
20 13135 119846        1    3
21 13143   5683        1    3
22 13261  52560        1    3
23 13491  47883        1    2
24 14647 143752        2    2
25 14734   4997        1    4
26 15075 221355        1    3
27 15613 144378        3    4
28 17044 103474        1    2
29 17058  24828        1    3
30 17070 136494        1    1

I am sure that there is only one rating for each tid, obsorder, eid combination. But when I run dcast on this like this

 one.comp.w <- dcast(one.comp, tid + obsorder ~ eid, value.var="d2c1" )
Aggregation function missing: defaulting to length

it puts in the number of ratings instead of the actual rating, which is what I want. Any help?

Also, I have about 1100 eids, but in each row, only two columns should have actual data in them. I want the other columns to be missing. I tried fill=NA, but that just gave me errors like this:
Error in vapply(indices, fun, .default) : values must be type 'logical',
 but FUN(X[[22]]) result is type 'integer'
Without fill= other columns have 0's in them. Is this because of the use of length as the aggregation function?  I would really like to have NAs. I know I can  change 0's to NAs post hoc but it would be more convenient not to have to.

Thanks in advance.

Ista Zahn

unread,
Feb 5, 2014, 2:43:17 PM2/5/14
to Stuart Luppescu, manip...@googlegroups.com
tid, obsorder and eid do not uniquely identify the observations. You
need to correct your id variables before casting.

best,
Ista
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/groups/opt_out.

Stuart Luppescu

unread,
Feb 5, 2014, 5:05:49 PM2/5/14
to manip...@googlegroups.com, Stuart Luppescu
Well, dang. I was sure they were unique, but about 0.1 percent of the observations had duplicated IDs. Thanks, Ista.
Reply all
Reply to author
Forward
0 new messages