missing data issue?

1,227 views
Skip to first unread message

doug

unread,
Jun 4, 2012, 2:53:03 PM6/4/12
to ez...@googlegroups.com
I am seeing a behavior with ezANOVA that I don't understand.
It may be because I am using it wrong, please advise.

I am trying to identify active compounds that show a significant difference when compared to a negative control compound.
I treated cells with 4 different doses of each of these compounds and then measured the expression of a gene.
My data looks like this where DV is a gene expression value:

  WID  PERT_ID DV      PERT_DOSE
1 X1_A09  CMPD_1  0.3979  10.00
2 X1_A10  CMPD_1  0.0806   2.50
3 X1_A19  CMPD_2  4.8671  10.00
4 X1_A20  CMPD_2  4.2285   2.50
5 X1_B09  CMPD_1  0.7117   0.63
6 X1_B10  CMPD_1  0.6214   0.16
...

ezPrecis gives:
             type missing values         min          max
WID        factor       0     96      X1_A09          X3_N16
PERT_ID    factor       0      2      CMPD_1          CMPD_2
DV        numeric       0     94       -1.9671        5.1095
PERT_DOSE numeric       0      4        0.16          10

ezDesign gives this indicating a balance design:

, but when I run ezANOVA, I get:
"Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within,  : 
One or more cells is missing data. Try using ezDesign() to check your data."

So, I don't understand why ezANOVA reports missing data while ezDesign indicates that all groups have the same number of measurements.


ezANOVA(
data = df, 
wid = .(WID), 
dv = .(DV), 
between = .(PERT_ID), 
within = .(PERT_DOSE) 
)



R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ez_3.0-1         stringr_0.6      scales_0.2.1     reshape2_1.2.1  
 [5] RCurl_1.91-1     bitops_1.0-4.1   plyr_1.7.1       memoise_0.1     
 [9] mgcv_1.7-16      lme4_0.999375-42 Matrix_1.0-6     lattice_0.20-6  
[13] ggplot2_0.9.1    car_2.0-12       nnet_7.3-1       MASS_7.3-17     

laurie

unread,
Sep 12, 2013, 7:24:43 AM9/12/13
to ez...@googlegroups.com
did you find a solution to your problem ?
i am having the same problem here...

i get the "One or more cells is missing data. Try using ezDesign() to check your data." error, but i tried running ezDesign on all the data and on all the data with NaN responses removed, and ezPrecis as well, and i didn't see any missing data.

it is driving me nuts. i just want to do a mixed model n-way ANOVA and have been struggling for two days now with this dataset. first i tried running the mixed ANOVA on Matlab using anovan which is my usual route for this test, but the p-values were all NaNs except for Subjects which was 0. Degrees of freedom for the error were satisfactory so i couldn't see what the "real" problem could be I tried asking a question on the Mathworks Q&A but received no answer, so i assumed i must have made a mistake specifying the model and decided to give it a go on R. i tried using aov() which returned weird results. because it is tricky to specify a model using aov() and i was reading a lot of conflicting information on how to do this on the forums, i once again assumed my formula was wrong and installed the ez package. This is were i got the "One or more cells is missing data. Try using ezDesign() to check your data.". i tried aggregating the dataset myself before running ezANOVA and still got a "missing data" error even thought there was no missing data.

Here is a sample of the dataset :

...
1122   5  2039 Caucasian   F       N     1450
1123   5  2306   Chinese   M       A     1450
1124   5  6690   Chinese   F       N     1450
1125   5   869   Chinese   F       A     1450
1126   5  1139   Chinese   F       A     1450
1127   5  2861   Chinese   M       A     1450
1128   5  1137   Chinese   F       N     1450
1129   5  1367   Chinese   M       A     1450
1130   5  1877   Chinese   F       N     1450
1131   5  2569   Chinese   F       A     1450
1132   5  1599   Chinese   F       H     1450
1133   5  1623   Chinese   M       H     1450
1134   5  1619   Chinese   M       N     1450
1135   5  7279   Chinese   F       H     1450
1136   5  1312   Chinese   F       H     1450
1137   5  2167   Chinese   M       H     1450
1138   7  1703   Chinese   F       N     1302
1139   7  2087   Chinese   F       H     1302
1140   7   930   Chinese   M       A     1302
1141   7  1044   Chinese   M       H     1302
1142   7   862   Chinese   F       H     1302
1143   7  1610   Chinese   M       H     1302
1144   7  1235   Chinese   M       H     1302
1145   7  3410   Chinese   F       H     1302
1146   7  2152   Chinese   M       A     1302
1147   7  1472   Chinese   F       N     1302
1148   7  1227   Chinese   M       H     1302
1149   7  1301   Chinese   F       H     1302
1150   7   801   Chinese   F       H     1302
1151   7   782   Chinese   M       N     1302

...
there is a total of 5770 rows in the dataset

here is how i specified the ANOVA :

model_full = ezANOVA(
    data = data # name of the above dataframe
    , dv = log(RTs) # log of column number 2
    , wid = Subjects # column number 6. there are 63 subjects in total (in 4 age groups)
    , within = .(Race,Sex,Emotion) # column number 3 (2 possible values) ,4 (2 possible values) and 5 (3 possible values). Those are properties of the stimulus, not the subject, and they are manipulated within subject.
    , between = Age # column number 1 : Subjects are from 4 different age groups labeled 5, 7, 9 and 11. the number of subjects is not the same for all age groups but i don't see why this could cause the error
    , detailed = TRUE
    , return_aov = TRUE
    , observed = Age # i tried without this option as well with no effect
)


i am now at a loss.... don't know what to try next... any help would be highly appreciated.

thank you.

laurie

laurie bayet

unread,
Sep 12, 2013, 10:19:03 AM9/12/13
to ez...@googlegroups.com
Actually i ran ezDesign() with x = Subjects and did spot missing data (of course ><)! It did not show when using ezDesign without "Subjects" as a grouping factor! For some reason Matlab's anovan had no problem with that, which is not very reassuring, but okay...

Anyway unfortunately my ANOVA troubles are not over yet.

The "problem child" was removed from the dataset. However the ANOVA i ran in Matlab with anovan still returned NaN p-values for all main effects (except "Subjects") and for all interactive effects except those that include "Age" (the between-subjects variable). So i went back to R, aov() and ezANOVA()...

i ran both ezANOVA on non aggregated data, and aov on aggregated data :

model_full = ezANOVA( # same as before
    data = data
    , dv = log(RTs)
    , wid = Subjects
    , within = .(Race,Sex,Emotion)
    , between = Age

    , detailed = TRUE
    , return_aov = TRUE
    , observed = Age
)

and

data_ag<-aggregate(cbind(log(RTs), Age) ~ Emotion + Sex + Race + Subjects, data = data, FUN = "mean") # replaces RTs by log(RTs) and aggregate the resulting data
names(data_ag)[names(data_ag)=="log(RTs)"] <- "logRTs" # change column name to avoid confusion
model_full = aov(logRTs ~ Age * Race * Sex * Emotion  + Error(Subjects/(Race * Sex * Emotion)),data=data_ag)
# Age is the only between subjects factor


i get
"Estimated effects may be unbalanced" both in the $aov output from ezANOVA and in the aov() output, so I don't know if i can trust the results?

besides, the output from ezANOVA() on unaggregated data and aov() on aggregated data are different
(sometimes drastically different, like one factor switching from highly significant to not at all)

what exactly is different between the two computations? i know the functions have different names so they should be doing something different... but that doesn't tell me what they do differently, as both perform a mixed model ANOVA?

Is it from the "type" option used in ezANOVA? i know from the ezANOVA ouput that i should be careful about that option since group sizes are unequal, but it doesn't help because i don't know how to choose...


does it mean something is wrong with my data?
it seems there is since i still get NaNs in the Matlab's output... did i make a mistake in aggregating the dataset?

it is quite unsettling to not know which formula or output to trust, if any, and because i don't know much about stats i like being able to rely on trusted functions... any help would be highly appreciated.

thank you.

cheers,
laurie


2013/9/12 laurie <lauri...@gmail.com>

--
You received this message because you are subscribed to a topic in the Google Groups "ez4r" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ez4r/kLnA_EomQM8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ez4r+uns...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bill Altermatt

unread,
Sep 12, 2013, 10:52:28 AM9/12/13
to ez...@googlegroups.com
This doesn't apply directly to Laurie's question, but I thought I'd share a function I wrote that fixes the special problem of missing data in repeated-measures designs.  The problem arises because you can't simply delete rows with NAs.  If a subject responded at time 1 and time 2 but not time 3, there will be no row for time 3 and so no row to delete.  The function below is based on a solution by Michael Lawrence that he wrote in a forum entry that I can't locate anymore.  For arguments, the function accepts a dataset, subject ID (wid), the dependent variable that you'll be using in ezANOVA, and the number of time points that you expect.  Any subjects who don't have data for that number of time points will be cut.  The function returns the trimmed dataset.  Still on my todo list is adding a warning that alerts the user about how many subjects have been cut.

repeated <- function(dataset,subjID,DV,timePoints) { #SubjID and DV must be entered in quotes
temp1 <- dataset[!is.na(dataset[[DV]]),] # removes all rows that contain an NA for that DV
temp2 <- as.data.frame(table(temp1[subjID])) # Counts how many times each participant shows up in the resulting data.
getRid <- temp2$Var1[temp2$Freq<timePoints] # gives you the subjIDs of any participants without enough data
newdata <- temp1[!(temp1[[subjID]]%in%getRid),] #Removes from temp1 any Ss in getRid
return(newdata)
}



--
You received this message because you are subscribed to the Google Groups "ez4r" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ez4r+uns...@googlegroups.com.

laurie bayet

unread,
Sep 12, 2013, 11:16:34 AM9/12/13
to ez...@googlegroups.com
thanks :)

don't know if that would help diagnostic my problem, but here are the outputs from ezANOVA and aov. They disagree on the signifiance of 3 effects (in bold face).

What bugs me is that this is not a case of one function simply being more conservative/liberal than the other, because the "order" of signifiance is changed from one function to the other.
Does it indicates i screwed something up?

#$ANOVA output from ezANOVA (with ges removed for clarity)

                         Effect DFn DFd          SSn      SSd            F                     p p<.05         
                  (Intercept)   1  58 1.205175e+09 93346318 748.82576786 7.529975e-35 *
                            Age   3  58 2.487183e+07 93346318   5.15130487   3.167467e-03 *
                          Race   1  58 1.104389e+06  5224330  12.26081498   8.970748e-04*
                             Sex   1  58 1.172192e+06 13144614   5.17224219   2.666682e-02*
                       Emotion 2 116 5.925139e+05 11629696   2.95500460  5.601189e-02 # sign. with aov
                   Age:Race   3  58 6.069952e+05  5224330   2.24626718    9.249314e-02      
                      Age:Sex   3  58 1.529862e+06 13144614   2.25014822   9.206671e-02      
               Age:Emotion   6 116 8.177315e+05 11629696   1.35940568  2.368739e-01      
                    Race:Sex   1  58 4.983205e+03  5303132   0.05450098   8.162305e-01      
             Race:Emotion   2 116 1.005683e+05  6190050   0.94231269  3.926880e-01      
                 Sex:Emotion  2 116 1.673415e+06  6789450  14.29542272 2.820961e-06* 
              Age:Race:Sex  3   58 2.587389e+05  5303132   0.94327003  4.257275e-01      
       Age:Race:Emotion 6 116 5.743518e+05  6190050  1.79386831 1.062807e-01 # sign. with aov
         Age:Sex:Emotion 6 116 9.646228e+05  6789450 2.74681678 1.562913e-02* #not sig.with aov
       Race:Sex:Emotion   2 116 6.753362e+04  9251012   0.42340770  6.558195e-01 
Age:Race:Sex:Emotion   6 116 4.458569e+05  9251012   0.93177918  4.750299e-01   


# output from aov (stratums regrouped for clarity)

                                   Df Sum Sq Mean Sq F value   Pr(>F)   
                            Age   1  12.43  12.432   20.68      2.69e-05***                  
                          Race   1 0.6101  0.6101  16.400    0.000149***
                             Sex   1  0.382  0.3819   6.262     0.0151*              
                      Emotion   2  0.255 0.12737   4.801   0.00987** # only marginal with ezANOVA
                   Age:Race   1 0.0144  0.0144   0.387     0.536095
                      Age:Sex   1  0.032  0.0320   0.524      0.4720
               Age:Emotion   2  0.001 0.00045   0.017     0.98328  
                    Race:Sex   1 0.0206 0.020557  0.782   0.380
             Race:Emotion   2 0.0391 0.01953   1.113    0.3319
                Sex:Emotion   2 0.6879  0.3439  15.882   7.6e-07***
             Age:Race:Sex   1 0.0014 0.001408  0.054   0.818
      Age:Race:Emotion   2 0.1548 0.07739   4.410  0.0142*   # not even marginal with ezANOVA
         Age:Sex:Emotion   2 0.0646  0.0323   1.492   0.229     # significant with ezANOVA
       Race:Sex:Emotion   2 0.0999 0.04993   2.020    0.137
Age:Race:Sex:Emotion   2 0.0555 0.02777   1.123    0.329


Thank you !

Cheers,
laurie


2013/9/12 Bill Altermatt <alter...@gmail.com>

Mike Lawrence

unread,
Sep 12, 2013, 11:49:58 AM9/12/13
to ez...@googlegroups.com
Hi folks,

First an apology to Doug for completely failing somehow to respond to his query. Doug, are you still with us? If so, the link to the output of your call to ezDesign no longer works, can you resend?

Laurie: yup, you need to put all the variables (inc./esp. subject) in the call to ezDesign in order to detect missing data. As for the discrepant results between aov and ezANOVA, I suspect that you're not properly using aggregate to collapse your data. Here's what ezANOVA is doing internally, using ddply from the immensely useful plyr package:

    cell_means = ddply(
        .data = my_data
        , .variables = .(Subjects,Age,Race,Sex,Emotion)
        , .fun = function(x){
            to_return = data.frame(
                value = mean(x$LRT)
            )
            return(to_return)
        }
    )

(Note: I presume you would have defined my_data$LRT = log(my_data$RT first)

I haven't used aggregate for quite a while, but I believe that if you look, you should be getting different means than those obtained by the call to ddply above. This is because the " Emotion + Sex + Race + Subjects" in your call to aggregate isn't what you want; I believe that you want " Emotion * Sex * Race * Subjects", but I'm not sure if aggregate can handle that.

Finally, while you didn't provide the code you're using to call aov, the proper code would be (after creating "cell_means" as above):

    my_aov = aov(
        data = cell_means
        , formula = value ~ Age * Race * Sex * Emotion + Error( Subjects / (Race * Sex * Emotion ) )
    )

With that call, and if your data are balanced with respect to the # of Ss in the Age groups (it looks like you're treating Age as a 2-level factor from your output), then the aov results should match the ezANOVA results. If your data are not balanced, then you shouldn't use aov as it is not computing the proper sums of squares. If this is the case, you can still confirm what I've said above by removing enough Ss to balance the groups then re-generate the aov and ezANOVA output, which should match up. Then go back to the full data and stick with ezANOVA for proper sums of squares computation :O)

Mike




--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~

laurie bayet

unread,
Sep 12, 2013, 12:08:47 PM9/12/13
to ez...@googlegroups.com
ok yes thank you the aggregation has to be what went wrong..

Age is a 4-level factors, but aov() sees it as a 2-level factor so something had to go wrong in the aggregate() call.. thank you very much !


2013/9/12 Mike Lawrence <mike....@gmail.com>

Mike Lawrence

unread,
Sep 12, 2013, 12:25:00 PM9/12/13
to ez...@googlegroups.com
If your operationalization of the 4 age groups comes from binning an otherwise continuous distribution of ages, you might consider leaving age as a number and either assume linearity and stick with ezANOVA, or skip the assumption of linearity and use ezMixed, which automatically uses generalized additive modelling to handle potentially non-linear effects of continuously distributed predictor variables. Your call to ezMIxed would be:

    my_mix = ezMixed(
        data = my_data
        , dv = LRT
        , random = Subjects
        , fixed = .(Age,Race,Sex,Emotion)
    )
    print(my_mix$summary)

This yields log-base-2 likelihood ratios ("bits" of evidence) associated with each effect.

Visualization can be achieved by:

    preds = ezPredict(
        fit = my_mix$models$Age
        , numeric_res = 100
    )
    ezPlot2(
        preds = preds
        , x = Age
        , ribbon = T
    )

Or, to achieve a visualization that eliminates the often-inferentially-uninteresting variance associated with the intercept, do:

    age_mean = mean(my_data$Age)
    my_data$Age = my_data$Age - age_mean
    my_mix = ezMixed(
        data = my_data
        , dv = LRT
        , random = Subjects
        , fixed = .(Age,Race,Sex,Emotion)
    )
    preds = ezPredict(
        fit = my_mix$models$Age
        , numeric_res = 100
        , zero = T
    )
    preds$cells$Age = preds$cells$Age + age_mean
    preds$boots$Age = preds$boots$Age + age_mean
    ezPlot2(
        preds = preds
        , x = Age
        , ribbon = T
    )

laurie bayet

unread,
Sep 13, 2013, 5:47:07 AM9/13/13
to ez...@googlegroups.com
Thanks a lot !

As tou said, i did the aggregation again using ddply, removed some subjects to balance the groups, and tried aov() again. The results are still different from ezANOVA(), but at least this time the two extra significant effects reported by aov() are reported as marginal by ezANOVA(), so the disagreement is less worrisome (but maybe it shouldn't be?) and i feel i can reasonably trust the ezANOVA() output even if i still get the weird "Estimated effects may be unbalanced" message and still have no idea if the type of SS used (the default type =2) is correct in this case.

Mike, the 4 age groups come from the way the subjects were tested : subjects aged 11-12 year olds were tested on day 1, 9-10 year olds on day 2, etc. I want to do a group analysis instead of entering their actual age as a continuous predictor, mainly because that's what's customary in the field and i don't have the weight or good-enough reasons to do it differently. But the underlying distribution of age should be continuous because the subjects were not a priori selected, they were simply allocated.

However, when you say that using ezANOVA  "assumes linearity" for Age, is it true even if Age was entered as a factor (as in is.factor(data$Age) = TRUE)?
I don't have any hypothesis on a linear (or even monotonous) trend in the effect of age (like a monotonous and linear improvement of performance, or a linear and monotonous increase or decrease of some effects across development: there are many reasons why an effect might be stronger or weaker in one of the "middle" age group). But i have the feeling this might not be the "linearity" you are talking about?

Does it mean using ezANOVA is wrongful in this case? Would it work if i coded age with letters instead, or would it be "cheating" because the underlying distribution is continuous and the factor is supposed to be ordered?

Sorry for being such a statistical illiterate, and thanks a lot for all the help you gave me :)

cheers,
laurie


2013/9/12 Mike Lawrence <mike....@gmail.com>

Mike Lawrence

unread,
Sep 13, 2013, 10:23:44 AM9/13/13
to ez...@googlegroups.com
When age is a factor, it is *not* treated as linear by any anova method (aov, ezANOVA, etc). 

I'd still suggest trying the ezMixed/gam approach, as it may be more powerful; binning and treating a continuous variable as a factor dramatically drops power. One way to think of it is that it's actually more true to the data than your current approach of binning ages together artificially.

I'm still concerned that you're getting different results between aov and ezANOVA. Can you make the groups equal in number in the uncollapsed data (i.e. what you're passing to ezANOVA), run ezANOVA, then collapse using ddply and run aov on the result, and send back the results of both if they continue to differ? Also, send the results of running ezANOVA on the collapsed-by-ddply data.

Mike
 


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~


laurie bayet

unread,
Sep 18, 2013, 9:41:00 AM9/18/13
to ez...@googlegroups.com
Hello,

Sorry about the delay in my response.

So I removed some subjects in the uncollapsed data and performed

1) ezANOVA on uncollapsed data
2) aov on collapsed-by-ddply & pre-transfromed (logged) data => different from (1)
3) ezANOVA on collapsed-by-ddply & pre-transfromed (logged) data => same as (2)
4) ezANOVA on collapsed-by-ddply but not pre-transformed (logged) data => same as (1)

It seems to me that the cause of the discrepancy between aov and ezANOVA results didn't stem from the data collapsing, but from the order in which the averaging/collapsing and transforming (log) steps were done.

It seems that when ezANOVA is asked to transform the data and do an ANOVA with the result, it does the averaging step BEFORE transforming the data. Mean(log(X)) isn't equal to Log(mean(X)), hence the discrepancy in the results? Does it make sense?

It seems to me that the correct way to do the ANOVA on transformed data then would be to transform the data beforehand and only then call ezANOVA on the pre-transfromed data, because averaging the data before transforming it would totally ruin the effect of transformation on approaching normalcy, wouldn't it ?

Also, are "type 2" Sums of Squares a good idea with unbalanced groups ?


(as you say it would be better to use the "age" variable as a continuous variable and not a categorical one, but i was specifically asked to perform the analysis this way and would be afraid about the way my attempt at analysis would deal with the non linear effects of age)

Thank you very much ! I am so relieved that i am finally able to see consistent results, thanks to your help.

Best,
laurie

-------------------------------
Here are the results:

1) ezANOVA on uncollapsed data

model specification :

> model_full = ezANOVA(
+     data = databal
+     , dv = log(RTs)
+     , wid = Subjects
+     , within = .(Race,Sex,Emotion)
+     , between = Age
+     , detailed = TRUE
+     , return_aov = TRUE
+     , observed = Age
+     , type = 2
+ )


results :
significant effects are in blue
marginal effects are in green
other effects are in black

                           Effect DFn DFd          SSn         SSd                F                p            p<.05     ges
1                       (Intercept) 1  48 973918133.61 72465765 645.1055907 1.756313e-29 * 8.704773e-01
2                               Age   3  48  23133114.97 72465765  5.1076510     3.789007e-03 * 1.596333e-01
3                             Race   1  48    716813.32  4291987     8.0165751     6.752641e-03 * 4.922122e-03

5                                Sex   1  48    509848.45 10224812   2.3934646      1.284118e-01   3.505945e-03
7                         Emotion   2  96    489477.96  9911987    2.3703563      9.889590e-02   3.366340e-03
4                      Age:Race   3  48    350126.47  4291987    1.3052283      2.835636e-01   2.416096e-03
6                         Age:Sex   3  48  1231382.37 10224812   1.9268929     1.377997e-01   8.497325e-03
8                  Age:Emotion   6  96   1107113.30  9911987    1.7871101     1.097807e-01  7.639789e-03
9                       Race:Sex   1  48     10433.25  4024122     0.1244486      7.258033e-01  7.199093e-05
11               Race:Emotion   2  96    167587.44  4173293   1.9275421      1.510951e-01  1.155124e-03
13                 Sex:Emotion   2  96   1550225.49  5211693   14.2776684   3.729296e-06 * 1.058432e-02
10              Age:Race:Sex   3  48    278458.96  4024122    1.1071592     3.554309e-01  1.921545e-03
12       Age:Race:Emotion   6  96    488398.02  4173293    1.8724708     9.337131e-02   3.370258e-03
14          Age:Sex:Emotion   6  96   1163647.76  5211693   3.5724215     3.079480e-03 * 8.029913e-03
15         Race:Sex:Emotion  2  96    183841.34  6641502   1.3286729      2.696539e-01  1.267015e-03
16 Age:Race:Sex:Emotion   6  96    216723.71  6641502   0.5221077      7.902821e-01   1.495532e-03


2) aov on collapsed by ddply and pre-transformed (logged) data

collapsing data & model specification :

databal$LRT<-log(databal$RTs)
cell_means = ddply(
+   .data = databal
+ , .variables = .(Subjects,Age,Race,Sex,Emotion)
+ , .fun = function(x){
+   to_return = data.frame(
+   value = mean(x$LRT)
+   )
+  return(to_return)
+  }
+ )

model_full_aov <- aov(value ~ Age * Race * Sex * Emotion  + Error(Subjects/(Race * Sex * Emotion)),data=cell_means)


results :
significant effects are in blue
marginal effects are in green
other effects are in black


                                     Df Sum Sq   Mean Sq   F value   Pr(>F)   
                             Age   3  12.04     4.015         6.836     0.000628 ***
                           Race   1  0.3843   0.3843     12.05       0.0011 **

                              Sex   1  0.1374   0.1374      2.658     0.1096 
                       Emotion   2  0.1861   0.09306    3.501     0.0341 *
                    Age:Race   3  0.1052   0.0351      1.10       0.3582  
                       Age:Sex  3  0.3537   0.1179       2.280     0.0912 .
                Age:Emotion  6  0.2346   0.03910    1.471      0.1963 
                     Race:Sex  1  0.0373   0.03726    1.518      0.224
              Race:Emotion  2  0.0370   0.01852    1.152      0.3204 
                Sex:Emotion   2  0.5941   0.29706  16.254     8.33e-07 ***
              Age:Race:Sex  3  0.0906   0.03020    1.231      0.309
       Age:Race:Emotion  6  0.1802   0.03003    1.868      0.0942 . 
         Age:Sex:Emotion  6  0.3768   0.06280     3.436     0.00407 **
       Race:Sex:Emotion  2  0.1255   0.06276     2.750     0.069 .
Age:Race:Sex:Emotion  6  0.1036   0.01727     0.757     0.606 

=> results still differ

3) ezANOVA on collapsed by ddply and pre-transformed (logged) data

model specification :

model_full_col = ezANOVA(
+     data = cell_means
+     , dv = value
+     , wid = Subjects
+     , within = .(Race,Sex,Emotion)
+     , between = Age
+     , detailed = TRUE
+     , return_aov = TRUE
+     , observed = Age
+     , type = 2
+ )


results :
significant effects are in blue
marginal effects are in green
other effects are in black

                               Effect DFn DFd          SSn       SSd            F            p p<.05          ges
1                       (Intercept) 1 48 3.076777e+04 28.187777 5.239338e+04 1.370438e-74* 0.9982186128
2                                 Age 3 48 1.204378e+01 28.187777 6.836314e+00 6.284850e-04* 0.2193482314
3                                Race 1  48 3.843375e-01  1.530401 1.205449e+01 1.103620e-03* 0.0069511181

5                                  Sex 1  48 1.374260e-01  2.481724 2.658009e+00 1.095751e-01  0.0024966318
7                           Emotion 2  96 1.861121e-01  2.551639 3.501036e+00 3.407321e-02* 0.0033781298
4                         Age:Race 3  48 1.052453e-01  1.530401 1.100316e+00 3.581927e-01 0.0019167880
6                           Age:Sex 3  48 3.536951e-01  2.481724 2.280318e+00 9.122751e-02  0.0064416974
8                    Age:Emotion 6  96 2.345816e-01  2.551639 1.470939e+00 1.963139e-01  0.0042723356
9                         Race:Sex 1  48 3.725716e-02  1.177743 1.518450e+00 2.238563e-01  0.0006780887
11                Race:Emotion 2  96 3.703916e-02  1.543446 1.151890e+00 3.203676e-01  0.0006741238
13                  Sex:Emotion 2  96 5.941167e-01  1.754526 1.625374e+01 8.326384e-07* 0.0107045666
10               Age:Race:Sex 3  48 9.059491e-02  1.177743 1.230760e+00 3.088160e-01   0.0016499664
12        Age:Race:Emotion 6  96 1.801749e-01  1.543446 1.867768e+00 9.421212e-02   0.0032814489
14          Age:Sex:Emotion 6 96 3.767782e-01  1.754526 3.435943e+00   4.070894e-03* 0.0068621010
15        Race:Sex:Emotion 2 96 1.255278e-01  2.191401 2.749536e+00  6.899853e-02   0.0022809704
16 Age:Race:Sex:Emotion 6 96 1.036148e-01 2.191401 7.565193e-01    6.058088e-01   0.0018870930

=> gives the same results as the call to aov

4) ezANOVA on collapsed by ddply but not pre-transformed data

collapsing data & model specification :

cell_means_2 = ddply(
+   .data = databal
+ , .variables = .(Subjects,Age,Race,Sex,Emotion)
+ , .fun = function(x){
+   to_return = data.frame(
+   value = mean(x$RTs)
+   )
+  return(to_return)
+  }
+ )

model_full_col_2 = ezANOVA(
+     data = cell_means_2
+     , dv = log(value)
+     , wid = Subjects
+     , within = .(Race,Sex,Emotion)
+     , between = Age
+     , detailed = TRUE
+     , return_aov = TRUE
+     , observed = Age
+     , type = 2
+ )


results:
significant effects are in blue
marginal effects are in green
other effects are in black

                 Effect DFn DFd          SSn      SSd           F            p p<.05          ges
1                   (Intercept)   1  48 973918133.61 72465765 645.1055907   1.756313e-29  * 8.704773e-01
2                                 Age   3  48  23133114.97 72465765   5.1076510  3.789007e-03  * 1.596333e-01
3                               Race   1  48    716813.32  4291987   8.0165751     6.752641e-03  * 4.922122e-03

5                                 Sex   1  48    509848.45 10224812   2.3934646    1.284118e-01     3.505945e-03
7                          Emotion   2  96    489477.96  9911987   2.3703563     9.889590e-02     3.366340e-03
4                       Age:Race   3  48    350126.47  4291987   1.3052283     2.835636e-01     2.416096e-03
6                          Age:Sex   3  48   1231382.37 10224812   1.9268929   1.377997e-01     8.497325e-03
8                   Age:Emotion   6  96   1107113.30  9911987   1.7871101     1.097807e-01     7.639789e-03
9                        Race:Sex   1  48     10433.25  4024122   0.1244486      7.258033e-01     7.199093e-05
11               Race:Emotion   2  96    167587.44  4173293   1.9275421     1.510951e-01     1.155124e-03
13                 Sex:Emotion   2  96   1550225.49  5211693  14.2776684    3.729296e-06  * 1.058432e-02
10               Age:Race:Sex   3  48    278458.96  4024122   1.1071592     3.554309e-01    1.921545e-03
12        Age:Race:Emotion   6  96    488398.02  4173293   1.8724708     9.337131e-02    3.370258e-03
14          Age:Sex:Emotion   6  96   1163647.76  5211693   3.5724215     3.079480e-03  * 8.029913e-03
15        Race:Sex:Emotion   2  96    183841.34  6641502   1.3286729     2.696539e-01    1.267015e-03
16 Age:Race:Sex:Emotion   6  96    216723.71  6641502   0.5221077     7.902821e-01    1.495532e-03

=> gives the same result as the call to ezANOVA on uncollapsed data


2013/9/13 Mike Lawrence <mike....@gmail.com>

laurie bayet

unread,
Sep 18, 2013, 10:06:31 AM9/18/13
to ez...@googlegroups.com
i did the ANOVA again with pre-transformed uncollapsed data and the original number of subjects (groups are unbalanced).

the results i get are generally consistent with what i expected (that is something resembling the results of previous analysis number (2)/(3)) but two significant effects "appear" (one was marginal, and one was "marginally marginal") while one marginal effect "disappears" (becoming "marginally marginal").

do you think this is just the result of increased power, or should i be worried ?

thanks for any help or insight on the matter.

best,
laurie

----------------------
data collapsing and model specification:

data$LRT<-log(data$RTs)
library(ez)

model_full = ezANOVA(
    data = data
    , dv = LRT

    , wid = Subjects
    , within = .(Race,Sex,Emotion)
    , between = Age
    , detailed = TRUE
    , return_aov = TRUE
    , observed = Age
    , type = 2
)


results:
significant effects are in blue
marginal effects are in green
other effects are in black

                                                  SSn                   SSd                   F            p              p<.05          ges
1                     (Intercept) 1  58 3.684236e+04 35.176185 6.074726e+04 2.652605e-89* 0.9981809032
2                               Age  3  58 1.332038e+01 35.176185 7.321069e+00 3.035368e-04* 0.1983912519
3                              Race 1  58 6.101449e-01  2.021189 1.750871e+01 9.813479e-05*  0.0090055492
5                                 Sex 1  58 3.818511e-01  3.108522 7.124726e+00 9.844421e-03*  0.0056550590
7                         Emotion  2 116 2.547410e-01  2.983719 4.951867e+00 8.636226e-03* 0.0037797253

4                      Age:Race  3  58 2.254751e-01  2.021189 2.156743e+00 1.028966e-01   0.0033581848
6                        Age:Sex   3  58 5.824406e-01  3.108522 3.622467e+00 1.818792e-02* 0.0086747630
8                  Age:Emotion  6 116 2.008146e-01  2.983719 1.301200e+00 2.621386e-01  0.0029908959
9                       Race:Sex  1  58 2.055729e-02  1.459353 8.170214e-01 3.697916e-01    0.0003060828
11               Race:Emotion 2 116 3.906440e-02  2.056988 1.101482e+00 3.358289e-01  0.0005814797
13                 Sex:Emotion  2 116 6.878913e-01 2.357585 1.692312e+01 3.557157e-07* 0.0101414242
10              Age:Race:Sex  3  58 1.183634e-01  1.459353 1.568064e+00 2.068603e-01   0.0017628830
12        Age:Race:Emotion 6 116 2.036068e-01  2.056988 1.913671e+00 8.429552e-02  0.0030324826
14          Age:Sex:Emotion  6 116 3.057392e-01  2.357585 2.507210e+00 2.562152e-02*0.0045536242
15        Race:Sex:Emotion  2 116 9.986895e-02  2.869528 2.018590e+00 1.374827e-01 0.0014852206
16 Age:Race:Sex:Emotion  6 116 1.520731e-01  2.869528 1.024586e+00 4.129185e-01 0.0022649484


2013/9/18 laurie bayet <lauri...@gmail.com>

Mike Lawrence

unread,
Sep 18, 2013, 4:51:31 PM9/18/13
to ez...@googlegroups.com
Ah, yes, I forgot there was a log transform in there. Frankly, I'm surprised that you were able to pass "log(RT)" to ezANOVA and have it work at all! Yes, you would want to log-transform at the level of the raw data (at least, this is what folks in the human RT literature do when doing log-transforms to normalize the typically positively skewed RT distributions).

You asked about sums of squares types. When data are imbalanced, your only (reasonable) options are type=2 and type=3, and it has been well argued on the r-help mailing list and elsewhere that while type=3 is what commercial stats packages (SPSS, SAS) use, type=2 makes more sense. This is why type=2 is the default.


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~


Mike Lawrence

unread,
Sep 18, 2013, 5:00:10 PM9/18/13
to ez...@googlegroups.com
I'd tend to trust the analysis with more data, which in this case is the one with imbalanced groups. I certainly wouldn't worry about the "marginal" effect becoming "marginally marginal". Indeed, the concept of "marginal" effects is frankly philosophically uninformed. Attached is a presentation I gave recently that touches on this.
Bayes in brief.pdf

laurie bayet

unread,
Sep 19, 2013, 3:52:28 AM9/19/13
to ez...@googlegroups.com
Ok, thanks for the help and the slideshow :) i think i'm all set for now :)
Have a great day,
laurie


2013/9/18 Mike Lawrence <mike....@gmail.com>
Reply all
Reply to author
Forward
0 new messages