include order factor for within-subjects design in ezANOVA

574 views
Skip to first unread message

Steffen Gauglitz

unread,
Apr 2, 2014, 8:21:43 PM4/2/14
to ez...@googlegroups.com
Hello,

Thanks for this package; it makes a few things much easier to read.
I'm wondering how to most appropriately account for the "order" effect for a within-subjects design in ezANOVA.

Our experimental design is a relatively straightforward within-subjects design, each user accomplishes one task with three interfaces each, and we measure task completion time.

We could use a one-way repeated measures ANOVA as follows:

 aovout = aov(task_time~interface+Error(user_id/interface), data=data)

And I get the same numbers from using ezANOVA as follows:
 
 ezANOVA(data=data, within=.(interface), wid=.(user_id), dv=.(task_time))

So far so good. However, the order of the interfaces is balanced -- user 1 used the interfaces in the order A,B,C; user 2 in the order B,C,A etc -- and it is quite apparent that there is a training effect, thus it appears that order should be included in the analysis. With aov, it appears that I can include order as a factor as follows:

 aovout = aov(task_time~interface+order+Error(user_id/interface), data=data)

(at least, the output looks like what I would expect)
How do I do this most appropriately using ezANOVA? I've played with several of the input arguments, but no luck so far. I'm not entirely sure what the most accurate terminology for the "order" factor is, either -- it's not 'within' subjects, it's not 'between', it's not 'nested' either...
The reason I'd like to ezANOVA here is that it includes more information, like Mauchly's Sphericity test, which the above aov command doesn't give me.

Any help appreciated. Please pardon inaccuracies in my terminology; if anything is unclear please let me know.

Thanks!
 Steffen

Mike Lawrence

unread,
Apr 2, 2014, 8:34:04 PM4/2/14
to ez...@googlegroups.com
It would seem that order is a between-Ss variable, so this should work:

ezANOVA(
    data=data
    , within=.(interface)
    , between = .(order)
    , wid=.(user_id)
    , dv=.(task_time)
)


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~


--
You received this message because you are subscribed to the Google Groups "ez4r" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ez4r+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steffen Gauglitz

unread,
Apr 3, 2014, 1:42:18 PM4/3/14
to ez...@googlegroups.com
Hello Mike,

Thanks very much for your prompt help!
I've tried that version, but unfortunately it doesn't appear to work for me, or maybe I'm not getting it right:

> ezANOVA(data=data, within=.(interface), between=.(order), wid=.(user_id), dv=.(task_time))
Warning: The column supplied as the wid variable contains non-unique values across levels of the supplied between-Ss variables. Automatically fixing this by generating unique wid labels.
Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within,  :
  One or more cells is missing data. Try using ezDesign() to check your data.
Calls: ezANOVA -> ezANOVA_main
Execution halted

Maybe I should clarify the coding of my factors -- here are the first six rows of data:

user_id    interface    task_time    order
u25    A    483.424    p1
u25    B    331.017    p2
u25    C    347.248    p3
u26    A    496.941    p1
u26    B    342.496    p3
u26    C    443.072    p2
...

My understanding of a classic "between" factor would be a grouping (e.g., treatment 1 vs treatment 2 vs control), for example that user 25 is "treatment 1", user 26 is "treatment 2" etc, which isn't exactly the case here. It appears that that's what the error is telling me, if I understand it correctly.

I guess I could code order as a grouping factor in the sense above as follows:

user_id    interface    task_time    order
u25    A    483.424    ABC
u25    B    331.017    ABC
u25    C    347.248    ABC
u26    A    496.941    ACB
u26    B    342.496    ACB
u26    C    443.072    ACB
...

However, that is not the same, semantically: In the first version, it retains the information that u25 and u26 used A first ('p1'), while in the second version, they are in completely separate groups...

Any suggestions?

Thanks!
 Steffen

Mike Lawrence

unread,
Apr 3, 2014, 2:25:13 PM4/3/14
to ez...@googlegroups.com
On Thu, Apr 3, 2014 at 2:42 PM, Steffen Gauglitz <sgau...@cs.ucsb.edu> wrote:
user_id    interface    task_time    order
u25    A    483.424    ABC
u25    B    331.017    ABC
u25    C    347.248    ABC
u26    A    496.941    ACB
u26    B    342.496    ACB
u26    C    443.072    ACB
...

However, that is not the same, semantically: In the first version, it retains the information that u25 and u26 used A first ('p1'), while in the second version, they are in completely separate groups...

I believe this is what you want. That, or add a variable called task_number that labels the order of the current task for each subject, eg:

user_id    interface    task_time    task_number
u25    A    483.424    1
u25    B    331.017    2
u25    C    347.248    3
u26    A    496.941    1
u26    B    342.496    3
u26    C    443.072    2


In which case you'd probably want to convert task_number to a factor, lest you assume linearity that may not hold.

Mike

Steffen Gauglitz

unread,
Apr 3, 2014, 2:30:46 PM4/3/14
to ez...@googlegroups.com
hm... isn't the latter exactly what I currently have?
How would you integrate the 'task_number' in the ezANOVA call?

Thanks again,
 Steffen

Mike Lawrence

unread,
Apr 3, 2014, 2:34:34 PM4/3/14
to ez...@googlegroups.com
Oh, sorry. "'task_number" would be included as a within variable:

ezANOVA(
    data=data
    , within=.(interface,task_number)
    , wid=.(user_id)
    , dv=.(task_time)
)


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~


--

Steffen Gauglitz

unread,
Apr 3, 2014, 3:51:25 PM4/3/14
to ez...@googlegroups.com
Including it as a within variable results in a different error:

> ezANOVA(data=data, within=.(interface,task_number), wid=.(user_id), dv=.(task_time))
Warning: "task_number" will be treated as numeric.

Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within,  :
  One or more cells is missing data. Try using ezDesign() to check your data.
Calls: ezANOVA -> ezANOVA_main
Execution halted

(Exactly the same, without the treated-as-numeric warning, for my "order" factor.)
Doesn't this syntax assume that I have data for each combination of interface and task number, per user?

I've also tried the "grouping" version, i.e. order_group assumes one of six values of the kind "ABC" etc., then calling:
 ezANOVA(data=data, within=.(interface), between=.(order_group), wid=.(user_id), dv=.(task_time))

That works (as in, doesn't result in an error), and the numbers are the same when I use:
 aovout = aov(task_time~interface*order_group+Error(user_id/interface), data=data)

However, they are (obviously, I guess) not the same as from my original version:
 aovout = aov(task_time~interface*order+Error(user_id/interface), data=data)

I'm not sure if I can articulate this well, but it appears to me that using the "order" factor would be preferrable -- because it retains the notion that "this session" was executed at "this position" (rather than "this user" was subject to treatment "ABC" for all three sessions).

For example, after the aov(...order...) call, a post-hoc analysis clearly shows the (not necessarily desirable, but expected) training effect (sessions executed last where significantly faster 2nd, and 1st), while the post-hoc analysis after the aov(...order_group...) call gives me 15 pairwise comparisons between each possible order, which is much less clear to interpret (there is no significant effect between CAB and ACB...), and obviously, each of them is based on much less data...

Does that make sense? Is there any way to execute it this way? What kind of a factor is "order" then, if it's neither within nor between?
Or, maybe: can somebody explain to me why using the grouping factor is indeed the preferable way here, if it is?

Thanks,
 Steffen

Mike Lawrence

unread,
Apr 5, 2014, 11:15:24 AM4/5/14
to ez...@googlegroups.com
Sorry, that was a major brain-freeze on my part.

Yes, my original suggestion of having order as a between-Ss variable is the best you can do with traditional ANOVA. With a mixed effects model, you could use task_number and permit it to interact with interface, as:

    library(lme4)
    fit = lmer(
        data = data
        , formula = task_time ~ (interface | user_id) + interface*task_number
    )


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~


Steffen Gauglitz

unread,
Apr 7, 2014, 2:31:25 PM4/7/14
to ez...@googlegroups.com
Ok, thanks a lot for the clarification. I might look into lme4 for a mixed effects model (or be satisfied with the between-Ss solution).

Thanks again!
Reply all
Reply to author
Forward
0 new messages