Hi Thomas and team,
Thank you for making and maintaining such a brilliant tool.
I've been looking through forum posts and published material but I've been unable to answer some questions, would you mind helping me out? I apologize if I have overlooked something simple.
I have a completely within-subjects design with factor 1 (Test, 2 levels: base, task), factor 2 (Site, 4 levels: site1, site2, site3, site4), and factor 3 (Timepoint, 4 levels: tpt1, tpt2, tpt3, tpt4). Factors 1 and 2 are fully nested within each other (i.e., base and task have all site levels, and each site has both base and task), but factor 3 of Timepoint codes for the order of experimental sessions which is different for each subject (i.e., for subj1, tpt1 is site1, tpt2 is site4, etc; for subj2, tpt1 is site3, tpt2 is site1, etc). So I've set up the following patsy model and HDDMRegressor model...
dmatrix("C(Test,Treatment('base'))*C(Site, Treatment('site1'))+C(Timepoint, Treatment('tpt1'))", data)
model = hddm.HDDMRegressor(data, {"v ~ C(Test, Treatment('base'))*C(Site, Treatment('site'))+C(Timepoint, Treatment('tpt1'))"}, keep_regressor_trace=True, p_outlier=.08, is_group_model=True)
This then gives me the following columns:
''Intercept''
"C(Test, Treatment('base'))[T.t]"
"C(Site, Treatment('site'))[T.site2]"
"C(Site, Treatment('site'))[T.site3]"
"C(Site, Treatment('site'))[T.site4]"
"C(Timepoint, Treatment('tpt1'))[T.tpt2]"
"C(Timepoint, Treatment('tpt1'))[T.tpt3]"
"C(Timepoint, Treatment('tpt1'))[T.tpt4]"
"C(Test, Treatment('base'))[T.t]:C(Site, Treatment('site'))[T.site2]"
"C(Test, Treatment('base'))[T.t]:C(Site, Treatment('site'))[T.site3]"
"C(Test, Treatment('base'))[T.t]:C(Site, Treatment('site'))[T.site4]"
So here my questions:
1). How do I interpret the different contrast vectors? Do I interpret all of them, even the interaction terms, relative to the single "Intercept" term (v_Intercept)? For example with the simple means reported in the print_stats option (although I know I will be doing Bayesian hypothesis testing on the full posteriors, but just for easy examples)...
If my v_Intercept = 1.24,
- when mean of v_C(Test, Treatment('base'))[T.task] = 0.20, then does that mean that the "task" condition = 1.44? If so, then how would I look at the "base" condition, would it be like this...v_C(Test, Treatment('base'))[T.base]?? (I'm suspicious of that, since not reported in stats output). Or is it actually the case that when v_C(Test, Treatment('base'))[T.task] = 0.20, this means that the "task" condition has drift rate that is 0.20 higher than the "base" condition drift rate? If so, how do I then figure out what is the mean (or full posterior distribution) of the "base" condition?
- similarly, for interaction terms, when the mean drift rate of C(Test, Treatment('base'))[T.t]:C(Site, Treatment('site'))[T.site2] = -0.35, is that relative to the v_Intercept term? If so, do I then interpret it as follows...the mean drift rate for "task" condition of "site1" = (1.24 + -0.35 = 0.89)?? But if it's not relative to v_Intercept, then I have no idea how to interpret this complex interaction term, since it would seem that the mean drift rate of the interaction term would be relative to two different intercepts, the intercept for the first contrast vector, C(Test, Treatment('base')) which would be "base" condition, and then the intercept for the second contrast vector, C(Site, Treatment('site')) which would be "site1".
2). When I add the Timepoint factor with its contrast vectors (e.g. v_C(Timepoint, Treatment('tpt1'))) as a "covariate" to my regression model, is it correct to interpret this inclusion as analogous to the "controlling for" aspect of standard regression? What I mean is, can I then say, the drift rates for the factor1xfactor2 interaction terms (e.g., C(Test, Treatment('base'))[T.t]:C(Site, Treatment('site'))[T.site2]) are meaningfully interpreted as "independent of" or "controlled for" the effects of timepoints? (much like an ANCOVA, where the effect of the Timepoint covariate needs to be removed)
3). Trying to understand that complex model in relation to an alternative simpler model with a single factor, e.g. Condition, with 8 levels (site1base, site1task, site2base...site4base, site4task), coded with patsy and hddm as:
dmatrix("C(Condition,Treatment('site1base'))", data)
model = hddm.HDDMRegressor(data, {"v ~ C(Condition, Treatment('site1base'))"}, keep_regressor_trace=True, p_outlier=.08, is_group_model=True)
Technically, for each individual subject, seems to me that the effect of timepoint would be inherent in this Condition coding, because I know the condition order for each subject...however, since the condition order is different across subjects yet my HDDM model analyses each Condition at a group level, I would lose the ability to interpret these Condition drift rates as "controlling for" the effects of timepoints, is that correct? In which case, these different models (model1: Test x Site + Timepoints...model2: Condition) would not produce equivalent drift rate posteriors for each basic condition (e.g., the "task" condition of "site1"), is that correct? So for my situation, do you recommend I stick with model1 since I want to "remove" or "control for" the effect of time?
My apologies for a lengthy post, I'm just very excited to figure this out so I can proceed. If you don't have much time to respond, my 1) and 2) questions are most important to me.
Thank you very much in advance,
Micah