Mixed effect model

1,280 views
Skip to first unread message

J Martinez

unread,
Jan 7, 2015, 3:23:03 PM1/7/15
to hddm-...@googlegroups.com
Hi Thomas,

Thanks for such an awesome package! I'm new to DDM, so I'm just learning the ropes and this forum and the documentation have been extremely helpful.

I have one question. I currently have a data set from a complex task switching study with tons of variables. I am interested in the effect of both between and within subject variables on the parameters. The between subject variable I have are age: Adult/Child and memory: High/Low. The within subject variable is condition:Shape/Pattern/InColor/OutColor. I also have a covariate called similarity.


I am trying to build my model from your example in the tutorial:

hddm.HDDMRegressor(data[data.dbs == 1], "a ~ theta:C(conf, Treatment('LC'))", depends_on={'v': 'stim'})

You split the data up by dbs on or off, but you also mention that the full model includes dbs in the interaction between theta and stim. I understand that a is being estimated as a within subject effect with LC as the reference level. But is v also being estimated within subject with depends_on = stim but just without a reference level?


How would I estimate both between and within effects on a parameter? Would I have to subset my data by each between group? Or can I add it in the model in the interaction?

I tried this:

v_reg = "v ~ similarity:C(condition, Treatment('SHAPE')):C(sub_age, Treatment('Adult')):C(memory, Treatment('low'))"
m_scm = hddm.HDDMRegressor(data, reg, include = ['t','a', 'v'])

but it gave me an error: NotImplementedError: Missing columns in design matrix. You need data for all conditions for all subjects.

I also tried this:

m_scm = hddm.HDDMRegressor(data, "v ~ similarity:C(condition, Treatment('SHAPE'))", include = ['t','a', 'v'], depends_on = {'v':['sub_age', 'memory']})

Which didn't give an error, but I am unsure if it is estimating v how I want it to.

Any advice would be greatly appreciated.

Joel

Thomas Wiecki

unread,
Jan 13, 2015, 7:12:59 AM1/13/15
to hddm-...@googlegroups.com
Hi Joel,


On Wed, Jan 7, 2015 at 9:23 PM, J Martinez <joeledm...@gmail.com> wrote:
Hi Thomas,

Thanks for such an awesome package! I'm new to DDM, so I'm just learning the ropes and this forum and the documentation have been extremely helpful.

Glad it's useful to you.

I have one question. I currently have a data set from a complex task switching study with tons of variables. I am interested in the effect of both between and within subject variables on the parameters. The between subject variable I have are age: Adult/Child and memory: High/Low. The within subject variable is condition:Shape/Pattern/InColor/OutColor. I also have a covariate called similarity.

I am trying to build my model from your example in the tutorial:

hddm.HDDMRegressor(data[data.dbs == 1], "a ~ theta:C(conf, Treatment('LC'))", depends_on={'v': 'stim'})

You split the data up by dbs on or off, but you also mention that the full model includes dbs in the interaction between theta and stim. I understand that a is being estimated as a within subject effect with LC as the reference level. But is v also being estimated within subject with depends_on = stim but just without a reference level?

depends_on creates between-subject effects, not within. You could add dbs to theta as an interaction: "a ~ dbs*theta:C(conf, Treatment('LC'))"
 

How would I estimate both between and within effects on a parameter? Would I have to subset my data by each between group? Or can I add it in the model in the interaction?

Use depends_on for between subject and regressors for within subject effects.

I tried this:

v_reg = "v ~ similarity:C(condition, Treatment('SHAPE')):C(sub_age, Treatment('Adult')):C(memory, Treatment('low'))"
m_scm = hddm.HDDMRegressor(data, reg, include = ['t','a', 'v'])

but it gave me an error: NotImplementedError: Missing columns in design matrix. You need data for all conditions for all subjects.

Right, you're trying to do within subjects but don't have every subject tested in every condition.

HTH,
THomas

I also tried this:

m_scm = hddm.HDDMRegressor(data, "v ~ similarity:C(condition, Treatment('SHAPE'))", include = ['t','a', 'v'], depends_on = {'v':['sub_age', 'memory']})

Which didn't give an error, but I am unsure if it is estimating v how I want it to.

Any advice would be greatly appreciated.

Joel

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thomas Wiecki, PhD
Data Science Lead, Quantopian Inc, Boston

J Martinez

unread,
Jan 13, 2015, 12:46:06 PM1/13/15
to hddm-...@googlegroups.com
Thank you for your response, it cleared a lot up for me. I realize why my between subjects variables weren't acting like dbs, dbs in your data set is a within subject variable, for some reason I thought otherwise.

Just one more question. I'm hoping to supplement the study with this analysis as I think this is a very important piece. I ran the model with the age and memory in the depends_on call. The between subject variables only affect the intercept as it provides the following nodes:
v_Intercept(high.Adult)
v_Intercept(high.Kid)
v_Intercept(low.Adult)
v_Intercept(low.Kid)

The within subject variables affect the actual regression coefficients:
v_similarity:C(condition, Treatment('SHAPE'))[INNER_COLOR]
v_similarity:C(condition, Treatment('SHAPE'))[OUTER_COLOR]
v_similarity:C(condition, Treatment('SHAPE'))[PATTERN]
v_similarity:C(condition, Treatment('SHAPE'))[SHAPE]

What I would like to calculate is the betas dependent on age, memory, and condition. Is the only way to subset the data by each between group (Adult:Low, Adult:High, etc) and run a model on each group calculating the interaction?

Something like: hddm.HDDMRegressor(data[(data.sub_age == "Adult") & (data.memory == "high")], 'v ~ similarity: C(condition)')

I feel like maybe I'm missing an important aspect of the output or the functionality. I'm sorry if there is a simple answer to all of this, I'm just happy to learn what I can to get the most out of this package.

-Joel

Thomas Wiecki

unread,
Jan 14, 2015, 2:17:33 AM1/14/15
to hddm-...@googlegroups.com
This getting more into patsy functionality so you might want to check here:
https://patsy.readthedocs.org/en/latest/

But perhaps instead of the ':' operator you want the '*' operator to get main effects + interaction?

Thomas

J Martinez

unread,
Jan 14, 2015, 1:11:59 PM1/14/15
to hddm-...@googlegroups.com
I tried this:
test = hddm.HDDMRegressor(data, 'v ~ similarity*sub_age', p_outlier = .005)

But it still spit out the incomplete design matrix error. I think I may know why: some subjects are missing one value of similarity, the covariate, making an incomplete cell. In R, I bypassed this problem by treating similarity as a numeric rather than a factor, that way it would just calculate what that missing value would be from the regression. I think the reason this model doesn't work is because the function is treating similarity as a factor/categorical rather than a numerical covariate. Is there any way to change that specification? I looked at the patsy documentation, but no mention of data types in the models.

similarity has a dtype of ('int64'), but when I try to convert it something else it gets mad: TypeError: cannot convert the series to <type 'float'>

Any ideas on how to get around this?

J Martinez

unread,
Jan 14, 2015, 1:19:14 PM1/14/15
to hddm-...@googlegroups.com
Nevermind,

It spits out the design matrix error even with a simple model that wouldn't have any missing cells:

test = hddm.HDDMRegressor(data, 'v ~ sub_age', p_outlier = .005)

It seems that this function doesn't allow between subjects variables within the model specification. I imagine this leaves just calculating the effect of similarity on v for each between group and compare their posterior distributions then?

Sean Matthews

unread,
Jun 28, 2015, 3:27:20 PM6/28/15
to hddm-...@googlegroups.com
Did you ever figure this out?  I would like to use HDDM to analyze data from an experiment with a mixed within-between design but I'm running into the same issues that you describe here.

I have two groups of subjects (controls and individuals with schizophrenia), and one categorical within-subjects manipulation that has 5 levels (let's just call it "relation").  I'd like to estimate a a drift rate for each level of relation (my within-subjects variable), and compare these drift rates for each level of relation across the two groups.  I tried specifying the model like this:

m_within_subj = hddm.HDDMRegressor(data, "v ~ C(relation, Treatment('unrelated'))", depends_on={'v': ['group']}, p_outlier = 0.05)

But it seems that from this I can only get main effects of group and relation (unless I'm missing something).  I've tried looking at the Patsy documentation but I don't see anything about how to specify mixed within-between designs.

-Sean

J Martinez

unread,
Jun 28, 2015, 5:12:44 PM6/28/15
to hddm-...@googlegroups.com
Yup, that's exactly the issue I had! I don't think HDDM can interact between and within variables. All the variations I tried would never lead to a v for both group and within combinations, just main effects. I ended up just sub setting the data and running the within subject model inside of each group. Whether this leads to similar results as if you had interacted them, or if there is a faux pas about doing it this way, I'm not sure. Especially when you have a lot of groups and it's not guaranteed that that particular model is best fit for all the groups.

Thomas Wiecki

unread,
Jun 29, 2015, 1:54:15 AM6/29/15
to hddm-...@googlegroups.com
Would this work? By default, patsy adds a 1-vector to the design matrix for the main effect. If you don't want that, you have to explicitly add a 0 to the string:
m_within_subj = hddm.HDDMRegressor(data, "v ~ 0 + group + C(relation, Treatment('unrelated'))", p_outlier = 0.05)

On Sun, Jun 28, 2015 at 11:12 PM, J Martinez <joeledm...@gmail.com> wrote:
Yup, that's exactly the issue I had! I don't think HDDM can interact between and within variables. All the variations I tried would never lead to a v for both group and within combinations, just main effects. I ended up just sub setting the data and running the within subject model inside of each group. Whether this leads to similar results as if you had interacted them, or if there is a faux pas about doing it this way, I'm not sure. Especially when you have a lot of groups and it's not guaranteed that that particular model is best fit for all the groups.
--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Clithero

unread,
Jun 29, 2015, 11:24:44 AM6/29/15
to hddm-...@googlegroups.com
Hopefully I am not throwing a wrench in things by chiming in, but if I understood the questions, it seemed like they wanted something closer to "v ~ 0 + group*C(relation,Treatment('unrelated'))" for the patsy specification?

Thomas Wiecki

unread,
Jun 29, 2015, 2:27:34 PM6/29/15
to hddm-...@googlegroups.com
Right, that would also give the interaction.

Sean Matthews

unread,
Jun 29, 2015, 8:52:09 PM6/29/15
to hddm-...@googlegroups.com
I just tried the patsy specifications that each of you suggested and I still get the exact same error complaining about missing columns in the design matrix.

J Martinez

unread,
Jun 29, 2015, 9:05:10 PM6/29/15
to hddm-...@googlegroups.com
I think the matrix error is from having the between subject variable inside of that specification.
Lets say your data looked like this:

subj_idx              group         within
1                        A                    D
1                        A                    E
2                        B                    D
2                        B                    E
3                        A                    D
3                        A                    E
4                        B                    D
4                        B                    E


Putting group inside of "v ~ group" instead of as a depends_on argument makes the model look for A and B for subj 1 bc it thinks it is a repeated measure and errors out. That's why group has to go into depends_on. However, the depends_on specification doesn't allow an interaction between groups and any within variable, which is why I think patsy/HDDM can't handle these kinds of designs. 

At least that's what I think is going on. 

Thomas Wiecki

unread,
Jun 30, 2015, 3:20:01 AM6/30/15
to hddm-...@googlegroups.com
v ~ 0 + group should work in that case.

ceyl...@gmail.com

unread,
Mar 2, 2016, 9:29:12 AM3/2/16
to hddm-users
Hello, was anyone able to run a mixed model? I need to run one myself, just wanted to check if it was indeed possible.
Thank you all,
Ceyla

J Martinez

unread,
Nov 30, 2016, 11:11:46 AM11/30/16
to hddm-users
Hey Ceyla,

I never got it to work. The addition of a "0" to the model did not remove the error. If anyone has figured out how to do this, I would love to know.

Best,
Joel
Reply all
Reply to author
Forward
0 new messages