Difficulties in specifying model with different parameter values for within conditions

745 views
Skip to first unread message

Maximilian Theisen

unread,
Mar 10, 2021, 8:19:53 AM3/10/21
to hddm-users
I have data where all subjects participated in two tasks. I would like to model different parameters (a, v, t0) for each task per subject. I also assume that parameters (and the effect of task type) differ between subjects.

Would it be correct to include these assumptions by using "depends_on" or do I have to use HDDMRegressor? This was not clear to me after following the demo.

How would I specify such a model using HDDMRegressor?


Blair Shevlin

unread,
Mar 10, 2021, 9:32:48 AM3/10/21
to hddm-...@googlegroups.com
By default, the model will estimate separate parameters for each subject, as long as you provide separate subject numbers. Typically, the column for subject IDs is "subj_idx." Though, since this is a hierarchical Bayesian model, you will also get group-level parameters.

The keyword argument "depends_on" is more appropriate for condition-level differences. For example, if subjects completed two types of trials ("condition") that you believe would affect boundary separation (a), you would add the line "depends_on: {'a': 'condition'}". HDDM will automatically estimate separate boundary separation parameters in each condition for each subject.

Hope that helps!

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/18d123da-7a47-45f3-a615-59d87036934bn%40googlegroups.com.

Maximilian Theisen

unread,
Mar 10, 2021, 1:10:01 PM3/10/21
to hddm-users
Thanks for your reply! So following your advice, my intuition of using "depends_on" was correct in my case. Could you explain what would be added by the within subjects effects and how I would do that?

Blair Shevlin

unread,
Mar 10, 2021, 1:46:23 PM3/10/21
to hddm-...@googlegroups.com
Glad my previous reply was helpful. And yes, your intuition is correct. Sorry for not making that clear.

Models that include within-subjects effects are useful in that they allow your model to account for subject-level differences outside of condition-level effects. The logic here is somewhat analogous using mixed effects regression models: by assuming subject-level differences in a baseline condition, the model can more accurately estimate changes in the comparison condition. In other words, the model can better capture condition-level differences that would otherwise be obscured by individual variability among subjects.

As for implementation, I recommend checking out the tutorial if you haven't already (http://ski.clps.brown.edu/hddm_docs/tutorial_python.html#within-subject-effects). Using the same example as before, where different conditions affected boundary separation, you might do something like this: hddm.HDDMRegressor(data, "a ~ C(condition, Treatment('baseline'))"). Though, you'd need to define the 'baseline' condition in your data beforehand.

Maximilian Theisen

unread,
Mar 11, 2021, 3:39:12 AM3/11/21
to hddm-users
Thank you very much, that solved my problem :)

Maximilian Theisen

unread,
Mar 11, 2021, 3:40:34 AM3/11/21
to hddm-users
Just to make sure: If I would implement within subject effects, would I still have to use "depends_on"? 

Blair Shevlin

unread,
Mar 11, 2021, 9:05:22 AM3/11/21
to hddm-...@googlegroups.com
If you are using HDDMRegressor to estimate a parameter, you do not need to use the "depends_on" keyword argument for that parameter. However, if you want to estimate another parameter outside of the regression you would.

For example, let's assume we think that boundary separation (a) varies by condition like we discussed before but the starting point (z) varies with some other variable. We could code something up like this:
hddm.HDDMRegressor(data, "a ~ C(condition, Treatment('baseline'))", depends_on: {'z': 'other_condition_variable'}).

Maximilian Theisen

unread,
Mar 11, 2021, 10:31:07 AM3/11/21
to hddm-users
Thank you!!

Thomas Carsten

unread,
May 6, 2021, 8:57:53 AM5/6/21
to hddm-users
Dear Maximilia, Dear Blairs,

thank you for the discussion so far, this has been really helpful to me. I was confused about the same thing, i.e. how to implement a model where all four parameters (v,a,z,t) are allowed to vary between two within-subject conditions. After your discussion, I am close to the solution, however one difficulty remains in adding the response bias "z" to the model properly. 

About the task, it is a simple decision-making task (think random dot motion task), where I compare blocks with speed instructions ("fast choice") and blocks with accuracy instructions ("accurate choice"). 60 participants completed 160 trials of each block. My data file is organized so that in a "response" -column leftward responses are coded as 0 and rightward responses are coded as 1. If I understand correctly, this is "stimulus-coding". I use stimulus-coding, because I want to add the response bias z to the model and I want to allow it to vary between fast and accurate choice blocks. 

So far, I have defined regressors for v, a and t as this: 

v_reg = {'model': "v ~ C(block,Treatment('accurate_decision'))", 'link_func': lambda x: x}
a_reg = {'model': "a ~ C(block,Treatment(' accurate _decision'))" , 'link_func': lambda x: x}
t_reg = {'model': "t ~ C(block,Treatment(' accurate_decision'))" , 'link_func': lambda x: x}
reg_descr = [v_reg, a_reg, t_reg]

I can then define the model without the response bias as 

m_vat = hddm.HDDMRegressor(df, reg_descr, group_only_regressors=False, keep_regressor_trace = True, p_outlier=0.05)

The problem that I am facing right now is adding the parameter z correctly to this within-subjects model. According to the tutorial, I have to define a link-function for the z-parameter first:

def z_link_func(x, data=df):
    
    stim = (np.asarray(dmatrix('0 + C(s, [[1], [-1]])',
                               {'s': data.stimulus.ix[x.index]}))
    )
    
    return 1 / (1 + np.exp(-(x * stim))) 

Then I can define the regressor for the z-parameter as:

z_reg = {'model': "z ~ C(block,Treatment('slow_decision'))", 'link_func': z_link_func}

My first question is 1) Do I have to use data.stimulus.ix for the link function (i.e. the column which indicates which stimulus was shown, i.e. a leftward dot motion or a rightward dot motion) or do I have to use data.response.ix for the link function (i.e. the column which indicates whether participants responded as left or right). My second question is 2) Which entries should be coded as 1 and which as -1 in the link-funtion? Or is it arbitrary? My third question is 3) Is there any way to add the z-parameter and allowing it to vary between conditions without defining a link-function and finally 4) How do I properly recover z-parameter values for this within-subject design after model fitting? 

These questions arise because I already tried to fit a full model with all four parameters (v, a, z, t) just like described above. However I am unsure whether I did everything correctly or whether I should include the z-parameter in a different way. I noticed that traces for all parameters seemed fine, with the exception of the z-parameter, whose traces were looking messy and showed a lot of autocorrelation. 

Thank you for any advice. Have a great day! 
Thomas 

Blair Shevlin

unread,
May 6, 2021, 10:11:52 AM5/6/21
to hddm-...@googlegroups.com
Hi Thomas,

Glad my previous email was helpful. Here are my thoughts are your subsequent questions:

1. Your link function should include the stimulus information, i.e. data.simulus.ix.
2. It's arbitrary which condition is coded [1] and [-1], as long as you are consistent.
3. You don't have to use this regression approach. You could instead use the group-by function. For example, adding group_by = {'z':'Condition Variable'} to your model.
4. I'm not exactly certain what you mean by recovering the z-parameter. When I think of parameter recovery, I typically think of simulating data with some parameters, fitting a model to that simulated data, and evaluating whether I recover the parameters that generated the data. I think what you mean is parameter convergence (though, correct me if I'm wrong!). If a parameter has issues with getting a stable posterior distribution as indicated by the marginal posterior plots from the plot_posteriors() function, you might need to increase your samples/burn-in. I typically sample 20 000 with burn-in 15 000. In regards to autocorrelation, I'd recommend increasing thin to 2-5. This would be included in your sampling function, for example: m.sample(nsample, nburn, thin = 5).

Hope this helps!

Thomas Carsten

unread,
May 21, 2021, 5:02:45 AM5/21/21
to hddm-users
Dear  Blair,

thank you again. Your feedback was very helpful. I now managed to fit an accuracy-coded model which allowed a and t to vary with conditions and have an additional z-parameter, which does not vary between conditions. Here's the code:

# Id of the model, in case I want to run multiple one's later 
id = 1

# Accuracy code the data
df = df.rename(columns = {'accuracy_coding': 'response'}, inplace = False)

# Defining the link function for z
def z_link_func(x, data=df):
    stim = (np.asarray(dmatrix('0 + C(s, [[-1], [1]])',
                               {'s': data.stimulus.ix[x.index]}))
    )
    return 1 / (1 + np.exp(-(x * stim))) 
 
# Defining the regressors
z_reg = {'model': "z ~ 1", 'link_func': z_link_func}
a_reg = {'model': "a ~ C(block,Treatment('slow_movement'))", 'link_func': lambda x: x}
t_reg = {'model': "t ~ C(block,Treatment('slow_movement'))", 'link_func': lambda x: x}
reg_descr = [z_reg, a_reg, t_reg]

m_zat = hddm.HDDMRegressor(df, reg_descr, include = 'z', group_only_regressors=False, keep_regressor_trace = True, p_outlier=0.05)
m_zat.find_starting_values()
m_zat.sample(11000, burn=5000, thin = 3, dbname="filepath\\movement_model_zat_traces%i"%id, db='pickle')
m_zat.save(filepath\\movement_zat_model%i"%id)

Here are there traces of the model: zat_model_traces.png
Any idea what's going on here? Z is clearly not working here, but I can't figure out what's going wrong. Regarding your tip to use the "group_by"- argument: Will it still capture the z-parameter as within-subject effect? I.e. will it allow to estimate the change in z-parameter from one condition to another within each subject? Because that would be my ultimate goal- As a first step for now I just try to get a model to work with z fixed between conditions. What I meant with parameter recovery (unlucky expression), is how to reverse the link function so that I can interpret z properly. But I already found the solution:

z = 1/(1+np.exp(-vector(z_intercept.trace())))
z.mean()

However, according to this, my z is around 0.7, but it should be slightly above 0.5 (~0.53 for example). 

Have a nice day,
Thomas

Blair Shevlin

unread,
May 21, 2021, 11:37:41 AM5/21/21
to hddm-...@googlegroups.com
Hi Thomas,

A couple of thoughts:

1) Your regression model for the z parameter doesn't make much sense to me, and I suspect might be the reason why the z parameter is being poorly estimated. There's no reason to use a regression if you want to estimate a parameter as fixed between conditions. Instead, I would run the model without z_reg and see if that parameter converges.

2) It's true that using the group_by argument would not allow you to estimate within-subjects differences. Though, using that method might be a nice intermediary step before estimating the within-subjects model. You can also do a model comparison (comparing DIC values or using PPC) to see which of the three models (fixed, between-subjects, within-subjects) is best. Just a thought.

Hope that helps,

Blair

Reply all
Reply to author
Forward
0 new messages