HDDMnnRegressor with non-identity link function

242 views
Skip to first unread message

Anne Urai

unread,
Mar 2, 2022, 8:42:53 AM3/2/22
to hddm-users
Hi HDDMnn'ers,

Perhaps I'm a bit late to the party, but today I started playing with the HDDMnn extensions and I'm very excited about all the new possibilities! Thanks for your great work on this. Hopefully switching to pyDDM just for collapsing bounds won't be necessary anymore.

My question: can HDDMnnRegression accept non-identity link functions?

Explanation:
I started by implementing a very basic regression model, which I usually run in the standard HDDM:

v_reg = {'model': 'v ~ 1 + stimulus + prevresp', 'link_func': lambda x:x}
z_reg = {'model': 'z ~ 1 + prevresp', 'link_func': z_link_func}
m = hddm.HDDMRegressor(data, [v_reg, z_reg], 
include=['z', 'sv'], group_only_nodes=['sv'], 
group_only_regressors=False, keep_regressor_trace=False, 
p_outlier=0.05)

Now, switching to the HDDMnnRegressor module, only regressing onto drift works beautifully 

regr_md = {'model': 'v ~ 1 + stimulus + prevresp', 'link_func': lambda x: x}
# keep things as similar as possible to the usual DDM for now
hddmnn_reg = hddm.HDDMnnRegressor(data,
                                  regr_md,
                                  include = ['z'], # 'sv' is not allowed here
                                  model = 'ddm',
                                  informative = False,
                                  is_group_model = True, # hierarchical model
                                  group_only_regressors = False, # fit one parameter for each subject
                                  p_outlier = 0.05)


But adding in a regression model for starting point doesn't work:

# Make HDDM model
def z_link_func(x):
    return 1 / (1 + np.exp(-(x.values.ravel())))

#ToDo: transform the z-param so it can find the right bounds?
regr_md = [{'model': 'v ~ 1 + stimulus', 'link_func': lambda x: x},
           {'model': 'z ~ 1', 'link_func': z_link_func}]

# keep things as similar as possible to the usual DDM for now
hddmnn_reg = hddm.HDDMnnRegressor(data,
                                  regr_md,
                                  include = ['z'], # 'sv' is not allowed here
                                  model = 'ddm',
                                  informative = False,
                                  is_group_model = True, # hierarchical model
                                  group_only_regressors = False, # fit one parameter for each subject
                                  p_outlier = 0.05)


returns

Reg Model:
{'outcome': 'v', 'model': ' 1 + stimulus', 'params': ['v_Intercept', 'v_stimulus'], 'link_func': <function <lambda> at 0x7f12f31aa320>}
Uses Identity Link
Reg Model:
{'outcome': 'z', 'model': ' 1', 'params': ['z_Intercept'], 'link_func': <function z_link_func at 0x7f12f33d70e0>}
Does not use Identity Link
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-24-542f6154eee0> in <module>()
     16                                   is_group_model = True, # hierarchical model
     17                                   group_only_regressors = False, # fit one parameter for each subject
---> 18                                   p_outlier = 0.05)

8 frames
/usr/local/lib/python3.7/dist-packages/hddm/models/hddm_regression.py in _create_stochastic_knodes(self, include)
    452                     param_lookup = param
    453
--> 454                 reg_parents[param] = reg_family["%s_bottom" % param_lookup]
    455                 if reg not in self.group_only_nodes:
    456                     reg_family["%s_subj_reg" % param] = reg_family.pop(

KeyError: 'z_Intercept_bottom'

The problem seems to be that in hddm_regression (https://github.com/hddm-devs/hddm/blob/master/hddm/models/hddm_regression.py#L404), it's not recognized that z should be transformed when it's entered through a regression model in the nn framework (at least, that's my interpretation of the issue with looking up z_Intercept_bottom).

Can I somehow indicate in the call to HDDMnnRegressor that the z-regression dict should be transformed?

Thanks a lot for any pointers!
Anne Urai, Leiden University

Alexander Fengler

unread,
Mar 7, 2022, 4:35:34 PM3/7/22
to hddm-users
Hi Anne, 

This should now be fixed, thank you for the bug report and please let me know if you encounter further issues.

As a sidenote regarding whether to use the respective link function for 'z' to begin with, I would like to refer you to this discussion.
(forgone conclusion: probably best to go with identity link for 'z' now)

Best,
Alex

Anne Urai

unread,
Mar 9, 2022, 4:11:32 AM3/9/22
to hddm-users
Hi Alex,

thanks for your help! Fitting with the 'z' link function now works, it looks like the identity indeed works best.

I do have another question about generating datasets with regression models using hddm.simulators.hddm_dataset_generators.simulator_h_c. I'd like to input my own data with a certain covariate structure, and this seems to work fine when supplying the arguments depends_on and conditions. 

However, with something like

data, full_parameter_dict = hddm.simulators.hddm_dataset_generators.simulator_h_c(data = df,
                                                                                  model = 'ddm',
                                                                                  p_outlier = 0.00,
                                                                                  conditions = None, 
                                                                                  depends_on = None, 
                                                                                  regression_models = ['v ~ 1 + S + X'],
                                                                                  regression_covariates = ['S', 'X'],
                                                                                  # regression_covariates = {'S': {'type': 'categorical', 'range': (-1, 1)},
                                                                                  #                          'X': {'type': 'categorical', 'range': (-1, 1)}},
                                                                                  group_only = None,
                                                                                  group_only_regressors = False,
                                                                                  fixed_at_default = None)

I run into the error 'UnboundLocalError: local variable 'regressor_set' referenced before assignment' at https://github.com/hddm-devs/hddm/blob/4e8b4e2eb3136d4304b015fca576fa60c7578b26/hddm/simulators/hddm_dataset_generators.py#L849.

It indeed looks like regressor_set has not been assigned in the function make_single_sub_cond_df_from_gt. This may be a bug, so far I haven't understood the structure of those generators well enough to know to know where the assignment regressor_set = 0 should go.

Relatedly, is it correct that when data is supplied, regression_covariates needs only be a list instead of a dict with type/range specifications?

Best,
Anne

Anne Urai

unread,
Mar 9, 2022, 4:41:01 AM3/9/22
to hddm-users
Quick follow-up: here's a PR that defines the regressor_set, so that the function can run with both data + regression covariates as input. 

https://github.com/hddm-devs/hddm/pull/81

I'm not 100% that this is the intended use case, since now the if not regressor_set will always be True. Still, hope it's helpful!

Alexander Fengler

unread,
Mar 11, 2022, 10:25:31 PM3/11/22
to hddm-users
Thanks,

will look into this over the weekend and report back.

Best,
Alex

Anne Urai

unread,
Mar 14, 2022, 9:14:00 AM3/14/22
to hddm-users
Hi Alex,

 I'm playing with the HDDMnn simulator (https://github.com/anne-urai/ddm_mediation/blob/main/generate_data.py#L139), another question came up.
Namely, when generating group-level data, is there a way to give as input the mean and std of each group-level regression parameter without fixing them across participants?

The situation: I'm generating a bunch of different datasets which are all response-coded, with a regression model v ~ 1 + S to account for the identity of the stimulus (-1 or 1). I'd like to keep the v_Intercept and v_S consistent across datasets at the group-level (i.e. specify the mean and std), before drawing single-subject values. Is this possible to do with the current simulator code? 

Thanks!
Anne

Alexander Fengler

unread,
May 12, 2022, 12:46:57 PM5/12/22
to hddm-users
Hi Anne, 

sorry for the late reply, I operate on the forum in bursts and sometimes forget checking for a while...
You probably found a way to do it at this point, but to benefit other people, here still a response:

To answer your question, this specific case is not included in the simulator_h_c() code yet. It allows a bunch of different scenarios to be generated there are limits.
You can always access the low level simulators directly (hddm.simulators.simulator()) and operate this way.

I recently gave a hddm workshop which includes a bunch of examples regarding these simulators (not a main focus but might help get started), which you may find helpful.

Best,
Alex
Reply all
Reply to author
Forward
0 new messages