Dear Paveen,
I am one of the contributors to [dockerHDDM](
https://github.com/hcp4715/dockerHDDM). I attempted to replicate your reported error using the latest dockerHDDM version, but unfortunately, I was unsuccessful in reproducing it.
My suspicion is that the issue might be data-related, as a small sample size or the presence of extreme values could potentially lead to MCMC failure, manifesting as an AssertionError: Step-out procedure failed.
Here are some key points for checking the error that I recommend:
1. To verify whether the model runs smoothly with simulated data. I have provided a sample code snippet you can experiment with as below.
2. Initially, run the model on data from 2-5 participants. If successful, proceed with fitting the entire dataset.
3. In case complex models fail to converge or run slowly, consider simplifying them. This will help confirm whether the model definition is correct.
Regarding specific aspects of your model definition that caught my attention and may benefit from discussion:
1. Conventionally, predictor variables (e.g., `"v ~ 1 + C(stim)"`) should not be the same "stimulus" used in the link function (`{'s':data.stim.loc[x.index]}`). You might want to try modifying this to "v ~ 1 + C(condition)" or simply "v ~ 1".
2. I noticed that you coded no-Go RTs as -999. Upon reviewing the source code, it seems only 999 is recognized as a non-response (`noresponse = x["rt"].abs() >= 999` in `likelihood.py#123`). While -999 appears to work in sampling, I'm unsure if this coding is accurate.
3. Given the consideration of non-responses for no-Go trials, I suggest encoding Errors of omission RT as 999 and excluding these directly from the data. Thus, fundamentally, the no-Go DDM should treat nonresponses as coming from a Bernoulli distribution, which is implemented as `logp_noresp = stats.binom.logpmf(k_upper, n_noresponse, p_upper)` in line `likelihood.py#148`.
Thank you for your feedback.
Best regards,
Wanke Pan
(
panwan...@gmail.com)
```python
# define model
def v_link_func(x, data=mydata):
stim = (np.asarray(dmatrix('0 + C(s,[[1],[-1]])', {'s':data.stimulus.loc[x.index]})))
return np.multiply(x.to_frame(), stim)
# define model - v
v_reg = {"model": "v ~ 1 + C(stimulus)", 'link_func': v_link_func}
a_reg = {"model": "a ~ 1 + C(stimulus)", 'link_func': lambda x: x}
z_reg = {"model": "z ~ 1 + C(stimulus)", 'link_func': lambda x: x}
t_reg = {"model": "t ~ 1 + C(stimulus)", 'link_func': lambda x: x}
reg_descr = [v_reg, a_reg, z_reg, t_reg]
# simulated data
n_subjects = 5
trials_per_level = 100 # and per stimulus
# only consider the stimulus effect on v
level1a = {"v": 0.3, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level2a = {"v": 0.4, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level3a = {"v": 0.5, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level1b = {"v": -0.3, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level2b = {"v": -0.4, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level3b = {"v": -0.5, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
# combine dataframe
data_a, params_a = hddm.generate.gen_rand_data(
{"level1": level1a, "level2": level2a, "level3": level3a},
size=trials_per_level,
subjs=n_subjects,
)
data_b, params_b = hddm.generate.gen_rand_data(
{"level1": level1b, "level2": level2b, "level3": level3b},
size=trials_per_level,
subjs=n_subjects,
)
data_a["stimulus"] = Series(np.ones((len(data_a))), index=data_a.index)
data_b["stimulus"] = Series(np.ones((len(data_b))) * 2, index=data_a.index)
mydata = pd.concat([data_a,data_b])
mydata.reset_index(drop=True,inplace=True)
mydata.head()
# set missing value
n_random_rows = 100
all_indices = mydata.index.tolist()
# randomly select n_random_rows indices
random_indices = np.random.choice(all_indices, n_random_rows, replace=False)
# set rt to 999 or -999
random_values = np.where(np.random.randint(0, 2, n_random_rows) == 0, 999, -999)
mydata.loc[random_indices, 'rt'] = random_values
# try sampling
m_reg = hddm.HDDMRegressor(
mydata, reg_descr,
p_outlier=0.05, group_only_regressors=False, keep_regressor_trace=True,
include=['a', 'v', 't', 'z', 'sv', 'st', 'sz']
)
m_reg.sample(2000,
chains = 4,
save_name = 'm',
loglike = True, ppc = True,
return_infdata = True)
```