HDDMRegressor - Assertion Error: Step-out procedure failed

Paveen Phon-Amnuaisuk

unread,

Jan 8, 2024, 12:53:19 PM1/8/24

to hddm-users

Hi all,

I seem to be facing an issue with my HDDMRegressor model with this error

I am currently running stimulus coded hddmRegressor models on a Go/no-Go task with 500 trials across 45 participants.

Here is the whole error output

```

/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:2309: RuntimeWarning: invalid value encountered in double_scalars
tmp2 = (x - v) * (fx - fw)
/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:2308: RuntimeWarning: invalid value encountered in double_scalars
tmp1 = (x - w) * (fx - fv)
/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:3187: RuntimeWarning: invalid value encountered in double_scalars
if (fx2 - fval) > delta:
/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:2769: RuntimeWarning: invalid value encountered in double_scalars
w = xb - ((xb - xc) * tmp2 - (xb - xa) * tmp1) / denom
/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:2310: RuntimeWarning: invalid value encountered in double_scalars
p = (x - v) * tmp2 - (x - w) * tmp1
/opt/conda/lib/python3.8/site-packages/scipy/optimize/_optimize.py:2311: RuntimeWarning: invalid value encountered in double_scalars
tmp2 = 2.0 * (tmp2 - tmp1)

AssertionError Traceback (most recent call last)
File <timed exec>:20

File /opt/conda/lib/python3.8/site-packages/kabuki/hierarchical.py:794, in Hierarchical.sample(self, *args, **kwargs)
792 import psutil
793 n_jobs = min(psutil.cpu_count(), chains)
--> 794 ms = Parallel(n_jobs=n_jobs)(delayed(sample_single_chain)(hddm, dbn, db, *args, **kwargs) for dbn,hddm in hddms.items())
795 else:
796 ms = [sample_single_chain(hddm, dbn, db, *args, **kwargs) for dbn,hddm in hddms.items()]

File /opt/conda/lib/python3.8/site-packages/joblib/parallel.py:1098, in Parallel.__call__(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time

File /opt/conda/lib/python3.8/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())

File /opt/conda/lib/python3.8/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
564 """Wrapper for Future.result to implement the same behaviour as
565 AsyncResults.get from multiprocessing."""
566 try:
--> 567 return future.result(timeout=timeout)
568 except CfTimeoutError as e:
569 raise TimeoutError from e

File /opt/conda/lib/python3.8/concurrent/futures/_base.py:444, in Future.result(self, timeout)
442 raise CancelledError()
443 elif self._state == FINISHED:
--> 444 return self.__get_result()
445 else:
446 raise TimeoutError()

File /opt/conda/lib/python3.8/concurrent/futures/_base.py:389, in Future.__get_result(self)
387 if self._exception:
388 try:
--> 389 raise self._exception
390 finally:
391 # Break a reference cycle with the exception in self._exception
392 self = None

AssertionError: Step-out procedure failed

```

Here is the relevant code

```

def v_link_func(x, data=comp_mod):
stim = (np.asarray(dmatrix('0 + C(s,[[1],[-1]])', {'s':data.stim.loc[x.index]})))
return np.multiply(x.to_frame(), stim)

# define model - v
v_reg = {"model": "v ~ 1 + C(stim)", 'link_func': v_link_func}
a_reg = {"model": "a ~ 1 + C(stim)", 'link_func': lambda x: x}
z_reg = {"model": "z ~ 1 + C(stim)", 'link_func': lambda x: x}
t_reg = {"model": "t ~ 1 + C(stim)", 'link_func': lambda x: x}
reg_model = [v_reg, a_reg, z_reg, t_reg]

mt = hddm.HDDMRegressor(comp_mod, reg_model,
bias = True, p_outlier=0.05, group_only_regressors=False, keep_regressor_trace=True,
include=['a', 'v', 't', 'z', 'sv', 'st', 'sz'])

# find a good starting point which helps with the convergence.
mt.find_starting_values()

# start drawing 10000 samples and discarding 2000 as burn-in, 4 chains
mt_infdata = mt.sample(10000, burn = 2000, chains = 4,
save_name = 'm', loglike = True, ppc = True, return_infdata = True)

```

As per previous threads regarding this error and Go/no-Go in hddm

my RT is in seconds
Errors of omission RT are coded as 999
Correct no-Go RT are coded as -999
Response coded for response column

I've also attempted to remove participants with less than 5 responses/non-responses (for each Go or no-Go), but I get the same error using hddmRegressor as well. However, I don't think this is the problem as the model with hddmStimCoding works perfectly fine even without removing those participants.

Any help or advice in resolving this would be greatly appreciated.

Thank you,

Paveen

Paveen Phon-Amnuaisuk

unread,

Jan 8, 2024, 2:36:00 PM1/8/24

to hddm-users

Forgot to mention

The current HDDM version is: 0.9.8RC
The current kabuki version is: 0.6.5RC3
The current PyMC version is: 2.3.8
The current ArviZ version is: 0.15.1

Unsure if this is relevant but running on a docker image

Wanke Pan (Epool)

unread,

Jan 9, 2024, 10:36:25 PM1/9/24

to hddm-users

Dear Paveen,

I am one of the contributors to [dockerHDDM](https://github.com/hcp4715/dockerHDDM). I attempted to replicate your reported error using the latest dockerHDDM version, but unfortunately, I was unsuccessful in reproducing it.

My suspicion is that the issue might be data-related, as a small sample size or the presence of extreme values could potentially lead to MCMC failure, manifesting as an AssertionError: Step-out procedure failed.

Here are some key points for checking the error that I recommend:

1. To verify whether the model runs smoothly with simulated data. I have provided a sample code snippet you can experiment with as below.
2. Initially, run the model on data from 2-5 participants. If successful, proceed with fitting the entire dataset.
3. In case complex models fail to converge or run slowly, consider simplifying them. This will help confirm whether the model definition is correct.

Regarding specific aspects of your model definition that caught my attention and may benefit from discussion:

1. Conventionally, predictor variables (e.g., `"v ~ 1 + C(stim)"`) should not be the same "stimulus" used in the link function (`{'s':data.stim.loc[x.index]}`). You might want to try modifying this to "v ~ 1 + C(condition)" or simply "v ~ 1".
2. I noticed that you coded no-Go RTs as -999. Upon reviewing the source code, it seems only 999 is recognized as a non-response (`noresponse = x["rt"].abs() >= 999` in `likelihood.py#123`). While -999 appears to work in sampling, I'm unsure if this coding is accurate.
3. Given the consideration of non-responses for no-Go trials, I suggest encoding Errors of omission RT as 999 and excluding these directly from the data. Thus, fundamentally, the no-Go DDM should treat nonresponses as coming from a Bernoulli distribution, which is implemented as `logp_noresp = stats.binom.logpmf(k_upper, n_noresponse, p_upper)` in line `likelihood.py#148`.

Thank you for your feedback.

Best regards,
Wanke Pan
(panwan...@gmail.com)

```python
# define model
def v_link_func(x, data=mydata):
stim = (np.asarray(dmatrix('0 + C(s,[[1],[-1]])', {'s':data.stimulus.loc[x.index]})))

return np.multiply(x.to_frame(), stim)

# define model - v

v_reg = {"model": "v ~ 1 + C(stimulus)", 'link_func': v_link_func}
a_reg = {"model": "a ~ 1 + C(stimulus)", 'link_func': lambda x: x}
z_reg = {"model": "z ~ 1 + C(stimulus)", 'link_func': lambda x: x}
t_reg = {"model": "t ~ 1 + C(stimulus)", 'link_func': lambda x: x}
reg_descr = [v_reg, a_reg, z_reg, t_reg]

# simulated data
n_subjects = 5
trials_per_level = 100 # and per stimulus
# only consider the stimulus effect on v
level1a = {"v": 0.3, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level2a = {"v": 0.4, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level3a = {"v": 0.5, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level1b = {"v": -0.3, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level2b = {"v": -0.4, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}
level3b = {"v": -0.5, "a": 2, "t": 0.3, "sv": 0, "z": 0.5, "sz": 0, "st": 0}

# combine dataframe
data_a, params_a = hddm.generate.gen_rand_data(
{"level1": level1a, "level2": level2a, "level3": level3a},
size=trials_per_level,
subjs=n_subjects,
)
data_b, params_b = hddm.generate.gen_rand_data(
{"level1": level1b, "level2": level2b, "level3": level3b},
size=trials_per_level,
subjs=n_subjects,
)
data_a["stimulus"] = Series(np.ones((len(data_a))), index=data_a.index)
data_b["stimulus"] = Series(np.ones((len(data_b))) * 2, index=data_a.index)
mydata = pd.concat([data_a,data_b])
mydata.reset_index(drop=True,inplace=True)
mydata.head()

# set missing value
n_random_rows = 100
all_indices = mydata.index.tolist()
# randomly select n_random_rows indices
random_indices = np.random.choice(all_indices, n_random_rows, replace=False)
# set rt to 999 or -999
random_values = np.where(np.random.randint(0, 2, n_random_rows) == 0, 999, -999)
mydata.loc[random_indices, 'rt'] = random_values

# try sampling
m_reg = hddm.HDDMRegressor(
mydata, reg_descr,

p_outlier=0.05, group_only_regressors=False, keep_regressor_trace=True,
include=['a', 'v', 't', 'z', 'sv', 'st', 'sz']
)

m_reg.sample(2000,

chains = 4,
save_name = 'm',
loglike = True, ppc = True,
return_infdata = True)
```

Paveen Phon-Amnuaisuk

unread,

Jan 11, 2024, 5:12:13 AM1/11/24

to hddm-users

Hi Wanke,

Thank you so much for the reply, and obviously for your contribution to the docker image - it has made installation really easy.

When I tried the simulation code you provided, it works just fine as you said. Upon closer inspection and testing, I found the link function in addition to my stimuli being string e.g. (Go and no-Go) instead of int to be the issue. If I remove model v entirely or exclude the link func it works fine; and if I change stimulus into int, model v with the link func works fine as well.

I suspect the link function cannot take string types under the stimulus column, though I could be mistaken.