Dear all:
I wish to confirmed that the function `post_pred_gen()` can generated posterior predictive for regression models. I'd like to get some feedback on this point, just in case I missed something important and misunderstand these functions.
This is important because I've seen two older posts suggested that the HDDM can not generate posterior predictive check for regression models (see
https://groups.google.com/g/hddm-users/c/l4SFjAxZHfc, and
https://groups.google.com/g/hddm-users/c/zEVSSHdruuo/m/tzLJT3ZBAQAJ) and that we have to write our own code to do that. However, after reading source script of the current version, I think `post_pred_gen()` can do the job now.
`post_pred_gen()` is defined at /kabuki/analyze.py#L287. The core of `post_pred_gen()` is the `random()` in the `_post_pred_generate()` (at /kabuki/analyze.py#L271):
```
sampled_data = bottom_node.random()
```
Which means, if the `random()` function of `bottom_node` in regression models can generate random data correctly, then, we can use the function `post_pred_gen()`.
The `random()` function of `bottom_node` depends on the input model. In regression models, this function is defined at "hddm/models/hddm_regression.py#L39", where the function `hddm.generate.gen_rts` is used to generate random RT data based on it's current parents' values, in a row-by-row approach:
```
def random(self):
param_dict = deepcopy(self.parents.value)
del param_dict['reg_outcomes']
sampled_rts = self.value.copy()
for i in self.value.index:
# get current params
for p in self.parents['reg_outcomes']:
param_dict[p] = np.asscalar(self.parents.value[p].loc[i])
# sample
samples = hddm.generate.gen_rts(method=sampling_method, size=1, dt=sampling_dt, **param_dict)
sampled_rts.loc[i, 'rt'] = hddm.utils.flip_errors(samples).rt.iloc[0]
return sampled_rts
```
So, the next question is, whether the parents' values change correctly in each iteration. In the `post_pred_gen()`, iterate `extended_parents` in `_parents_to_random_posterior_sample` function and draw values from posterior trace to update the value of "extended_parents" in each iteration. So, does changing the values of "extended_parents" change the value of "parents" of that node?
The answer is yes. In each iteration, when we taking values from posterior of each "extended_parents" (which are the hyper-parameters, if I am right) of the node, the values of "parents" (which is the parameters to generate RTs) of that node is changed accordingly.
For example, using the following formula for parameter `a` (using the data from the original tutorial) when specify our model:
```
"a ~ theta:C(conf, Treatment('LC')):C(dbs, Treatment('0'))"
```
we will have a design matrix like this for parameter `a`:

If we change the value of extended_parents of the node of subj.13 by drawing values from posterior, as the `_parents_to_random_posterior_sample` does in each iteration, so that:
a_Intercept_subj.13 = 2.2454617108606625
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13 = 0.004070453068099375
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13 = -0.019202136073496273
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13 =-0.031465416099146465
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13 = -0.007366480181982382
The value of the `a` parameter that corresponding to row 3714, based on the design matrix, should be:
a_subj.13 = 1.0 * Intercept.subj.13 + (-0.482485) * a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13 = 1.0 * 2.2454617108606625 + (-0.482485) * (-0.031465416099146465) = 2.260643302147259
This is exactly the value of a in the node's current parents' value (see the number following " 'a': 3714 "):

If the parents' values are correct in each iteration, then RTs generated should also be correct.
So I infer that we can use `post_pred_gen()` to generate posterior predictive data for regression models. Any suggestions/corrections will be appreciated!
PS: `post_pred_gen()` is different from `plot_posterior_predictive()`, the latter may not work for regression model. But once you generated posterior predictive data by `post_pred_gen()`, you can plot the data with your observations using python's plotting function.
Best
Chuan-Peng