"post_pred_gen()" works for regression models?

hcp...@gmail.com

unread,

Aug 4, 2021, 8:27:29 PM8/4/21

to hddm-users

Dear all:

I wish to confirmed that the function `post_pred_gen()` can generated posterior predictive for regression models. I'd like to get some feedback on this point, just in case I missed something important and misunderstand these functions.

This is important because I've seen two older posts suggested that the HDDM can not generate posterior predictive check for regression models (see https://groups.google.com/g/hddm-users/c/l4SFjAxZHfc, and https://groups.google.com/g/hddm-users/c/zEVSSHdruuo/m/tzLJT3ZBAQAJ) and that we have to write our own code to do that. However, after reading source script of the current version, I think `post_pred_gen()` can do the job now.

`post_pred_gen()` is defined at /kabuki/analyze.py#L287. The core of `post_pred_gen()` is the `random()` in the `_post_pred_generate()` (at /kabuki/analyze.py#L271):

```
sampled_data = bottom_node.random()
```
Which means, if the `random()` function of `bottom_node` in regression models can generate random data correctly, then, we can use the function `post_pred_gen()`.

The `random()` function of `bottom_node` depends on the input model. In regression models, this function is defined at "hddm/models/hddm_regression.py#L39", where the function `hddm.generate.gen_rts` is used to generate random RT data based on it's current parents' values, in a row-by-row approach:

```
def random(self):
     param_dict = deepcopy(self.parents.value)
     del param_dict['reg_outcomes']
     sampled_rts = self.value.copy()

     for i in self.value.index:
         # get current params
         for p in self.parents['reg_outcomes']:
            param_dict[p] = np.asscalar(self.parents.value[p].loc[i])
            # sample
            samples = hddm.generate.gen_rts(method=sampling_method, size=1, dt=sampling_dt, **param_dict)

             sampled_rts.loc[i, 'rt'] = hddm.utils.flip_errors(samples).rt.iloc[0]
     return sampled_rts
```
So, the next question is, whether the parents' values change correctly in each iteration. In the `post_pred_gen()`, iterate `extended_parents` in `_parents_to_random_posterior_sample` function and draw values from posterior trace to update the value of "extended_parents" in each iteration. So, does changing the values of "extended_parents" change the value of "parents" of that node?

The answer is yes. In each iteration, when we taking values from posterior of each "extended_parents" (which are the hyper-parameters, if I am right) of the node, the values of "parents" (which is the parameters to generate RTs) of that node is changed accordingly.

For example, using the following formula for parameter `a` (using the data from the original tutorial) when specify our model:

```
"a ~ theta:C(conf, Treatment('LC')):C(dbs, Treatment('0'))"
```

we will have a design matrix like this for parameter `a`:
Screenshot from 2021-08-04 10-34-58.png

If we change the value of extended_parents of the node of subj.13 by drawing values from posterior, as the `_parents_to_random_posterior_sample` does in each iteration, so that:

a_Intercept_subj.13 = 2.2454617108606625

a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13 = 0.004070453068099375

a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13 = -0.019202136073496273

a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13 =-0.031465416099146465

a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13 = -0.007366480181982382

The value of the `a` parameter that corresponding to row 3714, based on the design matrix, should be:
a_subj.13 = 1.0 * Intercept.subj.13 + (-0.482485) * a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13 = 1.0 * 2.2454617108606625 + (-0.482485) * (-0.031465416099146465) = 2.260643302147259

This is exactly the value of a in the node's current parents' value (see the number following " 'a': 3714 "):
Screenshot from 2021-08-04 10-43-28.png

If the parents' values are correct in each iteration, then RTs generated should also be correct.

So I infer that we can use `post_pred_gen()` to generate posterior predictive data for regression models. Any suggestions/corrections will be appreciated!

PS: `post_pred_gen()` is different from `plot_posterior_predictive()`, the latter may not work for regression model. But once you generated posterior predictive data by `post_pred_gen()`, you can plot the data with your observations using python's plotting function.

Best

Chuan-Peng

Alexander Fengler

unread,

Aug 27, 2021, 2:31:10 PM8/27/21

to hddm-users

Hi Chuang Peng,

as far as I can tell this is correct.

Using this it shouldn't be too problematic to amend plot_posterior_predictive() to do the right thing for regression models too.

Best,

Alex

hcp...@gmail.com

unread,

Aug 27, 2021, 10:56:05 PM8/27/21

to hddm-users

Hi, Alex,

thanks a lot for assuring me that my understanding is correct!

Best,

Chuan-Peng

Xiaoyu Zeng

unread,

Nov 26, 2021, 5:13:38 AM11/26/21

to hddm-users

Hi all.

Following Chuanpeng's advice, I also applied post_pred_gen() to generate posterior predictive data for **regressor models**. However, I found the PPC data derived from the post_pred_gen did not match/echo my raw data **trial-by-trial** (although the PPC data should have been estimated at the trial level for the regressor model). Despite this, the PPC rt showed a good fit with the raw rt **across trials**.

I hope this message serves to be a reminder for those who want to apply post_pred_gen on the regressor model and intend to look at the trial-level prediction. Any corrections/replication of this issue will be appreciated. You are also welcome to give suggestions to adapt the post_ pred_gen function in kabuki to make it right to consider trial order when estimating PPC data for the regressor models.