DIC + Posterior Predictive Checks

1,145 views
Skip to first unread message

Konstantinos Michmizos

unread,
Feb 24, 2014, 12:36:05 PM2/24/14
to hddm-...@googlegroups.com
Hello again,

I am trying to compare different DDM combining DIC and PPC. I would greatly appreciate your answers on how one could build his confidence on the best model selection.

1) If we can secure (visually) convergence and normal distribution of the group parameters, is it enough to just use DIC for selecting the best model when the difference in DIC (between the best candidate model and all the others) is large enough (e.g., >15)?

2) If we introduce PPC in the strategy of the best model selection, how can we use the mean RT for PPC, proposed in the tutorial? 

hddm.utils.post_pred_stats() gives the upper and lower boundaries (I believe this is 95% interval?) which I do not understand how to use: should one search for the smaller difference between observed and simulated UB and LB among all models? In addition, I do not understand why the lower boundary of both observed and simulated mean RT is negative. Does that mean that our confidence interval should be in the range, e.g., [-0.4, 0.4] s for the mean RT? If yes, wouldn't that be counter-intuitive?

3) Is it possible for a model to have the smallest DIC but some descriptive PPC statistics such as the MSE of the mean RT UB (or LB) be much worse than those in models with larger DIC?

Thank you!
Konstantinos

Thomas Wiecki

unread,
Feb 27, 2014, 2:17:03 PM2/27/14
to hddm-...@googlegroups.com
Hi,


On Mon, Feb 24, 2014 at 12:36 PM, Konstantinos Michmizos <kon...@gmail.com> wrote:
Hello again,

I am trying to compare different DDM combining DIC and PPC. I would greatly appreciate your answers on how one could build his confidence on the best model selection.

1) If we can secure (visually) convergence and normal distribution of the group parameters, is it enough to just use DIC for selecting the best model when the difference in DIC (between the best candidate model and all the others) is large enough (e.g., >15)?

DIC is far from fool-proof. My personal opinions is to use DIC as 'converging evidence'. It's not the only thing I would rely on in doing model comparison.

2) If we introduce PPC in the strategy of the best model selection, how can we use the mean RT for PPC, proposed in the tutorial? 

You would compute PPC stats for all models and compare them that way. Then you can see which model best captures patterns in your data. Ideally you'll have some quantity you really care about but if not then mean RT etc are good.

hddm.utils.post_pred_stats() gives the upper and lower boundaries (I believe this is 95% interval?) which I do not understand how to use: should one search for the smaller difference between observed and simulated UB and LB among all models? In addition, I do not understand why the lower boundary of both observed and simulated mean RT is negative. Does that mean that our confidence interval should be in the range, e.g., [-0.4, 0.4] s for the mean RT? If yes, wouldn't that be counter-intuitive?

I recommend reading up on PPC. You compute statistics over the simulation sets from your models' posterior. Then you compare these statistics to the statistics of your real data, if you just want the fit, just take the model with lowest overall error (MSE). Upper and lower boundary (UB and LB)  are the proportion of responses to that boundary. UB would be comparable to accuracy if you use accuracy coding.

3) Is it possible for a model to have the smallest DIC but some descriptive PPC statistics such as the MSE of the mean RT UB (or LB) be much worse than those in models with larger DIC?

Yes, that's tricky. I would personally trust the PPC, especially if you can define a summary statistic relevant to what you care about in your experiment.

One more question: if we use the “depends_on{'a':'direction','v':'direction','t':'direction'} argument, the variance parameters (s_z, s_t, η) are estimated in the model or not? If yes, they are estimated for the “average” model?

No, that's separate. You would need to set include='all' to include all variability parameters. Those will be estimated on the group-level only.

HTH,
Thomas

Thank you!
Konstantinos

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston

Konstantinos Michmizos

unread,
Mar 24, 2014, 5:01:13 PM3/24/14
to hddm-...@googlegroups.com
Thank you Thomas

One more question: It seems that HDDM can do PPC between group model / average data

Is there a way to do PPC on each subject, separately?

Thanks again,
Konstantinos 

Thomas Wiecki

unread,
Mar 24, 2014, 8:37:23 PM3/24/14
to hddm-...@googlegroups.com
If you have a hierarchical model, hddm.utils.post_pred_stats() should return a dataframe with a row for each subject (and condition).

Can you post what the return dataframe from post_pred_stats() and model specification looks like?

Does print_summary() show that you have a hierarchical model with nodes for each subject?

Do you have a subj_idx column in your data? It does sound as if you might not run a hierarchical model.

The best suggested model is the {no-constraints} model that has a larger -by 200 - DIC value, compared to the {a,v,t} model


The best fitting model is the one with lowest DIC score, yes?

 One more thing: For models with the same number of free parameters, should I expect pD values to be close to each other?


Not necessarily, DIC only estimates the number of free parameters from the posterior so this can vary. Different parameters of the model could affect the degrees of freedom in a different way.


For more options, visit https://groups.google.com/d/optout.

Konstantinos Michmizos

unread,
Mar 24, 2014, 9:24:49 PM3/24/14
to hddm-...@googlegroups.com
Hi Thomas

once more, congratulations for your valuable work and invaluable support of it

Data: exp,subj_idx,direction,response,rt

We have:
- 2 experiments, so exp={1,2}
- 8 and 6 subjects for exp 1 and 2
- 2 directions per subject
Response =1 for correct, 0 for error
rt = the usual suspect in seconds

Due to high accuracy (>0.96) and low number of repetitions per subject, we do not even try to estimate 'sv' (so 'sv'=0)

After loading data, a typical procedure we do is:

m_t=hddm.HDDM(data, depends_on={'t':['exp','direction']},include=['st','sz'],p_outlier=0.01)
m_t.find_starting_values()
m_t.sample(17000,burn=2000,thin=3)
ppc_t=hddm.utils.post_pred_gen(m_t)
ppc_compare_t = hddm.utils.post_pred_stats(data, ppc_t)
print ppc_compare_t

               observed      mean       std       SEM       MSE
stat                                                      
Accuracy  0.950368  0.957384  0.029085  0.000049  0.000895
Q_05      0.200000  0.099809  0.253532  0.010038  0.074317
Q_10      0.275000  0.263692  0.105209  0.000128  0.011197
Q_15      0.295000  0.297852  0.049848  0.000008  0.002493
Q_20      0.310000  0.313125  0.043986  0.000010  0.001944
Q_25      0.320000  0.325528  0.044832  0.000031  0.002041
Q_30      0.330000  0.337019  0.045829  0.000049  0.002150
Q_35      0.340000  0.347776  0.047008  0.000060  0.002270
Q_40      0.355000  0.358122  0.048403  0.000010  0.002353
Q_45      0.360000  0.368279  0.050142  0.000069  0.002583
Q_50      0.370000  0.378380  0.052064  0.000070  0.002781
Q_55      0.380000  0.388660  0.054380  0.000075  0.003032
Q_60      0.390000  0.399326  0.056985  0.000087  0.003334
Q_65      0.405000  0.410704  0.059949  0.000033  0.003626
Q_70      0.415000  0.423167  0.063599  0.000067  0.004112
Q_75      0.430000  0.437172  0.067918  0.000051  0.004664
Q_80      0.450000  0.453519  0.073156  0.000012  0.005364
Q_85      0.475000  0.473461  0.079812  0.000002  0.006372
Q_90      0.515000  0.500062  0.089413  0.000223  0.008218
Q_95      0.600000  0.542633  0.106683  0.003291  0.014672
mean_ub   0.397222  0.393793  0.056673  0.000012  0.003224
std_ub    0.106131  0.085605  0.029847  0.000421  0.001312
10q_ub    0.295000  0.295952  0.041105  0.000001  0.001691
30q_ub    0.340000  0.343626  0.045402  0.000013  0.002074
50q_ub    0.375000  0.382642  0.052010  0.000058  0.002763
70q_ub    0.420000  0.426549  0.063845  0.000043  0.004119
90q_ub    0.520000  0.502840  0.089803  0.000294  0.008359
mean_lb  -0.348704 -0.344702  0.059711  0.000016  0.003581
std_lb    0.139867  0.040764  0.034131  0.009821  0.010986
10q_lb    0.228500  0.306153  0.060161  0.006030  0.009649
30q_lb    0.270500  0.323835  0.058770  0.002845  0.006299
50q_lb    0.312500  0.340755  0.059832  0.000798  0.004378
70q_lb    0.355000  0.360015  0.063739  0.000025  0.004088
90q_lb    0.499000  0.386310  0.077460  0.012699  0.018699

(Between Accuracy and Mean_UB, we have added the statistics for the mean quantiles)

In : m_t.print_stats()
                   mean       std      2.5q       25q       50q       75q     97.5q    mc err
a              1.032642  0.127689  0.824521  0.948166  1.018610  1.099780  1.336110  0.003390
a_std          0.308509  0.134249  0.145709  0.220208  0.280280  0.359179  0.651747  0.004988
a_subj.1       0.990361  0.056806  0.883734  0.951650  0.988845  1.025816  1.108167  0.001970
a_subj.2       0.809401  0.063683  0.693465  0.765786  0.805333  0.850235  0.943533  0.002586
a_subj.3       0.969141  0.067541  0.852300  0.921081  0.964426  1.010240  1.115244  0.002387
a_subj.4       0.806454  0.051044  0.712355  0.771292  0.804302  0.839960  0.913347  0.001551
a_subj.5       0.819844  0.066411  0.698920  0.773885  0.816425  0.865128  0.956576  0.001923
a_subj.6       1.602332  0.152676  1.291418  1.507251  1.606886  1.699638  1.905872  0.010508
a_subj.7       1.008833  0.084272  0.860462  0.949472  1.003223  1.063383  1.189927  0.002327
a_subj.8       0.948984  0.067851  0.823164  0.903010  0.945877  0.992517  1.086083  0.002207
v              4.714714  0.502965  3.733480  4.407427  4.712778  5.019056  5.732513  0.016011
v_std          1.214801  0.422593  0.635760  0.915997  1.128040  1.430299  2.244965  0.010739
v_subj.1       3.928489  0.301786  3.371822  3.722588  3.915173  4.120581  4.556741  0.011183
v_subj.2       6.003318  0.516067  5.065809  5.641532  5.975912  6.333699  7.082451  0.022518
v_subj.3       4.791555  0.421857  4.020674  4.499762  4.766987  5.058304  5.688362  0.016938
v_subj.4       5.058226  0.372824  4.390900  4.798664  5.034277  5.295641  5.845174  0.012705
v_subj.5       6.352554  0.545023  5.364790  5.979040  6.321541  6.692676  7.494782  0.020538
v_subj.6       3.827196  0.321974  3.252794  3.600050  3.817358  4.039728  4.512377  0.017195
v_subj.7       4.355106  0.413607  3.596050  4.074463  4.338487  4.622252  5.196153  0.012900
v_subj.8       3.873893  0.390663  3.159540  3.597838  3.861794  4.126158  4.694825  0.012872
t(0.1)         0.275380  0.013704  0.249568  0.266171  0.275204  0.284138  0.303593  0.000514
t(0.2)         0.274447  0.015130  0.245438  0.264520  0.274162  0.284083  0.305011  0.000296
t(1.1)         0.314519  0.013336  0.289018  0.305652  0.314256  0.322955  0.341992  0.000339
t(1.2)         0.300083  0.015312  0.270407  0.289858  0.299880  0.309830  0.330553  0.000347
t_std          0.035148  0.006650  0.024315  0.030480  0.034349  0.039012  0.050054  0.000283
t_subj(0.1).1  0.248112  0.008760  0.229747  0.242518  0.248549  0.254293  0.263577  0.000207
t_subj(0.1).2  0.270673  0.009773  0.254547  0.263600  0.267901  0.279509  0.289771  0.000351
t_subj(0.1).3  0.290117  0.011364  0.267608  0.281473  0.291709  0.298492  0.309739  0.000259
t_subj(0.1).4  0.296832  0.007807  0.281452  0.291404  0.297143  0.302121  0.312161  0.000165
t_subj(0.1).5  0.260671  0.007165  0.246820  0.255288  0.260665  0.266457  0.273125  0.000196
t_subj(0.1).6  0.268236  0.038288  0.233900  0.246690  0.252946  0.262183  0.358054  0.003711
t_subj(0.1).7  0.252958  0.009762  0.231366  0.247022  0.253625  0.260059  0.269597  0.000245
t_subj(0.1).8  0.299903  0.007868  0.283003  0.294819  0.300653  0.305419  0.313619  0.000227
t_subj(0.2).1  0.256980  0.007182  0.241750  0.252417  0.257584  0.261910  0.270021  0.000217
t_subj(0.2).2  0.279105  0.006139  0.266320  0.275291  0.279376  0.283189  0.290842  0.000234
t_subj(0.2).3  0.259725  0.006239  0.246808  0.255819  0.259958  0.263988  0.271358  0.000212
t_subj(0.2).4  0.267836  0.007093  0.254300  0.263046  0.267347  0.272825  0.281561  0.000185
t_subj(0.2).5  0.313849  0.013266  0.285745  0.305987  0.316211  0.323530  0.334524  0.000461
t_subj(0.2).6  0.254320  0.014032  0.226313  0.244978  0.253617  0.265007  0.281187  0.000715
t_subj(1.1).1  0.260888  0.008298  0.246491  0.255589  0.260288  0.265446  0.281274  0.000280
t_subj(1.1).2  0.281509  0.005358  0.271115  0.277666  0.281583  0.285323  0.292194  0.000203
t_subj(1.1).3  0.315620  0.006701  0.301834  0.311386  0.316139  0.320246  0.326843  0.000237
t_subj(1.1).4  0.361867  0.007676  0.346343  0.356406  0.362449  0.367601  0.375362  0.000180
t_subj(1.1).5  0.283690  0.008042  0.267329  0.278256  0.283749  0.289334  0.298430  0.000229
t_subj(1.1).6  0.379134  0.032680  0.336490  0.351678  0.364018  0.406473  0.446203  0.002460
t_subj(1.1).7  0.299002  0.008113  0.281934  0.293556  0.299651  0.305130  0.312436  0.000243
t_subj(1.1).8  0.347653  0.008351  0.329314  0.342209  0.348576  0.353762  0.361394  0.000287
t_subj(1.2).1  0.280512  0.007707  0.265587  0.275153  0.280499  0.285741  0.294904  0.000244
t_subj(1.2).2  0.334318  0.006451  0.321361  0.330108  0.334448  0.338767  0.346245  0.000226
t_subj(1.2).3  0.304907  0.006419  0.291071  0.300829  0.305618  0.309475  0.316045  0.000205
t_subj(1.2).4  0.274315  0.005915  0.262187  0.270396  0.274586  0.278539  0.285208  0.000220
t_subj(1.2).5  0.345572  0.006650  0.331976  0.341087  0.345907  0.350093  0.357847  0.000198
t_subj(1.2).6  0.261897  0.014895  0.230657  0.252398  0.262685  0.272238  0.288430  0.000679
sz             0.705535  0.051797  0.599227  0.670685  0.706871  0.741854  0.803352  0.002991
st             0.139792  0.006086  0.128237  0.135541  0.139637  0.143945  0.152157  0.000371
DIC: -4162.849989
deviance: -4176.260738
pD: 13.410749

As you see, our DIC values are negative. So if only DIC is considered, the best fit model should be the one with the '''largest''' absolute value (ie., the '"more"' negative DIC)

Thank you!
Konstantinos

Thomas Wiecki

unread,
Mar 24, 2014, 9:35:45 PM3/24/14
to hddm-...@googlegroups.com
All of this looks correct.

If you want the actual summary statistics for each individual subject, call:
ppc_compare_t = hddm.utils.post_pred_stats(data, ppc_t, call_compare=False)

instead.

But why can't you just compare the MSE column between the two models (on a summary statistic you care about, you could also take the mean I think) to assess which model fits better in terms of PPC?

Konstantinos Michmizos

unread,
Mar 24, 2014, 10:51:34 PM3/24/14
to hddm-...@googlegroups.com
Thank you Thomas

I think the problem lies on trying to infer the best-fit model by comparing the MSE of the PPC """for the group models"""

By looking at the MSE for the U, L and mean quantiles of the group models, the PPC method did not strongly support any of the candidate models

"Strongly" means a MSE that should remain the smallest across most of the quantiles, for a particular model (we had one model as an exemption: see below)

Also, the DIC values seem not to be consistent with the PPC when compared ''''across group models''', for our data sets.

Specifically, the group model with the best (smallest) DIC was the worst in terms of MSE for <15 and > 85 quantiles, indicating that other models, with worse (larger) DIC values, captured the variety of the RT somewhat better. 

In addition, the "no effect" model had a DIC value larger by 150 units compared to the smallest DIC value. However, the "no effect" model had a better fit (lower MSE) in all quantiles, for the average data/ group model.

These results were quite confusing

I hope that on an individual basis, the MSE of the mean RT might actually give us a credible indication of the best fit model which will be consistent with DIC

Thank you once more,
Konstantinos

Thomas Wiecki

unread,
Mar 25, 2014, 8:11:48 AM3/25/14
to hddm-...@googlegroups.com
Hm, yeah I always worried about a case like this.

Are there significant differences in the posteriors for the different conditions though? That might be justification enough to use that model.

You could also check for significant differences between the conditions in mean RT or accuracy and say that a model should capture that.

Konstantinos Michmizos

unread,
Mar 26, 2014, 2:28:25 PM3/26/14
to hddm-...@googlegroups.com
Hi Thomas

There seem to be significant differences for all three components of the most complicated {a,v,t} model: p <0.01 for v, p<0.1 for t (and posterior mean amplitude at 45%) and p<0.1 for a (posterior mean amplitude at 3.5%)

Running the models with more samples might smooth the posteriors more and give us a lower p-value for a and t

We estimated the MSE for the quintiles per subject and per condition

Interestingly, for condition-1, the smallest MSE are for {vt} and {v} models and for condition-2, the smallest MSE are for {at} and {a} models (see attached figure)


Do you think this is enough justification to use the "combined" model, {a,v,t}, to do our inferences?

Thank you

Konstantinos

Konstantinos Michmizos

unread,
Mar 27, 2014, 5:11:18 PM3/27/14
to hddm-...@googlegroups.com
Hi Thomas (and all)

it seems that what you suggested (doing PPC for each condition, separately) worked in our case

There were systematic differences in the fits between the conditions that could not be captured by the group models

The median of the MSE across subjects for the 10, 50 and 90 quantiles, per condition, was enough to assess the best fit model

Perhaps in the future, it would be helpful to introduce an R^2 value of the best gamma fit to the rt distribution, per condition/subject

Thanks again for your help

Konstantinos

Thomas Wiecki

unread,
Mar 27, 2014, 9:08:13 PM3/27/14
to hddm-...@googlegroups.com
Hi Konstantinos,

That sounds great. But just to clarify, I didn't mean to run separate PPCs. I unfortunately forgot about to mention a critical feature: you can pass a groupby keyword argument to post_pred_gen. This way you can test how your model estimated without any conditions predicts the differences in the conditions. So you might do:

kabuki.analyze.post_pred_gen(no_cond_model, groupby=['cond1', 'cond2'])

and

kabuki.analyze.post_pred_gen(full_cond_model) # no groupby needed as it's extracted from the model

And they both should produce RTs for all your different conditions and you can check whether one systematically misses some patterns.

Does that make sense?

Thomas

P.S.: I also added this part to the documentation now as it's indeed a critical feature that should be documented properly:
http://ski.cog.brown.edu/hddm_docs/tutorial_post_pred.html#using-ppc-for-model-comparison-with-the-groupby-argument

Konstantinos Michmizos

unread,
Mar 27, 2014, 11:18:21 PM3/27/14
to hddm-...@googlegroups.com
Hi Thomas

it makes sense, thank you

If I understand correctly, the groupby argument compares a "no-constraints" model to any other model

Is there a way to compare, e.g. the {a,v,t} vs. the {a,v} model, for each 'condition'?

Thanks once more

Konstantinos

Thomas Wiecki

unread,
Mar 27, 2014, 11:42:36 PM3/27/14
to hddm-...@googlegroups.com
Hi Konstantinos,

Well, not quite. groupby just splits the data that is generated by a non-condition model in the same way as a model where you used groupby in the model spec. Maybe check out the tutorial that has more information. The {a,v,t} and {a,v} model should already generate data for the different conditions.

You then will want to look at the mean RT and accuracy in the generated summary stats for each condition and compare the MSE of your models. With groupby you now also get summary stats in the model that does not have different conditions.

Thomas

Chuan-Peng Hu

unread,
Jan 24, 2017, 10:32:31 AM1/24/17
to hddm-users
Hi, Thomas,

I encountered very difficult situation related to this topic, it would be great if you could give me some suggestions.

I have run 16 models, varying the parameters that were set free to vary. I hope to use both DIC and the MSE from PPC. However, I found that the MSE and DIC gave conflicting results (see below), and if I only consider the MSE as the model selection criterion, then the model will be the one that fixed all parameters, which sound weird to me.

Model

Free to vary

MSE

DIC

 

Model

Free to vary

MSE

DIC

1

v, a, t, z

0.03199

-265.8

 

9

a, t

0.01580

732.6

2

v, t, z

0.02966

-120.9

 

10

v, a

0.03222

883.6

3

v, a, t

0.03252

61.7

 

11

a, z

0.01852

978.3

4

a, t, z

0.01777

258.3

 

12

a

0.01659

1645.1

5

v, a, z

0.03137

505.4

 

13

t

0.01582

797.8

6

v, t

0.03088

186.2

 

14

z

0.02069

925.3

7

t, z

0.01829

285.1

 

15

v

0.033218

897.3

8

v, z

0.03291

475.8

 

16

Fix all

0.01413

2015.9

Thomas Wiecki

unread,
Jan 27, 2017, 5:10:44 AM1/27/17
to hddm-...@googlegroups.com
That's a tricky issue, I'm not sure what to make of this. You can always run the full model and just do hypothesis testing on that. 

Is this with simulated data? 

To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+unsubscribe@googlegroups.com.

Chuan-Peng Hu

unread,
Jan 29, 2017, 7:49:53 AM1/29/17
to hddm-users
Hi, Thomas, 

Thank you very much for your quick response (and sorry for my delayed reply), by full model, do you mean the model with all four parameters free to vary?

These results are from real experimental data.

Thomas Wiecki

unread,
Jan 30, 2017, 4:05:56 AM1/30/17
to hddm-...@googlegroups.com
On Sun, Jan 29, 2017 at 1:49 PM, Chuan-Peng Hu <hcp...@gmail.com> wrote:
Hi, Thomas, 

Thank you very much for your quick response (and sorry for my delayed reply), by full model, do you mean the model with all four parameters free to vary?

Yes exactly. 

These results are from real experimental data.

Might be worth testing with simulated data. 
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+unsubscribe@googlegroups.com.

Chuan-Peng Hu

unread,
Jan 31, 2017, 9:38:58 PM1/31/17
to hddm-users
Thank you very much, Thomas!
Reply all
Reply to author
Forward
0 new messages