DIC + Posterior Predictive Checks

Konstantinos Michmizos

unread,

Feb 24, 2014, 12:36:05 PM2/24/14

to hddm-...@googlegroups.com

Hello again,

I am trying to compare different DDM combining DIC and PPC. I would greatly appreciate your answers on how one could build his confidence on the best model selection.

1) If we can secure (visually) convergence and normal distribution of the group parameters, is it enough to just use DIC for selecting the best model when the difference in DIC (between the best candidate model and all the others) is large enough (e.g., >15)?

2) If we introduce PPC in the strategy of the best model selection, how can we use the mean RT for PPC, proposed in the tutorial?

hddm.utils.post_pred_stats() gives the upper and lower boundaries (I believe this is 95% interval?) which I do not understand how to use: should one search for the smaller difference between observed and simulated UB and LB among all models? In addition, I do not understand why the lower boundary of both observed and simulated mean RT is negative. Does that mean that our confidence interval should be in the range, e.g., [-0.4, 0.4] s for the mean RT? If yes, wouldn't that be counter-intuitive?

3) Is it possible for a model to have the smallest DIC but some descriptive PPC statistics such as the MSE of the mean RT UB (or LB) be much worse than those in models with larger DIC?

Thank you!

Konstantinos

Thomas Wiecki

unread,

Feb 27, 2014, 2:17:03 PM2/27/14

to hddm-...@googlegroups.com

Hi,

On Mon, Feb 24, 2014 at 12:36 PM, Konstantinos Michmizos <kon...@gmail.com> wrote:

Hello again,

I am trying to compare different DDM combining DIC and PPC. I would greatly appreciate your answers on how one could build his confidence on the best model selection.

1) If we can secure (visually) convergence and normal distribution of the group parameters, is it enough to just use DIC for selecting the best model when the difference in DIC (between the best candidate model and all the others) is large enough (e.g., >15)?

DIC is far from fool-proof. My personal opinions is to use DIC as 'converging evidence'. It's not the only thing I would rely on in doing model comparison.

2) If we introduce PPC in the strategy of the best model selection, how can we use the mean RT for PPC, proposed in the tutorial?

You would compute PPC stats for all models and compare them that way. Then you can see which model best captures patterns in your data. Ideally you'll have some quantity you really care about but if not then mean RT etc are good.

hddm.utils.post_pred_stats() gives the upper and lower boundaries (I believe this is 95% interval?) which I do not understand how to use: should one search for the smaller difference between observed and simulated UB and LB among all models? In addition, I do not understand why the lower boundary of both observed and simulated mean RT is negative. Does that mean that our confidence interval should be in the range, e.g., [-0.4, 0.4] s for the mean RT? If yes, wouldn't that be counter-intuitive?

I recommend reading up on PPC. You compute statistics over the simulation sets from your models' posterior. Then you compare these statistics to the statistics of your real data, if you just want the fit, just take the model with lowest overall error (MSE). Upper and lower boundary (UB and LB) are the proportion of responses to that boundary. UB would be comparable to accuracy if you use accuracy coding.

3) Is it possible for a model to have the smallest DIC but some descriptive PPC statistics such as the MSE of the mean RT UB (or LB) be much worse than those in models with larger DIC?

Yes, that's tricky. I would personally trust the PPC, especially if you can define a summary statistic relevant to what you care about in your experiment.

One more question: if we use the “depends_on{'a':'direction','v':'direction','t':'direction'} argument, the variance parameters (s_z, s_t, η) are estimated in the model or not? If yes, they are estimated for the “average” model?

No, that's separate. You would need to set include='all' to include all variability parameters. Those will be estimated on the group-level only.

HTH,
Thomas

Thank you!
Konstantinos

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Thomas Wiecki

PhD candidate, Brown University

Quantitative Researcher, Quantopian Inc, Boston

Konstantinos Michmizos

unread,

Mar 24, 2014, 5:01:13 PM3/24/14

to hddm-...@googlegroups.com

Thank you Thomas

One more question: It seems that HDDM can do PPC between group model / average data

Is there a way to do PPC on each subject, separately?

Thanks again,

Konstantinos

Thomas Wiecki

unread,

Mar 24, 2014, 8:37:23 PM3/24/14

to hddm-...@googlegroups.com

If you have a hierarchical model, hddm.utils.post_pred_stats() should return a dataframe with a row for each subject (and condition).

Can you post what the return dataframe from post_pred_stats() and model specification looks like?

Does print_summary() show that you have a hierarchical model with nodes for each subject?

Do you have a subj_idx column in your data? It does sound as if you might not run a hierarchical model.

The best suggested model is the {no-constraints} model that has a larger -by 200 - DIC value, compared to the {a,v,t} model

The best fitting model is the one with lowest DIC score, yes?

One more thing: For models with the same number of free parameters, should I expect pD values to be close to each other?

Not necessarily, DIC only estimates the number of free parameters from the posterior so this can vary. Different parameters of the model could affect the degrees of freedom in a different way.

For more options, visit https://groups.google.com/d/optout.

Konstantinos Michmizos

unread,

Mar 24, 2014, 9:24:49 PM3/24/14

to hddm-...@googlegroups.com

Hi Thomas

once more, congratulations for your valuable work and invaluable support of it

Data: exp,subj_idx,direction,response,rt

We have:

- 2 experiments, so exp={1,2}

- 8 and 6 subjects for exp 1 and 2

- 2 directions per subject

Response =1 for correct, 0 for error

rt = the usual suspect in seconds

Due to high accuracy (>0.96) and low number of repetitions per subject, we do not even try to estimate 'sv' (so 'sv'=0)

After loading data, a typical procedure we do is:

m_t=hddm.HDDM(data, depends_on={'t':['exp','direction']},include=['st','sz'],p_outlier=0.01)

m_t.find_starting_values()

m_t.sample(17000,burn=2000,thin=3)

ppc_t=hddm.utils.post_pred_gen(m_t)

ppc_compare_t = hddm.utils.post_pred_stats(data, ppc_t)

print ppc_compare_t

observed mean std SEM MSE

stat

Accuracy 0.950368 0.957384 0.029085 0.000049 0.000895

Q_05 0.200000 0.099809 0.253532 0.010038 0.074317

Q_10 0.275000 0.263692 0.105209 0.000128 0.011197

Q_15 0.295000 0.297852 0.049848 0.000008 0.002493

Q_20 0.310000 0.313125 0.043986 0.000010 0.001944

Q_25 0.320000 0.325528 0.044832 0.000031 0.002041

Q_30 0.330000 0.337019 0.045829 0.000049 0.002150

Q_35 0.340000 0.347776 0.047008 0.000060 0.002270

Q_40 0.355000 0.358122 0.048403 0.000010 0.002353

Q_45 0.360000 0.368279 0.050142 0.000069 0.002583

Q_50 0.370000 0.378380 0.052064 0.000070 0.002781

Q_55 0.380000 0.388660 0.054380 0.000075 0.003032

Q_60 0.390000 0.399326 0.056985 0.000087 0.003334

Q_65 0.405000 0.410704 0.059949 0.000033 0.003626

Q_70 0.415000 0.423167 0.063599 0.000067 0.004112

Q_75 0.430000 0.437172 0.067918 0.000051 0.004664

Q_80 0.450000 0.453519 0.073156 0.000012 0.005364

Q_85 0.475000 0.473461 0.079812 0.000002 0.006372

Q_90 0.515000 0.500062 0.089413 0.000223 0.008218

Q_95 0.600000 0.542633 0.106683 0.003291 0.014672

mean_ub 0.397222 0.393793 0.056673 0.000012 0.003224

std_ub 0.106131 0.085605 0.029847 0.000421 0.001312

10q_ub 0.295000 0.295952 0.041105 0.000001 0.001691

30q_ub 0.340000 0.343626 0.045402 0.000013 0.002074

50q_ub 0.375000 0.382642 0.052010 0.000058 0.002763

70q_ub 0.420000 0.426549 0.063845 0.000043 0.004119

90q_ub 0.520000 0.502840 0.089803 0.000294 0.008359

mean_lb -0.348704 -0.344702 0.059711 0.000016 0.003581

std_lb 0.139867 0.040764 0.034131 0.009821 0.010986

10q_lb 0.228500 0.306153 0.060161 0.006030 0.009649

30q_lb 0.270500 0.323835 0.058770 0.002845 0.006299

50q_lb 0.312500 0.340755 0.059832 0.000798 0.004378

70q_lb 0.355000 0.360015 0.063739 0.000025 0.004088

90q_lb 0.499000 0.386310 0.077460 0.012699 0.018699

(Between Accuracy and Mean_UB, we have added the statistics for the mean quantiles)

In : m_t.print_stats()

mean std 2.5q 25q 50q 75q 97.5q mc err

a 1.032642 0.127689 0.824521 0.948166 1.018610 1.099780 1.336110 0.003390

a_std 0.308509 0.134249 0.145709 0.220208 0.280280 0.359179 0.651747 0.004988

a_subj.1 0.990361 0.056806 0.883734 0.951650 0.988845 1.025816 1.108167 0.001970

a_subj.2 0.809401 0.063683 0.693465 0.765786 0.805333 0.850235 0.943533 0.002586

a_subj.3 0.969141 0.067541 0.852300 0.921081 0.964426 1.010240 1.115244 0.002387

a_subj.4 0.806454 0.051044 0.712355 0.771292 0.804302 0.839960 0.913347 0.001551

a_subj.5 0.819844 0.066411 0.698920 0.773885 0.816425 0.865128 0.956576 0.001923

a_subj.6 1.602332 0.152676 1.291418 1.507251 1.606886 1.699638 1.905872 0.010508

a_subj.7 1.008833 0.084272 0.860462 0.949472 1.003223 1.063383 1.189927 0.002327

a_subj.8 0.948984 0.067851 0.823164 0.903010 0.945877 0.992517 1.086083 0.002207

v 4.714714 0.502965 3.733480 4.407427 4.712778 5.019056 5.732513 0.016011

v_std 1.214801 0.422593 0.635760 0.915997 1.128040 1.430299 2.244965 0.010739

v_subj.1 3.928489 0.301786 3.371822 3.722588 3.915173 4.120581 4.556741 0.011183

v_subj.2 6.003318 0.516067 5.065809 5.641532 5.975912 6.333699 7.082451 0.022518

v_subj.3 4.791555 0.421857 4.020674 4.499762 4.766987 5.058304 5.688362 0.016938

v_subj.4 5.058226 0.372824 4.390900 4.798664 5.034277 5.295641 5.845174 0.012705

v_subj.5 6.352554 0.545023 5.364790 5.979040 6.321541 6.692676 7.494782 0.020538

v_subj.6 3.827196 0.321974 3.252794 3.600050 3.817358 4.039728 4.512377 0.017195

v_subj.7 4.355106 0.413607 3.596050 4.074463 4.338487 4.622252 5.196153 0.012900

v_subj.8 3.873893 0.390663 3.159540 3.597838 3.861794 4.126158 4.694825 0.012872

t(0.1) 0.275380 0.013704 0.249568 0.266171 0.275204 0.284138 0.303593 0.000514

t(0.2) 0.274447 0.015130 0.245438 0.264520 0.274162 0.284083 0.305011 0.000296

t(1.1) 0.314519 0.013336 0.289018 0.305652 0.314256 0.322955 0.341992 0.000339

t(1.2) 0.300083 0.015312 0.270407 0.289858 0.299880 0.309830 0.330553 0.000347

t_std 0.035148 0.006650 0.024315 0.030480 0.034349 0.039012 0.050054 0.000283

t_subj(0.1).1 0.248112 0.008760 0.229747 0.242518 0.248549 0.254293 0.263577 0.000207

t_subj(0.1).2 0.270673 0.009773 0.254547 0.263600 0.267901 0.279509 0.289771 0.000351

t_subj(0.1).3 0.290117 0.011364 0.267608 0.281473 0.291709 0.298492 0.309739 0.000259

t_subj(0.1).4 0.296832 0.007807 0.281452 0.291404 0.297143 0.302121 0.312161 0.000165

t_subj(0.1).5 0.260671 0.007165 0.246820 0.255288 0.260665 0.266457 0.273125 0.000196

t_subj(0.1).6 0.268236 0.038288 0.233900 0.246690 0.252946 0.262183 0.358054 0.003711

t_subj(0.1).7 0.252958 0.009762 0.231366 0.247022 0.253625 0.260059 0.269597 0.000245

t_subj(0.1).8 0.299903 0.007868 0.283003 0.294819 0.300653 0.305419 0.313619 0.000227

t_subj(0.2).1 0.256980 0.007182 0.241750 0.252417 0.257584 0.261910 0.270021 0.000217

t_subj(0.2).2 0.279105 0.006139 0.266320 0.275291 0.279376 0.283189 0.290842 0.000234

t_subj(0.2).3 0.259725 0.006239 0.246808 0.255819 0.259958 0.263988 0.271358 0.000212

t_subj(0.2).4 0.267836 0.007093 0.254300 0.263046 0.267347 0.272825 0.281561 0.000185

t_subj(0.2).5 0.313849 0.013266 0.285745 0.305987 0.316211 0.323530 0.334524 0.000461

t_subj(0.2).6 0.254320 0.014032 0.226313 0.244978 0.253617 0.265007 0.281187 0.000715

t_subj(1.1).1 0.260888 0.008298 0.246491 0.255589 0.260288 0.265446 0.281274 0.000280

t_subj(1.1).2 0.281509 0.005358 0.271115 0.277666 0.281583 0.285323 0.292194 0.000203

t_subj(1.1).3 0.315620 0.006701 0.301834 0.311386 0.316139 0.320246 0.326843 0.000237

t_subj(1.1).4 0.361867 0.007676 0.346343 0.356406 0.362449 0.367601 0.375362 0.000180

t_subj(1.1).5 0.283690 0.008042 0.267329 0.278256 0.283749 0.289334 0.298430 0.000229

t_subj(1.1).6 0.379134 0.032680 0.336490 0.351678 0.364018 0.406473 0.446203 0.002460

t_subj(1.1).7 0.299002 0.008113 0.281934 0.293556 0.299651 0.305130 0.312436 0.000243

t_subj(1.1).8 0.347653 0.008351 0.329314 0.342209 0.348576 0.353762 0.361394 0.000287

t_subj(1.2).1 0.280512 0.007707 0.265587 0.275153 0.280499 0.285741 0.294904 0.000244

t_subj(1.2).2 0.334318 0.006451 0.321361 0.330108 0.334448 0.338767 0.346245 0.000226

t_subj(1.2).3 0.304907 0.006419 0.291071 0.300829 0.305618 0.309475 0.316045 0.000205

t_subj(1.2).4 0.274315 0.005915 0.262187 0.270396 0.274586 0.278539 0.285208 0.000220

t_subj(1.2).5 0.345572 0.006650 0.331976 0.341087 0.345907 0.350093 0.357847 0.000198

t_subj(1.2).6 0.261897 0.014895 0.230657 0.252398 0.262685 0.272238 0.288430 0.000679

sz 0.705535 0.051797 0.599227 0.670685 0.706871 0.741854 0.803352 0.002991

st 0.139792 0.006086 0.128237 0.135541 0.139637 0.143945 0.152157 0.000371

DIC: -4162.849989

deviance: -4176.260738

pD: 13.410749

As you see, our DIC values are negative. So if only DIC is considered, the best fit model should be the one with the '''largest''' absolute value (ie., the '"more"' negative DIC)

Thank you!

Konstantinos

Thomas Wiecki

unread,

Mar 24, 2014, 9:35:45 PM3/24/14

to hddm-...@googlegroups.com

All of this looks correct.

If you want the actual summary statistics for each individual subject, call:
ppc_compare_t = hddm.utils.post_pred_stats(data, ppc_t, call_compare=False)

instead.

But why can't you just compare the MSE column between the two models (on a summary statistic you care about, you could also take the mean I think) to assess which model fits better in terms of PPC?

Konstantinos Michmizos

unread,

Mar 24, 2014, 10:51:34 PM3/24/14

to hddm-...@googlegroups.com

Thank you Thomas

I think the problem lies on trying to infer the best-fit model by comparing the MSE of the PPC """for the group models"""

By looking at the MSE for the U, L and mean quantiles of the group models, the PPC method did not strongly support any of the candidate models

"Strongly" means a MSE that should remain the smallest across most of the quantiles, for a particular model (we had one model as an exemption: see below)

Also, the DIC values seem not to be consistent with the PPC when compared ''''across group models''', for our data sets.

Specifically, the group model with the best (smallest) DIC was the worst in terms of MSE for <15 and > 85 quantiles, indicating that other models, with worse (larger) DIC values, captured the variety of the RT somewhat better.

In addition, the "no effect" model had a DIC value larger by 150 units compared to the smallest DIC value. However, the "no effect" model had a better fit (lower MSE) in all quantiles, for the average data/ group model.

These results were quite confusing

I hope that on an individual basis, the MSE of the mean RT might actually give us a credible indication of the best fit model which will be consistent with DIC

Thank you once more,

Konstantinos

Thomas Wiecki

unread,

Mar 25, 2014, 8:11:48 AM3/25/14

to hddm-...@googlegroups.com

Hm, yeah I always worried about a case like this.

Are there significant differences in the posteriors for the different conditions though? That might be justification enough to use that model.

You could also check for significant differences between the conditions in mean RT or accuracy and say that a model should capture that.

Konstantinos Michmizos

unread,

Mar 26, 2014, 2:28:25 PM3/26/14

to hddm-...@googlegroups.com

Hi Thomas

There seem to be significant differences for all three components of the most complicated {a,v,t} model: p <0.01 for v, p<0.1 for t (and posterior mean amplitude at 45%) and p<0.1 for a (posterior mean amplitude at 3.5%)

Running the models with more samples might smooth the posteriors more and give us a lower p-value for a and t

We estimated the MSE for the quintiles per subject and per condition

Interestingly, for condition-1, the smallest MSE are for {vt} and {v} models and for condition-2, the smallest MSE are for {at} and {a} models (see attached figure)

Do you think this is enough justification to use the "combined" model, {a,v,t}, to do our inferences?

Thank you

Konstantinos

Konstantinos Michmizos

unread,

Mar 27, 2014, 5:11:18 PM3/27/14

to hddm-...@googlegroups.com

Hi Thomas (and all)

it seems that what you suggested (doing PPC for each condition, separately) worked in our case

There were systematic differences in the fits between the conditions that could not be captured by the group models

The median of the MSE across subjects for the 10, 50 and 90 quantiles, per condition, was enough to assess the best fit model

Perhaps in the future, it would be helpful to introduce an R^2 value of the best gamma fit to the rt distribution, per condition/subject

Thanks again for your help

Konstantinos

Thomas Wiecki

unread,

Mar 27, 2014, 9:08:13 PM3/27/14

to hddm-...@googlegroups.com

Hi Konstantinos,

That sounds great. But just to clarify, I didn't mean to run separate PPCs. I unfortunately forgot about to mention a critical feature: you can pass a groupby keyword argument to post_pred_gen. This way you can test how your model estimated without any conditions predicts the differences in the conditions. So you might do:

kabuki.analyze.post_pred_gen(no_cond_model, groupby=['cond1', 'cond2'])

and

kabuki.analyze.post_pred_gen(full_cond_model) # no groupby needed as it's extracted from the model

And they both should produce RTs for all your different conditions and you can check whether one systematically misses some patterns.

Does that make sense?

Thomas

P.S.: I also added this part to the documentation now as it's indeed a critical feature that should be documented properly:
http://ski.cog.brown.edu/hddm_docs/tutorial_post_pred.html#using-ppc-for-model-comparison-with-the-groupby-argument

Konstantinos Michmizos

unread,

Mar 27, 2014, 11:18:21 PM3/27/14

to hddm-...@googlegroups.com

Hi Thomas

it makes sense, thank you

If I understand correctly, the groupby argument compares a "no-constraints" model to any other model

Is there a way to compare, e.g. the {a,v,t} vs. the {a,v} model, for each 'condition'?

Thanks once more

Konstantinos

Thomas Wiecki

unread,

Mar 27, 2014, 11:42:36 PM3/27/14

to hddm-...@googlegroups.com

Hi Konstantinos,

Well, not quite. groupby just splits the data that is generated by a non-condition model in the same way as a model where you used groupby in the model spec. Maybe check out the tutorial that has more information. The {a,v,t} and {a,v} model should already generate data for the different conditions.

You then will want to look at the mean RT and accuracy in the generated summary stats for each condition and compare the MSE of your models. With groupby you now also get summary stats in the model that does not have different conditions.

Thomas

Chuan-Peng Hu

unread,

Jan 24, 2017, 10:32:31 AM1/24/17

to hddm-users

Hi, Thomas,

I encountered very difficult situation related to this topic, it would be great if you could give me some suggestions.

I have run 16 models, varying the parameters that were set free to vary. I hope to use both DIC and the MSE from PPC. However, I found that the MSE and DIC gave conflicting results (see below), and if I only consider the MSE as the model selection criterion, then the model will be the one that fixed all parameters, which sound weird to me.

Model	Free to vary	MSE	DIC	Model	Free to vary	MSE	DIC
1	v, a, t, z	0.03199	-265.8	9	a, t	0.01580	732.6
2	*v, t, z*	0.02966	-120.9	10	v, a	0.03222	883.6
3	v, a, t	0.03252	61.7	11	a, z	0.01852	978.3
4	a, t, z	0.01777	258.3	12	a	0.01659	1645.1
5	v, a, z	0.03137	505.4	13	t	0.01582	797.8
6	v, t	0.03088	186.2	14	z	0.02069	925.3
7	t, z	0.01829	285.1	15	v	0.033218	897.3
8	v, z	0.03291	475.8	16	Fix all	0.01413	2015.9

Thomas Wiecki

unread,

Jan 27, 2017, 5:10:44 AM1/27/17

to hddm-...@googlegroups.com

That's a tricky issue, I'm not sure what to make of this. You can always run the full model and just do hypothesis testing on that.

Is this with simulated data?

To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+unsubscribe@googlegroups.com.

Chuan-Peng Hu

unread,

Jan 29, 2017, 7:49:53 AM1/29/17

to hddm-users

Hi, Thomas,

Thank you very much for your quick response (and sorry for my delayed reply), by full model, do you mean the model with all four parameters free to vary?

These results are from real experimental data.

Thomas Wiecki

unread,

Jan 30, 2017, 4:05:56 AM1/30/17

to hddm-...@googlegroups.com

On Sun, Jan 29, 2017 at 1:49 PM, Chuan-Peng Hu <hcp...@gmail.com> wrote:

Hi, Thomas,

Thank you very much for your quick response (and sorry for my delayed reply), by full model, do you mean the model with all four parameters free to vary?

Yes exactly.

These results are from real experimental data.

Might be worth testing with simulated data.

To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+unsubscribe@googlegroups.com.

Chuan-Peng Hu

unread,

Jan 31, 2017, 9:38:58 PM1/31/17

to hddm-users

Thank you very much, Thomas!

Reply all

Reply to author

Forward