between vs. within subject design

stein...@gmail.com

unread,

Mar 25, 2015, 4:44:53 AM3/25/15

to hddm-...@googlegroups.com

Hi everyone,

I have found that modeling my data including an intercept for individual subject as in the Within-subject effects -tutorial leads to a substantially worse DIC value. The experiment in fact was a within-subject design. But apparently the between subject variance is not big enough to justify the how much the DIC penalises the additional factor. - or is there another explanation? Before deciding to just stick with a model that treats conditions as independent I would want to check this explanation however.

Question now is how to do this. A colleague suggested, that I compare the error variance of the parameters to the between subject variance. Is there a way to obtain these variances?

Sorry for asking so many questions. But there's hardly a better way to learn to use this tool than to dive right in and answers here have been really helpful so far!

Thank you,

Benjamin

Michael J Frank

unread,

Mar 25, 2015, 9:16:56 AM3/25/15

to hddm-...@googlegroups.com

Hi Benjamin,

There shouldn't actually be an 'additional factor' making the model more complex for the within subject analysis. The difference is either estimate a parameter for each condition where each one is drawn from separate group distributions, or to define one of the conditions as the intercept, and the others relative to that one (but where both intercepts and within subject differences are drawn from group distributions). So in each case there are the same number of params.

The DIC is not always well behaved and is more appropriate when the posteriors are normalish. I would strongly encourage you to assess other features of model fit in complement - for example, posterior predictive checks. In this case you could look at behavior in the different conditions, and the within-subject difference in behavior (with standard analysis, e.g. accuracy in cond 1, cond2 and cond2-1) and then see if your model with posterior parameters that generate data that looks like what you see behaviorally. You can also look at the posterior for the within subject parameters and see if they are significantly different from zero, and similarly in the between subject version just subtract the group posterior for one condition from another - as shown in the demo, it should be easier to detect a within subject difference in conditions with the within subject model especially if there is between subject variability in overall performance.

One reason you might get different DICs is because of priors - the priors for DDM parameters are informed based on previous studies, whereas the priors for within condition differences in your specific experiment are not. It thus might not be as straightforward as you'd like to compare DIC's between these two model specifications.

Michael

Michael J Frank, PhD, Associate Professor
Laboratory for Neural Computation and Cognition
Brown University
http://ski.clps.brown.edu
(401)-863-6872

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

stein...@gmail.com

unread,

Mar 30, 2015, 1:32:57 PM3/30/15

to hddm-...@googlegroups.com, michae...@brown.edu

Thanks a lot Michael, for the clarification and the pointers.

So following the ppc procedure as documented, I get smaller MSEs for the within-subject model than for the between-subject model in all measures considered in the output below except for the standard deviation of the upper bound. But should not the SEMs and the mahalanobis distance also be smaller for the model with the better fit? These two mostly show smaller values for the between-subject model.

Between-subject model

	observed	mean	std	SEM	MSE	credible	quantile	mahalanobis
stat
accuracy	0.9433536	0.9660037	0.03557391	0.00051303	0.001778533	True	22.26667	0.6367075
mean_ub	0.4854672	0.4957795	0.06141767	0.000106345	0.003878475	True	44.61667	0.1679052
std_ub	0.1305849	0.1649427	0.04745605	0.001180459	0.003432536	True	24.05	0.7239923
10q_ub	0.3500026	0.3387868	0.04158229	0.000125795	0.001854882	True	57.04167	0.2697261
30q_ub	0.4164336	0.3926245	0.04690844	0.000566873	0.002767275	True	68.15417	0.5075653
50q_ub	0.459658	0.4516114	0.05508585	6.47E-05	0.003099199	True	56.50417	0.1460744
70q_ub	0.5229284	0.535392	0.0710092	0.000155341	0.005197647	True	44.6375	0.1755208
90q_ub	0.650119	0.7092383	0.1141997	0.003495096	0.01653666	True	31.01667	0.517684
mean_lb	-0.4481721	-0.4933285	0.1126405	0.002039099	0.01472698	True	61.01346	0.4008895
std_lb	0.1330119	0.1194116	0.09171243	0.00018497	0.00859614	True	61.0186	0.1482935
10q_lb	0.300472	0.3840419	0.1061624	0.006983921	0.01825438	True	17.82814	0.7871887
30q_lb	0.3626525	0.4219183	0.1063242	0.003512433	0.01481726	True	30.1881	0.5574066
50q_lb	0.4210305	0.4654468	0.113744	0.001972808	0.0149105	True	39.99897	0.3904937
70q_lb	0.4875995	0.5257183	0.1315139	0.001453044	0.01874894	True	44.87614	0.2898463
90q_lb	0.630967	0.6279221	0.1849728	9.27E-06	0.03422423	True	57.72947	0.01646151

within-subject model

	observed	mean	std	SEM	MSE	credible	quantile	mahalanobis
stat
accuracy	0.9420198	0.9704948	0.03085291	0.000810826	0.001762728	True	19.05	0.9229276
mean_ub	0.4990737	0.5161628	0.04990107	0.000292038	0.002782155	True	39.05	0.3424599
std_ub	0.1507654	0.195317	0.04086438	0.001984848	0.003654745	True	20.35	1.090231
10q_ub	0.351571	0.3327659	0.03516861	0.000353632	0.001590463	True	65.05	0.5347129
30q_ub	0.418432	0.3959843	0.03987111	0.000503897	0.002093602	True	75.3	0.5630057
50q_ub	0.477889	0.4639448	0.04591399	0.000194441	0.002302536	True	61.41667	0.3037031
70q_ub	0.5431935	0.5606523	0.05733132	0.000304811	0.00359169	True	39.65	0.3045252
90q_ub	0.680197	0.7657917	0.09203341	0.007326446	0.01579659	True	16.2	0.930039
mean_lb	-0.4601323	-0.5156403	0.08880484	0.003081135	0.01096743	True	75.12412	0.6250557
std_lb	0.1551207	0.1666337	0.08685637	0.00013255	0.007676578	True	44.11916	0.1325524
10q_lb	0.2993659	0.3614318	0.08268629	0.003852176	0.0106892	True	24.08834	0.7506191
30q_lb	0.3627853	0.4141317	0.08185624	0.002636453	0.009336896	True	29.36141	0.6272753
50q_lb	0.4250845	0.472975	0.08851271	0.002293496	0.010128	True	32.10067	0.5410574
70q_lb	0.492839	0.5565743	0.1081401	0.004062182	0.01575647	True	27.23849	0.5893764
90q_lb	0.6780694	0.713153	0.1683135	0.001230861	0.0295603	True	42.98921	0.2084421

Michael J Frank

unread,

Mar 31, 2015, 6:44:45 AM3/31/15

to hddm-...@googlegroups.com

Hi Ben,

Indeed that is odd. Mahalanobis distance should not give a different answer for this 1D case. I just checked it out and there is actually a bug in HDDM reporting of these stats. (See https://github.com/hddm-devs/kabuki/blob/master/kabuki/analyze.py )

MSE is correct but

1. SEM is calculated as just the squared difference between the mean observed data and the mean of simulated data sets, which is an unsigned measure of how far the overall means are from each other, but is not SEM... (we'll open an issue in github). In contrast MSE gives you the mean of the point by point comparison, for each simulated data set from the posterior, how far off is the predicted data from the observed data.

2. Mahalanobis distance was also calculated incorrectly, and should probably just be removed.

Having said this, what seems clear in your case is that the mean of the simulated data sets is further away from the mean of the observed data for the within-subjects model, but that the individual runs are closer to each other. This is a bit puzzling but before thinking about this further, I'm not sure why your 'observed' column is different for the two models - shouldn't this just be summary statistics of the observed data independent of model?

Also, you might consider using PPC summary stats that take into account the different within subject conditions, from what I can tell you are looking at overall accuracy.

(Thomas is currently traveling to Japan but might have something to add when he gets back online...)

stein...@gmail.com

unread,

Apr 1, 2015, 6:57:37 PM4/1/15

to hddm-...@googlegroups.com, michae...@brown.edu

Thanks again Michael,

the different observed results actually stem from playing with slightly different outlier exclusion procedures. By mistake I posted one wrong table. However, below we see the results, of the correct comparison, which does not change much and still shows the odd pattern of better fit according to MSE for the within-model and the opposite is suggested when we look and the distance between mean of observed and mean of simulated data.

Also, about your last remark: Am I correct, this can not be implemented through the "groupby" parameter but should be done by defining my own summary stats?

between subject model


	observed	mean	std	SEM	MSE	credible	quantile	mahalanobis
stat

accuracy	0.9420198	0.9618769	0.03868918	0.000394306	0.001891159	True	24.5625	0.5132477
mean_ub	0.4990737	0.5031659	0.06148393	1.67E-05	0.003797019	True	48.85833	0.06655713
std_ub	0.1507654	0.1595664	0.04610429	7.75E-05	0.002203064	True	47.67917	0.1908948
10q_ub	0.351571	0.3514401	0.04493653	1.71E-08	0.002019309	True	46.0625	0.002912607
30q_ub	0.418432	0.4034474	0.0494219	2.25E-04	0.002667061	True	60.3375	0.3031971
50q_ub	0.477889	0.4604147	0.05624435	3.05E-04	0.003468778	True	62.67083	0.3106853
70q_ub	0.5431935	0.5412901	0.06999375	3.62E-06	0.004902748	True	53.42083	0.0271932
90q_ub	0.680197	0.7096399	0.1102168	8.67E-04	0.01301463	True	42.95417	0.267136
mean_lb	-0.4601323	-0.5019116	0.1055379	0.001745513	0.01288376	True	60.94701	0.3958705
std_lb	0.1551207	0.1235704	0.0866004	0.00099542	0.008495048	True	68.9502	0.3643202
10q_lb	0.2993659	0.388149	0.09969723	0.007882448	0.01782199	True	13.60643	0.8905277
30q_lb	0.3627853	0.427042	0.09969966	0.004128927	0.01406895	True	25.13222	0.644503
50q_lb	0.4250845	0.4719084	0.1065959	0.002192478	0.01355517	True	37.16695	0.4392655
70q_lb	0.492839	0.5347944	0.1240898	0.001760255	0.01715852	True	42.46582	0.3381052
90q_lb	0.6780694	0.6436438	0.1755016	1.19E-03	0.03198593	True	64.41473	0.1961556

Michael J Frank

unread,

Apr 2, 2015, 3:50:41 AM4/2/15

to hddm-...@googlegroups.com

Yes I think you can just do groupby here because the stats that you want to compare for DDM fits may still be accuracy and quantiles, but just separately for your within-subject conditions. I suspect this will be more informative about whether which model fits better - in general one can always use overall measures of model fit (like DIC, BIC, BF etc) but one thing the PPC buys you is being able to focus on the specific key aspects of the data that you are studying, so here the question is can the model capture the effects of condition. It could be that one model can capture the overall summary statistics across conditions but not the changes due to condition. In your case both models should be able to get the effects on average since they are both allowing for condition effects (on drift presumably) but what you want to check is whether the within subject effects of condition are more precisely estimated by the within-subject model.

Actually this might be seen more clearly if you define your observed stats as the within subject differences ( in accuracy and quantiles) and then look at the error in the between vs within subject models in capturing these. But it might already be clearer if you do the groupby as per above.

As for the finding that (when not splitting by conditions) you get opposite conclusions for MSE vs mean of simulated data, it seems to me that MSE is more appropriate in any case because it reflects the error for each simulated data set using the same number of trials, subjects etc as you actually used empirically in the observed data. If you use the mean of the simulated data sets, you could happen to do ok even if no data set actually looks anything like the observed data (e.g. if accuracy is 0.5 in the data, then if you have simulated data sets where you get something like (0.4, 0.6, 0.55 ...) the MSE will be much better than simulated data sets that look like (0, 1, 0, 1 ...) which are all completely off -- but the mean of these will look better.

Michael

stein...@gmail.com

unread,

Apr 9, 2015, 12:00:55 PM4/9/15

to hddm-...@googlegroups.com, michae...@brown.edu

Again very valuable input for me, Michael and I think your argument is very plausible.

So I will concentrate on the MSE as the decisive measure.

However, I have been trying to use the "groupby" argument with the within-subject model and can't seem to get it to work. Has anybody successfully done that?

I have unsuccessfully tried the code below:

### code

dmatrix("C(stim, Treatment('congruent'))", data.head(10))
### output

DesignMatrix with shape (10, 4)
  Columns:
    ['Intercept',
     "C(stim, Treatment('congruent'))[T.auditory]",
     "C(stim, Treatment('congruent'))[T.incongruent]",
     "C(stim, Treatment('congruent'))[T.visual]"]
  Terms:
    'Intercept' (column 0)
    "C(stim, Treatment('congruent'))" (columns 1:4)
  (to view full data, use np.asarray(this_obj))

### code

m_vat = hddm.HDDMRegressor(data, {"v ~ C(stim, Treatment('congruent'))","a ~ C(stim, Treatment('congruent'))","t ~ C(stim, Treatment('congruent'))"}, keep_regressor_trace=True, p_outlier=.05)

### output

Adding these covariates:
['t_Intercept', "t_C(stim, Treatment('congruent'))[T.auditory]", "t_C(stim, Treatment('congruent'))[T.incongruent]", "t_C(stim, Treatment('congruent'))[T.visual]"]
Adding these covariates:
['a_Intercept', "a_C(stim, Treatment('congruent'))[T.auditory]", "a_C(stim, Treatment('congruent'))[T.incongruent]", "a_C(stim, Treatment('congruent'))[T.visual]"]
Adding these covariates:
['v_Intercept', "v_C(stim, Treatment('congruent'))[T.auditory]", "v_C(stim, Treatment('congruent'))[T.incongruent]", "v_C(stim, Treatment('congruent'))[T.visual]"]
### code
m_vat.find_starting_values()
m_vat.sample(20500, burn=500, thin=10, dbname='traces.db', db='pickle')
### output

 [-----------------100%-----------------] 20501 of 20500 complete in 85312.6 sec
<pymc.MCMC.MCMC at 0x10b22a9d0>
### code
ppc_data = hddm.utils.post_pred_gen(m_vat,groupby='stim')
### output

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-65-1353dec930ed> in <module>()
----> 1 ppc_data = hddm.utils.post_pred_gen(m_vat,groupby='stim')

/Users/benjamin/anaconda/lib/python2.7/site-packages/kabuki/analyze.pyc in post_pred_gen(model, groupby, samples, append_data, progress_bar)
    327 
    328     for name, data in iter_data:
--> 329         node = model.get_data_nodes(data.index)
    330 
    331         if progress_bar:

/Users/benjamin/anaconda/lib/python2.7/site-packages/kabuki/hierarchical.pyc in get_data_nodes(self, idx)
    919 
    920         if len(data_nodes) != 1:
--> 921             raise NotImplementedError("Supply a grouping so that at most 1 observed node codes for each group.")
    922 
    923         return data_nodes[0]

NotImplementedError: Supply a grouping so that at most 1 observed node codes for each group.

Thomas Wiecki

unread,

Apr 10, 2015, 4:10:12 AM4/10/15

to hddm-...@googlegroups.com, Michael J. Frank

The groupby feature of PPC is designed for a slightly different purpose.

Imagine you have two models, one where you split v by condition and one where you don't and you want to compare them on their ability to capture accuracy differences between the two conditions. If you run PPC on the split model it will automatically generate separate statistics for both conditions. However, PPC has no idea about the conditions in the combined model. That's where you use the groupby argument.

In your model you are already splitting by condition so you don't need the groupy argument. But you can compare it to a combined model that should provide worse fit.

Thomas

Thomas Wiecki, PhD

Data Science Lead, Quantopian Inc, Boston

stein...@gmail.com

unread,

Apr 10, 2015, 9:50:44 AM4/10/15

to hddm-...@googlegroups.com, michae...@brown.edu

Hi Thomas,

so as different conditions are already taken into account when defining within-subject models, I guess the best way to go is to use a combination of DIC and MSE for comparing different within-subject models (varying the number and combinations of free parameters) and to stick to only the MSE for comparing within-subject and between-subject models that allow the same parameters to vary freely.

Thanks again you two!

...

Chuan-Peng Hu

unread,

Dec 14, 2016, 3:05:38 AM12/14/16

to hddm-users, michae...@brown.edu

Hi, Stein,

I really appreciate that you have this great discussion!

I am wondering: Using MSE to compare within-subject and between-subject models, what if the between-subject model fit better than within-subject model even the actual design is within-subject design? Not sure there will be such a possibility, just out of curiosity.

Reply all

Reply to author

Forward