between vs. within subject design

809 views
Skip to first unread message

stein...@gmail.com

unread,
Mar 25, 2015, 4:44:53 AM3/25/15
to hddm-...@googlegroups.com
Hi everyone,

I have found that modeling my data including an intercept for individual subject as in the Within-subject effects -tutorial leads to a substantially worse DIC value. The experiment in fact was a within-subject design. But apparently the between subject variance is not big enough to justify the how much the DIC penalises the additional factor. - or is there another explanation? Before deciding to just stick with a model that treats conditions as independent I would want to check this explanation however.

Question now is how to do this. A colleague suggested, that I compare the error variance of the parameters to the between subject variance. Is there a way to obtain these variances?
 
Sorry for asking so many questions. But there's hardly a better way to learn to use this tool than to dive right in and answers here have been really helpful so far!

Thank you,
Benjamin

Michael J Frank

unread,
Mar 25, 2015, 9:16:56 AM3/25/15
to hddm-...@googlegroups.com
Hi Benjamin, 

There shouldn't actually be an 'additional factor' making the model more complex for the within subject analysis.  The difference is either estimate a parameter for each condition where each one is drawn from separate group distributions, or to define one of the conditions as the intercept, and the others relative to that one (but where both intercepts and within subject differences are drawn from group distributions). So in each case there are the same number of params.   

The DIC is not always well behaved and is more appropriate when the posteriors are normalish. I would strongly encourage you to assess other features of model fit in complement - for example, posterior predictive checks. In this case you could look at behavior in the different conditions, and the within-subject difference in behavior (with standard analysis, e.g. accuracy in cond 1, cond2 and cond2-1) and then see if your model with posterior parameters that generate data that looks like what you see behaviorally. You can also look at the posterior for the within subject parameters and see if they are significantly different from zero, and similarly in the between subject version just subtract the group posterior for one condition from another - as shown in the demo, it should be easier to detect a within subject difference in conditions with the within subject model especially if there is between subject variability in overall performance.

One reason you might get different DICs is because of priors - the priors for DDM parameters are informed based on previous studies, whereas the priors for within condition differences in your specific experiment are not. It thus might not be as straightforward as you'd like to compare DIC's between these two model specifications.  

Michael
Michael J Frank, PhD, Associate Professor
Laboratory for Neural Computation and Cognition
Brown University
http://ski.clps.brown.edu
(401)-863-6872

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

stein...@gmail.com

unread,
Mar 30, 2015, 1:32:57 PM3/30/15
to hddm-...@googlegroups.com, michae...@brown.edu
Thanks a lot Michael, for the clarification and the pointers.

So following the ppc procedure as documented, I get smaller MSEs for the within-subject model than for the between-subject model in all measures considered in the output below except for the standard deviation of the upper bound. But should not the SEMs and the mahalanobis distance also be smaller for the model with the better fit? These two mostly show smaller values for the between-subject model.


Between-subject model
observed mean std SEM MSE credible quantile mahalanobis
stat
accuracy 0.9433536 0.9660037 0.03557391 0.00051303 0.001778533 True 22.26667 0.6367075
mean_ub 0.4854672 0.4957795 0.06141767 0.000106345 0.003878475 True 44.61667 0.1679052
std_ub 0.1305849 0.1649427 0.04745605 0.001180459 0.003432536 True 24.05 0.7239923
10q_ub 0.3500026 0.3387868 0.04158229 0.000125795 0.001854882 True 57.04167 0.2697261
30q_ub 0.4164336 0.3926245 0.04690844 0.000566873 0.002767275 True 68.15417 0.5075653
50q_ub 0.459658 0.4516114 0.05508585 6.47E-05 0.003099199 True 56.50417 0.1460744
70q_ub 0.5229284 0.535392 0.0710092 0.000155341 0.005197647 True 44.6375 0.1755208
90q_ub 0.650119 0.7092383 0.1141997 0.003495096 0.01653666 True 31.01667 0.517684
mean_lb -0.4481721 -0.4933285 0.1126405 0.002039099 0.01472698 True 61.01346 0.4008895
std_lb 0.1330119 0.1194116 0.09171243 0.00018497 0.00859614 True 61.0186 0.1482935
10q_lb 0.300472 0.3840419 0.1061624 0.006983921 0.01825438 True 17.82814 0.7871887
30q_lb 0.3626525 0.4219183 0.1063242 0.003512433 0.01481726 True 30.1881 0.5574066
50q_lb 0.4210305 0.4654468 0.113744 0.001972808 0.0149105 True 39.99897 0.3904937
70q_lb 0.4875995 0.5257183 0.1315139 0.001453044 0.01874894 True 44.87614 0.2898463
90q_lb 0.630967 0.6279221 0.1849728 9.27E-06 0.03422423 True

57.72947 0.01646151
within-subject model
observed mean std SEM MSE credible quantile mahalanobis
stat
accuracy 0.9420198 0.9704948 0.03085291 0.000810826 0.001762728 True 19.05 0.9229276
mean_ub 0.4990737 0.5161628 0.04990107 0.000292038 0.002782155 True 39.05 0.3424599
std_ub 0.1507654 0.195317 0.04086438 0.001984848 0.003654745 True 20.35 1.090231
10q_ub 0.351571 0.3327659 0.03516861 0.000353632 0.001590463 True 65.05 0.5347129
30q_ub 0.418432 0.3959843 0.03987111 0.000503897 0.002093602 True 75.3 0.5630057
50q_ub 0.477889 0.4639448 0.04591399 0.000194441 0.002302536 True 61.41667 0.3037031
70q_ub 0.5431935 0.5606523 0.05733132 0.000304811 0.00359169 True 39.65 0.3045252
90q_ub 0.680197 0.7657917 0.09203341 0.007326446 0.01579659 True 16.2 0.930039
mean_lb -0.4601323 -0.5156403 0.08880484 0.003081135 0.01096743 True 75.12412 0.6250557
std_lb 0.1551207 0.1666337 0.08685637 0.00013255 0.007676578 True 44.11916 0.1325524
10q_lb 0.2993659 0.3614318 0.08268629 0.003852176 0.0106892 True 24.08834 0.7506191
30q_lb 0.3627853 0.4141317 0.08185624 0.002636453 0.009336896 True 29.36141 0.6272753
50q_lb 0.4250845 0.472975 0.08851271 0.002293496 0.010128 True 32.10067 0.5410574
70q_lb 0.492839 0.5565743 0.1081401 0.004062182 0.01575647 True 27.23849 0.5893764
90q_lb 0.6780694 0.713153 0.1683135 0.001230861 0.0295603 True 42.98921 0.2084421

Michael J Frank

unread,
Mar 31, 2015, 6:44:45 AM3/31/15
to hddm-...@googlegroups.com
Hi Ben, 
 
Indeed that is odd. Mahalanobis distance should not give a different answer for this 1D case. I just checked it out and there is actually a bug in HDDM reporting of these stats. (See https://github.com/hddm-devs/kabuki/blob/master/kabuki/analyze.py )

MSE is correct but 

1. SEM is calculated as just the squared difference between the mean observed data and the mean of simulated data sets, which is an unsigned measure of how far the overall means are from each other, but is not SEM... (we'll open an issue in github).   In contrast   MSE gives you the mean of the point by point comparison, for each simulated data set from the posterior, how far off is the predicted data from the observed data. 

2. Mahalanobis distance was also calculated incorrectly, and should probably just be removed. 
 
Having said this, what seems clear in your case is that the mean of the simulated data sets is further away from the mean of the observed data for the within-subjects model, but that the individual runs are closer to each other.  This is a bit puzzling but before thinking about this further, I'm not sure why your 'observed' column is different for the two models - shouldn't this just be summary statistics of the observed data independent of model?

Also, you might consider using PPC summary stats that take into account the different within subject conditions, from what I can tell you are looking at overall accuracy. 

(Thomas is currently traveling to Japan but might have something to add when he gets back online...)

stein...@gmail.com

unread,
Apr 1, 2015, 6:57:37 PM4/1/15
to hddm-...@googlegroups.com, michae...@brown.edu
Thanks again Michael,

the different observed results actually stem from playing with slightly different outlier exclusion procedures. By mistake I posted one wrong table. However, below we see the results, of the correct comparison, which does not change much and still shows the odd pattern of better fit according to MSE for the within-model and the opposite is suggested when we look and the distance between mean of observed and mean of simulated data.
Also, about your last remark: Am I correct, this can not be implemented through the "groupby" parameter but should be done by defining my own summary stats? 

between subject model
observed mean std SEM MSE credible quantile mahalanobis
stat
accuracy 0.9420198 0.9618769 0.03868918 0.000394306 0.001891159 True 24.5625 0.5132477
mean_ub 0.4990737 0.5031659 0.06148393 1.67E-05 0.003797019 True 48.85833 0.06655713
std_ub 0.1507654 0.1595664 0.04610429 7.75E-05 0.002203064 True 47.67917 0.1908948
10q_ub 0.351571 0.3514401 0.04493653 1.71E-08 0.002019309 True 46.0625 0.002912607
30q_ub 0.418432 0.4034474 0.0494219 2.25E-04 0.002667061 True 60.3375 0.3031971
50q_ub 0.477889 0.4604147 0.05624435 3.05E-04 0.003468778 True 62.67083 0.3106853
70q_ub 0.5431935 0.5412901 0.06999375 3.62E-06 0.004902748 True 53.42083 0.0271932
90q_ub 0.680197 0.7096399 0.1102168 8.67E-04 0.01301463 True 42.95417 0.267136
mean_lb -0.4601323 -0.5019116 0.1055379 0.001745513 0.01288376 True 60.94701 0.3958705
std_lb 0.1551207 0.1235704 0.0866004 0.00099542 0.008495048 True 68.9502 0.3643202
10q_lb 0.2993659 0.388149 0.09969723 0.007882448 0.01782199 True 13.60643 0.8905277
30q_lb 0.3627853 0.427042 0.09969966 0.004128927 0.01406895 True 25.13222 0.644503
50q_lb 0.4250845 0.4719084 0.1065959 0.002192478 0.01355517 True 37.16695 0.4392655
70q_lb 0.492839 0.5347944 0.1240898 0.001760255 0.01715852 True 42.46582 0.3381052
90q_lb 0.6780694 0.6436438 0.1755016 1.19E-03 0.03198593 True 64.41473 0.1961556

Michael J Frank

unread,
Apr 2, 2015, 3:50:41 AM4/2/15
to hddm-...@googlegroups.com
 Yes I think you can just do groupby here because the stats that you want to compare for DDM fits may still be accuracy and quantiles, but just separately for your within-subject conditions. I suspect this will be more informative about whether which model fits better - in general one can always use overall measures of model fit (like DIC, BIC, BF etc) but one thing the PPC buys you is being able to focus on the specific key aspects of the data that you are studying, so here the question is can the model capture the effects of condition. It could be that one model can capture the overall summary statistics across conditions but not the changes due to condition. In your case both models should be able to get the effects on average since they are both allowing for condition effects (on drift presumably) but what you want to check is whether the within subject effects of condition are more precisely estimated by the within-subject model.

Actually this might be seen more clearly if you define your observed stats as the within subject differences ( in accuracy  and quantiles) and then look at the error in the between vs within subject models in capturing these. But it might already be clearer if you do the groupby as per above.

As for the finding that (when not splitting by conditions) you get opposite conclusions for MSE vs mean of simulated data, it seems to me that MSE is more appropriate in any case because it reflects the error for each simulated data set using the same number of trials, subjects etc as you actually used empirically in the observed data. If you use the mean of the simulated data sets, you could happen to do ok even if no data set actually looks anything like the observed data (e.g. if accuracy is 0.5 in the data, then if you have simulated data sets where you get something like (0.4, 0.6,  0.55 ...) the MSE will be much better than simulated data sets that look like (0, 1, 0, 1 ...) which are all completely off -- but the mean of these will look better.

Michael

stein...@gmail.com

unread,
Apr 9, 2015, 12:00:55 PM4/9/15
to hddm-...@googlegroups.com, michae...@brown.edu
Again very valuable input for me, Michael and I think your argument is very plausible.
So I will concentrate on the MSE as the decisive measure. 
However, I have been trying to use the "groupby" argument with the within-subject model and can't seem to get it to work. Has anybody successfully done that? 
I have unsuccessfully tried the code below: 

### code
dmatrix("C(stim, Treatment('congruent'))", data.head(10))
### output
DesignMatrix with shape (10, 4)
  Columns:
    ['Intercept',
     "C(stim, Treatment('congruent'))[T.auditory]",
     "C(stim, Treatment('congruent'))[T.incongruent]",
     "C(stim, Treatment('congruent'))[T.visual]"]
  Terms:
    'Intercept' (column 0)
    "C(stim, Treatment('congruent'))" (columns 1:4)
  (to view full data, use np.asarray(this_obj))
### code
m_vat = hddm.HDDMRegressor(data, {"v ~ C(stim, Treatment('congruent'))","a ~ C(stim, Treatment('congruent'))","t ~ C(stim, Treatment('congruent'))"}, keep_regressor_trace=True, p_outlier=.05)
### output
Adding these covariates:
['t_Intercept', "t_C(stim, Treatment('congruent'))[T.auditory]", "t_C(stim, Treatment('congruent'))[T.incongruent]", "t_C(stim, Treatment('congruent'))[T.visual]"]
Adding these covariates:
['a_Intercept', "a_C(stim, Treatment('congruent'))[T.auditory]", "a_C(stim, Treatment('congruent'))[T.incongruent]", "a_C(stim, Treatment('congruent'))[T.visual]"]
Adding these covariates:
['v_Intercept', "v_C(stim, Treatment('congruent'))[T.auditory]", "v_C(stim, Treatment('congruent'))[T.incongruent]", "v_C(stim, Treatment('congruent'))[T.visual]"]
### code
m_vat.find_starting_values()
m_vat.sample(20500, burn=500, thin=10, dbname='traces.db', db='pickle')
### output
 [-----------------100%-----------------] 20501 of 20500 complete in 85312.6 sec
<pymc.MCMC.MCMC at 0x10b22a9d0>
### code
ppc_data = hddm.utils.post_pred_gen(m_vat,groupby='stim')
### output

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-65-1353dec930ed> in <module>()
----> 1 ppc_data = hddm.utils.post_pred_gen(m_vat,groupby='stim')

/Users/benjamin/anaconda/lib/python2.7/site-packages/kabuki/analyze.pyc in post_pred_gen(model, groupby, samples, append_data, progress_bar)
    327 
    328     for name, data in iter_data:
--> 329         node = model.get_data_nodes(data.index)
    330 
    331         if progress_bar:

/Users/benjamin/anaconda/lib/python2.7/site-packages/kabuki/hierarchical.pyc in get_data_nodes(self, idx)
    919 
    920         if len(data_nodes) != 1:
--> 921             raise NotImplementedError("Supply a grouping so that at most 1 observed node codes for each group.")
    922 
    923         return data_nodes[0]

NotImplementedError: Supply a grouping so that at most 1 observed node codes for each group.



Thomas Wiecki

unread,
Apr 10, 2015, 4:10:12 AM4/10/15
to hddm-...@googlegroups.com, Michael J. Frank
The groupby feature of PPC is designed for a slightly different purpose.

Imagine you have two models, one where you split v by condition and one where you don't and you want to compare them on their ability to capture accuracy differences between the two conditions. If you run PPC on the split model it will automatically generate separate statistics for both conditions. However, PPC has no idea about the conditions in the combined model. That's where you use the groupby argument.

In your model you are already splitting by condition so you don't need the groupy argument. But you can compare it to a combined model that should provide worse fit.

Thomas
Thomas Wiecki, PhD
Data Science Lead, Quantopian Inc, Boston

stein...@gmail.com

unread,
Apr 10, 2015, 9:50:44 AM4/10/15
to hddm-...@googlegroups.com, michae...@brown.edu
Hi Thomas, 

so as different conditions are already taken into account when defining within-subject models, I guess the best way to go is to use a combination of DIC and MSE for comparing different within-subject models (varying the number and combinations of free parameters) and to stick to only the MSE for comparing within-subject and between-subject models that allow the same parameters to vary freely.

Thanks again you two!


...

Chuan-Peng Hu

unread,
Dec 14, 2016, 3:05:38 AM12/14/16
to hddm-users, michae...@brown.edu
Hi, Stein,

I really appreciate that you have this great discussion!
I am wondering: Using MSE to compare within-subject and between-subject models, what if the between-subject model fit better than within-subject model even the actual design is within-subject design? Not sure there will be such a possibility, just out of curiosity.
Reply all
Reply to author
Forward
0 new messages