Hi thomas,
my last questions were answered.
ppc_data = post_pred_check(m, compute_stats=False, append_data=True)
makes a pandas dataframe containing both the observed data (rt) and the predicted data (rt_sampled).
running the ppc for 50 samples while using 10 bins took approx. an hour (for an experiment with ca 25 participants and 900 trials/participant on a 3.5Ghz processor).
so sampling definitively is time consuming.
I'll run it with more samples now. for the quick sampling i did now, sampled RTs are nearly twice as slow as observed RTs and accuracy is higher--85% compared to 80%. I hope that this is only an effect of my coarse sampling, as I know from running the same model with depends_on that observed and predicted data are very similar
some clarification questions:
- when I submit the flag samples = 50, that means that 50 datasets will be generated with 50 different thetas drawn from the posterior, correct?
- in your code for the posterior predictive plots, do you make the predicted RT distribution (including "confidence intervals") as the mean of 50 histograms for each rt (and their 2.5 and 97.5 percentiles). or do you use another approach.
- relatedly, when you calculate the percentiles of mean RTs and other statistics, do you calculate the statistic for each sample and then calculate the percentile over the sample-statistics?
cheers - guido
ps: the pandas frame has a name only for the first of three index variables. i would have understood the dataframe faster if 2 and 3 were labled as "sample" and "trial" or similar.