Hm that's odd. If all (or most) subjects show opposite sign drifts for the two conditions when fit individually this should also hold at the group level in a hierarchical model.
A few thoughts:
- I assume you use "is_group_model" is True when you fit the whole group and that you have the appropriate column indicating which subject is which (ie subj_idx) in the data frame?
- Have you also confirmed that the hierarchical model converges? (ie run multiple chains and look at the r-hat).
- do all of your subjects have the same sign or is the case that some of them have positive v's for the blue cond and some negative v's for that cond? if so you might have just coded the response or stim differently per subject (eg if you counterbalanced responses) and lumping them together that way wouldn't work
More generally I always recommend confirming that you get interpretable parameters first based on simulated data (ie generate data with known parameters, and then fit those data to see if you get them back). THis should work both for individual subjects (if there is enough data /trials to recover the parameters) and for groups.