HDDMRegressor options

Michael Stevens

unread,

Dec 4, 2022, 12:36:25 PM12/4/22

to hddm-users

Hi everyone,

I'm taking another step with HDDM modeling... this time looking to link single-trial fMRI estimates to the various HDDM parameters I estimated using the performance data from the task I've been working on. I'd be grateful for some guidance with a few quick orientation questions:

1) I've got a 'test' regression model successfully worked up that allows me to see how trial-by-trial brain function relates to a, t, and v parameters. It seems like I can also successfully differentiate study groups by just adding a BOLDest:group interaction notation. But the guidance on the tutorial website and listserv is a bit mixed on HOW to include z parameters to this effort. After reviewing everything I could find, I didn't see anything that led me to believe it's impossible to have a single regression model that'll show me the neural correlates of all 4 HDDM params at once... But I saw clear guidance in one prior post not to use a z link function because my data are stim-coded. Is this correct? I couldn't quite tell if that post from Michael Frank was example-specific to someone's question/paradigm, or a general rule. Even if I've got this right... What was less clear was exactly which link function one needs for the v parameter... (People have posted a few different example v link functions on this listerv and I'm not quite following when/why people are modifying the one presented on the tutorial webpage). For instance, what is what some seem to call an "identity" link actually doing for 'v' when using stimulus-coded data? I'm just hoping to understand why one wouldn't want to just use basic patsy notation without a link function... like I'm picturing doing for a, t, and z? Or more specifically... why one DOESN'T need a link function for z with stimulus-coded data, but DOES presumably (?) need one for v with the same stimulus-coded data?

2) If I need to include a link function for just 1 of the 4 parameters, is it kosher to combine different types of patsy model syntaxes together? In other words, for the intermediate modeling test step I'm at now, I'm specifying:

{"a ~ BOLDest:group",
"v ~ BOLDest:group",
"t ~ BOLDest:group"}

...and that model runs fine. But if I need to use a v link function... examples I've seen suggest I'll need to use the longer patsy format, e.g., "v_reg = {'model': 'v ~ 1 + C(group)', 'link_func': lambda x: x}". I'm still playing around with this, but I keep getting a lot of errors when trying to combine that statement with the first 3. Can someone offer an example of how I can specify things in both these syntax formats together?

And perhaps equally as important... Can anyone suggest what might be throwing "Mismatched contrast matrix for factor EvalFactor('C(s, [[1], [-1]])')" errors I'm getting when I try to implement this link function? I'm consistently getting "After 7.00000 retries, still no good fit found" errors. I'd appreciate a nudge towards whatever it is I could be missing.

3) One final, really quick question... I've gathered from older listserv posts that mixed-model applications weren't possible for HDDMRegressor in older HDDM versions. I'm using HDDM 0.9.7 and I'm not running into any errors when testing out a few between:within designs with HDDMRegressor. But I don't want to assume that means I'm doing it right. Just to confirm... Are fully between or mixed between:within designs fully working in HDDM 0.9.7?

Many thanks everyone,

Mike

Michael Stevens

unread,

Dec 5, 2022, 6:41:24 PM12/5/22

to hddm-users

Oh... One more question: What are the implications for running a HDDM model with full vs. data subsets? For instance, in this dataset I have 3 separate "tasks" that people form. If I run comparable regression models, I'm getting notably different associations with these single-trial BOLD signal estimates... But they're run on the same data. Might this come down to the coding? For instance, I'm using simple BOLDest:group or BOLDest:task:group model coding syntax. Or, is it a general property of having more information available during the modeling? I paused to do this careful check to make sure I'd get comparable results to bolster my confidence I was approaching this correctly. So when they didn't agree, I was a bit concerned. What should I be thinking here?

nadja...@gmail.com

unread,

Dec 6, 2022, 9:51:02 PM12/6/22

to hddm-...@googlegroups.com

Hi Michael,

To your various questions:

Whether it is meaningful to estimate a starting point bias depends on what the two decision boundaries are reflecting. If they reflect the two response options (e.g., response A vs. B), then it is meaningful to estimate a starting point bias “z”. If the two decision boundaries reflect “correct responses” and “incorrect responses”, respectively, then it is not meaningful to estimate a starting point bias (z). In this latter case, the starting point should be equidistant to the two boundaries (because it seems unjustified to assume that a person develops a response bias towards correct responses if that person has not yet seen the stimulus).

Regarding what link function to use for z: in the past, it has been suggested to use an inverse logit but that is now already incorporated in the prior. Therefore, one should instead just use the linear link function (lambda x:x - this just means that the sampler will directly estimate z instead of a transform of it). See this thread. For v, the link function usually is identity / linear also because it is not constrained.

Regarding the question whether one can have a single regression with all parameters varying at once:

This is possible in general but one needs to consider the complexity of such a model which often leads to all sorts of problems (e.g., some parameters might trade-off with each other; it might also lead to more uncertainty in the posteriors; problems with parameter recovery; longer estimation time). As pointed out earlier, it is often better to have a specific theory/hypothesis about parameters and then compare these hypotheses via separate models.

Regarding your question about regression syntax:

You provided this example:

{"a ~ BOLDest:group",
"v ~ BOLDest:group",
"t ~ BOLDest:group"}

{"v_reg = {'model': 'v ~ 1 + C(group)', 'link_func': lambda x: x}".

See the purple highlighted equations: it seems to me that two different definitions for drift rate are tried to be integrated into the same model. If the first code is working, then that is fine. You would not want to include both a v_reg AND the v ~ BOLD:group syntax as that would be redundant and may cause errors. Here is the link to a recent tutorial paper that has some good examples of regressors: https://hddm.readthedocs.io/en/latest/lan_tutorial.html#section-5-regressors

Regarding your question about mixed-model applications which would not be for HDDMRegressor in older HDDM versions:

There is an issue with trying to estimate *both* between and within subject effects in the same model which should be updated in later versions. For now, we recommend to use separate models for the between subjects and then comparing the posteriors on the within subject effects estimated in those two models using the extracted traces. See for an example: here

Best,

Nadja

Nadja R. Ging-Jehli, PhD

Postdoctoral Research Associate in Computational Psychiatry & Cognitive Neuroscience

Brown University

Department of Cognitive, Linguistic & Psychological Sciences

190 Thayer St, Providence, RI 02912

Lab website: https://www.lnccbrown.com/home/

(614) 736-7755 | na...@gingjehli.com | www.gingjehli.com

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/f33d617c-7d4d-46e5-891d-7298c19fb9ean%40googlegroups.com.

image001.jpg

Michael Stevens

unread,

Dec 7, 2022, 3:57:29 PM12/7/22

to hddm-users

Thanks so much! That's incredibly helpful and much appreciated. Yes, my task is stimulus-coded, so 'z' is interpretable and worth looking at... particularly in our topical context. My next research question on these data involves describing group differences in how trial-to-trial BOLD signal changes relate to HDDM parameters. Well, admittedly... first we want to know which brain regions are associated with the parameters in general (no one has ever looked before on this task). But then, our real question is to test if those associations differ between our diagnosis-based study groups. We've already determined that there are interesting group differences in the parameters themselves... depending not just on task, but on which stimuli set is used in each task. For one type of stimuli, we found all HDDM parameters to be abnormal in the clinical group. For the other stimulus class, only 'v' drift rates are slower. So we'd expect to see a very different picture of HDDM parameter neural correlates emerge when we assess the whole brain. At minimum, I'd be happy putting a map of HDDM parameter correlates for one study group side by side with the other... just to compare visually. But even better, I'll use your suggestion to run a model separately for each group then compare the traces to contrast each HDDM parameters' association with BOLD activity.

But let me make sure I'm translating your guidance into the right practical modeling approach. Essentially, I'm taking your caution against more complex models as "simpler is better" because it leads to clearer inference, involves fewer modeling unknowns, less uncertainty, etc. That means, avoid running larger complex models with all the parameters thrown in together... Instead focus on each HDDM parameter one-at-a-time. I assume that guidance would extend to only looking at one stimulus class per model too? If so, that'd translate into 4 models for one stimulus class, plus 4 models for the other stimulus class. Then two groups get modeled separately. And I'd ultimately do this separately for all 3 task contexts. (So to canvas the entire parameter set for both study groups, we'd run 48 rHDDMregressor models before we're done...) Also, each of these models would be simple, e.g., "a ~ BOLDest" because we'll take care to avoid mixing between/within combinations (i.e., 'BOLDest' is within, 'group' is between), in favor of the trace testing approach for groups.

Focusing on one HDDM parameter per regression model also means I no longer have to puzzle out why errors were thrown when I tried to run models which combined one notation format (e.g. 'a ~ BOLDest') with the longer patsy notation "e.g., "v_reg = {'model': 'v ~ 1 + C(group)', 'link_func': lambda x: x)"). I apologize I worded that question poorly. I was only asking the proper syntax to use both notation formats within the same model, not to put two 'v' parameter terms in.

So finally, when I go to run these models on the different parameters, I'd be specifying:

Model 1 for 'a' parameter: "a ~ BOLDest"
Model 2 for 't' parameter: "t ~ BOLDest"
Model 3 for 'v' parameter: "v_reg = {'model': 'v ~ 1 + BOLDest', 'link_func': lambda x: x)"
Model 4 for 'z' parameter: "z_reg = {'model': 'z ~ 1 + BOLDest', 'link_func': lambda x: x)"

Is that correct?

I could use just a bit more clarification on the link function guidance you offered. First, how is the simpler notation different than the expanded one with the link functions? That is, why can't I simply specify "v ~ BOLDest" and "z ~ BOLDest" in those models? I guess I'm still not sure what the link function's actual purpose is... i.e.., what it accomplishes in the model that requires it to be specified for z or v if the other parameters don't need it. From all the reading I did, I guessed it had something to do with stimulus-coding, but I couldn't quite figure out exactly what Second, when you refer to "linear" and "identity" link functions, are those different terms for the same "lambda x:x" function? The way you worded your reply left me unsure if they were two different things or not.

Thanks again...! Unless I learn I'm totally wrong-footed here, I think I'm maybe ready to code all this up on our HPC cluster and letting it crunch numbers for a week or two.

Mike

Reply all

Reply to author

Forward