Design matrices: Conditions and extraregressors

67 views
Skip to first unread message

Darren Yeo

unread,
May 20, 2020, 8:36:12 PM5/20/20
to GLMdenoise
Hi Dr. Kay and colleagues,

I am planning to use GLMdenoise to derive the optimal set of regressors, which I will then use to model my fMRI data better within BrainVoyager.

In our experimental task, a run has 54 trials:
9 exemplars x 3 trials each (total of 27 trials) where participants respond that a letter is present among a string of digits (e.g., 8A961)
27 trials in total where participants respond that a letter is absent among a string of digits (e.g., 64713)
We have 2-4 runs per subject, most with 4 runs.

In our lab, we traditionally distinguish the correct trials and incorrect trials (i.e., commission errors and omission errors) to better model our data.
For the 'design' input, I currently have for each run:
9 regressors for correct ‘present’ trials (1 for each exemplar; 3 trials each minus any incorrect trials)
1 regressor for correct ‘absent’ trials (27 trials minus any incorrect trials)
1 regressor for commission error trials
1 regressor for omission error trials

Here are the design matrices for 2 subjects (S1 with only 3 runs, and S2 with full set of 4 runs) and the conditions are in order of the 12 regressors mentioned above:

S1S1_designmatrices.pngS2S2_designmatrices.png
 



Due to this coding scheme, some runs have no remaining correct trials for one or more exemplar conditions (so these condition regressors are vectors of zeros), and some runs have no error trials (so the error regressors are vectors of zeros). This seems to pose a problem for the cross-validation if I were to use these design matrices.

An alternative coding scheme that I am considering is a design matrix with the following to ensure that every run has the same set of conditions:
9 regressors for correct ‘present’ trials (1 for each exemplar; all 3 trials each regardless of response accuracy)
1 regressor for correct ‘absent’ trials (27 trials regardless of response accuracy)

Then include "extraregressors" to code for error trials:
1 regressor for commission error trials
1 regressor for omission error trials

For this alternative approach, I am worried about perfect collinearity between the exemplar condition regressor and one of the error regressors in any one run in which all the 'present' trials pertaining to an exemplar were incorrect trials.
I was wondering if you could advise on whether this alternative coding scheme is more sound than the original one, and if there are things that I should be concerned about. I am also wondering if I should omit extraregressors that are just vectors of zeros from each run, or just leave them in there because it was documented that "The number of extra regressors does not have to be the same across runs, and each run can have zero or more extra regressors.  If [] or not supplied, we do not use extra regressors in the model."

I would also appreciate it greatly if you could provide an example of how extraregressors can be incorporated in the code (I don't suppose it matters, but I am using onset timings for the <design> input as the onsets do not exactly coincide with the TRs).

Thank you!

Best,
Darren

Kendrick Kay

unread,
May 21, 2020, 10:03:21 AM5/21/20
to Darren Yeo, GLMdenoise
Hi Darren,

Thanks for the clear explanation of your paradigm.

You touch on a number of issues which are collectively a bit tricky to think about, and there are many choices one can make...

Regarding the correct/incorrect trials - I am wondering what the intention is for these trials. For example, if the intention is that you just want to effectively ignore the response from incorrect trials, then that leads you down one path. Or if the intention is that you are specifically interested in estimating the response to incorrect trials that leads to other choices...

At least the way GLMdenoise is setup, it acts as if you don't care about the beta weight associated with any extraregressors you give it. This was a design choice a long time ago (and maybe it should be different).

Regarding the columns of all zeros -- I am not sure it poses a huge problem. I think it might be all gracefully handled (for example, a column of all zeros is by default given a beta estimate of zero).  Does the code run (without crashing)?  If so, things are likely fine.

Regarding the collinearity issue, my understanding is that it might be the case (in one of the scenarios you lay out) that an extraregressor might be identical to one of your other design matrix columns. I agree this might be a problem. The code works by fully estimating (in the least squares sense) the beta weights for the extra regressors by projecting out these extra regressors from both the data and the other design matrix columns. So, I think what it will do in this case is assign all of the variance to the extra regressor and leave an essentially random design matrix column. But I'm not entirely sure what will ensue after that (might be okay or might be catastrophic). We may need to do a dbstop to trace things in the code...

Regarding the extraregressors, it should be as simple as passing an options struct like...  struct('extraregressors',{A B C}) where A is a 2D matrix of time points x regressors, B is a 2D matrix of time points x regressors, and so on. The number of time points in A should correspond to what you have for run 1, the number of time points in B should correspond to what you have for run 2, and so on.  The regressors in A, B, C are treated distinctly (i.e. the code estimates separate weights for the columns in A, the columns in B, etc.)

On a completely different note, perhaps a way to simplify a lot of these tricky issues is to simply estimate a separate beta weight (amplitude) for every trial you have in your experiment (and then deal with all the correct/incorrect stuff posthoc)? There is a new code function, GLMestimatesingletrial.m, that we are developing. It actually does GLMdenoise and several other major analysis magic, and it might be suitable for your needs. One thing, though, is that it requires the design matrix to be specified in lockstep with your TRs. If this is not currently the case, you could either round to the nearest TR, or upsample the fMRI time series data to better match your experiment. I can give you more details or discuss further, if you are curious

Kendrick



--
Kendrick Kay (k...@umn.edu)
Assistant Professor
Center for Magnetic Resonance Research
University of Minnesota

On May 20, 2020, at 7:36 PM, Darren Yeo <jians...@gmail.com> wrote:

Hi Dr. Kay and colleagues,

I am planning to use GLMdenoise to derive the optimal set of regressors, which I will then use to model my fMRI data better within BrainVoyager.

In our experimental task, a run has 54 trials:
9 exemplars x 3 trials each (total of 27 trials) where participants respond that a letter is present among a string of digits (e.g., 8A961)
27 trials in total where participants respond that a letter is absent among a string of digits (e.g., 64713)
We have 2-4 runs per subject, most with 4 runs.

In our lab, we traditionally distinguish the correct trials and incorrect trials (i.e., commission errors and omission errors) to better model our data.
For the 'design' input, I currently have for each run:
9 regressors for correct ‘present’ trials (1 for each exemplar; 3 trials each minus any incorrect trials)
1 regressor for correct ‘absent’ trials (27 trials minus any incorrect trials)
1 regressor for commission error trials
1 regressor for omission error trials

Here are the design matrices for 2 subjects (S1 with only 3 runs, and S2 with full set of 4 runs) and the conditions are in order of the 12 regressors mentioned above:

S1<S1_designmatrices.png>S2<S2_designmatrices.png>
 



Due to this coding scheme, some runs have no remaining correct trials for one or more exemplar conditions (so these condition regressors are vectors of zeros), and some runs have no error trials (so the error regressors are vectors of zeros). This seems to pose a problem for the cross-validation if I were to use these design matrices.

An alternative coding scheme that I am considering is a design matrix with the following to ensure that every run has the same set of conditions:
9 regressors for correct ‘present’ trials (1 for each exemplar; all 3 trials each regardless of response accuracy)
1 regressor for correct ‘absent’ trials (27 trials regardless of response accuracy)

Then include "extraregressors" to code for error trials:
1 regressor for commission error trials
1 regressor for omission error trials

For this alternative approach, I am worried about perfect collinearity between the exemplar condition regressor and one of the error regressors in any one run in which all the 'present' trials pertaining to an exemplar were incorrect trials.
I was wondering if you could advise on whether this alternative coding scheme is more sound than the original one, and if there are things that I should be concerned about. I am also wondering if I should omit extraregressors that are just vectors of zeros from each run, or just leave them in there because it was documented that "The number of extra regressors does not have to be the same across runs, and each run can have zero or more extra regressors.  If [] or not supplied, we do not use extra regressors in the model."

I would also appreciate it greatly if you could provide an example of how extraregressors can be incorporated in the code (I don't suppose it matters, but I am using onset timings for the <design> input as the onsets do not exactly coincide with the TRs).

Thank you!

Best,
Darren

--
You received this message because you are subscribed to the Google Groups "GLMdenoise" group.
To unsubscribe from this group and stop receiving emails from it, send an email to glmdenoise+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/glmdenoise/535ef59f-22e6-44f1-8dfa-fb7c3a8cc35e%40googlegroups.com.
<S1_designmatrices.png><S2_designmatrices.png>

Darren Yeo

unread,
May 21, 2020, 8:41:35 PM5/21/20
to GLMdenoise
Hi Darren,

Thanks for the clear explanation of your paradigm.

You touch on a number of issues which are collectively a bit tricky to think about, and there are many choices one can make...


Hi Dr Kay,

Thanks so much for the prompt and thoughtful reply! This is such a great tool and the posts here are all very informative. I was convinced by both Kay et al., 2013 and Charest et al., 2018 papers, so I am inclined to invest time to make sure I can use it - and use it well - for the current study and future studies.

Regarding the correct/incorrect trials - I am wondering what the intention is for these trials. For example, if the intention is that you just want to effectively ignore the response from incorrect trials, then that leads you down one path. Or if the intention is that you are specifically interested in estimating the response to incorrect trials that leads to other choices...

At least the way GLMdenoise is setup, it acts as if you don't care about the beta weight associated with any extraregressors you give it. This was a design choice a long time ago (and maybe it should be different).


My intention for distinguishing correct and incorrect trials is to effectively ignore the response from incorrect trials. I am interested in estimating exemplar-level neural representations from activation patterns in a priori regions-of-interest. As we can never be sure what caused commission errors/omission of responses, I do not want to assume that correct and incorrect trials evoked the same neural representations even thought the stimuli presented were identical. If I had assumed otherwise, I think the neural representations estimated will be noisier/"contaminated" than if I had not made that assumption. Based on your explanation, it seems that going the extraregressor route makes more sense than including them in the main design matrix.
 
Regarding the columns of all zeros -- I am not sure it poses a huge problem. I think it might be all gracefully handled (for example, a column of all zeros is by default given a beta estimate of zero).  Does the code run (without crashing)?  If so, things are likely fine.


The code ran without crashing, at least for S1, which was the only dataset I tested thus far. There were just warning messages indicating that the beta weights for them will be zero (which I appreciate greatly!). It got me thinking about what I could have done wrong for the first subject, because the noise pool I got comprises essentially the whole brain. One possibility was that I used the nearest TR in the design matrix instead of onset times, so the condition regressors could have been slightly off, but I doubt that was it (as you alluded to below that it is a reasonable approach). My next step is to use onset times and include the commission errors and omission as extraregressors, and see how it goes.
 
Regarding the collinearity issue, my understanding is that it might be the case (in one of the scenarios you lay out) that an extraregressor might be identical to one of your other design matrix columns. I agree this might be a problem. The code works by fully estimating (in the least squares sense) the beta weights for the extra regressors by projecting out these extra regressors from both the data and the other design matrix columns. So, I think what it will do in this case is assign all of the variance to the extra regressor and leave an essentially random design matrix column. But I'm not entirely sure what will ensue after that (might be okay or might be catastrophic). We may need to do a dbstop to trace things in the code...


I will try it and keep you posted. I have 5 subjects with this issue in at least 1 condition in 1-2 runs. I do not wish to exclude these subjects, but I would like to preprocess+denoise the data across subjects as similarly as possible, because one of our subsequent analyses will involve individual differences with behavioral measures and we want to reduce other uninteresting sources of variation.
 
Regarding the extraregressors, it should be as simple as passing an options struct like...  struct('extraregressors',{A B C}) where A is a 2D matrix of time points x regressors, B is a 2D matrix of time points x regressors, and so on. The number of time points in A should correspond to what you have for run 1, the number of time points in B should correspond to what you have for run 2, and so on.  The regressors in A, B, C are treated distinctly (i.e. the code estimates separate weights for the columns in A, the columns in B, etc.)

This is very clear and helpful. Thank you!
 
On a completely different note, perhaps a way to simplify a lot of these tricky issues is to simply estimate a separate beta weight (amplitude) for every trial you have in your experiment (and then deal with all the correct/incorrect stuff posthoc)? There is a new code function, GLMestimatesingletrial.m, that we are developing. It actually does GLMdenoise and several other major analysis magic, and it might be suitable for your needs. One thing, though, is that it requires the design matrix to be specified in lockstep with your TRs. If this is not currently the case, you could either round to the nearest TR, or upsample the fMRI time series data to better match your experiment. I can give you more details or discuss further, if you are curious

This is potentially useful, especially for classification analyses. I will need to think more about whether it is suitable for my purpose, which is to perform RSA. In the final GLM that I will implement in BrainVoyager (for consistency with our other analyses that led to our a priori ROIs), it will be a single model with all 4 runs of the task so that I can have more repetitions of each exemplar (up to 12 trials) to reliably estimate their betas. I'll definitely consider that in future if classification analyses are on the table for this or other datasets I have. Thank you for pointing me to this new code function!

Kendrick Kay

unread,
May 22, 2020, 8:08:19 AM5/22/20
to Darren Yeo, GLMdenoise
Hi Darren,

On a completely different note, perhaps a way to simplify a lot of these tricky issues is to simply estimate a separate beta weight (amplitude) for every trial you have in your experiment (and then deal with all the correct/incorrect stuff posthoc)? There is a new code function, GLMestimatesingletrial.m, that we are developing. It actually does GLMdenoise and several other major analysis magic, and it might be suitable for your needs. One thing, though, is that it requires the design matrix to be specified in lockstep with your TRs. If this is not currently the case, you could either round to the nearest TR, or upsample the fMRI time series data to better match your experiment. I can give you more details or discuss further, if you are curious

This is potentially useful, especially for classification analyses. I will need to think more about whether it is suitable for my purpose, which is to perform RSA. In the final GLM that I will implement in BrainVoyager (for consistency with our other analyses that led to our a priori ROIs), it will be a single model with all 4 runs of the task so that I can have more repetitions of each exemplar (up to 12 trials) to reliably estimate their betas. I'll definitely consider that in future if classification analyses are on the table for this or other datasets I have. Thank you for pointing me to this new code function!
 

Given the single trial outputs, there is (at least in theory) plenty of flexibility to do RSA or what not.  For example, you could average across trials (associated with a given condition) and then do condition RSA.  Or you could exclude incorrect trials and then proceed with analyses.  Or you could also split the single trials into training and test and then do some MVPA-style analyses.  Of course, the single trial estimates are new territory, so one would want to proceed with caution and do sanity checks.

See how it goes, and we can take it from there.

Kendrick



Message has been deleted

Darren Yeo

unread,
May 29, 2020, 5:50:18 PM5/29/20
to GLMdenoise
Hi Dr. Kay,

Thanks so much for the insights on using single-trial estimates. I totally agree that it offers greater downstream flexibility, and I'll consider that.

I have since tried different variants of the design with and without extraregressors for a couple of subjects:
1) 12 main condition regressors [9 target-present regressors (exemplar level) + 1 target-absent regressor + 1 regressor for commission errors + 1 regressor for omission]
2) 10 main condition regressors [9 target-present regressors (exemplar level) + 1 target-absent regressor] + 2 extraregressors [1 regressor for commission errors + 1 regressor for omission]
3) 4 main condition regressors [1 target-present regressor (collapsed across exemplars) + 1 target-absent regressor + 1 regressor for commission errors + 1 regressor for omission]
4) 2 main condition regressors [1 target-present regressor (collapsed across exemplars) + 1 target-absent regressor] + 2 extraregressors [1 regressor for commission errors + 1 regressor for omission]
When the extragressors are included, the main conditions disregard the correctness of the responses. 
When the extraregressors are not included, the main conditions only comprise correct responses.

I included GLM Variants 3 and 4 as they were used for the univariate analyses, in which we did not care about exemplar-level information. I hypothesized that Variants 3 and 4 will result in a higher cross-validated R^2 overall, and thus smaller noise pools. This indeed turned out to be the case, so it's good as a sanity check.

My first query is whether the noise pool and PC voxels that I have been getting look reasonable. Here are the mean volume, noise pool, and PC voxels from one of the subjects:

MeanVolume.png NoisePool.png PCvoxels.png


In the attached PowerPoint file, I provide a comparison of the noisepool.png, PCvoxels.png, PCscatterXX.png, and SNRcomparebeforeandafter.png for GLM Variants 1-4 above for 3 subjects (S1-S3). Each subject has a set of digit-as-target runs and a set of letter-as-target runs, and they were analyzed separately. If you are interested in looking at the full set of output figures for the 3 subjects, they can be accessed here: https://drive.google.com/drive/folders/1kq7KIPmGYpxkJ02wFnVa3KxWlhLNKV-g?usp=sharing

Having the extraregressors seem to result in substantial changes in the cross-validation R^2 and SNR, for some sets of runs, but not for others. In most cases, the error and omission trials are few, so I suppose that is why those regressors do not have much of an impact regardless of whether they are considered regressors of interest or extraregressors. Overall, there isn't an obvious advantage, but it doesn't seem to hurt either, so I suppose it depends on my intention (as you noted in your previous response).

My second query is whether I can use the noise regressors from GLM Variants 3 and 4 (i.e., collapsed across exemplars), for a final GLM that focuses on exemplar-level regressors (i.e., similar to GLM variants 1 and 2). This option came to mind because the noise regressors from Variants 3 and 4 may be somewhat more reliable, but I wonder if there are any conceptual concerns? Intuitively, I think it makes sense that if my final GLM is Variant 2, I should use the noise regressors extracted from Variant 2 because the noise regressors were estimated based on the task regressors (so they are non-independent).

Thank you in advance!

Best,
Darren
Comparison_differentialGLMcoding.pptx

Kendrick Kay

unread,
Jun 1, 2020, 2:15:41 PM6/1/20
to Darren Yeo, GLMdenoise
Hi Darren,

Sorry for the delay. Perhaps we should take this detailed discussion offline -- i'll email you separately.

Kendrick


--

Kendrick Kay (k...@umn.edu)
Assistant Professor
Center for Magnetic Resonance Research
University of Minnesota
Reply all
Reply to author
Forward
0 new messages