Questions on Mediation Analysis (Second level Bootstrapping / Multiple comparison problem)

tieman...@gmail.com

unread,

Feb 22, 2017, 3:02:24 AM2/22/17

to WagerlabTools

Dear Tor, dear Wani, dear Mediation toolbox-experts,

this is Laura writing you - I am a PostDoc at Markus Ploner´s lab in Munich, and at least Tor and I have met before (e.g. at the CiNet Conference in Osaka, 2015). I am contacting you today because I have a question regarding a current project where we make use of your MediationToolbox. Which works impressingly well - so, first of all, thank you very, very much for providing the toolbox, without which our project would not have happened at all!

In one part of the project, we aim to run a whole-brain multilevel mediation analysis (n = 51, ~60 trials per subject), investigating the role of EEG activity (100 frequencies x ~200 timesamples x 64 channels) as a mediator of the relationship between stimulus intensity (low/medium/high) and verbal pain rating (0-100). We already manage to get promising looking results on second level, but rather obviously, we have a massive multiple comparison problem and did not find a solution yet how to solve this issue. Thus, I would very much appreciate your thoughts on the following questions:

1. Bootstrapping on second level - do I understand it correctly?
In our analysis we run a mediation analysis with these settings:

[paths, stats] = mediation (X, Y, M, 'verbose', 'boot', 'bootsamples', 1000);

We looked into your code and tried to understand exactly when and how the bootstrapping is done. Is the following correct?:

As a result of the first-level mediation analyses (where no botstrapping is applied yet) we get 51 coefficients (for every frequency/time/channel-triplet). As a preparation for the second-level mediation analysis, these are weighted. In the process of second-level bootstrapping, we randomly draw 51 coefficients + corresponding weights from our pool and calculate the group-level coefficients (or "mean" in the stats). This procedure we repeat a thousand times, resulting in a distribution of group level coefficients, based on which we can determine the sigificance of our original group level coefficient.

2. Possible approach to solve multiple comparison problem?
To solve our massive muliple comparison problem, we thought of the following approach:

Instead of applying bootstrapping on second-level, we could permute the weighted single-subject coefficients in our baseline period and the weighted singe-subject coefficients in our poststimulus period, and calculate a weighted mean coefficient after every permutation. That would give us a permutation distribution per frequency/time/channel-triplet, against which we could test the significance of our original weighted mean coefficient at every frequency/time/channel-triplet by means of cluster-based permutation statistic. So far so good...

However, we assumed that it would be problematic to permute between baseline and poststimulus period and not adjust the weights for the "new" samples (in other words, not getting a 1 anymore as a result when adding the weights over subjects). Still, it appears that you also draw 51 random subjects from the pool without adjusting the weights for every new sample during your bootstrapping approach, suggesting that it might not be problematic at all - would you be so kind to explain to me why you think this approach does not pose a problem?

Thank you a lot already for your time and for any information on the above issues. I am already very curious to hear your answers on those!

Kind regards from Munich,

Laura

Tor Wager

unread,

Feb 22, 2017, 10:18:04 AM2/22/17

to tieman...@gmail.com, WagerlabTools

Hi Laura,

Glad you are finding the toolbox useful! It sounds like permuting the baseline values relative to the post-stimulus values might violate exchangeability by not preserving the natural correlations in your dataset. But I may be misunderstanding what you’re doing.

A simple way to correct for multiple comparisons is to use the bootstrapped p-values to calculate an FDR threshold.

If you want FWER but don’t want to use Bonferroni, you might want to permute the labels on X, M, and Y, and re-calculate the max mediation coefficient across your set. My understanding from recent conversations with Stephan Geuter and Martin Lindquist is that this is not quite correct, however, and it’s quite tricky to do a gold-standard permutation test for mediation. Winkler and Nichols have written about this. But if you permute M, it could give you a reasonable approximation of the maximum null-hypothesis a*b coefficient. Bonferroni also seems safe.

Best,

Tor

--
You received this message because you are subscribed to the Google Groups "WagerlabTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wagerlabtool...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tor Wager

unread,

Feb 22, 2017, 10:18:20 AM2/22/17

to tieman...@gmail.com, WagerlabTools

Hi Laura,

Glad you are finding the toolbox useful! It sounds like permuting the baseline values relative to the post-stimulus values might violate exchangeability by not preserving the natural correlations in your dataset. But I may be misunderstanding what you’re doing.

A simple way to correct for multiple comparisons is to use the bootstrapped p-values to calculate an FDR threshold.

If you want FWER but don’t want to use Bonferroni, you might want to permute the labels on X, M, and Y, and re-calculate the max mediation coefficient across your set. My understanding from recent conversations with Stephan Geuter and Martin Lindquist is that this is not quite correct, however, and it’s quite tricky to do a gold-standard permutation test for mediation. Winkler and Nichols have written about this. But if you permute M, it could give you a reasonable approximation of the maximum null-hypothesis a*b coefficient. Bonferroni also seems safe.

Best,

Tor

On Feb 22, 2017, at 3:02 AM, tieman...@gmail.com wrote:

Stephan Geuter

unread,

Feb 22, 2017, 11:37:37 AM2/22/17

to Laura Bok, WagerlabTools, Tor Wager

Hi Laura,

As Tor said, permutation testing in mediation is challenging because you need to ensure exchangeability in two inter-related regression models. When you permute M, you change the predictor set and its covariance structure of the other regression (estimating the path b from M to Y). Depending on the magnitude of the coefficients for path a and b, the direct test for path ab might fail to control Type-I errors (see attached paper by Taylor et al.). This paper also describes permutation tests based on confidence intervals which are better at controlling Type-I errors.

FDR or Bonferroni Holm would probably the easiest and most straightforward way to correct for multiple comparisons.

I’m also attaching the papers by Winkler & Nichols describing permutations tests for (multi-level) regression models

Cheers,

Stephan

Taylor_BehavRes_12_mediation_perm.pdf

Winkler_NI_14.pdf

Winkler_NI_15_MultilevelBlockPermutation.pdf

Tor Wager

unread,

Feb 23, 2017, 1:16:02 PM2/23/17

to Stephan Geuter, Martin Lindguist, Laura Bok, WagerlabTools

Thanks, Stephan. The Taylor and McKinnon approach is straightforward, and they say it’s valid and performs well. Do you think it’s right? If so that would be very useful.

Tor

<Taylor_BehavRes_12_mediation_perm.pdf><Winkler_NI_14.pdf><Winkler_NI_15_MultilevelBlockPermutation.pdf>

Laura Bok

unread,

Feb 24, 2017, 5:20:19 AM2/24/17

to Tor Wager, Stephan Geuter, Martin Lindguist, WagerlabTools

Dear Tor, dear Stephan,

thank you for your swift answers so far!

It sounds like permuting the baseline values relative to the post-stimulus values might violate exchangeability by not preserving the natural correlations in your dataset. But I may be misunderstanding what you’re doing.

The raw values in our dataset are results from a moving-window FFT. Thus, it is indeed the case that successive datapoints are not independent from each other (and neither are the resulting weighted coefficients for every datapoint, which we would permute). Is this dependance of successive datapoints due to the moving-window FFT what you mean with "natural correlations" in the dataset? Or am I getting you wrong? I agree that we might have a problem here, as permutation between the baseline and poststimulus period would destroy this inherent dependency of successive datapoints.

I would also like to come back to my earlier question once more about the second level bootstrapping; I understand that, in the toolbox, you draw (e.g. 1000) random samples consisting of 51 weighted single-subject coefficients each, calculate the respective group level coefficient without adjusting the single-subject weights in each of those random samples, determine a distribution of the resulting group level coefficients, and determine the sigificance of our original group level coefficient by comparing it against this distribution. I do not quite understand, however, why it isn´t problematic to not adjust the weights of the single-subject coefficients in every new random sample? Maybe I am a bit lost here, but I would really appreciate if you could help me out on this one! Thank you already for your help.

A simple way to correct for multiple comparisons is to use the bootstrapped p-values to calculate an FDR threshold.

I agree with you both that FDR might be the most straightforward method, and I will try that in a next step - however, with far more than one million analyses (now, that´s a multiple comparison problem we have here!), I am afraid that nothing will survive the rather conservative correction. Of course the same applies (even more so) to the correction based on Bonferroni Holm.

Thank you very much for thinking along once more,

all the best,

Laura

Tor

To unsubscribe from this group and stop receiving emails from it, send an email to wagerlabtools+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "WagerlabTools" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wagerlabtools+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

<Taylor_BehavRes_12_mediation_perm.pdf><Winkler_NI_14.pdf><Winkler_NI_15_MultilevelBlockPermutation.pdf>

--

Dr. Laura Bok, Dipl.-Psych.
Klinikum rechts der Isar
Technische Universität München
Neurologie
Ismaninger Str. 22
81675 München

Tor Wager

unread,

Feb 24, 2017, 12:27:35 PM2/24/17

to Laura Bok, Stephan Geuter, Martin Lindguist, WagerlabTools

Hi Laura,

The permutation method permutes 1st-level relationships in multilevel mediation, which might be an issue with dependent data, yes — Martin, what do you think?

The 2nd-level bootstrap actually selects subjects with replacement, not time points, in case that wasn’t clear. The weights are based on within-person precision, which does not change, so they are not re-calculated. We do not change the relative weights so that they sum to 1. This practice might increase the variance in the weighted sum (group-level coefficients), reducing power. It’s possible we could do it the other way, but we’d have to evaluate it carefully with true/false positive simulations, etc. The variation in the weighted sum is what is used to construct the bootstrap distribution, so the more variation, the less power.

Cheers,

Tor

Laura Bok

unread,

Mar 2, 2017, 12:10:41 PM3/2/17

to Tor Wager, Stephan Geuter, Martin Lindguist, WagerlabTools

Dear all,

The 2nd-level bootstrap actually selects subjects with replacement, not time points, in case that wasn’t clear. The weights are based on within-person precision, which does not change, so they are not re-calculated. We do not change the relative weights so that they sum to 1. This practice might increase the variance in the weighted sum (group-level coefficients), reducing power. It’s possible we could do it the other way, but we’d have to evaluate it carefully with true/false positive simulations, etc. The variation in the weighted sum is what is used to construct the bootstrap distribution, so the more variation, the less power.

Thanks for the in-depth explanation Tor, that made it a lot clearer to me! Just to be sure, am I correct in assuming the following: In your 2nd level bootstrap approach you could choose to either adjust the single subject weights, or not to adjust them - both approaches have their pros and cons - however, with choosing not to adjust them, the biggest disadvantage you have to expect is that the power will be decreased (which is not nice, but still better than an inflated type I error rate)?

Thank you again, Stephan, for the paper by Taylor & MacKinnon, which I now read in detail - very interesting. We already thought of the following approach for our data:

1. permute the labels of X (or Y)

2. run 1000 mediation analyses (for every time-frequency-channel-triplet, respectively) and get a permutation distribution of weighted multi-level coefficients at every time-frequency-channel-triplet

3. apply cluster-based permutation statistics to correct for multiple comparisons and determine the significance of the coefficients at every triplet.

However, I fear that this approach might have the same problem as the "permutation test of ab" in Taylor & MacKinnon´s paper? Besides the fact that it would be computationally very demanding...

Thank you, as always, a lot for thinking along!

All the best,

Laura

Reply all

Reply to author

Forward