Edouard Machery --- Significance Testing in Neuroimagery

New Waves in Philosophy of Mind

unread,

Dec 3, 2012, 11:42:24 AM12/3/12

to

Significance Testing in Neuroimagery

Edouard Machery

Contemporary philosophers of mind often appeal to findings obtained by brain imagery techniques, but they too rarely adopt an appropriate skeptical attitude toward the methods of cognitive neuroscience. In this chapter, I examine the most common way of testing a cognitive-neuroscientific hypothesis about the function of a brain area or network: derive a statistical hypothesis from it, and test it by means of null hypothesis significance testing. In particular, I will focus on Klein’s (2010) claim that, because of the reliance of neuroimagery on null hypothesis significance testing, fMRI data cannot provide evidence for or against functional hypotheses about brain areas and networks. I argue that Klein’s criticism fails because he misunderstands the way null hypothesis significance testing works in neuroimagery.

A PDF of the paper is ready to view and download in the attachment below.

A direct link to the PDF: http://goo.gl/KjEj5

Edouard Machery.pdf

Liz Irvine

unread,

Dec 3, 2012, 11:12:02 AM12/3/12

to new-waves-in-ph...@googlegroups.com

Hi Edouard (and all),

First, just to say that I really enjoyed the paper. A question out of ignorance (and interest) - I get that null hypotheses are range hypotheses, but how do researchers establish what the range is? Clearly this depends on a bunch of factors (as you note in the paper), but are there any standard methods for doing it? If there are outstanding issues about how to establish appropriate range hypotheses, then a much watered down version of Klein's argument could still go through(?).

Liz

cvklein

unread,

Dec 4, 2012, 12:08:51 PM12/4/12

to new-waves-in-ph...@googlegroups.com

Edouard,

I was glad to finally read this! I've talked to you a bit about this already, but here are some initial thoughts.

As I've mentioned, I think this is the best strategy for responding to my paper. To put your point another way, I think: it's not whether we can detect a signal under ideal cases that matters, but whether we can detect it using the particular, noisy setup that we have. And if we do, then we can be confident not just that there was a change, but that there was a comparatively large change. And it's the latter that counts for theory-building.

If that's a fair setup: I think that's right as far as it goes. But I think a new problem appears. Or rather, a pair of related problems. On the one hand, we want some indication as to how large a change is theoretically significant: if effect size is what matters, it would be nice to have a principled story about what counts as a large enough effect size (rather than, say, defaulting to Cohen's table). On the other hand, we'd want some reassurance that large effects are the ones that really matter: that they are not just downstream effects of the real processing, that there are not functionally significant small effects that would change our minds about the interpretation of the large effects, and so on. (I was just reading today about single cells in the hippocampus which, by their inhibitory effects, can cause switches between very different patterns of firing. I don't see why such an organization shouldn't be repeated on larger spatial scales.)

I think this is the deeper point in Meehl (obscured, to be fair, by Meehl's polemic): that sciences advance when they make quantitative predictions. Neuroimaging --- or any other part of psychology that deals with mostly qualitative predictions --- is then just working with an impoverished set of data. One solution (which Meehl mentions, and I dutifully mention somewhere) is looking at least for fit to a predicted functional form, rather than just qualitative differences. That might be a good starting point, and you do see this in, say, neuroimaging data. I don't think that gets around all of my worries, but it is a start.

cvklein

unread,

Dec 4, 2012, 12:39:28 PM12/4/12

to new-waves-in-ph...@googlegroups.com

Ooops--forgot to sign preceding. I'm sure it's obvious to Edouard, but maybe not to anyone else!

Colin Klein

Bence

unread,

Dec 7, 2012, 2:51:29 PM12/7/12

to new-waves-in-ph...@googlegroups.com

Hi Edouard, thanks – you convinced me entirely (I was worried when I first looked at the Klein paper). I wonder whether one could use a simpler argument – something like this:

Even if the brain is indeed a causally dense system, this shouldn’t matter as long as we measure the statistically significance difference between differences: Take brain region B1 and B2. B1 is the one we suspect activation. Now take the difference between the average BOLD signal during task and during control in B1. This is D1. And take the difference between the average BOLD signal during task and during control in B2. This is D2. If the difference between D1 and D2 is statistically significant, then we can conclude that D1 is activated and D2 is not – no matter how causally dense system the brain may be.

I was also wondering why you were so quick to suppose for the sake of argument that the brain is a causally dense system. In fact (and I guess this question is also for Colin), I have trouble coming up with a concept of ‘cause’ that would make the claim that the brain is a causally dense system plausible. If by ‘direct cause’ we mean deterministic cause (if A is activated, B will be activated), then the brain is blatantly not a causally dense system (no matter which level of description we choose). But if we understand ‘direct cause’ in a weaker way (statistical cause maybe?) then causally dense systems will pop up everywhere. The point could be made that if we go with this sense of ‘direct cause’, the population of the Earth is also a causally dense system (viz. that notorious four steps of friend of a friend claim). But I doubt that this bothers social scientists too much…

cvklein

unread,

Dec 7, 2012, 3:36:04 PM12/7/12

to new-waves-in-ph...@googlegroups.com

Bence: I'm having a bit of trouble following your proposal. Why would we conclude that B2 is *not* activated, rather than simply being less activated than B1? After all, if D2>0, that should be independent evidence that B2 is active, regardless of the comparison to B1. You might want to take a look at Henson's "Forward inference using functional neuroimaging..." (TICS, 2006); I think he has a proposal similar to yours, and with a few more requirements to deal with the possibility of differing variance across regions. (In general, the noise in the BOLD signal is not uniform across regions, which makes cross-region comparison difficult). But I think Henson's proposal still suffers from the same problem---that is, it doesn't distinguish non-activation from less activation.

The causal density argument has been offered in other realms, including economics, so I don't think it's odd to say that it ought to be a worry for social scientists. It seems to me obvious that the brain is a causally dense system, so perhaps we're talking past one another. Here's the basic claim: activity in one region of the brain will be affected by activity in any other region of the brain. Note that this doesn't say anything about how much it will be affected; I assume that in most cases, it will be relatively small. But to deny causal density is to claim that if you change activity in one region of the brain, there will be other regions of the brain that do not change their activity even one little bit. Given what we know about the brain, I think that's preposterous.

Again, the most plausible thing to say is exactly what Edouard says in his paper: since these changes are mostly going to be small, since small changes are mostly going to be insignificant, and since the statistical power of fMRI is relatively low to begin with, finding a change gives you some reason to believe that the change is big enough to be important and so the region in question is doing something important. Then I go back to saying what I said above!

Bence

unread,

Dec 7, 2012, 4:42:44 PM12/7/12

to new-waves-in-ph...@googlegroups.com

Colin, thanks - I was hoping you'd keep an eye on this thread.

What I had in mind was exactly a Henson 2006-like story. I agree that it does not show that there is NO activation in B2 - but that is (normally) not needed either as long as we have all the comparative results (for all the relevant brain regions) that B1 shows way more difference compared to the control scenario than any other brain region. There ARE fMRI experiments where the aim is to show that such and such a region is not active at all in such and such a task, but, frankly, one sees much much more experiments where the aim is to show that such and such a region IS active (or that it is more active than some other region or than it is in some other task).

A bit more on causal density: I really have trouble keeping track of how it is defined. In the original paper, you say that causally dense systems are "systems in which there is a causal path between changes in any explanatory variable and most other variables". If this is the definition, then the brain is undoubtedly a causally dense system, but then again, if this is the definition then I'm not sure that there are causally sparse systems at all (that would still count as systems). Modular systems would, for example, also qualify.

Now bringing in the 'direct or few steps of indirect causal links' talk does constrain the definition, but then I wonder whether it rules out the brain too...

Edouard Machery

unread,

Dec 8, 2012, 8:52:19 AM12/8/12

to new-waves-in-ph...@googlegroups.com

Liz,

Great question.

1. There is no formal procedure to determine what the range is, or, alternatively, what a trivial effect is. Too little is known about the BOLD signal and about the relation between processing and the BOLD signal for this.

2. And neuroimagists can do without: The only assumption they need is that functional changes in the BOLD signal are, or at least tend to be (see 3.), so much larger than non-functional changes (spreading activation) that the former, but not the latter, are detectable. This is, of course, an empirical assumption, but it strikes me as plausible.

3. This empirical assumption is however bound to be false for some non-functional voxels - viz. those that are close to functional voxels - but that is not too much a problem given the precision of our current localization. We know that these localizations are fairly rough and approximate.

E

Edouard Machery

unread,

Dec 8, 2012, 9:18:47 AM12/8/12

to new-waves-in-ph...@googlegroups.com

Colin

1. I do not think that the relevant distinction is between small and large effects (a la Cohen) which is a distinction between effects that are large enough to be likely to be measured given the typical power of our statistical tests. Rather, the distinction is between trivial effects - viz. effects that are so small that they are very unlikely to be detected (rather than "undetectable" as I put it in my response to Liz's question) with the power of our tests - and effects that are not trivial in that sense, and the assumption is that at least typically if a change in the BOLD signal is non-trivial, then it reflects the involvement in the voxels in the task (for "typically" see 3. of my response to Liz).

2. Your second question seems to challenge, or at least ask for more reason to accept the empirical assumption. I think that this is a fair challenge. I try to give some plausibility reasons in the paper, but more should be said to defend it. As I see it, much depends on the correction of this empirical assumption. Note that you have not given any reason to doubt it.

3. It would be nice indeed to have some quantative predictions about the activation of the BOLD signal due to cognitive processing, but we are very far from having such predictions, and this is precisely in this kind of epistemic condition that null hypothesis testing earns its keep. There is value in testing qualitative effects, and we should not be led to the view that the choice is either between testing quantitative predictions and despair or pseudo-science (see McCloskey's recent book for a caricatural defense of this alternative)>

Edouard Machery

unread,

Dec 8, 2012, 9:33:57 AM12/8/12

to new-waves-in-ph...@googlegroups.com

Bence,

1. About the first point. I think I discuss something along the lines somewhere in the paper. The method of substraction is not going to undermine the logical point made by Colin, since, first, it is very unlikely that for a given voxel V the spreading activation due to the task and the spreading activation due to the control will be equal, and since, second, if the null hypothesis is always false the significance level should be set at 0.

That said, you are right that the reliance on the method of subtraction matters when one realizes that range nulls, not point nulls, are tested. Because we are measuring the difference between the BOLD signal in two tasks, non functional changes in the BOLD signal are likely to be small.

I'll turn to your second point next.

edouard

Edouard Machery

unread,

Dec 8, 2012, 9:36:48 AM12/8/12

to new-waves-in-ph...@googlegroups.com

Bence

The second reading of cause is the right one and causally dense systems are everywhere. This should concern, and in fact has concerned, sociologists for a while. Meehl's paper highlighted the issue for non-experimental (observational) studies in psychology and sociology.

Sociologists are not too concerned, either because they simply ignore the issue or understand that they are testing range hypotheses to the effect that the value of the population parameter is zero or nearly so. In effect, they are embracing the response I give to Colin.

edouard

Edouard Machery

unread,

Dec 8, 2012, 9:41:52 AM12/8/12

to new-waves-in-ph...@googlegroups.com

Bence

causal density is of course graded. So systems can be more or less sparse or dense depending on the average length between any two nodes in the system. Modularity decreases the density of the system by increasing this average length.

In any case that part of the paper needs to be sligthly revised as a result of a useful comment by Richard Scheines at the PSA. Richard made the correct point (which I was somewhat aware of) that causal density by itself does not entail that changes in a variable will results in changes in the other variables. It also depends on the parametrization of the network. I don't think that this point changes a lot in the dialectic, but it should be clarified.

Edouard

Robert O'Shaughnessy

unread,

Dec 11, 2012, 12:08:51 PM12/11/12

to new-waves-in-ph...@googlegroups.com

Hi Edouard

I really like Colin’s argument but like Bence I am very relieved that you show it can be responded to! I was wondering if what was going on is a kind of meta question about what is good science that might be worth briefly exploring (i.e. your reply is not just relevant to causally dense systems). I have the very vague idea that a similar issue arose in Shallice’s 1988 book on neuropsychology to do with double dissociations which I know you also write about.

What I have in mind is that it is easy to imagine cases where voxels that are causally contributing do not show up as highly activated. As Colin says in his comment above a single neuron firing in say voxel B might cause a switch in the task which is hugely causally significant but because that voxel contributes nothing else it won’t ‘light up’ as statistically significant. The same goes for where say voxel C activates equally as much in the control task as it does in the test task.

We can also easily imagine the reverse, cases where a voxel always strongly activates in the test task despite not causally contributing. The most obvious would be where say voxel X is informed of the outcome of the task and so activates. The assumption will be that it causally contributes to the task when it does not.

On this basis we are in the epistemic position that for any given voxel we don’t know just from NHT whether it does or does not causally contribute to the task. But once a warning to this effect has been issued the power of NHT shows up. What your argument shows is that if a voxel ‘lights up’ it is (very?) *likely* that it causally contributes and if it does not ‘light up’ it is (very) likely that it does not causally contribute.

The meta-principle then is that it is well worth proceeding in science knowing that some of your claims will be false so long as you know that much more of your claims will be true (and even though you don’t know which are the true ones and which are the false ones). In other words we cannot be sure we are right about any single claim but we can be sure that much more of our claims are right than wrong. As I say I think the same point arises for double dissociations.

Maybe theorists like Meehl and perhaps even Colin (though I suspect not) are questioning whether scientists ought to proceed on this basis. However once statistical analysis is involved it would appear to be inevitable that one doesn’t often get to make entailment claims; yet scientists can surely make far better progress by proceeding on the basis of likely true claims rather than only on the basis of certainly true claims.

I’m sure this issue is obvious to philosophers of science but maybe it could be briefly brought up for those of us not so well versed on the topic?

Thanks for a great paper!

-Robert

Bence

unread,

Dec 12, 2012, 5:57:52 AM12/12/12

to new-waves-in-ph...@googlegroups.com

Edouard,

Now I'm even more bitter about the clash between your session and mine at the PSA. I'm just worried that while there's something intuitively appealing about the concept of causally dense system, it may be quite difficult to make it precise in such a way that would make it apply to the brain in a straightforward manner.

Edouard Machery

unread,

Dec 13, 2012, 6:47:41 AM12/13/12

to new-waves-in-ph...@googlegroups.com

Robert

Thanks for these comments.

Yes, you are right that the two situations you describe are possible. The second one plays an important role in Colin's argument.

The meta principle you describe is also very close to my views about the nature of statistical inference in a classical statistics framework. I defend it in the book I am writing, currently called Science Without Evidence. stay tuned!

e

Reply all

Reply to author

Forward