Classification significance: Permutation test vs Wavestrapping

MS Al-Rawi

unread,

Feb 19, 2010, 10:18:08 AM2/19/10

to MPVA

Hi

I was comparing wavestraper_results.m {from Polyn et al (2005) science which is based on Bullmore et al (2004)} to permute_test.m {a function that implements the work of Golland and Fischl (2003) LNCS 2732(1) 330-341}. After importing the subject, doing feature extraction, and for some classifier that gives results slightly above chance, the first function wavestraper_results.m always gives p=0 even if nshuffels is 1000, while for the same dataset and classifier, permute_test (no of permutations is 1000) gives reasonable p values (e.g., p=0.009). I am surprised by wavestrapper_results.m outputs and would very much like to hear your comments about this issue.

Regards......Rawi

Francisco Pereira

unread,

Feb 19, 2010, 11:52:00 AM2/19/10

to mvpa-t...@googlegroups.com

Without knowing the code, I suspect what you are seeing is a p-value
where the computation is

# permutations where statistic is greater than or equal to observed
--------------------------------------------------------------------------------
# permutations

rather than

1 + #permutations where ...
---------------------------------
1 + # permutations

the latter is the correct thing, in that the true labelling is one of
the possible permutations. For more details, see

"Nonparametric Permutation Tests for Functional Neuroimaging: A Primer
with Examples."
http://www.fil.ion.ucl.ac.uk/spm/doc/papers/NicholsHolmes.pdf

f

> --
> You received this message because you are subscribed to the Google Groups
> "Princeton MVPA Toolbox for Matlab" group.
> To post to this group, send email to mvpa-t...@googlegroups.com.
> To unsubscribe from this group, send email to
> mvpa-toolbox...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mvpa-toolbox?hl=en.
>

Greg Detre

unread,

Feb 19, 2010, 12:00:30 PM2/19/10

to mvpa-t...@googlegroups.com

it's hard to know from your description whether that's a bug.

a) do permute_test and wavestrapper_results perform the exact same
analysis? if not, then maybe it's not a surprise/problem that they differ.

b) 1000 iterations isn't that many. perhaps the results would stabilize
after more iterations?

g

> --
> You received this message because you are subscribed to the Google
> Groups "Princeton MVPA Toolbox for Matlab" group.
> To post to this group, send email to mvpa-t...@googlegroups.com.
> To unsubscribe from this group, send email to
> mvpa-toolbox...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mvpa-toolbox?hl=en.

--

---
Greg Detre
cell: 617 642 3902
email: gr...@gregdetre.co.uk
web: http://www.gregdetre.co.uk

MS Al-Rawi

unread,

Feb 19, 2010, 12:53:12 PM2/19/10

to mvpa-t...@googlegroups.com

Well, I am not sure that this is a bug or not. In fact if we use log_reg p is 0, but, wavestrapper_results applied to results of the same experimental settings when using GNB gives p = 0.2, permute_test gives a p value close to 0.2 too . But, being slightly above chance wavestrapper_results should not give p=0 (or so I believe).

>do permute_test and wavestrapper_results perform the exact same analysis? if not, then maybe it's not a surprise/problem that they differ.

of course they do not perform the same analysis, thus, p should differ too. In fact, due to stochastic permutation, p might be different even for the same function reimplementation (i.e., running permute_test may give p=0.009, then, running it again may give p=0.007).

b) 1000 iterations isn't that many

Though this would take lots of processing time, I will try 10,000 ...we'll have to wait for the machine's answer.

Regards....Rawi

Francisco Pereira

unread,

Feb 19, 2010, 12:57:19 PM2/19/10

to mvpa-t...@googlegroups.com

If you are getting p-values of 0 and 0.0009, consider that

0
-- = 0
1000

1
-- = 0.0009
1001

f

MS Al-Rawi

unread,

Feb 19, 2010, 1:06:39 PM2/19/10

to mvpa-t...@googlegroups.com

Hi Francisco

I need to have a look at the paper you suggested.

> If you are getting p-values of 0 and 0.0009, consider that 0/1000=0 ....1/1001 = 0.0009

so, wavestrapper_results.m may contain the bug you just mentioned?

Rawi

Francisco Pereira

unread,

Feb 19, 2010, 1:32:45 PM2/19/10

to mvpa-t...@googlegroups.com

On Fri, Feb 19, 2010 at 1:06 PM, MS Al-Rawi <raw...@yahoo.com> wrote:
>
> so, wavestrapper_results.m may contain the bug you just mentioned?

That was my thought. I had wondered about this till I read the paper
and asked a couple of other statisticians. But I'd be interested in
any sources that indicate otherwise.

f

Greg Detre

unread,

Feb 19, 2010, 2:54:32 PM2/19/10

to mvpa-t...@googlegroups.com

if your real result scores better than all 1000 scrambled null results,
then p = 0 is the right answer for it to give, isn't it? can you explain
that would be indicative of a bug?

g

Francisco Pereira

unread,

Feb 19, 2010, 3:00:13 PM2/19/10

to mvpa-t...@googlegroups.com

Not a bug, just the incorrect permutation test procedure. I'm going to
cite the Nichols&Holmes paper to spare everyone having to look for the
passage :) (page 4)

"Given exchangeability under the null hypothesis, the observed data is
equally likely to have arisen from any possible labeling. Hence, the
statistics associated with each of the possible labeling are also
equally likely. Thus, we have the permutation (or randomization)
distribution of our statistic: the permutation distribution is the
sampling distribution of the statistic under the null hypothesis,
given the data observed. Under the null hypothesis, the observed
statistic is randomly chosen from the set of statistics corresponding
to all possible relabelings. This gives us a way to formalize our
"surprise" at an outcome: the probability of an outcome as or more
extreme than the one observed, the P-value, is the proportion of
statistic values in the permutation distribution greater or equal to
that observed.
***
The actual labeling used in the experiment is one of the possible
labelings, so if the observed statistic is the largest of the
permutation distribution, the P-value is 1/N, where N is the number of
possible labelings of the initial randomization scheme.
***
"
f

Emre Demiralp

unread,

Feb 19, 2010, 3:01:59 PM2/19/10

to mvpa-t...@googlegroups.com

I believe you need to report p < 1/1000 not p=0.

Emre

Francisco Pereira

unread,

Feb 19, 2010, 3:05:34 PM2/19/10

to mvpa-t...@googlegroups.com

On Fri, Feb 19, 2010 at 3:01 PM, Emre Demiralp <emre.d...@gmail.com> wrote:
> I believe you need to report p < 1/1000 not p=0.

Correct, my (1000+1) was in case you already had the value for 1000
permutations.

f

Yaroslav Halchenko

unread,

Feb 19, 2010, 4:03:26 PM2/19/10

to mvpa-t...@googlegroups.com

On Fri, 19 Feb 2010, Francisco Pereira wrote:

> Not a bug, just the incorrect permutation test procedure. I'm going to
> cite the Nichols&Holmes paper to spare everyone having to look for the
> passage :) (page 4)

Let me insert my .1 of non-scientific rumble.

I think that I am placing some different meaning into word 'likely' in
following:

> "Given exchangeability under the null hypothesis, the OBSERVED DATA is
> equally LIKELY to have arisen from any possible labeling. Hence, the
> STATISTICS associated with each of the possible labeling are also
> equally LIKELY.

although indeed, under H0, observed data is as likely as any other, its
statistic is not (therefore we have some non-uniform distribution of
statistics usually). Hence incorporating it into the random sample test
(+1 in enumerator) and accounting for it in the 'sample size' (+1 in
denominator) might provide tiny but unnecessary conservative bias
(just consider the case when you don't know that # of permutations is
20, like in the authors' paper, then you assess on 20 permutations and
get your p=(1+1)/(20+1)=0.095 instead of true 1/20=0.05).

So far I see +1 just as a safety bias to avoid 0s in
p-statistics instead of a simple mathematician-unfriendly max:

p = max(p_estimate, MC simulation "resolution")
= max((number_of samples>alpha)/number_of_permutations, 1/number_of_permutations)
= max((number_of samples>alpha), 1)/number_of_permutations

instead of suggested conservative

((number_of samples>alpha) + 1) /(number of permutations +1)

or am I very wrong?

--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]

MS Al-Rawi

unread,

Feb 22, 2010, 5:14:48 AM2/22/10

to mvpa-t...@googlegroups.com

> So far I see +1 just as a safety bias to avoid 0s in

p-statistics

As well as penalizing the use of low #permutations.

Also, p approaches 0 only if #permutations approaches infinity

Rawi

Francisco Pereira

unread,

Feb 22, 2010, 10:01:28 AM2/22/10

to mvpa-t...@googlegroups.com

Yarik,

I should have been clearer: the only thing that matters is the 1 in
the numerator (which is what the paper passage alludes to). I add one
in the denominator because I don't want to have to remember whether I
did 9999 or 10000 permutations :)

f

Yaroslav Halchenko

unread,

Feb 22, 2010, 10:07:05 AM2/22/10

to mvpa-t...@googlegroups.com

On Mon, 22 Feb 2010, Francisco Pereira wrote:
> I should have been clearer: the only thing that matters is the 1 in
> the numerator (which is what the paper passage alludes to).

although they talk about that only for the case of undersampling, ie not
exploring all permutations, which makes it imho somewhat ad-hoc. But I
can be (and probably) wrong.

> I add one
> in the denominator because I don't want to have to remember whether I
> did 9999 or 10000 permutations :)

;-)

in any case, indeed, +1 seems to be just a guarding term which doesn't
matter much whenever reasonable number of permutations is used.

Francisco Pereira

unread,

Feb 22, 2010, 10:11:41 AM2/22/10

to mvpa-t...@googlegroups.com

On Mon, Feb 22, 2010 at 10:07 AM, Yaroslav Halchenko
<yarik...@gmail.com> wrote:
>
> On Mon, 22 Feb 2010, Francisco Pereira wrote:
>
> although they talk about that only for the case of undersampling, ie not
> exploring all permutations, which makes it imho somewhat ad-hoc. But I
> can be (and probably) wrong.

Actually, that's a really good way of looking at it, and I think I now
understand what you were getting at. If you could try all the
permutations, the true order of examples would certainly be one of the
possibilities. I'm not sure always adding it in is necessarily ad-hoc,
but it could mean the results are a little bit conservative. I have
something under review comparing p-values obtained with permutation
and analytical tests, as the number of permutations increases, and the
results seem to bear that.

f

Yaroslav Halchenko

unread,

Feb 22, 2010, 10:31:36 AM2/22/10

to mvpa-t...@googlegroups.com

On Mon, 22 Feb 2010, Francisco Pereira wrote:
> permutations, the true order of examples would certainly be one of the
> possibilities. I'm not sure always adding it in is necessarily ad-hoc,
> but it could mean the results are a little bit conservative.

exactly -- that was what I was trying to point out!

> something under review comparing p-values obtained with permutation
> and analytical tests, as the number of permutations increases, and the
> results seem to bear that.

cool -- is preprint available for general (or not so general) public ;)?

Reply all

Reply to author

Forward