[SciPy-user] help with scipy.stats.mannwhitneyu

Wavy Davy

unread,

Feb 5, 2009, 6:37:47 AM2/5/09

to scipy...@scipy.org

Hi all

I am using the mannwhitneyu in the stats module, and I was looking the
code and I see this notice in the docstring.

"Use only when the n in each condition is < 20 and you have 2
independent samples of ranks. "

Am I reading it correctly that this test should only be used with
sample sizes less than 20?

I am not a statistican, more a python coder. I have been pointed and
this test as a more robust version of the t-test, so forgive my
ignorance.

Any help would be much appreciated.

--
Simon
_______________________________________________
SciPy-user mailing list
SciPy...@scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user

josef...@gmail.com

unread,

Feb 5, 2009, 9:33:05 AM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 6:37 AM, Wavy Davy <bloode...@gmail.com> wrote:
> Hi all
>
> I am using the mannwhitneyu in the stats module, and I was looking the
> code and I see this notice in the docstring.
>
> "Use only when the n in each condition is < 20 and you have 2
> independent samples of ranks. "
>
> Am I reading it correctly that this test should only be used with
> sample sizes less than 20?
>
> I am not a statistican, more a python coder. I have been pointed and
> this test as a more robust version of the t-test, so forgive my
> ignorance.
>
> Any help would be much appreciated.
>
> --
> Simon

I briefly looked at the test, the implementation of the test statistic
is mostly as described in
http://en.wikipedia.org/wiki/Mann-Whitney_U_test

It seems the test statistic is defined with the opposite sign from the
definition in wikipedia.

The doc string statement "Use only when the n in each condition is <
20", I think should be >20, since the pvalue is based on the
asymptotic distribution, which is only correct in larger samples.

I didn't see any unit tests for this test, but I will try to verify
the results later today.

wilcoxon is a similar test for paired instead of independent samples,
and there the recommendation in the docstring is for N>20.

Josef

Sturla Molden

unread,

Feb 5, 2009, 9:32:59 AM2/5/09

to SciPy Users List

On 2/5/2009 12:37 PM, Wavy Davy wrote:

> I am using the mannwhitneyu in the stats module, and I was looking the
> code and I see this notice in the docstring.
>
> "Use only when the n in each condition is < 20 and you have 2
> independent samples of ranks. "
>
> Am I reading it correctly that this test should only be used with
> sample sizes less than 20?

First of all, the Mann-Withney U-test should NEVER be used. It has
assumptions that are mathematically problematic, known as the
"Behrens-Fisher problem". What you probably want to use is the "Wilcoxon
rank-sum test". Despite common belief, Mann-Withney U and Wilcoxon
rank-sum are not the same test. The latter assumes equal variance, the
former do not. The Mann-Withney U has even been shown to fail when
distributions have unequal variance (Journal of Experimental Education,
Vol. 60, 1992), so its justification over the Wilcoxon rank-sum test is
questionable. Wikipedia says the Wilcoxon rank-sum test assumes equal
sample sizes; this is not correct.

I would vote for the immediate removal of Mann-Withney U-test from
SciPy. The only thing it should do is raise an exception and instruct
the user to apply a t-test or Wilcoxon rank-sum test instead.

As a side note, if you request a Mann-Withney test in MINITAB, you
actually get a Wilcoxon rank-sum test instead.

Then for your question:

If N > 20, you can just as well use a t-test. Its assumptions will be
asymptotically valid due to the central limit theorem, even though the
data are not normally distributed. If you are worried about outliers, as
opposed to systematic deviation from normality, use the Wilcoxon
rank-sum test instead: When the data is transformed to rank scale and
the two sample sizes are M and N respectively, the Mann-Withney
U-statistic has O(N*M) complexity whereas the Wilcoxon rank-sum
statistic only has O(N+M) complexity. O(N*M) behaviour makes the
Mann-Withney U-statistic intractable for large samples.

Sturla Molden

unread,

Feb 5, 2009, 9:38:31 AM2/5/09

to SciPy Users List

On 2/5/2009 3:33 PM, josef...@gmail.com wrote:

> wilcoxon is a similar test for paired instead of independent samples,
> and there the recommendation in the docstring is for N>20.

There are two Wilcoxon tests. The signed-rank test for paired samples
and the rank-sum test for independent samples.

S.M.

Bruce Southey

unread,

Feb 5, 2009, 9:56:08 AM2/5/09

to SciPy Users List

Wavy Davy wrote:
> Hi all
>
> I am using the mannwhitneyu in the stats module, and I was looking the
> code and I see this notice in the docstring.
>
> "Use only when the n in each condition is < 20 and you have 2
> independent samples of ranks. "
>
> Am I reading it correctly that this test should only be used with
> sample sizes less than 20?
>
> I am not a statistican, more a python coder. I have been pointed and
> this test as a more robust version of the t-test, so forgive my
> ignorance.
>
> Any help would be much appreciated.
>
>

I think the docstring is referring to the distribution of the actual
U-test. So for small samples typically the pvalue is directly computed
from the sampling distribution. However, Scipy is using the normal
approximation is which is not meant to be that great.

http://faculty.vassar.edu/lowry/utest.html
http://faculty.vassar.edu/lowry/ch11a.html
http://www.alglib.net/statistics/hypothesistesting/mannwhitneyu.php

Bruce

josef...@gmail.com

unread,

Feb 5, 2009, 10:36:08 AM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 9:38 AM, Sturla Molden <stu...@molden.no> wrote:
> On 2/5/2009 3:33 PM, josef...@gmail.com wrote:
>
>> wilcoxon is a similar test for paired instead of independent samples,
>> and there the recommendation in the docstring is for N>20.
>
> There are two Wilcoxon tests. The signed-rank test for paired samples
> and the rank-sum test for independent samples.
>

According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test
for independent samples, just a different name.

Josef

Sturla Molden

unread,

Feb 5, 2009, 10:59:04 AM2/5/09

to SciPy Users List

On 2/5/2009 4:36 PM, josef...@gmail.com wrote:

> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test
> for independent samples, just a different name.

You should not trust Wikipedia.

Stéfan van der Walt

unread,

Feb 5, 2009, 11:09:43 AM2/5/09

to SciPy Users List

2009/2/5 Sturla Molden <stu...@molden.no>:

> On 2/5/2009 4:36 PM, josef...@gmail.com wrote:
>
>> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test
>> for independent samples, just a different name.
>
> You should not trust Wikipedia.

Or, you can fix the entry on Wikipedia...

Stéfan

Wavy Davy

unread,

Feb 5, 2009, 11:24:45 AM2/5/09

to SciPy Users List

2009/2/5 Bruce Southey <bsou...@gmail.com>:

> I think the docstring is referring to the distribution of the actual
> U-test. So for small samples typically the pvalue is directly computed
> from the sampling distribution. However, Scipy is using the normal
> approximation is which is not meant to be that great.
>
> http://faculty.vassar.edu/lowry/utest.html
> http://faculty.vassar.edu/lowry/ch11a.html
> http://www.alglib.net/statistics/hypothesistesting/mannwhitneyu.php

OK - that makes more sense. Thanks.

I've ended up using the Kruskal-Wallis extension to Mann-Whitney
anyway, as I have multiple data samples. Which of course scipy
provides with the kurskal function. Confusing docstrings aside, its
been a pleasure to us :)

--
Simon

Bruce Southey

unread,

Feb 5, 2009, 11:32:51 AM2/5/09

to SciPy Users List

Or perhaps it is actually correct. My understanding (because I don't want to do it) is that these are equivalent and all major stats packages provide just one test. For example, Prof Brian Ripley's reply on the R list

https://stat.ethz.ch/pipermail/r-help/2005-May/071544.html

> I am hoping someone could shed some light into the Wilcoxon Rank Sum Test
> for me?  In looking through Stats references, the Mann-Whitney U-test and
> the Wilcoxon Rank Sum Test are statistically equivalent.

Yes, but not numerically: they differ by a constant (in the data, a 
function of the data size).

Bruce

josef...@gmail.com

unread,

Feb 5, 2009, 11:39:19 AM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 10:36 AM, <josef...@gmail.com> wrote:
> On Thu, Feb 5, 2009 at 9:38 AM, Sturla Molden <stu...@molden.no> wrote:
>> On 2/5/2009 3:33 PM, josef...@gmail.com wrote:
>>
>>> wilcoxon is a similar test for paired instead of independent samples,
>>> and there the recommendation in the docstring is for N>20.
>>
>> There are two Wilcoxon tests. The signed-rank test for paired samples
>> and the rank-sum test for independent samples.
>>
>
> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test
> for independent samples, just a different name.
>
> Josef
>

So far:

According to R:
wilcox.test(x,y)
Performs one and two sample Wilcoxon tests on vectors of data; the
latter is also known as 'Mann-Whitney' test.

I tried a normal random variable example ( no ties): the test
statistic returned is exactly the same as the one returned by
stats.mannwhitneyu(x,y) however the p-values differ. the pvalue in
stats is half of the one in R (up to 1e-17) as stated in the
docstring: one-tailed p-value.

In R the test statistic is the same for the two sided and the one
sided tests, but the reported p-values differ.

I used sample size 100.

So there is an inconsistency in the reporting in stats.mannwhitneyu,
the test statistic is for the two-sided test, but the p-value is half
of the two sided p-value and should be multiplied by two.

I haven't checked the tie handling.

Sturla Molden

unread,

Feb 5, 2009, 11:56:27 AM2/5/09

to SciPy Users List

On 2/5/2009 5:39 PM, josef...@gmail.com wrote:

> According to R:
> wilcox.test(x,y)
> Performs one and two sample Wilcoxon tests on vectors of data; the
> latter is also known as 'Mann-Whitney' test.
>
> I tried a normal random variable example ( no ties): the test
> statistic returned is exactly the same as the one returned by
> stats.mannwhitneyu(x,y) however the p-values differ. the pvalue in
> stats is half of the one in R (up to 1e-17) as stated in the
> docstring: one-tailed p-value.

I believe there is a bug in SciPy:

def mannwhitneyu(x, y):
"""Calculates a Mann-Whitney U statistic on the provided scores and
returns the result. Use only when the n in each condition is < 20 and
you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
significant if the u-obtained is LESS THAN or equal to the critical
value of U.

Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
"""
x = asarray(x)
y = asarray(y)
n1 = len(x)
n2 = len(y)
ranked = rankdata(np.concatenate((x,y)))
rankx = ranked[0:n1] # get the x-ranks
#ranky = ranked[n1:] # the rest are y-ranks
u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx,axis=0) # calc U for x
u2 = n1*n2 - u1 # remainder is U for y
bigu = max(u1,u2)
smallu = min(u1,u2)
T = np.sqrt(tiecorrect(ranked)) # correction factor for tied scores
if T == 0:
raise ValueError, 'All numbers are identical in amannwhitneyu'
sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0)
z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
return smallu, 1.0 - zprob(z)

Take a look at the last two lines? Do you see something peculiar?

Sturla Molden

josef...@gmail.com

unread,

Feb 5, 2009, 12:23:16 PM2/5/09

to SciPy Users List

you mean that it uses bigu for the p-value calculation but reports
smallu as the test-statistic?

I didn't try to figure out what the formula for the p-value actually
is, but I'm pretty happy that we get the same result as R, except for
the times 2.

I looked some more at the R implementation :

the main difference is that R uses by default a continuity correction
"correct a logical indicating whether to apply continuity correction
in the normal approximation for the p-value"

>>> rwilcox=rpy.r('wilcox.test')
>>> stats.mannwhitneyu(rvs1,rvs2)[1]*2 - rwilcox(rvs1,rvs2,correct = False)['p.value']
-1.5265566588595902e-016

The test statistic in R is not symmetric in its argument, although the
p-values are, stats.mannwhitneyu is symmetric in statistic and
p-value.

>>> rresult = rwilcox(rvs2,rvs1)
>>> rresult['statistic']
{'W': 5637.0}
>>> rresult['p.value']
0.11989439052971607

>>> rresult = rwilcox(rvs1,rvs2)
>>> rresult['statistic']
{'W': 4363.0}
>>> rresult['p.value']
0.11989439052971618

So overall stats.mannwhitney, I think, looks pretty good but it could
be expanded to include some of the options that R offers, and I also
think we should multiply the pvalue by 2, so that the reported p-value
actually corresponds to the test.

Josef

Sturla Molden

unread,

Feb 5, 2009, 12:29:43 PM2/5/09

to SciPy Users List

On 2/5/2009 6:23 PM, josef...@gmail.com wrote:

> you mean that it uses bigu for the p-value calculation but reports
> smallu as the test-statistic?

Yes

S.M.

Alan G Isaac

unread,

Feb 5, 2009, 12:35:46 PM2/5/09

to SciPy Users List

> On 2/5/2009 4:36 PM, josef...@gmail.com wrote:
>> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test
>> for independent samples, just a different name.

On 2/5/2009 10:59 AM Sturla Molden apparently wrote:
> You should not trust Wikipedia.

Or any other encyclopedia.
But actually Wikipedia is usually pretty good
on technical matters, and can easily be fixed.

fwiw,
Alan Isaac

josef...@gmail.com

unread,

Feb 5, 2009, 12:46:48 PM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 12:29 PM, Sturla Molden <stu...@molden.no> wrote:
> On 2/5/2009 6:23 PM, josef...@gmail.com wrote:
>
>> you mean that it uses bigu for the p-value calculation but reports
>> smallu as the test-statistic?
>
> Yes
>

Given that it works, I didn't want to spend time on this, but wikipedia again:

"therefore, the absolute value of the z statistic calculated will
be same whichever value of U is used."

As I understand it, because the sum U1+U2 is fixed (given the sample
sizes), many properties are equivalent, i.e.
U1 - meanU = - (U2 - meanU)

so whether bigU or smallU is used in the calculation of z doesn't
matter, I have no idea why in this specific implementation both are
calculated if smallU would be enough.

Josef

Sturla Molden

unread,

Feb 5, 2009, 1:03:34 PM2/5/09

to SciPy Users List

On 2/5/2009 6:46 PM, josef...@gmail.com wrote:

> so whether bigU or smallU is used in the calculation of z doesn't
> matter, I have no idea why in this specific implementation both are
> calculated if smallU would be enough.

By the way, there is a fucntion scipy.stats.ranksums that does a
Wilcoxon rank-sum test. It seems to be using a large-sample
approximation, and has no correction for tied ranks.

S.M.

Pierre GM

unread,

Feb 5, 2009, 1:11:55 PM2/5/09

to SciPy Users List

On Feb 5, 2009, at 1:03 PM, Sturla Molden wrote:

> On 2/5/2009 6:46 PM, josef...@gmail.com wrote:
>
>> so whether bigU or smallU is used in the calculation of z doesn't
>> matter, I have no idea why in this specific implementation both are
>> calculated if smallU would be enough.
>
> By the way, there is a fucntion scipy.stats.ranksums that does a
> Wilcoxon rank-sum test. It seems to be using a large-sample
> approximation, and has no correction for tied ranks.

Please keep in mind that some of the tests have been reimplemented in
scipy.stats.mstats to support masked/missing values in scipy.mstats
and to take ties into accounts ...
I trust y'all to let me know of any inconsistencies between the masked/
unmasked versions, whether in terms of signatures or assumptions.
Thx a lot in advance...

josef...@gmail.com

unread,

Feb 5, 2009, 1:16:22 PM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 1:03 PM, Sturla Molden <stu...@molden.no> wrote:
> On 2/5/2009 6:46 PM, josef...@gmail.com wrote:
>
>> so whether bigU or smallU is used in the calculation of z doesn't
>> matter, I have no idea why in this specific implementation both are
>> calculated if smallU would be enough.
>
> By the way, there is a fucntion scipy.stats.ranksums that does a
> Wilcoxon rank-sum test. It seems to be using a large-sample
> approximation, and has no correction for tied ranks.
>
> S.M.
>

Also, in the explanation for kruskal it says it's an extension of
Mann-Whitney-U to more than 2 groups

for 2 groups (no ties):

>>> stats.kruskal(rvs1,rvs2)[1] - stats.mannwhitneyu(rvs1,rvs2)[1]*2
-4.8572257327350599e-016
>>> stats.kruskal(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1]
-4.8572257327350599e-016
>>> stats.ranksums(rvs1,rvs2)[1] - stats.mannwhitneyu(rvs1,rvs2)[1]*2
0.0

It looks like there are some redundancies or small variations in these
tests. A systematic list of all tests would be pretty useful.

Josef

josef...@gmail.com

unread,

Feb 5, 2009, 1:31:00 PM2/5/09

to SciPy Users List

a quick check looks pretty good (still example without ties)

>>> stats.mstats.kruskal(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1]
-4.8572257327350599e-016
>>> stats.mstats.kruskalwallis(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1]
-4.8572257327350599e-016

>>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1]
0.00029058688269312238
>>> stats.mstats.mannwhitneyu(rvs1,rvs2)
(4363.0, 0.11989439052971618)
>>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - rwilcox(rvs1,rvs2,correct = False)['p.value']
0.00029058688269296973
>>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - rwilcox(rvs1,rvs2)['p.value']
0.0

stats.mstats.mannwhitneyu employs continuity correction by default as in R.

Just calling this, according to docstring, requires sequence, correct
usage is not clear:

>>> stats.mstats.compare_medians_ms(rvs1,rvs2)
Traceback (most recent call last):
File "<pyshell#127>", line 1, in <module>
stats.mstats.compare_medians_ms(rvs1,rvs2)
File "\Programs\Python25\Lib\site-packages\scipy\stats\mstats_extras.py",
line 332, in compare_medians_ms
(std_1, std_2) = (mstats.stde_median(group_1, axis=axis),
File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py",
line 1511, in stde_median
return _stdemed_1D(data)
File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py",
line 1504, in _stdemed_1D
n = len(sorted)
TypeError: object of type 'builtin_function_or_method' has no len()

Josef

Pierre GM

unread,

Feb 5, 2009, 1:35:35 PM2/5/09

to SciPy Users List

On Feb 5, 2009, at 1:31 PM, josef...@gmail.com wrote:

>
> Just calling this, according to docstring, requires sequence, correct
> usage is not clear:
>
>>>> stats.mstats.compare_medians_ms(rvs1,rvs2)

OK, can you send me a test sample (ie, the rvs1& rvs2 you used that
fail, and what we should have had)? I'll try to fix that this
afternoon...

josef...@gmail.com

unread,

Feb 5, 2009, 1:53:37 PM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 1:35 PM, Pierre GM <pgmde...@gmail.com> wrote:
>
> On Feb 5, 2009, at 1:31 PM, josef...@gmail.com wrote:
>
>>
>> Just calling this, according to docstring, requires sequence, correct
>> usage is not clear:
>>
>>>>> stats.mstats.compare_medians_ms(rvs1,rvs2)
>
> OK, can you send me a test sample (ie, the rvs1& rvs2 you used that
> fail, and what we should have had)? I'll try to fix that this
> afternoon...
>

I generated rvs1 and rvs2 (without fixed seed)

rvs1 = stats.norm.rvs(size = 100)
rvs2 = 0.25*stats.norm.rvs(size = 100)

I didn't look at stats.mstats.compare_medians_ms specifically, but it
sounded like it should do something similar to the other tests I was
trying out. So, I don't know what the expected answer should be, but I
would expect a p-values similar to the other non-parametric tests for
equality of location.

the problem is in stats.mstats_basic.stde_median.
Note: it is not exported (I'm not sure how or why the imports work)
and can be accessed only directly

>>> stats.mstats.stde_median(rvs1,axis=0)

Traceback (most recent call last):

File "<pyshell#143>", line 1, in <module>
stats.mstats.stde_median(rvs1,axis=0)
AttributeError: 'module' object has no attribute 'stde_median'

calling it directly returns this:

>>> stats.mstats_basic.stde_median(rvs1,axis=0)

Traceback (most recent call last):

File "<pyshell#142>", line 1, in <module>
stats.mstats_basic.stde_median(rvs1,axis=0)

File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py",

line 1514, in stde_median
return ma.apply_along_axis(_stdemed_1D, axis, data)
File "C:\Programs\Python25\lib\site-packages\numpy\ma\extras.py",
line 185, in apply_along_axis
res = func1d(arr[tuple(i.tolist())], *args, **kwargs)

File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py",
line 1504, in _stdemed_1D
n = len(sorted)
TypeError: object of type 'builtin_function_or_method' has no len()

Josef

josef...@gmail.com

unread,

Feb 5, 2009, 1:58:31 PM2/5/09

to SciPy Users List

doing : >>> np.source(stats.mstats_basic.stde_median)

shows it's a reference by wrong name: you assign the sorted data to
data, but then use "sorted" as a name

def _stdemed_1D(data):
data = np.sort(data.compressed())
n = len(sorted)

Sturla Molden

unread,

Feb 5, 2009, 2:51:02 PM2/5/09

to SciPy Users List

On 2/5/2009 7:03 PM, Sturla Molden wrote:

> By the way, there is a fucntion scipy.stats.ranksums that does a
> Wilcoxon rank-sum test. It seems to be using a large-sample
> approximation, and has no correction for tied ranks.

Here is a modification of SciPy's ranksums to allow small samples and
correct for tied ranks.

Sturla Molden

import numpy as np

import scipy
import scipy.special
zprob = scipy.special.ndtr

def ranksums(x, y):
"""
Wilcoxon rank sum test

Returns:
W-statistic
Z-statistic
one-tailed p-value, asymptotic approximation
one-tailed p-value, Monte Carlo approximation

Corrected for ties.
"""

x,y = map(np.asarray, (x, y))

n1 = len(x)
n2 = len(y)

alldata = np.concatenate((x,y))
ranked = rankdata(alldata)
x = ranked[:n1]
y = ranked[n1:]
w = np.sum(x,axis=0)

def montecarlo():
shuffle = np.random.shuffle
a = np.zeros(1000)
shuffle(ranked) # bug in numpy: the first shuffle doesn't work
for i in xrange(1000):
shuffle(ranked)
a[i] = np.sum(ranked[:n1],axis=0)
return np.sum(a >= w) / 1000.0

def aymptotic_p():
expected = n1*(n1+n2+1) / 2.0
z = (w - expected) / np.sqrt(n1*n2*(n1+n2+1)/12.0)
return 1.0 - zprob(z), z

def aymptotic_p_ties():
t = []
_t = 0
for r in ranked:
if r % 1:
_t += 1
else:
if _t:
t.append(_t)
_t = 0
if _t: t.append(_t)
t = np.asarray(t)
expected = n1*(n1+n2+1) / 2.0
tcorr = np.sum((t-1)*t*(t+1))/float((n1+n2)*(n1+n2-1))
z = (w - expected) / np.sqrt(n1*n2*(n1+n2+1-tcorr)/12.0)
return 1.0 - zprob(z), z

p_mc = montecarlo()
if np.any(ranked % 1):
p, z = aymptotic_p_ties()
else:
p, z = aymptotic_p()
return w, z, p, p_mc

def rankdata(a):
a = np.ravel(a)
n = len(a)
svec, ivec = fastsort(a)
sumranks = 0
dupcount = 0
newarray = np.zeros(n, float)
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray

def fastsort(a):
it = np.argsort(a)
as_ = a[it]
return as_, it

josef...@gmail.com

unread,

Feb 5, 2009, 3:30:07 PM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 2:51 PM, Sturla Molden <stu...@molden.no> wrote:
> On 2/5/2009 7:03 PM, Sturla Molden wrote:
>
>> By the way, there is a fucntion scipy.stats.ranksums that does a
>> Wilcoxon rank-sum test. It seems to be using a large-sample
>> approximation, and has no correction for tied ranks.
>
>
> Here is a modification of SciPy's ranksums to allow small samples and
> correct for tied ranks.
>

there are absolute values missing abs(z-expected), I also prefer the
correction p*2 since it is a two-sided test

sample size 20, 9 ties
this is with R wilcox.exact, ranksums is your ranksum

>>> rwilcex(rvs1[:20],4*ind10+rvs2t[:20],exact=True)['p.value']
0.15716005595098306
>>> ranksums(rvs1[:20],4*ind10+rvs2t[:20]) #wrong tail because no abs()
(357.0, -1.4336547191212172, 0.9241645900073665, 0.92800000000000005)
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])
(463.0, 1.4336547191212172, 0.075835409992633496, 0.068000000000000005)
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[3]*2
0.186
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[2]*2
0.15167081998526699
>>> stats.mannwhitneyu(rvs1[:20],4*ind10+rvs2t[:20])[1]*2
0.15167081998526699

With this correction, the normal distribution based p-value in
ranksums looks exactly the same as stats.mannwhitneyu. your Monte
Carlo p-value differs more from R's exact result than the normal
distribution based p-value.

Overall, the differences in p-values look pretty small in the examples
I tried out, so my guess is that a Monte-Carlo on the correct size and
power of the tests will show very similar rejection rates, at critical
values of 0.05 or 0.1.

But I don't have time for that now.

Josef

josef...@gmail.com

unread,

Feb 5, 2009, 3:54:32 PM2/5/09

to SciPy Users List

>
> sample size 20, 9 ties
> this is with R wilcox.exact, ranksums is your ranksum

...

>
> With this correction, the normal distribution based p-value in
> ranksums looks exactly the same as stats.mannwhitneyu.

this statement is not correct.

I mixed up my variables and didn't actually have ties, now with ties,
I still get essentially but not exactly the same results.

josef...@gmail.com

unread,

Feb 5, 2009, 7:03:34 PM2/5/09

to SciPy Users List

On Thu, Feb 5, 2009 at 3:54 PM, <josef...@gmail.com> wrote:
>>
>> sample size 20, 9 ties
>> this is with R wilcox.exact, ranksums is your ranksum
> ...
>>
>> With this correction, the normal distribution based p-value in
>> ranksums looks exactly the same as stats.mannwhitneyu.
>
> this statement is not correct.
>
> I mixed up my variables and didn't actually have ties, now with ties,
> I still get essentially but not exactly the same results.
>

I think there is a mistake in the tie handling of stats.mannwhitneyu
In the calculation of the standard error the sqrt is taken twice.

T = np.sqrt(tiecorrect(ranked)) # correction factor for tied scores
if T == 0:
raise ValueError, 'All numbers are identical in amannwhitneyu'
sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0)

I don't have the formulas for the tie correction, but from looking at
the tie correction
in Sturlas version of ranksums, it seems that the first sqrt shouldn't be there.

Can someone with access to the correct references verify this.

Karl Young

unread,

Feb 6, 2009, 5:08:01 PM2/6/09

to SciPy Users List

I know there are a number of array manipulation maestros on the list and
wanted to run a problem by the list that my feeble mind is foundering on.

I have three objects, 1) an array containing an "image" (could be any
dimension), 2) a mask for the image (of the same dimensions as the
image), and 3) a "template" which is just a list of offset coordinates
from any point in the image.

Say the template has n elements, the problem is to move the template
over the image and build an m x n array containing image values (at the
template positions) where m is the number of image indices such that the
template lies within the image and all mask values are true. I currently
do this using some raveling, compressing, and length comparison but I
still haven't been able to figure out how to do it without looping
through the image indices (and this is sloooowwww for big
multidimensional "images"). I keep feeling like there must be some
clever way to concatenate the template with the image and mask so as to
do this without looping but haven't been able to come up with it. I
could speed it up by doing the looping in C but that doesn't seem very
elegant. Any thoughts welcome.

--KY

Stéfan van der Walt

unread,

Feb 7, 2009, 12:36:46 PM2/7/09

to SciPy Users List

2009/2/7 Karl Young <karl....@ucsf.edu>:

> I have three objects, 1) an array containing an "image" (could be any
> dimension), 2) a mask for the image (of the same dimensions as the
> image), and 3) a "template" which is just a list of offset coordinates
> from any point in the image.

You can create a strided view of the image, so that the values around
each position where the filter can be applied becomes a row.
Thereafter, using the indexing tricks shown at

http://mentat.za.net/numpy/numpy_advanced_slides/

index the view to produce the templated values at each position.

Say your template has length n, then you'd have:

template of shape (1, n)
rows = np.arange(m)[:, None] with shape (m, 1)

When using template and rows in a fancy indexing operating, you should
get an output of shape (m, n).

Here is a simplified example:

# Strided view of your image
In [25]: data
Out[25]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])

In [26]: rows
Out[26]:
array([[0],
[1],
[2],
[3]])

In [27]: rows.shape
Out[27]: (4, 1)

In [28]: template
Out[28]: array([[0, 2]])

In [29]: template.shape
Out[29]: (1, 2)

In [30]: data[rows, template]
Out[30]:
array([[ 0, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11]])

Hope that helps!

Cheers
Stéfan

Young, Karl

unread,

Feb 9, 2009, 12:03:45 PM2/9/09

to SciPy Users List

Thanks Stefan ! The index meister comes through again. I was sort of thinking along those lines but couldn't quite take the final step of understanding how to get the row coordinates for arbitrary filter locations. BTW, I should send you the latest version of my modification of glcom; the "profiling" that showed this part of my code to be the current bottleneck was the result of the generalized glcom (arbitrary number of co-registered images and arbitrary templates) being so fast at generating the co-occurrence matrices (for others on the list, Stefan wrote a very nice ctypes module, glcom, that generates co-occurrence matrices from an image for doing texture analysis).

Karl Young
Center for Imaging of Neurodegenerative Disease, UCSF
VA Medical Center, MRS Unit (114M)
Phone: (415) 221-4810 x3114
FAX: (415) 668-2864
Email: karl young at ucsf edu

Reply all

Reply to author

Forward