Mann-Whitney-Test - meaning of Mean Ranks

Markus Quandt

unread,

Apr 29, 1998, 3:00:00 AM4/29/98

to

Sorry for asking a very basic question, but my access to textbooks
is rather limited right now, and I'm helping out a friend with a
little medical research project without having much experience in
this field myself.

We want to test for differences in the effect of two drugs on
several variables, some of which have both non-normal
distributions and heterogeneous variances (the target variables
are waiting times, error rates with certain tasks and the like, so
there certainly are some useful transformations which might deal
both with nonnormality and heterogeneiety, but this is another
issue).

Neither the deviations from normality nor the heterogeneiety are
very severe in most cases, so we have calculated both the t-test
(group sizes are equal, by the way) and the Mann-Whitney-U-test,
the latter being a fallback line when the assumptions of the
t-test don't hold. Especially in case both tests come out
differently, we think that further investigation and maybe
transformations seem worthwhile.

Now here is the problem: The mean differences for the t-test are
always in the expected direction, but the 'Mean Ranks' that SPSS
prints out with the U-test are inconsistent with the means. I.e.,
when the t-test shows a significant difference in the expected
direction, the U-test is also significant, BUT the group with the
higher mean also has the higher mean rank. For some very similar
variables (e.g., repetetions of the same tests after a fixed time
- maybe I'm going to ask about repeated measures soon ;-)) the
mean difference shows an almost constant difference between
the two drugs, but again the mean ranks switch place.

I'm not sure that I fully understand what the U-test does, but
could this be due to outliers which affect the means but who loose
there effect when the ranking is done? But if this was the reason
- why do both tests show very similar significance levels? Just
by chance in our data?

Please enlighten me! (References to introductory texts are also
welcome, of course. Textbook access will hopefully improve soon).

Many thanks, MQ

--
--------------------------------------------------------------------
Markus Quandt qua...@wiso.Uni-Koeln.DE
Universitaet zu Koeln
University of Cologne, Germany

Arnold Kester

unread,

Apr 29, 1998, 3:00:00 AM4/29/98

to

In <dyvR14AN...@wiso.Uni-Koeln.DE>, on 04/29/98

at 12:10 PM, qua...@wiso.Uni-Koeln.DE (Markus Quandt) said:

>Sorry for asking a very basic question, but my access to textbooks is
>rather limited right now, and I'm helping out a friend with a little
>medical research project without having much experience in this field
>myself.

>We want to test for differences in the effect of two drugs on several
>variables, some of which have both non-normal
>distributions and heterogeneous variances (the target variables are
>waiting times, error rates with certain tasks and the like, so there
>certainly are some useful transformations which might deal both with
>nonnormality and heterogeneiety, but this is another issue).

>Neither the deviations from normality nor the heterogeneiety are very
>severe in most cases, so we have calculated both the t-test (group sizes
>are equal, by the way) and the Mann-Whitney-U-test, the latter being a
>fallback line when the assumptions of the t-test don't hold. Especially
>in case both tests come out
>differently, we think that further investigation and maybe
>transformations seem worthwhile.

>Now here is the problem: The mean differences for the t-test are always
>in the expected direction, but the 'Mean Ranks' that SPSS prints out
>with the U-test are inconsistent with the means. I.e., when the t-test
>shows a significant difference in the expected direction, the U-test is
>also significant, BUT the group with the higher mean also has the higher
>mean rank. For some very similar variables (e.g., repetetions of the

I don't see a problem here; higher means usually have higher mean ranks. A
known error in SPSS Mann-Whitney is that the sign of the printed Z-test is
negative regardless of the dircetion of the difference between groups.
Maybe that is what you mean ?

>same tests after a fixed time - maybe I'm going to ask about repeated
>measures soon ;-)) the mean difference shows an almost constant
>difference between the two drugs, but again the mean ranks switch place.

>I'm not sure that I fully understand what the U-test does, but could this

The Mann-Whitney U statistic counts the number of pairs X and Y (X i group
1, Y in group 2) with X<Y. It is equivalent to the sum of the ranks in
either group, hence the name 'ranksum test'.

>be due to outliers which affect the means but who loose there effect when
>the ranking is done? But if this was the reason - why do both tests show
>very similar significance levels? Just by chance in our data?

When groups are fairly large, say >30, you would expect that. A somewhat
technical reference for rank tests is Lehmann: Nonparametrics. Wiley 1975.

>Please enlighten me! (References to introductory texts are also welcome,
>of course. Textbook access will hopefully improve soon).

>Many thanks, MQ

Hope this is helpful,

--
Arnold Kester
Arnold...@stat.unimaas.nl

Markus Quandt

unread,

Apr 30, 1998, 3:00:00 AM4/30/98

to

(Arnold, thanks a lot for your response)

From Arnold's response I see that I didn't express the problem very
clearly, so I'll try to reformulate it at the critical places.

(Sorry, the quotation is quite messed up).

Arnold Kester wrote:
>
> In <dyvR14AN...@wiso.Uni-Koeln.DE>, on 04/29/98
> at 12:10 PM, qua...@wiso.Uni-Koeln.DE (Markus Quandt) said:

snip

>
> >Now here is the problem: The mean differences for the t-test are always
> >in the expected direction, but the 'Mean Ranks' that SPSS prints out
> >with the U-test are inconsistent with the means. I.e., when the t-test
> >shows a significant difference in the expected direction, the U-test is
> >also significant, BUT the group with the higher mean also has the higher
> >mean rank.
>

> I don't see a problem here; higher means usually have higher mean ranks.

I assumed that an ascending ranking is applied: a high value of the
measurement would then have led to a *numerically* low rank, i.e. the
highest measurement value gets rank place '1', the next '2' and so on.
Arnold, thanks for clearing this up, but this is exactly where my
problem is: Sometimes the group with the higher mean has the higher
rank, sometimes it has the lower rank. So, my question is: What are the
exceptions to 'usually'?

> the mean difference shows an almost constant
> difference between the two drugs, but again the mean ranks switch place.
>

[Could this...]

> >be due to outliers which affect the means but who loose there effect when
> >the ranking is done? But if this was the reason - why do both tests show
> >very similar significance levels? Just by chance in our data?

Since the above failed so badly in getting the problem straight, I'll
try again and give an example. (This is not the only case where the
phenomenon occurs).

Suppose you have repeated measurements of the effects of each of our two
drugs at four points in time. The means and mean ranks of the test come
out as follows:

meas. drug1 drug2 t-test sign. drug1 drug2 U-test sign.
1 -1.34 -6.84 yes 63.5 37.5 yes
2 3.56 -.2 yes 45.5 55.5 yes
3 5.8 2.26 yes 49.1 51.9 no
4 6.674 4.38 yes 44.1 56.9 yes

You see that the order of the mean ranks is switching between the two
drugs, while the order of the means is not. How does it happen, and what
does it imply for the interpretation?

Regards, MQ

Richard F Ulrich

unread,

May 1, 1998, 3:00:00 AM5/1/98

to

Markus Quandt (qua...@wiso.Uni-Koeln.DE) wrote:
: (Arnold, thanks a lot for your response)

: From Arnold's response I see that I didn't express the problem very
: clearly, so I'll try to reformulate it at the critical places.

<< snip >>
: Since the above failed so badly in getting the problem straight, I'll

: try again and give an example. (This is not the only case where the
: phenomenon occurs).

<< snip >>

I think you better look at histograms of your groups, or box-and-
whisker plots.

The gross assumption of the rank-test is that the distributions
are the same general shape - if that were the case, then it
would be impossible for the t-test to reject in the OPPOSITE
direction, unless you are (improperly) ignoring some outliers.

If one group is better on the t-test and
the other has the rank-order superiority, then
it is ALMOST BOUND TO BE TRUE that NEITHER test
is legitimate, i.e., wholly believable. The assumptions
are not so very different. So you may have to report both tests,
or else, make a choice : What assumptions do you want to throw away?

--
Rich Ulrich, biostatistician wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh

fakaru...@gmail.com

unread,

Feb 11, 2014, 12:18:15 PM2/11/14

to

On Wednesday, April 29, 1998 3:00:00 PM UTC+8, Arnold Kester wrote:
> In <dyvR14AN...@wiso.Uni-Koeln.DE>, on 04/29/98
> at 12:10 PM, qua...@wiso.Uni-Koeln.DE (Markus Quandt) said:
>
>

> >Sorry for asking a very basic question, but my access to textbooks is
> >rather limited right now, and I'm helping out a friend with a little
> >medical research project without having much experience in this field
> >myself.
>
> >We want to test for differences in the effect of two drugs on several
> >variables, some of which have both non-normal
> >distributions and heterogeneous variances (the target variables are
> >waiting times, error rates with certain tasks and the like, so there
> >certainly are some useful transformations which might deal both with
> >nonnormality and heterogeneiety, but this is another issue).
>
> >Neither the deviations from normality nor the heterogeneiety are very
> >severe in most cases, so we have calculated both the t-test (group sizes
> >are equal, by the way) and the Mann-Whitney-U-test, the latter being a
> >fallback line when the assumptions of the t-test don't hold. Especially
> >in case both tests come out
> >differently, we think that further investigation and maybe
> >transformations seem worthwhile.
>

> >Now here is the problem: The mean differences for the t-test are always
> >in the expected direction, but the 'Mean Ranks' that SPSS prints out
> >with the U-test are inconsistent with the means. I.e., when the t-test
> >shows a significant difference in the expected direction, the U-test is
> >also significant, BUT the group with the higher mean also has the higher

> >mean rank. For some very similar variables (e.g., repetetions of the
>
> I don't see a problem here; higher means usually have higher mean ranks. A
> known error in SPSS Mann-Whitney is that the sign of the printed Z-test is
> negative regardless of the dircetion of the difference between groups.
> Maybe that is what you mean ?
>
> >same tests after a fixed time - maybe I'm going to ask about repeated

> >measures soon ;-)) the mean difference shows an almost constant

> >difference between the two drugs, but again the mean ranks switch place.
>

> >I'm not sure that I fully understand what the U-test does, but could this
>
> The Mann-Whitney U statistic counts the number of pairs X and Y (X i group
> 1, Y in group 2) with X<Y. It is equivalent to the sum of the ranks in
> either group, hence the name 'ranksum test'.
>

> >be due to outliers which affect the means but who loose there effect when
> >the ranking is done? But if this was the reason - why do both tests show
> >very similar significance levels? Just by chance in our data?
>

> When groups are fairly large, say >30, you would expect that. A somewhat
> technical reference for rank tests is Lehmann: Nonparametrics. Wiley 1975.
>
> >Please enlighten me! (References to introductory texts are also welcome,
> >of course. Textbook access will hopefully improve soon).
>
> >Many thanks, MQ
>
> Hope this is helpful,
>
> --
> Arnold Kester
> Arnold...@stat.unimaas.nl

Hi all,

I also have some question on t-test and Mann-Whitney-U-test

basically the higher mean from the t-test will also shows a higher mean rank in the Mann-Whitney

e.g: result form t-test

Group Statistics
DUMIB N Mean Std. Deviation Std. Error Mean
TE 1.00 89 0.76864 0.315850 0.033480
0.00 369 0.80971 0.205751 0.010711
PTE 1.00 89 0.86397 0.234012 0.024805
0.00 369 0.90146 0.104199 0.005424
SE 1.00 89 0.91733 0.210088 0.022269
0.00 369 0.89729 0.165979 0.008640

t-test for Equality of Means

t Sig. (2-tailed)

TE -1.505 0.033

PTE -2.284 0.023

SE 0.967 0.334

*note:
1.00=public sector (PuS),
0.00=private sector (PrS),
TE=technical efficiency,
PTE=pure technical efficiency,
SE=scale efficiency

so we can see that the mean on TE (we just take the TE as example)of Pus is lower than PrS (0.76864<0.80971) and significantly different at 5% level.

However, when I run the Mann-Whitney test,
the result is:

Ranks
DUMIB N Mean Rank Sum of Ranks
TE 0.00 369 227.44 83,924.00
1.00 89 238.06 21,187.00
Total 458
PTE 0.00 369 223.93 82,631.00
1.00 89 252.58 22,480.00
Total 458
SE 0.00 369 222.64 82,155.00
1.00 89 257.93 22,956.00
Total 458

Test Statistics(a)

TE PTE SE
Mann-Whitney U 15,659.000 14,366.000 13,890.000
Wilcoxon W 83,924.000 82,631.000 82,155.000
Z -0.707 -1.897 -2.695
Asymp. Sig. (2-tailed) 0.479 0.058 0.007

the mean of TE have chnage, where TE mean rank on Pus is higher than PrS (238.06>227.44) that contra with the mean from t-test and not significantly different

so why this could be occur because Arnold Kester said:

"I don't see a problem here; higher means usually have higher mean ranks. A
known error in SPSS Mann-Whitney is that the sign of the printed Z-test is
negative regardless of the dircetion of the difference between groups."
Maybe that is what you mean ?

please assist me for this problem

thank you

regards
Fakarudin

Rich Ulrich

unread,

Feb 11, 2014, 4:38:56 PM2/11/14

to

On Tue, 11 Feb 2014 09:18:15 -0800 (PST), fakaru...@gmail.com
wrote:
[snip, previous]

I'm not positive about what your "problem" is.

I do see that there are three t-tests that are, in order,
-1.5, - 2.3, 0.97

where the same variables yield MWW results listed as z's,
-0.71, -1.9, -2.7

I don't know whether the z's have arbitrarily been given a
minus sign so that the p-value is the simple look-up value
(times 2, for 2-tailed). However, in all three groups, the
second groups *does* have the higher mean-rank, as
reported in your table.

Therefore, I think you are concerned about why the results
are so different between the t's and the z's, when that is
usually not the case.

Okay. Conover showed that the MWW for large samples
is effectively the same as simply performing a t-test on
the rank-transformed versions of the same scores -- Where
they differ is that your computerized MWW is apt to use
a (slightly) poorer estimate for the "exact variance" in order
to account for ties. In other words, carrying out a t-test
on the rank-transformed scores is apt to give *better*
results, in terms of precision of the p-value.

(Given the big difference between the results, I think I
would use RANK to create the ranks, and confirm that
the there isn't something wrong any program.)

Thus, the question becomes: Why does the transformation
make so much difference? Usually, this is attributed to
"outliers", and the fact that extreme values will weaken
the t-test by increasing the variance. For these data:

If these "efficiencies" range from 0 to 100%, with means
up to 90%, then I expect that the transformation is
effectively spreading out the many scores from 95-100%,
while compressing the scattered scores from 0-80%.

If this is what is going on, then the analysis of ranks is
a better test than the analysis of raw scores.

If these were my data, I would examine a scatterplot
of all scores, marked for group, to see what odd thing
is going on. Isn't there some strange thing in a
distribution or two? The strangeness (A hole in
the scores?) might be the important thing to report.

If these were my data, I would examine the univariate
distributions, and see what it looks like (symmetry, and
by logic) to look at Transformation of (100-%efficiency),
where the Transformation is square-root or log. The
best transformation for making a logical presentation
of "equal intervals" in efficiencies is what gives the most
appropriate test.

If there are weird things in the distribution, it might be
useful to show the actual counts for the distribution
of scores in particular ranges.

--
Rich Ulrich