ANOVA, Mann-WHitney, or T-test ?

nimer...@gmail.com

unread,

Mar 19, 2016, 11:40:12 AM3/19/16

to

I have two groups, one is 42 and one is 13. Want to compare median (7 vs 10.1) or mean (7.3 vs 12.7).

ANOVA, Mann-WHitney, or T-test each give different results.

Which one to use ?

Rich Ulrich

unread,

Mar 19, 2016, 1:19:20 PM3/19/16

to

The t-test is an ANOVA, so you are choosing between two
choices. The t-test, squared, is an F-test. The t-test program
does offer an alternative for "unequal variances" that is only
sometimes provided for ANOVA programs more generally.
When they give "different results", that is something you need
to consider carefully, and probably warn your readers about -
whether you accept or reject the null in the end.

The MW is a rank test -- Thus, you are not comparing the
median, as you may think, but, rather, the rank-superiority.
Conover showed in the 1980s that the tests of ranks by
enumeration are asymptotically equivalent to performing
an ANOVA test on the rank-transformed scores. In the case
of extensive ties, the ANOVA sometimes performs better
(more accurate p-value) than using a computing formula
that tries to adjust for ties.

Rank-scoring is useful is especially justifiable when there are
outliers that belie the assumption of homogeneity of variance.
And ANOVA is vulnerable to odd behavior when the groups sizes
are different (like, your 42 vs. 13).

So. Do you want to compare scores, or the ranks of scores?
If the average is what you would consider "very meaningful",
than you should resist the temptation to rank-transform.

Thus, the answer to your question should be referred back
to asking: Do you like the "spacing" or intervals provided by
the original scoring? - in which case, you may hesitate to
transform to ranks. How do you like the spacing provided for
ranks? And, Are there ties?

If a transformation other than rank-order is appropriate,
that may provide the most robust and rational test.

What I ask a client, after I look at the actual numbers, is:
- Where do these numbers come from? - That is because,
there are natural transformations for data of some sorts.
1. Counts, often, are Poisson -- Use the square root.
2. Chemical concentrations (among other things) are often
lognormal -- Take the logs.
3. Some measures are readily and appropriately inverted,
like the "speed" versus the "time elapsed" for a race -- use
the inverted form or otherwise take the reciprocal.

Keep in mind that the /purpose/ of such transformations is
to improve homogeneity of variance ... which ought to be
matched by a subjective recognition of improvement in the
"interval" nature of the data. If the transformation is totally
arbitrary (not based on consideration of its source), then
rank-order is one that more audiences will find familiar.

--
Rich Ulrich

nimer...@gmail.com

unread,

Mar 19, 2016, 3:32:06 PM3/19/16

to

On Saturday, March 19, 2016 at 12:19:20 PM UTC-5, Rich Ulrich wrote:

Thank you Rich for the extensive explanation.

What am actually studying is the effect of tumor size on the complete tumor resection.

Complete resection is the group with 42 and incomplete is the group with 13. These groups have a median of 7 vs 10.1 and mean of 7.3 vs 12.7, respectively.

I use SPSS for analysis

Rich Ulrich

unread,

Mar 19, 2016, 4:37:49 PM3/19/16

to

On Sat, 19 Mar 2016 12:32:02 -0700 (PDT), nimer...@gmail.com wrote:

>
>What am actually studying is the effect of tumor size on the complete tumor resection.
>
>Complete resection is the group with 42 and incomplete is the group with 13. These groups have a median of 7 vs 10.1 and mean of 7.3 vs 12.7, respectively.

Okay. For tumors, it does seem to me to be perfectly natural
and appropriate to say that tumor A is twice the size of tumor B.

The log-unit is much more appropriate than raw sizes. You would
worry much more about 2 mm variation around 4 mm, than you
would 2 mm around 14 mm. If those look fairly normal, I would
go with the ordinary t-test or ANOVA on the log of the data.
If you need to justify the transformation, you can report that
this more-precise testing was recommended to you after the
disagreement between tests on the raw data.

The raw sizes would be skewed with a chance of a large interval
at the top end. Sometimes an outlier shows up at the small end
after a log-transformation -- and this can happen, in part, because
of measurement imprecision. For these sizes in mm (assuming
those units), notice that "2" is fully twice as large as "1"; but
that 1 could have been properly recorded as the rounded off value
for 1.45. - That could justify a re-evaluation of such low scores,
if they exist and if the re-measurement is possible.

--
Rich Ulrich