I am performing both 1-tailed and 2-tailed t Mann-Whitney U (e.g.
Wilcoxon Rank Sum) tests, comparing two independent group samples.
Whether I go w/ 1 or 2 tailed varies, though I want to put them all in
a related table. I had read that I should use the Asym. sig value
provided in the SPSS output for 2-tailed, but I should use the exact
sig divided by 2 for the 1-tailed. Is this true? Why? What is the
difference b/t the Asymp and Exact values?
Thanks,
Hos
The asymptotic p-values come from the fact that the U distribution,
whose mean and variance under the null hypothesis are m*n/2 and m*n*
(m+n+1)/12, becomes approximately normal when both m and n are large.
The situation is analogous to, but somewhat more complicated than,
the familiar normal approximation to the exact binomial distribution,
whose mean and variance are n*p and n*p*q.
Unlike the binomial, the U distribution is always symmetric, so there
should be no need to consider whether the test is one- or two-tailed
when deciding whether to use exact or approximate p-values.
My samples sizes run between 5 and 8. I was always told that sample
sizes under 20 shouldn't be assumed to be normal, which is why I am
using a non-parametric test in the first place. Does this mean I
should use the exact instead of the asym p?
Also, I'm not sure why the fact that U is symmetric means it doesn't
matter if I use a 1 or 2-tailed test. If it is symmetric (is it called
asymptotic b/c it is not symmetric?), then I assume I can just divide
the p value by 2, if I already knew that there was only one direction
that a difference in mean could go?
Thanks again,
Hos
I fear you're confusing two different normalities. One refers to the
populations from which your samples were drawn. We're not talking
about that one. The other, which we are talking about, refers to the
sampling distribution of the U statistic. When the sample sizes are
small, the U statistic is not very normal, and you should not use the
normal approximation to its distribution.
"Asymptotic" means "when the sample size is large", with "large"
being undefined. The usual normal approximations to the binomial and
U distributions are "asymptotic approximations". Most people call
them simply "large-sample approximations".
Whether you do a one- or two-tailed test depends on the research
question, not on the sample size or the numerical method you use
to calculate the p-value.
However, how you get the p-value always depends on the sample size,
and may also depend on whether the test is one- or two-tailed.
In the binomial case, the normal approximation is always symmetric,
but the exact distribution is skewed if the hypothesized probability
is not 1/2, with the skew being stronger the farther the hypothesized
value is from 1/2, and weaker the larger the sample size is. For two-
tailed tests, the normal-approximation errors in the two tails tend
to cancel, enabling the approximation to be used with relatively
smaller sample sizes. For one-tailed tests such cancellation can not
occur, and the approximation requires relatively larger sample sizes
before it is sufficiently accurate.
However, the U distribution is always symmetric, so the accuracy of
the normal approximation does not depend on whether the test is one-
or two-tailed; only the sample size matters.
Now I understand. Thank you so much for taking the time to explain
this to me! Does the normal-approximation errors in the two tails tend
to cancel because the distribution is symmetric? Even if so, I'm
thinking that the conservative thing to do is use the exact sig
values.
Thanks again,
Hos
When the exact distribution is skewed, the normal approximation
overestimates the p-value in the short tail and underestimates it
in the long tail, and the two errors tend to cancel one another.
In your case (U, with sample sizes between 5 and 8, which would
usually be called "very small"), I would use the exact distribution.