Q problems with a small-sized sample, parametric ad non-parametric approach

Cosine

unread,

Jan 23, 2021, 6:22:56 AM1/23/21

to

Hi:

When we only have a small-sized sample, what comes out to our mind is to use a non-parametric statistical method. But does using a non-parametric method really solve the problem?

Also, what are the drawbacks of solving a problem with a non-parametric method when the problem actually has some kind of distribution?

David Jones

unread,

Jan 23, 2021, 10:14:27 AM1/23/21

to

Cosine wrote:

> Hi:
>
> When we only have a small-sized sample, what comes out to our mind
> is to use a non-parametric statistical method. But does using a
> non-parametric method really solve the problem?

It's all about assumptions. With a "non-parametric statistical method"
you avoid the need to make assumptions about a particular
distributional form, but you are still making assumptions, typically
that you have observations that are statistically independent in some
respect.

>
> Also, what are the drawbacks of solving a problem with a
> non-parametric method when the problem actually has some kind of
> distribution?

Again, assumptions. If your assumption of a particular distribution
happens to be true, then you have an analysis that is better (in some
sense) than a non-parametric one. The opposite is true, If your
assumption of a distribution is wrong then your analysis may be worse
than a non-parametric one ... if the assumption is only slightly wrong
then the analysis may still be better than a non-parametric one.

If the task is estimation, analyses based on different sets of
assumptions may have performances that differ in general two ways: (i)
they both produce results that are "correct" but for one the results
are more variable; (ii) the results for the one with incorrect
assumptions are incorrect.

If the task is "testing", the analysis with fewer incorrect assumptions
will give a more powerful test. If assumptions are wrong, the test may
not have the size (alpha) you think it has.

A typical solution in small samples is to do both analyses and see if
the results are radically different. With large samples there is more
opportunity to do some assumption-checking.

Cosine

unread,

Jan 23, 2021, 11:28:26 AM1/23/21

to

While one is free to make any assumptions, the assumptions made need to be verified.

But then how does one verify under such circumstance?

Rich Ulrich

unread,

Jan 23, 2021, 2:44:58 PM1/23/21

to

On Sat, 23 Jan 2021 03:22:54 -0800 (PST), Cosine <ase...@gmail.com>
wrote:

>Hi:
>
> When we only have a small-sized sample, what comes out to our mind is to use a non-parametric statistical method.

I assume that when you are say "non-parametric statistical method,"
you are referring to those methods based on ranks.

What you say: That's unfortunate, but too often it is true. I think
the idea was spread especially by psychologists who were
looking for an easy "out", to avoid dealing with those stats-
assumptions that they did not understand.

Psychologists are notoriously weak in math, for reasons I don't
know. My easiest A in college was in psy-stats; the final exam
took me less than 10 minutes. Maybe it was less than 5.

> But does using a non-parametric method really solve the problem?

In the 1980s, Conover provided a fine perspective. Using those
rank-order tests is - effectively - performing a rank-transformation
on the data, followed by the usual ANOVA. That's often the text-
book prescription for the "large-sample" use of rank tests. If you
have to take into account the text-book's approximated adjustments
for "tied values", you can be /better/ off using transform-plus-ANOVA
for the small samples, too.

If the rank-transformed data are closer to "equal interval" for
scores than the raw data is, then you get better test after the rank
tranformation.

>
> Also, what are the drawbacks of solving a problem with a non-parametric method when the problem actually has some kind of distribution?

The obvious problem to which ranking offers a quick fix is the
presence of visible outliers. Those can screw up both the means
and the variances.

If you don't know anything at all about your data, including the
likely distribution of scores, you should probably put off trying
to make any sense of it until you learn something.

If you think that the arithmetic average, the mean, ought to be
meaningful, then you probably don't want the rank transform.
Or any transform, if that holds for all likely samples. I've probably
chosen to "winsorize" data (set an extreme to a moderated value)
more often than I've taken rank-transformations as the tool.
(But for winsorizing, I am speaking of large samples, weird scaling.)

Where data comes from can imply that certain transforms should
"bring in" the outliers, or otherwise fix the tails. For instance,
take the square root for (Poisson) counts; take the logit for
proportions; take the log for chemical traces in blood samples.

The "drawbacks" of using the rank-transformation approach
are (a) you throw away the mean and inter-group comparisons,
and (b) you can get a weaker test.

--
Rich Ulrich

David Duffy

unread,

Jan 25, 2021, 4:37:20 AM1/25/21

to

Cosine <ase...@gmail.com> wrote:
>
> When we only have a small-sized sample, what comes out to our
> mind is to use a non-parametric statistical method. But does using a
> non-parametric method really solve the problem?
>

You might like to read the 2017 review by Szekely and Rizzo _The Energy of
Data_ (you can find it via Google Scholar) which discusses their
particular general nonparametric approach, _and_ then the more recent
papers citing that paper. Some relevant R packages are energy and HHG.
The latter tests their nonparametric test when the true distributions
are non-monotone:

W, Diamond, Parabola, 2Parabolas, Circle, Cubic, Sine, Wedge, Cross,
Spiral, Circles, Heavisine, Doppler, 5Clouds, 4Clouds.

Standard ranks-based test do poorly against many of these, but the
energy and Heller et al methods are OK.

Rich Ulrich

unread,

Jan 31, 2021, 4:01:55 PM1/31/21

to

On Sat, 23 Jan 2021 08:28:24 -0800 (PST), Cosine <ase...@gmail.com>
wrote:

>While one is free to make any assumptions, the assumptions made need to be verified.
>
>But then how does one verify under such circumstance?

- Pay attention to what generates the numbers, on what sort
of sample, and you often have a good idea about what
distributuion to expect. Will anyone object to the assumption?

Rather than what you wrote, I would say that assumptions
need to be understood and accounted for.

ANOVA is defined in terms of normal distributions.

However, it is known to be robust (generally) to many sorts of
violations of normality, owing to the Central Limit Theorem.
Binary variables where proportions are between 20% and 80%
are sufficiently "normal" for ANOVA when the Ns are not tiny.

However, even a single /extreme/ outlier can screw up an ANOVA.
Strong skewness can weaken the power, which is why I like
power transformations (log, sq rt, reciprocol usually suffice).

The assumption most often ignored by the users of rank-
transformations is that two distributions being compared should
be of the same kind, the same shape. - When that is true, it is
also - often - true that a power transformation will provide
better "equal intervals" than what you get from ranking, and
pretty good normality.

--
Rich Ulrich