On Tue, 2 Aug 2022 15:00:35 -0700 (PDT), Cosine <
ase...@gmail.com>
wrote:
>Hi:
>
> How do we determine the minimal number of samples required for a
> statistical experiment?
This is called "power analysis." The statistical procedure uses the
distribution of the non-central F or whatever. I suggest Jacob
Cohen's book for an introduction that goes beyond simply presenting
the tables that can be used for lookup.
It was in the 1980s when NIMH started requiring power analyses
as part of our research grants (psychiatric medicine).
Which statistical test (F, t, etc.)? What alpha-error? What
beta-error (chance of missing an effect), given What assumed
effect size? ... for what N? Power is equal to (1-beta).
Thus, a power analysis might include a table that shows the
power obtained by using specific Ns with specific underlying
effects for the test we are using.
For a two-tailed t-test, at 5%,
FIXED FORMAT TABLE
N needed assumed
effect sizes(d)
0.5 0.6 0.8
power 60% < n's >
80%
90%
95%
Greater power implies larger N; larger effect
size implies smaller N.
In our area, 80% was the minimum for most studies.
If there are multiple hypotheses, the same table shows how
likely the study will "detect" the various effect sizes with
a nominal test of that size.
> For example, we found on the internet that "N
> < 30" is considered a set with a small number of samples. But how do
> we decide if the number of samples is too small? For example, are 2,
> 3, ..., 10 samples too small? Why is that?
I had a friend who did lab work on cells. He told me that his
typical N was 3: The only effect sizes he was interested in was
the HUGE ones. If he used just one or two, then a weird result
might be lab error; two simliar weird results showed that he had
something.
> Any theory to support the
> decision? Likewise, what is the theory behind that decides "N < 30" is
> a set with a small number of samples?
>
> Next, let's consider the number of metrics (e.g., accuracy and
> specificifity) analyzed in the experiment. If we use too many metrics,
> it would be considered that we are fishing the dataset. But again, how
> do we determine the proper number of metrics analyzed in the
> experiment?
I think you are confusing two other discussions here. Metrics
like "specificity and sensitivity" are not assessed by statistical
tests like the t-test; they are found with a sample large enough to
give s small-enough standard deviation.
"Multiple variables" opens a discussion that starts with setting
up your experiment: Have FEW "main" hypotheses. There can be
sub-hypotheses; there can be other, "frankly exploratory" results.
One approach for several variables is to use Bonferroni correction
for multiple tests; the Power Table then might have to refer to a
"nominal" alpha of 2.5% or what-not, to correct for multiple tests.
Another approach is to do a 'multivariate analysis' that tests
several hypotheses at once; that gets into other discussions of
how to properly consider mutliple tests, since the OVERALL test
does not tell you about the relative import of different variables.
I've always recommended creating "composite scores" where they
combine the main criteria -- if you can't just pick a single score.
I took part in a multi-million dollar study, several hundred patients
followed for two years, a dozen rating scales collected at multiple
time points ... where the main criterion for treatment success was
whether a patient had to be withdrawn from the trial because
re-hospitalization was imminent.
--
Rich Ulrich