Q power for the analysis of the results

Cosine

unread,

Feb 2, 2023, 2:05:50 AM2/2/23

to

Hi:

Found that many academic journals would require the submission report the statistical significance (in terms of p-value or confidence interval) of the results; however, it seems less often that a journal requires reporting the statistical power of the results. Why is that?

Should a "complete" always include both statistical significance (p-value or alpha) and power ( 1-beta )? What are the "practical" meaning for power analysis? Say, would it be possible that the results are not significant, but of high power? What are the practical meaning for this situation?

David Jones

unread,

Feb 2, 2023, 3:18:52 AM2/2/23

to

You should look into the similarities in the theory behind power
analyses and confidence intervals. Specifically not approximate
confidence intervals, but the approach where points in the confidence
intervals are defined to be those not rejected by a significance
test.So low power means a wide confidence interval.

But power analyses may not be thought of as being part of the "results"
of an experiment, but rather something that goes before the experiment,
being used to help to decide on the design and sample size. After an
initial experiment, you might do a power analysis to help you design
the next one in terms of sample size, given that the power analysis
will require assumptions or estimates of the sampling variation
inherent in the experimental procedure.

David Duffy

unread,

Feb 2, 2023, 6:38:23 PM2/2/23

to

There are a few papers now on "posterior power" or "reproducibility
probability" where "the power is estimated only after
a statistical test has been performed, in order to evaluate the
reproducibility of the test result", including:

Goodman SN, 1992. A comment on replication, p-values and
evidence. Statistics in Medicine. 11, 875-879.
De Martini D, 2008. Reproducibility Probability Estimation for Testing
Statistical Hypotheses. Statistics & Probability Letters. 78, 1056-1061.
Boos DD, Stefanski LA, 2014. P-Value Precision and Reproducibility. The
American Statistician, 65:4, 213-221

Rich Ulrich

unread,

Feb 3, 2023, 4:00:24 PM2/3/23

to

On Wed, 1 Feb 2023 23:05:48 -0800 (PST), Cosine <ase...@gmail.com>
wrote:

> Hi:
>
> Found that many academic journals would require the submission
> report the statistical significance (in terms of p-value or
> confidence interval) of the results; however, it seems less often
> that a journal requires reporting the statistical power of the
> results. Why is that?

If you found something, obviously you had enough power.

In the US, the granting agencies of NIH want to hear what
you have to say about power, to justify giving you money.

I remember a few things relevant about power and journals.

1970s - my stats professor told the class that The New England
Journal of Medicine specified, 'Use /no/ p-levels' -- in an article
he co-authored, reporting the results of a health survey of 30,000
people. Anything big enough to be interesting would be 'significant'.

A number of non-interesting things also would be significant, at 0.05.
Years later, I analyzed a data set of similar size. I convinced the
PI that the F-tests of 245 and 350 were the ones that were
interesting. There were some ANOVA 2-way interactions that
that were p < 0.05 which were uninteresting -- some of them
were the consequence of 'non-linear scaling' across 3 or 4 points
on a rating scale, rather than any inherent interaction on the
latent dimension being measured. So, we only reported p< .001,
and (also) carefully dwelt on Effect sizes.

In the opposite direction -- In one study, we did report a
predicted one-tailed result at p < 0.05 for an interaction. The good
journal we submitted to accepted our 'one-tailed test' (both
'one-tailed' and 'interaction' suggest ugly data-dredging) only
because we could point to it as one of the (few) tests specified in
advance in our research proposal.

I liked the European standard that I heard of, long ago --
I don't know how wide-spread it is/was -- They reported the
"minimum N for which the test would be significant." I think
that people who are not statisticians can relate to this,
more easily that saying p < .05 and p< .001 or exact numbers.
An experimantal result that would be significant (.05) with N=10
is huge; one that requires N=500 is small.

(By the way, epidemiology results often /require/ huge N's
because of small effects as measured by Variance; that's why
their Effect sizes are reported as Odds ratios. 'Effect size' based
on N or power or p-level do not work well for rare outcomes.)

>
> Should a "complete" always include both statistical significance
> (p-value or alpha) and power ( 1-beta )? What are the "practical"
> meaning for power analysis? Say, would it be possible that the
> results are not significant, but of high power? What are the
> practical meaning for this situation?

As suggested elsewhere, high power gives a narrow Confidence
limit for the size of the actual effect in the experiment. Usually,
"very close to zero difference in means."

--
Rich Ulrich

Bruce Weaver

unread,

Feb 4, 2023, 2:02:37 PM2/4/23

to

On Friday, February 3, 2023 at 4:00:24 PM UTC-5, Rich Ulrich wrote:

> If you found something, obviously you had enough power.

Rich, a former boss of mine made a statement similar to yours in a stats book he co-authored, and I challenged it (using simulation) in this short presentation:

https://www.researchgate.net/publication/299533433_Does_Statistical_Significance_Really_Prove_that_Power_was_Adequate

Cheers,
Bruce

Rich Ulrich

unread,

Feb 5, 2023, 2:52:26 PM2/5/23

to

Okay, you are right -- mine was a careless statement.

Saying 'enough power' misleads the non-statistician reader,
who might be tempted to replicate. Pointing to 'luck' is not
a bad idea.

I usually tried this --
When a study shows something /barely/ at 0.05, then the
power for replication is pretty close to 50% and not the
80% to 95% that most grant applications like to show.

That's the simple logic of saying, "Any replication will come
up weaker or stronger. If you are right at 0.05 in the first
place, then Weaker or Stronger is equally likely -- 50% power,
by definition.

IIRC, one rule of thumb I used was that to achieve a 5%
chisquared (1 d.f.), value about 4.0, the Expected effect-test
that yields 80% power was the one that gives X2 = 8.0; so,
twice the N, since X2 (2x2 contingency table, etc) is linear in N.
(IIRC. All from memory.)

--
Rich Ulrich