Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

15 views

Skip to first unread message

Feb 2, 2023, 2:05:50 AM2/2/23

to

Hi:

Found that many academic journals would require the submission report the statistical significance (in terms of p-value or confidence interval) of the results; however, it seems less often that a journal requires reporting the statistical power of the results. Why is that?

Should a "complete" always include both statistical significance (p-value or alpha) and power ( 1-beta )? What are the "practical" meaning for power analysis? Say, would it be possible that the results are not significant, but of high power? What are the practical meaning for this situation?

Found that many academic journals would require the submission report the statistical significance (in terms of p-value or confidence interval) of the results; however, it seems less often that a journal requires reporting the statistical power of the results. Why is that?

Should a "complete" always include both statistical significance (p-value or alpha) and power ( 1-beta )? What are the "practical" meaning for power analysis? Say, would it be possible that the results are not significant, but of high power? What are the practical meaning for this situation?

Feb 2, 2023, 3:18:52 AM2/2/23

to

analyses and confidence intervals. Specifically not approximate

confidence intervals, but the approach where points in the confidence

intervals are defined to be those not rejected by a significance

test.So low power means a wide confidence interval.

But power analyses may not be thought of as being part of the "results"

of an experiment, but rather something that goes before the experiment,

being used to help to decide on the design and sample size. After an

initial experiment, you might do a power analysis to help you design

the next one in terms of sample size, given that the power analysis

will require assumptions or estimates of the sampling variation

inherent in the experimental procedure.

Feb 2, 2023, 6:38:23 PM2/2/23

to

probability" where "the power is estimated only after

a statistical test has been performed, in order to evaluate the

reproducibility of the test result", including:

Goodman SN, 1992. A comment on replication, p-values and

evidence. Statistics in Medicine. 11, 875-879.

De Martini D, 2008. Reproducibility Probability Estimation for Testing

Statistical Hypotheses. Statistics & Probability Letters. 78, 1056-1061.

Boos DD, Stefanski LA, 2014. P-Value Precision and Reproducibility. The

American Statistician, 65:4, 213-221

Feb 3, 2023, 4:00:24 PM2/3/23

to

On Wed, 1 Feb 2023 23:05:48 -0800 (PST), Cosine <ase...@gmail.com>

wrote:

> Hi:

>

> Found that many academic journals would require the submission

> report the statistical significance (in terms of p-value or

> confidence interval) of the results; however, it seems less often

> that a journal requires reporting the statistical power of the

> results. Why is that?

If you found something, obviously you had enough power.

In the US, the granting agencies of NIH want to hear what

you have to say about power, to justify giving you money.

I remember a few things relevant about power and journals.

1970s - my stats professor told the class that The New England

Journal of Medicine specified, 'Use /no/ p-levels' -- in an article

he co-authored, reporting the results of a health survey of 30,000

people. Anything big enough to be interesting would be 'significant'.

A number of non-interesting things also would be significant, at 0.05.

Years later, I analyzed a data set of similar size. I convinced the

PI that the F-tests of 245 and 350 were the ones that were

interesting. There were some ANOVA 2-way interactions that

that were p < 0.05 which were uninteresting -- some of them

were the consequence of 'non-linear scaling' across 3 or 4 points

on a rating scale, rather than any inherent interaction on the

latent dimension being measured. So, we only reported p< .001,

and (also) carefully dwelt on Effect sizes.

In the opposite direction -- In one study, we did report a

predicted one-tailed result at p < 0.05 for an interaction. The good

journal we submitted to accepted our 'one-tailed test' (both

'one-tailed' and 'interaction' suggest ugly data-dredging) only

because we could point to it as one of the (few) tests specified in

advance in our research proposal.

I liked the European standard that I heard of, long ago --

I don't know how wide-spread it is/was -- They reported the

"minimum N for which the test would be significant." I think

that people who are not statisticians can relate to this,

more easily that saying p < .05 and p< .001 or exact numbers.

An experimantal result that would be significant (.05) with N=10

is huge; one that requires N=500 is small.

(By the way, epidemiology results often /require/ huge N's

because of small effects as measured by Variance; that's why

their Effect sizes are reported as Odds ratios. 'Effect size' based

on N or power or p-level do not work well for rare outcomes.)

>

> Should a "complete" always include both statistical significance

> (p-value or alpha) and power ( 1-beta )? What are the "practical"

> meaning for power analysis? Say, would it be possible that the

> results are not significant, but of high power? What are the

> practical meaning for this situation?

As suggested elsewhere, high power gives a narrow Confidence

limit for the size of the actual effect in the experiment. Usually,

"very close to zero difference in means."

--

Rich Ulrich

wrote:

> Hi:

>

> Found that many academic journals would require the submission

> report the statistical significance (in terms of p-value or

> confidence interval) of the results; however, it seems less often

> that a journal requires reporting the statistical power of the

> results. Why is that?

In the US, the granting agencies of NIH want to hear what

you have to say about power, to justify giving you money.

I remember a few things relevant about power and journals.

1970s - my stats professor told the class that The New England

Journal of Medicine specified, 'Use /no/ p-levels' -- in an article

he co-authored, reporting the results of a health survey of 30,000

people. Anything big enough to be interesting would be 'significant'.

A number of non-interesting things also would be significant, at 0.05.

Years later, I analyzed a data set of similar size. I convinced the

PI that the F-tests of 245 and 350 were the ones that were

interesting. There were some ANOVA 2-way interactions that

that were p < 0.05 which were uninteresting -- some of them

were the consequence of 'non-linear scaling' across 3 or 4 points

on a rating scale, rather than any inherent interaction on the

latent dimension being measured. So, we only reported p< .001,

and (also) carefully dwelt on Effect sizes.

In the opposite direction -- In one study, we did report a

predicted one-tailed result at p < 0.05 for an interaction. The good

journal we submitted to accepted our 'one-tailed test' (both

'one-tailed' and 'interaction' suggest ugly data-dredging) only

because we could point to it as one of the (few) tests specified in

advance in our research proposal.

I liked the European standard that I heard of, long ago --

I don't know how wide-spread it is/was -- They reported the

"minimum N for which the test would be significant." I think

that people who are not statisticians can relate to this,

more easily that saying p < .05 and p< .001 or exact numbers.

An experimantal result that would be significant (.05) with N=10

is huge; one that requires N=500 is small.

(By the way, epidemiology results often /require/ huge N's

because of small effects as measured by Variance; that's why

their Effect sizes are reported as Odds ratios. 'Effect size' based

on N or power or p-level do not work well for rare outcomes.)

>

> Should a "complete" always include both statistical significance

> (p-value or alpha) and power ( 1-beta )? What are the "practical"

> meaning for power analysis? Say, would it be possible that the

> results are not significant, but of high power? What are the

> practical meaning for this situation?

limit for the size of the actual effect in the experiment. Usually,

"very close to zero difference in means."

--

Rich Ulrich

Feb 4, 2023, 2:02:37 PM2/4/23

to

On Friday, February 3, 2023 at 4:00:24 PM UTC-5, Rich Ulrich wrote:

> If you found something, obviously you had enough power.

Rich, a former boss of mine made a statement similar to yours in a stats book he co-authored, and I challenged it (using simulation) in this short presentation:
> If you found something, obviously you had enough power.

https://www.researchgate.net/publication/299533433_Does_Statistical_Significance_Really_Prove_that_Power_was_Adequate

Cheers,

Bruce

Feb 5, 2023, 2:52:26 PM2/5/23

to

Saying 'enough power' misleads the non-statistician reader,

who might be tempted to replicate. Pointing to 'luck' is not

a bad idea.

I usually tried this --

When a study shows something /barely/ at 0.05, then the

power for replication is pretty close to 50% and not the

80% to 95% that most grant applications like to show.

That's the simple logic of saying, "Any replication will come

up weaker or stronger. If you are right at 0.05 in the first

place, then Weaker or Stronger is equally likely -- 50% power,

by definition.

IIRC, one rule of thumb I used was that to achieve a 5%

chisquared (1 d.f.), value about 4.0, the Expected effect-test

that yields 80% power was the one that gives X2 = 8.0; so,

twice the N, since X2 (2x2 contingency table, etc) is linear in N.

(IIRC. All from memory.)

--

Rich Ulrich

0 new messages

Search

Clear search

Close search

Google apps

Main menu