13 views

Skip to first unread message

Apr 29, 2021, 3:08:08 AM4/29/21

to

We conducted a test on two groups (A and B). We used a

15-item scale to measure the results. A cut-off score of

6 (scores ranging from 0 to 15, with the higher score being

indicative for stronger reaction) was set to differentiate

the individuals with a clinical reaction from normal individuals.

The null hypothesis is that the two groups have no difference.

The alternative hypothesis is that the reaction of the members

of Group A is greater than that of Group B.

We defined the difference = score of A - score of B.

We chose the alpha = 0.05

We got the following data summarized in the table below.

Case Mean Difference P-value 95%CI N

1 0.15 0.001 0.05-0.25 2000

2 2.10 0.005 1.25-2.95 1200

3 1.30 0.089 -2.10-3.70 400

In addition to the following analysis, what else could we

draw from the data?

Case-1:

P-value < alpha -> significant

95%CI all > 0 -> A > B

Case-2:

the same as those of A

Case-3:

P-value > alpha -> insignificant

95%CI consists of 0 -> not sure if A > B or A < B

15-item scale to measure the results. A cut-off score of

6 (scores ranging from 0 to 15, with the higher score being

indicative for stronger reaction) was set to differentiate

the individuals with a clinical reaction from normal individuals.

The null hypothesis is that the two groups have no difference.

The alternative hypothesis is that the reaction of the members

of Group A is greater than that of Group B.

We defined the difference = score of A - score of B.

We chose the alpha = 0.05

We got the following data summarized in the table below.

Case Mean Difference P-value 95%CI N

1 0.15 0.001 0.05-0.25 2000

2 2.10 0.005 1.25-2.95 1200

3 1.30 0.089 -2.10-3.70 400

In addition to the following analysis, what else could we

draw from the data?

Case-1:

P-value < alpha -> significant

95%CI all > 0 -> A > B

Case-2:

the same as those of A

Case-3:

P-value > alpha -> insignificant

95%CI consists of 0 -> not sure if A > B or A < B

Apr 29, 2021, 11:07:53 AM4/29/21

to

the results carefully, and think what other information you'd like to

have in order to make sense of it. I'd have several questions to ask,

starting with exactly what these 3 cases are (I'm pretty sure what

they're not).

Duncan

Apr 29, 2021, 12:29:24 PM4/29/21

to

Cosine 在 2021年4月29日 星期四下午3:08:08 [UTC+8] 的信中寫道：

We could intitutively connect the P-value inference with the CI inference by P-value < alpha <=> reject H0 <=> ( 1-alpha )CI doesn't consists of 0.

But is there a formal way to prove the latter part, i.e, making inference by CI?

We could also draw the conclusion of clinical significance if we have additional information on a clinically meaningful value. Then we could

say that the result is clinically significant if 1) the CI consists of that clinical measure, and 2) the width of the CI is narrow enough. Nevertheless,

are there ways to determine if the width of the CI is too wide objectively?

But is there a formal way to prove the latter part, i.e, making inference by CI?

We could also draw the conclusion of clinical significance if we have additional information on a clinically meaningful value. Then we could

say that the result is clinically significant if 1) the CI consists of that clinical measure, and 2) the width of the CI is narrow enough. Nevertheless,

are there ways to determine if the width of the CI is too wide objectively?

Apr 29, 2021, 1:21:05 PM4/29/21

to

general situation is to define the confidence interval to contain

exactly all those values for which the signifance test that the true

value is that particular value is not rejected. This is standard stuff

in any reliable text-book or statistics course.

> We could also draw the conclusion of clinical significance if we

> have additional information on a clinically meaningful value. Then we

> could say that the result is clinically significant if 1) the CI

> consists of that clinical measure, and 2) the width of the CI is

> narrow enough. Nevertheless, are there ways to determine if the width

> of the CI is too wide objectively?

importance if the confidence interval contains only values that are

large enough to be medically useful, and NO OTHERS. That last

stipulation replaces your concern about the confidence interval being

too wide.

Apr 29, 2021, 1:58:02 PM4/29/21

to

On Thu, 29 Apr 2021 00:08:06 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>We conducted a test on two groups (A and B). We used a

>15-item scale to measure the results. A cut-off score of

> 6 (scores ranging from 0 to 15, with the higher score being

> indicative for stronger reaction) was set to differentiate

>the individuals with a clinical reaction from normal individuals.

>

> The null hypothesis is that the two groups have no difference.

>The alternative hypothesis is that the reaction of the members

>of Group A is greater than that of Group B.

>

> We defined the difference = score of A - score of B.

> We chose the alpha = 0.05

>

> We got the following data summarized in the table below.

>

>Case Mean Difference P-value 95%CI N

> 1 0.15 0.001 0.05-0.25 2000

> 2 2.10 0.005 1.25-2.95 1200

> 3 1.30 0.089 -2.10-3.70 400

>

> In addition to the following analysis, what else could we

>draw from the data?

Bad reporting. Is the N a total, for equal group sizes?

Whatever the "cases" are, they are vastly different in SD.

Perhaps Case 1 has scores near zero for all. Or: It will be more

sensible if Case 1 happened to report "Average item score"

whereas the others reported "Scale Total". That would make

the adjusted line for Case 1 read

1 2.25 0.001 0.75-3.75 2000

I haven't done calculations to be sure, but that does

seem like a large SE (on all three) for the reported Ns and

a 15 point scale.

Then too, some numbers have to be wrong. For Case

3, the mean difference is the midpoint of (-1.1, 3.7), not

of the reported (-2.1, 3.7). I assume -1.1 is correct.

But, more seriously, the test results (CI) are inconsistent with

the reported p-values. The SE for each comparison, the

denominator of the t-tests, is about 1/4th the range of the

CI. Using that for a close approximation gives me t-tests of

3.0, 4.94, and 1.08, respectively. The difference for case 2

is clearly the largest, and it is smaller than "p-value = 0.005".

>

> Case-1:

> P-value < alpha -> significant

> 95%CI all > 0 -> A > B

>

> Case-2:

> the same as those of A

>

> Case-3:

> P-value > alpha -> insignificant

> 95%CI consists of 0 -> not sure if A > B or A < B

If this is a homework assignment, as Duncan suggests,

you should give credit where credit is due.

--

Rich Ulrich

wrote:

>We conducted a test on two groups (A and B). We used a

>15-item scale to measure the results. A cut-off score of

> 6 (scores ranging from 0 to 15, with the higher score being

> indicative for stronger reaction) was set to differentiate

>the individuals with a clinical reaction from normal individuals.

>

> The null hypothesis is that the two groups have no difference.

>The alternative hypothesis is that the reaction of the members

>of Group A is greater than that of Group B.

>

> We defined the difference = score of A - score of B.

> We chose the alpha = 0.05

>

> We got the following data summarized in the table below.

>

>Case Mean Difference P-value 95%CI N

> 1 0.15 0.001 0.05-0.25 2000

> 2 2.10 0.005 1.25-2.95 1200

> 3 1.30 0.089 -2.10-3.70 400

>

> In addition to the following analysis, what else could we

>draw from the data?

Whatever the "cases" are, they are vastly different in SD.

Perhaps Case 1 has scores near zero for all. Or: It will be more

sensible if Case 1 happened to report "Average item score"

whereas the others reported "Scale Total". That would make

the adjusted line for Case 1 read

1 2.25 0.001 0.75-3.75 2000

I haven't done calculations to be sure, but that does

seem like a large SE (on all three) for the reported Ns and

a 15 point scale.

Then too, some numbers have to be wrong. For Case

3, the mean difference is the midpoint of (-1.1, 3.7), not

of the reported (-2.1, 3.7). I assume -1.1 is correct.

But, more seriously, the test results (CI) are inconsistent with

the reported p-values. The SE for each comparison, the

denominator of the t-tests, is about 1/4th the range of the

CI. Using that for a close approximation gives me t-tests of

3.0, 4.94, and 1.08, respectively. The difference for case 2

is clearly the largest, and it is smaller than "p-value = 0.005".

>

> Case-1:

> P-value < alpha -> significant

> 95%CI all > 0 -> A > B

>

> Case-2:

> the same as those of A

>

> Case-3:

> P-value > alpha -> insignificant

> 95%CI consists of 0 -> not sure if A > B or A < B

you should give credit where credit is due.

--

Rich Ulrich

Apr 29, 2021, 2:45:01 PM4/29/21

to

Rich Ulrich 在 2021年4月30日 星期五上午1:58:02 [UTC+8] 的信中寫道：

This has nothing to do with homework or whatsoever.

The table came from Table I of this following paper.

Aarts, S., B. Winkens and M. van den Akker (2012). "The insignificance of statistical significance." European Journal of General Practice 18(1): 50-52.

But the 95% CI of case 3 was printed as: 21.10-3.70.

The table came from Table I of this following paper.

Aarts, S., B. Winkens and M. van den Akker (2012). "The insignificance of statistical significance." European Journal of General Practice 18(1): 50-52.

But the 95% CI of case 3 was printed as: 21.10-3.70.

Apr 29, 2021, 9:43:03 PM4/29/21

to

On Thu, 29 Apr 2021 11:44:58 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Rich Ulrich ? 2021?4?30? ?????1:58:02 [UTC+8] ??????

Without looking, I would guess that I correctly nailed the

distinction of Case 1 vs. 2 and 3. And they were trying to make

a point which turns out to ba a point about incompetent readers.

I'm reminded of an article I read, maybe 1985, that documented

the surprisingly high error rate for footnotes to scientific studies.

(9hat is - where references cited gave the wrong page, named the

journal wrong, or whatever). The next issue of the journal

included a note that apologized for three errors in the footnotes

of that article.

Or, to the point: I don't have much respect for people who talk

about "the insignificance of statistical significance".

It doesn't surprise me a bit that they carelessly screwed up a table

both logically and typographically, because such people are not

careful people.

>

>But the 95% CI of case 3 was printed as: 21.10-3.70.

Okay. You guessed wrong on the correction. Was -1.10.

not 21.10 or -2.10.

--

Rich Ulrich

wrote:

>Rich Ulrich ? 2021?4?30? ?????1:58:02 [UTC+8] ??????

distinction of Case 1 vs. 2 and 3. And they were trying to make

a point which turns out to ba a point about incompetent readers.

I'm reminded of an article I read, maybe 1985, that documented

the surprisingly high error rate for footnotes to scientific studies.

(9hat is - where references cited gave the wrong page, named the

journal wrong, or whatever). The next issue of the journal

included a note that apologized for three errors in the footnotes

of that article.

Or, to the point: I don't have much respect for people who talk

about "the insignificance of statistical significance".

It doesn't surprise me a bit that they carelessly screwed up a table

both logically and typographically, because such people are not

careful people.

>

>But the 95% CI of case 3 was printed as: 21.10-3.70.

not 21.10 or -2.10.

--

Rich Ulrich

Apr 30, 2021, 1:49:53 AM4/30/21

to

Cosine <ase...@gmail.com> wrote:

> In addition to the following analysis, what else could we

> draw from the data?

A different way thinking about what a P-value is telling you is via the
> In addition to the following analysis, what else could we

> draw from the data?

literature on estimation or calibration of posterior P-values eg Sellke

et al (2001). The argument is basically seen for a result with P=0.05,

when you have set alpha=0.05 - if what you saw is the true effect size,

then you only have a 50% chance of getting a significant result if you

repeated exactly the same study (same N etc).

For simple states of affairs,

"...here is the basic and surprising conclusion for normal testing, first

established (theoretically) by Berger and Sellke (1987). Suppose it is

known, a priori, that about 50% of the drugs tested have a negligible

effect. (We shortly consider the more general case.) Then:

"1. Of the Di for which the p value ~ .05, at least 23% (and typically

close to 50%) will have negligible effect.

"2. Of the Di for which the p value ~ .01, at least 7% (and typically

close to 15%) will have negligible effect.

If H0 and H1 have equal prior probabilities of 1/2, Sellke et al give

alpha(p) = 1/(1 + 1/(-e p log(p)))

as the posterior probability of H0, and as a frequentist calibration of

p. This is only simple for "precise" alternative hypotheses, obviously.

Relatedly, in genetic linkage analysis, where we set the critical

alpha to 0.0003 (chosen because there are 22 (pairs of) chromosomes),

the power to replicate a *true* finding using the same size and type

dataset (with P close to 0.0003) is ~20% (obtained via simulations).

You can think about the three results in your example and the

"replication crisis" through this lens.

Apr 30, 2021, 8:42:20 AM4/30/21

to

How do we determine if the width of the CI is adequate or too wide?

The corrected data of Table I is given below:

Case Mean Difference P-value 95%CI N

1 0.15 0.001 0.05-0.25 2000

2 2.10 0.005 1.25-2.95 1200

3 1.30 0.089 -1.10-3.70 400

For the data provided by the above paper, the author wrote:

Let us reconsider the above-mentioned hypothetical study. The null hypothesis states that the mean difference between females and males on the GDS-15 (scale ranging from 0 to 15) is zero. Hence, if zero is detected in the 95% CI, the null hypothesis is not rejected. Examples of possible study results, using an α of 5%, are displayed in Table I. ...

Example 2 is not only statistically significant but also clinically relevant; the difference between females and males on the GDS-15 is approximately two whole points. Moreover, the confidence interval is quite

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

narrow, which indicates that the sample size is large enough to make a proper judgement.

^^^^^^^^^

What is the basis for the author to make this judgment?

The author also wrote:

Example 3 is not statistically significant. The confidence interval in this example is very large (almost six

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

points), which makes it difficult to draw any firm conclusions. Since the confidence interval in this

^^^^^^^

Again, why could the author make this statement? What did it mean by almost 6 points?

example includes both negative and positive values, it is not yet clear if there is a difference between these two groups (if females report more depressive symptoms than males or vice versa). Consequently, this study should be repeated using a larger sample size, which will decrease the width of the confidence interval.

The corrected data of Table I is given below:

Case Mean Difference P-value 95%CI N

1 0.15 0.001 0.05-0.25 2000

2 2.10 0.005 1.25-2.95 1200

For the data provided by the above paper, the author wrote:

Let us reconsider the above-mentioned hypothetical study. The null hypothesis states that the mean difference between females and males on the GDS-15 (scale ranging from 0 to 15) is zero. Hence, if zero is detected in the 95% CI, the null hypothesis is not rejected. Examples of possible study results, using an α of 5%, are displayed in Table I. ...

Example 2 is not only statistically significant but also clinically relevant; the difference between females and males on the GDS-15 is approximately two whole points. Moreover, the confidence interval is quite

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

narrow, which indicates that the sample size is large enough to make a proper judgement.

^^^^^^^^^

What is the basis for the author to make this judgment?

The author also wrote:

Example 3 is not statistically significant. The confidence interval in this example is very large (almost six

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

points), which makes it difficult to draw any firm conclusions. Since the confidence interval in this

^^^^^^^

Again, why could the author make this statement? What did it mean by almost 6 points?

example includes both negative and positive values, it is not yet clear if there is a difference between these two groups (if females report more depressive symptoms than males or vice versa). Consequently, this study should be repeated using a larger sample size, which will decrease the width of the confidence interval.

May 1, 2021, 7:36:06 PM5/1/21

to

On Fri, 30 Apr 2021 05:42:17 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>How do we determine if the width of the CI is adequate or too wide?

>

> The corrected data of Table I is given below:

>

>Case Mean Difference P-value 95%CI N

> 1 0.15 0.001 0.05-0.25 2000

> 2 2.10 0.005 1.25-2.95 1200

> 3 1.30 0.089 -1.10-3.70 400

>

Here's some computation showing Cohen's d for each Case.

Cohen's d is the ueual recommendation for two-group

comparisons of effect size. That seems very relevant to the

reported title of that paper.

Cohen's d = (m1-m2) / s_w for the Means and Within SD.

The s_w can be recovered from the t-test: note, the t is

incorporated in the computation of the CI, approximately

+/- 2 (easier than 1.96) for the 95% CI.

t-test t= (m1-m2)/ s_diff where I compute the standard error of

the difference, using the common s_w for Case 3, N= 400 as 200+200:

The variance of a difference is equal to the sum of the variances,

thus,

s_diff= sqrt( s_w**2 /200 + s_w**2 /200)

= sqrt( 2* s_w**2 /200)

= s_w /10

Or, s_w= 10* s_diff .

For Case 3, the range for +/- 1.96 is about 4* s_diff.

For Case 3, the range is 4.8, so that s_diff is 1.2.

Thus s_w is computed as 10 times that, or 12.

Cohen's d would be a "small" effect, 1.1 (from 1.3/12; but

that is less relevant than the fact that "12" is impossible as the

SD for scores between (0,15) -- If all scores are at 0 and

15, equally distributed, the maximum SD of 7.5 is achieved,

as you get by re-scaling of a 0-1 variable to 0-15.

Computations for Cases 1 and 2 get s_w's of 1.12 and 7.36

(nearly the max of 7.5); and Cohen's d's, respectively, of 0.13

and 0.29. Case 2 has a moderate difference.

I don't like to criticize a paper from a distance, that is, without

actually reading it. I'm using the numbers and description,

as given.

Am I all confused, and screwing up? or is this example, as

it has been presented, totally bad?

>For the data provided by the above paper, the author wrote:

>

>Let us reconsider the above-mentioned hypothetical study. The null hypothesis states that the mean difference between females and males on the GDS-15 (scale ranging from 0 to 15) is zero. Hence, if zero is detected in the 95% CI, the null hypothesis is not rejected. Examples of possible study results, using an ? of 5%, are displayed in Table I. ...

Females rate higher on typical depression scales (U.S.)

because of non-depressive artifacts, like, TALKING more

with people about everything, including mood. Women

also see doctors more often, which is not entirely accounted

for by pregnancy or menustration. Thus - such results as

these be followed by showing that there are items that

/matter/ that are relevant and differ.

>

> The author also wrote:

>Example 3 is not statistically significant. The confidence interval in this example is very large (almost six

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>points), which makes it difficult to draw any firm conclusions. Since the confidence interval in this

>^^^^^^^

> Again, why could the author make this statement? What did it mean by almost 6 points?

That's what he calls, 4.8. "Clumsy" makes many mistakes.

"Careless" fails to catch them.

>

>example includes both negative and positive values, it is not yet clear if there is a difference between these two groups (if females report more depressive symptoms than males or vice versa). Consequently, this study should be repeated using a larger sample size, which will decrease the width of the confidence interval.

>

They should have started with real data.

--

Rich Ulrich

wrote:

>How do we determine if the width of the CI is adequate or too wide?

>

> The corrected data of Table I is given below:

>

>Case Mean Difference P-value 95%CI N

> 1 0.15 0.001 0.05-0.25 2000

> 2 2.10 0.005 1.25-2.95 1200

> 3 1.30 0.089 -1.10-3.70 400

>

Cohen's d is the ueual recommendation for two-group

comparisons of effect size. That seems very relevant to the

reported title of that paper.

Cohen's d = (m1-m2) / s_w for the Means and Within SD.

The s_w can be recovered from the t-test: note, the t is

incorporated in the computation of the CI, approximately

+/- 2 (easier than 1.96) for the 95% CI.

t-test t= (m1-m2)/ s_diff where I compute the standard error of

the difference, using the common s_w for Case 3, N= 400 as 200+200:

The variance of a difference is equal to the sum of the variances,

thus,

s_diff= sqrt( s_w**2 /200 + s_w**2 /200)

= sqrt( 2* s_w**2 /200)

= s_w /10

Or, s_w= 10* s_diff .

For Case 3, the range for +/- 1.96 is about 4* s_diff.

For Case 3, the range is 4.8, so that s_diff is 1.2.

Thus s_w is computed as 10 times that, or 12.

Cohen's d would be a "small" effect, 1.1 (from 1.3/12; but

that is less relevant than the fact that "12" is impossible as the

SD for scores between (0,15) -- If all scores are at 0 and

15, equally distributed, the maximum SD of 7.5 is achieved,

as you get by re-scaling of a 0-1 variable to 0-15.

Computations for Cases 1 and 2 get s_w's of 1.12 and 7.36

(nearly the max of 7.5); and Cohen's d's, respectively, of 0.13

and 0.29. Case 2 has a moderate difference.

I don't like to criticize a paper from a distance, that is, without

actually reading it. I'm using the numbers and description,

as given.

Am I all confused, and screwing up? or is this example, as

it has been presented, totally bad?

>For the data provided by the above paper, the author wrote:

>

>Example 2 is not only statistically significant but also clinically relevant; the difference between females and males on the GDS-15 is approximately two whole points. Moreover, the confidence interval is quite

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>narrow, which indicates that the sample size is large enough to make a proper judgement.

>^^^^^^^^^

> What is the basis for the author to make this judgment?

Knowing the subject matter (almost) always matters.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>narrow, which indicates that the sample size is large enough to make a proper judgement.

>^^^^^^^^^

> What is the basis for the author to make this judgment?

Females rate higher on typical depression scales (U.S.)

because of non-depressive artifacts, like, TALKING more

with people about everything, including mood. Women

also see doctors more often, which is not entirely accounted

for by pregnancy or menustration. Thus - such results as

these be followed by showing that there are items that

/matter/ that are relevant and differ.

>

> The author also wrote:

>Example 3 is not statistically significant. The confidence interval in this example is very large (almost six

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>points), which makes it difficult to draw any firm conclusions. Since the confidence interval in this

>^^^^^^^

> Again, why could the author make this statement? What did it mean by almost 6 points?

"Careless" fails to catch them.

>

>example includes both negative and positive values, it is not yet clear if there is a difference between these two groups (if females report more depressive symptoms than males or vice versa). Consequently, this study should be repeated using a larger sample size, which will decrease the width of the confidence interval.

>

--

Rich Ulrich

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu