# Sample Size Again!

40 views

### John Whittington

Feb 8, 2022, 5:18:12 PMFeb 8
Hi folks, I hope that all is well with you all, and all of your
'yours', and (a little belatedly) that you all have a healthy, happy
and successful 2022.

This time, it's all pretty basic and straightforward 'sample size'
issue (essentially just t-tests), and I'm just seeking reassurance!

I'm playing with some sample sizes estimates for paired differences
of hypothesised Normal distributions with various variances. I think
that one-sided tests are appropriate, although it actually makes very
little difference with my figures.

With a simplified, but typical, example, I have a distribution with a
mean of 26 and an SD of 13. With that, I think that the sample size
required to give 95% power to reject H0:difference =<0 with
(one-tailed) p=0.0001 is about N=14 (differences).

Moving on for that, what if I want the sample size to reject, say,
H0:difference =<20 (with the same parameters)? If I have got it
right, that results in a sample size estimate of about N=19 - and,
similarly, about N=56 to reject H0:difference =<10 (again, with same
parameters. Is that correct? If not, "please advise" !

Thanks for any reassurance (and/or 'education'!).

Kind Regards,
John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

### Marc Schwartz

Feb 8, 2022, 5:46:30 PMFeb 8
to John Whittington, meds...@googlegroups.com
Hi John,

Good to hear from you!

Here are the three results using R:

> power.t.test(n = NULL, delta = 26, sd = 13,

sig.level = 0.0001,

power = 0.95, type = "one.sample",

alternative = "one.sided")

One-sample t test power calculation

n = 13.74222

delta = 26

sd = 13

sig.level = 1e-04

power = 0.95

alternative = one.sided

> power.t.test(n = NULL, delta = 20, sd = 13,

sig.level = 0.0001,

power = 0.95, type = "one.sample",

alternative = "one.sided")

One-sample t test power calculation

n = 18.87723

delta = 20

sd = 13

sig.level = 1e-04

power = 0.95

alternative = one.sided

> power.t.test(n = NULL, delta = 10, sd = 13,

sig.level = 0.0001,

power = 0.95, type = "one.sample",

alternative = "one.sided")

One-sample t test power calculation

n = 55.51463

delta = 10

sd = 13

sig.level = 1e-04

power = 0.95

alternative = one.sided

So, yes, you appear to be on the right track... :-)

Regards,

Marc

### John Whittington

Feb 8, 2022, 5:58:11 PMFeb 8
Hi Marc,

Many thanks for your very rapid response.  That sounds reassuring enough for me - in fact, once one has rounded up your figures, they couldn't possibly be any more reassuring :-)

Apologies for asking such a simple question!

Kindest Regards,
John
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/etPan.6202f2bf.73dbb9d1.13655%40me.com .

### Abhaya Indrayan

Feb 8, 2022, 11:45:07 PMFeb 8
The sample size calculations done by Marc are for detecting a mean difference of 26, 20, and 10 respectively with a power of 95% and significance level 0.0001  when SD = 13. I am not clear whether John's question was this. In his first case, Ho: (mean) difference <= 0, in the second case Ho: (mean) difference <= 20, and in the third case Ho: (mean) difference <= 10. Note the mean difference in the first case.

I formulate a sample size problem in case of testing as to how much effect is to be detected (if present), denoted by delta.

~Abhaya

### Rich Ulrich

Feb 9, 2022, 12:44:25 AMFeb 9
I have the same problem as Abhaya. John does not describe the
problem in a consistent way, so I was happy to see that someone
made sense of it.

The problem as stated said that (reordering) for H0s of
<0, <10, and <20, the Ns would be 14, 56, and 19. Not good.

As solved, it would be for deltas  <26, <20, <10, yielding Ns
that are consistent, 14, 19 and 56.

--
Rich Ulrich

From: meds...@googlegroups.com <meds...@googlegroups.com> on behalf of Abhaya Indrayan <a.ind...@gmail.com>
Sent: Tuesday, February 8, 2022 11:44 PM
Subject: Re: {MEDSTATS} Sample Size Again!

### John Whittington

Feb 9, 2022, 11:54:30 AMFeb 9
At 04:44 09/02/2022, Abhaya Indrayan wrote:
The sample size calculations done by Marc are for detecting a mean
difference of 26, 20, and 10 respectively with a power of 95% and
significance level 0.0001 when SD = 13. I am not clear whether
John's question was this. In his first case, Ho: (mean) difference <=
0, in the second case Ho: (mean) difference <= 20, and in the third
case Ho: (mean) difference <= 10. Note the mean difference in the
first case. .... I formulate a sample size problem in case of testing
as to how much effect is to be detected (if present), denoted by delta.

At 05:44 09/02/2022, Rich Ulrich wrote:
I have the same problem as Abhaya. John does not describe the problem
in a consistent way, so I was happy to see that someone made sense of it.
The problem as stated said that (reordering) for H0s of <0, <10, and
<20, the Ns would be 14, 56, and 19. Not good.
As solved, it would be for deltas <26, <20, <10, yielding Ns that
are consistent, 14, 19 and 56.

Thanks both. This goes to show that 'simple questions' are not
necessarily all that simple (particularly when badly formulated!),
and perhaps explains why I felt the need to ask the question! As
evidence of that, I'm now getting myself rather confused,and it seems
that my problem relates to the way in which I specified the null hypotheses.

Whether because he is a psychic or whatever, Marc's calculations
corresponded to what I was talking about (and the calcuklations I had
done myself) - in prose "the sample sizes needed for one to be
("99.99%") confident that the population mean difference was (a) >0,
(b) >10 and (c) >20, respectively " and it stands to reason that the
required sample sizes would increase as one moved from (a) through
(b) to (c) - per both my and Marc's calculations It therefore seems
that the problem is simply that I expressed the H0s incorrectly - is
that the case?

I have more to say/ask, but it's probably best if I first wait for
comments at this stage before saying anything more (or making more of
a fool of myself!).

Kindest Regards,
John

### Abhaya Indrayan

Feb 9, 2022, 8:10:54 PMFeb 9
John:

As I mentioned earlier, the way I formulate a sample size problem in the case of testing of hypothesis is what effect size is aimed to be detected (if present). The power depends on a specific value under the alternative and not more than theta or less than theta. Power requires an exact value of delta    (although it may imply generalization to less than or more than). Whereas the significance level depends on the null, the power depends on the alternative.

May I suggest to formulate the problem as to what effect is aimed to be detected?

Regards.

~Abhaya

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

### John Whittington

Feb 9, 2022, 8:30:22 PMFeb 9
At 01:10 10/02/2022, Abhaya Indrayan wrote:
As I mentioned earlier, the way I formulate a sample size problem in the case of testing of hypothesis is what effect size is aimed to be detected (if present). The power depends on a specific value under the alternative and not more than theta or less than theta. Power requires an exact value of delta  (although it may imply generalization to less than or more than). Whereas the significance level depends on the null, the power depends on the alternative. ... May I suggest to formulate the problem as to what effect is aimed to be detected?

Thanks Abhaya.  As I implied, I think my problem is in formulating/stating the (null and alternative) hypotheses when that data in question consists of within-pair differences, which (at least to me) makes the concept of 'effect size' less clear cut.

Taking one of my example scenarios, there is no doubt that the hypothesis I want to reject is "mean within-pair difference =< 10" - so, if that is my null hypothesis, I would (maybe naively) assume that the corresponding alternative hypothesis would simply be "mean within-pair difference > 10".  Is that not correct?

I think I have adequately explained what I want to achieve - namely to determine the sample size required to have a 95% power to show that the mean within-pair difference is not 10 or less, with p=0.0001.  How do you think I should approach that?

Kind Regards,
John

On Wed, Feb 9, 2022 at 10:24 PM John Whittington <Joh...@mediscience.co.uk > wrote:
At 04:44 09/02/2022, Abhaya Indrayan wrote:
The sample size calculations done by Marc are for detecting a mean
difference of 26, 20, and 10 respectively with a power of 95% and
significance level 0.0001Â  when SD = 13. I am not clear whether
John's question was this. In his first case, Ho: (mean) difference <=
0, in the second case Ho: (mean) difference <= 20, and in the third
case Ho: (mean) difference <= 10. Note the mean difference in the
first case. .... I formulate a sample size problem in case of testing
as to how much effect is to be detected (if present), denoted by delta.

At 05:44 09/02/2022, Rich Ulrich wrote:
I have the same problem as Abhaya. John does not describe the problem
in a consistent way, so I was happy to see that someone made sense of it.
The problem as stated said that (reordering) for H0s of <0, <10, and
<20, the Ns would be 14, 56, and 19. Not good.
As solved, it would be for deltasÂ  <26, <20, <10, yielding Ns that
are consistent, 14, 19 and 56.

Thanks both.Â  This goes to show that 'simple questions' are not
necessarily all that simple (particularly when badly formulated!),
and perhaps explains why I felt the need to ask the question!Â  As
evidence of that, I'm now getting myself rather confused,and it seems
that my problem relates to the way in which I specified the null hypotheses.

Whether because he is a psychic or whatever, Marc's calculations
corresponded to what I was talking about (and the calcuklations I had
done myself) - in prose "the sample sizes needed for one to be
("99.99%") confident that the population mean difference was (a) >0,
(b) >10 and (c) >20, respectively " and it stands to reason that the
required sample sizes would increase as one moved from (a) through
(b) to (c) - per both my and Marc's calculationsÂ  It therefore seems
that the problem is simply that I expressed the H0s incorrectly - is
that the case?

I have more to say/ask, but it's probably best if I first wait for
comments at this stage before saying anything more (or making more of
a fool of myself!).

Kindest Regards,
John

On Wed, Feb 9, 2022 at 4:28 AM John Whittington
< Joh...@mediscience.co.uk> wrote:
>Hi Marc,
>
>Many thanks for your very rapid response.Â  That sounds reassuring
>enough for me - in fact, once one has rounded up your figures, they
>couldn't possibly be any more reassuring :-)
>
>Apologies for asking such a simple question!
>
>Kindest Regards,
>John
>
>At 22:46 08/02/2022, 'Marc Schwartz' via MedStats wrote:
>>Hi John,
>>
>>Good to hear from you!
>>
>>Here are the three results using R:
>>
>> > power.t.test(n = NULL, delta = 26, sd = 13,
>>
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â  alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 13.74222
>>Â  Â  Â  Â  Â  Â delta = 26
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>Â  Â  Â alternative = one.sided
>>
>> > power.t.test(n = NULL, delta = 20, sd = 13,
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â  alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 18.87723
>>Â  Â  Â  Â  Â  Â delta = 20
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>Â  Â  Â alternative = one.sided
>>
>> > power.t.test(n = NULL, delta = 10, sd = 13,
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 55.51463
>>Â  Â  Â  Â  Â  Â delta = 10
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>
>>Â  Â  Â alternative = one.sided
>>
>>
>>So, yes, you appear to be on the right track... :-)
>>
>>Regards,
>>
>>Marc
>>
>>On February 8, 2022 at 5:17:16 PM, John Whittington
>>>Hi folks, I hope that all is well with you all, and all of your
>>>'yours', and (a little belatedly) that you all have a healthy, happy
>>>and successful 2022.
>>>
>>>This time, it's all pretty basic and straightforward 'sample size'
>>>issue (essentially just t-tests), and I'm just seeking reassurance!
>>>
>>>I'm playing with some sample sizes estimates for paired differences
>>>of hypothesised Normal distributions with various variances. I think
>>>that one-sided tests are appropriate, although it actually makes very
>>>little difference with my figures.
>>>
>>>With a simplified, but typical, example, I have a distribution with a
>>>mean of 26 and an SD of 13. With that, I think that the sample size
>>>required to give 95% power to reject H0:difference =<0 with
>>>(one-tailed) p=0.0001 is about N=14 (differences).
>>>
>>>Moving on for that, what if I want the sample size to reject, say,
>>>H0:difference =<20 (with the same parameters)? If I have got it
>>>right, that results in a sample size estimate of about N=19 - and,
>>>similarly, about N=56 to reject H0:difference =<10 (again, with same
>>>parameters. Is that correct? If not, "please advise" !
>>>
>>>Thanks for any reassurance (and/or 'education'!).
>>>
>>>Kind Regards,
>>>John

John

----------------------------------------------------------------
Dr John Whittington,Â  Â  Â  Â Voice:Â  Â  +44 (0) 1296 730225
Mediscience ServicesÂ  Â  Â  Â Fax:Â  Â  Â  +44 (0) 1296 738893
Twyford Manor, Twyford,Â  Â  E-mail:Â  Â Joh...@mediscience.co.uk
BuckinghamÂ  MK18 4EL, UK

----------------------------------------------------------------

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/202202091654.219GsO18000577%40mail194c50.megamailservers.eu .

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

### Rich Ulrich

Feb 10, 2022, 12:14:35 AMFeb 10
A power statement has a set of parameters; you GIVE all but
the one you will solve for. You GIVE a test by stating both the
name of the test and the alpha.

Thus, I work from something like this.

For a given test and alpha error [one-tailed t-test at p= 0.0001],
what is the sample size [solve for N]
required to GIVE a stated power [95%]
for this GIVEN effect size?  [Three: 26/13; 20/13; 10/13]

For clinical psychiatric research, I most often provided the PI
with tables for a 5% test at 80% power and 90% or 95% power,
showing Ns for several effect sizes.

By the way, none of my power analyses ever used alpha smaller
than 1%, but I am aware that the t and F distributions are
increasingly LESS accurate in real data for alphas smaller than that.
Does this bother folks who want to use p= 0.0001?

I think the discrepancy in two-group t occurs when there are
short-tail or long-tail distributions in the samples, which (IIRC)
yield the opposite excesses for the t's.  (On randomly generated
samples, excess big t's arise when the 'random' standard deviations
happen to be small.)   I never checked what happens for one-sample t.

--
Rich Ulrich

Sent: Wednesday, February 9, 2022 8:10 PM

Subject: Re: {MEDSTATS} Sample Size Again!

### Abhaya Indrayan

Feb 10, 2022, 5:54:27 AMFeb 10
I would like to respond to John as follows:

In your case, the effect size is the mean difference. Let us keep that aside.

To detect a minimum mean difference delta = 10 with a power of at least 95% and significance level (one-tail) 0.0001, I get a sample size of a minimum of  49 when the SD = 13. This sample size almost surely will not miss a mean difference of 10 or more (if present) but can miss if the mean difference is less than 10.

The other point is that the sample size calculations are for detecting a specified effect when present. If not present, no big sample size will not be able to detect it. That is my understanding and I hope I am not wrong.

~Abhaya

### John Whittington

Feb 10, 2022, 11:02:11 AMFeb 10
Thanks again, Rich.  However, I'm getting more confused, because I totally agree with everything you've written, and hence am not sure what point you are making.  Perhaps you can help me understand?

At 05:14 10/02/2022, Rich Ulrich wrote:
A power statement has a set of parameters; you GIVE all but the one you will solve for. You GIVE a test by stating both the name of the test and the alpha.

Totally agreed.

Thus, I work from something like this.  For a given test and alpha error [one-tailed t-test at p= 0.0001],  what is the sample size [solve for N] required to GIVE a stated power [95%]
for this GIVEN effect size?  [Three: 26/13; 20/13; 10/13]

Is that not exactly what I did (perhaps give or take the discussion about specification of the effect sizes)?

For clinical psychiatric research, I most often provided the PI with tables for a 5% test at 80% power and 90% or 95% power,  showing Ns for several effect sizes.

For all clinical research (and in relation to the matter I'm discussing here), I do exactly the same, usually graphically as well as in tabular form - and often (as in current case) having to extend that to illustrate the effect of different SDs, different levels of alpha and, sometimes, both one- and two-tailed tests.

By the way, none of my power analyses ever used alpha smaller than 1%, but I am aware that the t and F distributions are increasingly LESS accurate in real data for alphas smaller than that.
Does this bother folks who want to use p= 0.0001?

Like you, in terms of clinical re/search, I don';t think I would ever use alpha less than 1%, and it is very rare for me to use powers greater than 90%.  However, those conventions arise primarily out of practicalities, given the usually 'modest' (or worse!) effect sizes and the practical, ethical and cost (in the widest sense) considerations which preclude massive trials.  If we could undertake clinical trials of a realistic size when designed with a power of 99% and an alpha of 0.0001, I'm sure that would become the accepted/conventional practice - but that, of course, is not 'how it is'!!

I maybe muddied the waters by mentioning 'real' (in context) parameters, since I could have just as easily asked my question in terms of 80% power and alpha=0.05.  The main issue about what I am talking about is that the expected effect size is massive, far greater than anything that would be seen clinically - thereby facilitating the hypothetical (but impossible) 'Utopian' scenario I mention above in relation to clinical research.

I didn't think it was going to be necessary to explain the context, but now it seems that I need to.  The 'differences' are between two measurements of the same quantity (by allegedly, but clearly not actually, very similar methodologies).  There is very strong circumstantial evidence to suggest that the mean difference will probably be around 26 units, but currently with an unknown distribution of the difference.

If, as I suspect, the difference is pretty consistent (i.e. pretty small SD), then the required sample size would be tiny for almost any credible power and alpha (even ones as extreme as I mentioned) for effect sizes of interest.  However, I have been asked to consider 'worst case scenarios' and to indicate what sample sizes would be required to provide "overwhelming evidence" of the difference in those scenarios, both in relation to 'any' (positive) difference (i.e. difference >0) and also differences "not less than X".

So, what I presented to you was the most extreme case of the wide range of scenarios I intend to present, such that (in terms of the 'worst cases' and the desire for 'overwhelming evidence') I can conclude something along the lines of "EVEN IF the SD were as great as 13 units, and EVEN IF one wanted 95% power to detect a specified level of difference with p=0.0001, then the sample size required would be ("only"!!) N".  I say "only" because if my suspicion that SD will be pretty small is correct, then than N is likely to be very small even with such 'extreme' parameters.

Does that help, or alter what you feel you should say to me?

Kindest Regards,
John

that the case?

a fool of myself!).

Kindest Regards,
John

< Joh...@mediscience.co.uk> wrote:
>Hi Marc,
>
>
>
>Kindest Regards,
>John
>
>>Hi John,
>>
>>
>>
>>
>>                sig.level = 0.0001,
>>                alternative = "one.sided")
>>
>>
>>               n = 13.74222
>>           delta = 26
>>              sd = 13
>>
>>       sig.level = 1e-04
>>           power = 0.95
>>     alternative = one.sided
>>
>>                sig.level = 0.0001,
>>                alternative = "one.sided")
>>
>>
>>               n = 18.87723
>>           delta = 20
>>              sd = 13
>>       sig.level = 1e-04
>>           power = 0.95
>>     alternative = one.sided
>>
>>                sig.level = 0.0001,
>>               alternative = "one.sided")
>>
>>
>>               n = 55.51463
>>           delta = 10
>>              sd = 13
>>       sig.level = 1e-04
>>           power = 0.95
>>
>>     alternative = one.sided
>>
>>
>>
>>Regards,
>>
>>Marc
>>

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

### John Whittington

Feb 10, 2022, 12:42:09 PMFeb 10
At 10:54 10/02/2022, Abhaya Indrayan wrote:
To detect a minimum mean difference delta = 10 with a power of at least 95% and significance level (one-tail) 0.0001, I get a sample size of a minimum of  49 when the SD = 13. This sample size almost surely will not miss a mean difference of 10 or more (if present) but can miss if the mean differenceÂ is less than 10.

Agreed, albeit Marc and myself got 56, rather than 49.

The other point is that the sample size calculations are for detecting a specified effect when present. If not present, no big sample size will not be able to detect it. That is my understanding and I hope I am not wrong.

That is obviously true, provided that "not present" means EXACTLY zero.  If the effect is non-zero (even if incredibly close to zero) one can get whatever power one wants, with any alpha, to detect the effect if the sample size is large enough.

For example, if one can believe my software with such extreme parameters, with our SD of 13, a power of at least 95% could be achieved to detect an effect of 0.0001 using a one-tailed t-test with alpha=0.0001 with a sample size of N = 5,178,280,000

Kindest Regards,
John

On Thu, Feb 10, 2022 at 10:44 AM Rich Ulrich <rich-...@live.com> wrote:
A power statement has a set of parameters; you GIVE all but
the one you will solve for. You GIVE a test by stating both the
name of the test and the alpha.

Thus, I work from something like this.Â Â

For a given test and alpha error [one-tailed t-test at p= 0.0001],
what is the sample size [solve for N]
required to GIVE a stated power [95%]
for this GIVEN effect size?Â  [Three: 26/13; 20/13; 10/13]

For clinical psychiatric research, I most often provided the PI
with tables for a 5% test at 80% power and 90% or 95% power,
showing Ns for several effect sizes.

By the way, none of my power analyses ever used alpha smaller
than 1%, but I am aware that the t and F distributions are
increasingly LESS accurate in real data for alphas smaller than that.
Does this bother folks who want to use p= 0.0001?

I think the discrepancy in two-group t occurs when there are
short-tail or long-tail distributions in the samples, which (IIRC)
yield the opposite excesses for the t's.Â  (On randomly generated
samples, excess big t's arise when the 'random' standard deviations
happen to be small.) Â  I never checked what happens for one-sample t.

--
Rich Ulrich

From: meds...@googlegroups.com < meds...@googlegroups.com> on behalf of Abhaya Indrayan <a.ind...@gmail.com>
Sent: Wednesday, February 9, 2022 8:10 PM
Subject: Re: {MEDSTATS} Sample Size Again!

Â
John:

As I mentioned earlier, the way I formulate a sample size problem in the case of testing of hypothesis is what effect size is aimed to be detected (if present). The power depends on a specific value under the alternative and not more than theta or less than theta. Power requires an exact valueÂ of deltaÂ  Â  (although it may imply generalization to less than or more than). Whereas the significance levelÂ depends on the null, the power depends on the alternative.

May I suggest to formulate the problem as to what effect is aimed to be detected?

Regards.

~Abhaya

On Wed, Feb 9, 2022 at 10:24 PM John Whittington <Joh...@mediscience.co.uk > wrote:
At 04:44 09/02/2022, Abhaya Indrayan wrote:
The sample size calculations done by Marc are for detecting a mean
difference of 26, 20, and 10 respectively with a power of 95% and
significance level 0.0001Â  when SD = 13. I am not clear whether
John's question was this. In his first case, Ho: (mean) difference <=
0, in the second case Ho: (mean) difference <= 20, and in the third
case Ho: (mean) difference <= 10. Note the mean difference in the
first case. .... I formulate a sample size problem in case of testing
as to how much effect is to be detected (if present), denoted by delta.

At 05:44 09/02/2022, Rich Ulrich wrote:
I have the same problem as Abhaya. John does not describe the problem
in a consistent way, so I was happy to see that someone made sense of it.
The problem as stated said that (reordering) for H0s of <0, <10, and
<20, the Ns would be 14, 56, and 19. Not good.
As solved, it would be for deltasÂ  <26, <20, <10, yielding Ns that
are consistent, 14, 19 and 56.

Thanks both.Â  This goes to show that 'simple questions' are not
necessarily all that simple (particularly when badly formulated!),
and perhaps explains why I felt the need to ask the question!Â  As
evidence of that, I'm now getting myself rather confused,and it seems
that my problem relates to the way in which I specified the null hypotheses.

Whether because he is a psychic or whatever, Marc's calculations
corresponded to what I was talking about (and the calcuklations I had
done myself) - in prose "the sample sizes needed for one to be
("99.99%") confident that the population mean difference was (a) >0,
(b) >10 and (c) >20, respectively " and it stands to reason that the
required sample sizes would increase as one moved from (a) through
(b) to (c) - per both my and Marc's calculationsÂ  It therefore seems
that the problem is simply that I expressed the H0s incorrectly - is
that the case?

I have more to say/ask, but it's probably best if I first wait for
comments at this stage before saying anything more (or making more of
a fool of myself!).

Kindest Regards,
John

On Wed, Feb 9, 2022 at 4:28 AM John Whittington
< Joh...@mediscience.co.uk> wrote:
>Hi Marc,
>
>Many thanks for your very rapid response.Â  That sounds reassuring
>enough for me - in fact, once one has rounded up your figures, they
>couldn't possibly be any more reassuring :-)
>
>Apologies for asking such a simple question!
>
>Kindest Regards,
>John
>
>At 22:46 08/02/2022, 'Marc Schwartz' via MedStats wrote:
>>Hi John,
>>
>>Good to hear from you!
>>
>>Here are the three results using R:
>>
>> > power.t.test(n = NULL, delta = 26, sd = 13,
>>
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â  alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 13.74222
>>Â  Â  Â  Â  Â  Â delta = 26
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>Â  Â  Â alternative = one.sided
>>
>> > power.t.test(n = NULL, delta = 20, sd = 13,
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â  alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 18.87723
>>Â  Â  Â  Â  Â  Â delta = 20
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>Â  Â  Â alternative = one.sided
>>
>> > power.t.test(n = NULL, delta = 10, sd = 13,
>>Â  Â  Â  Â  Â  Â  Â  Â  sig.level = 0.0001,
>>Â  Â  Â  Â  Â  Â  Â  Â  power = 0.95, type = "one.sample",
>>Â  Â  Â  Â  Â  Â  Â  Â alternative = "one.sided")
>>
>>Â  Â  Â  One-sample t test power calculation
>>
>>Â  Â  Â  Â  Â  Â  Â  Â n = 55.51463
>>Â  Â  Â  Â  Â  Â delta = 10
>>Â  Â  Â  Â  Â  Â  Â  sd = 13
>>Â  Â  Â  Â sig.level = 1e-04
>>Â  Â  Â  Â  Â  Â power = 0.95
>>
>>Â  Â  Â alternative = one.sided
>>
>>
>>So, yes, you appear to be on the right track... :-)
>>
>>Regards,
>>
>>Marc
>>
>>On February 8, 2022 at 5:17:16 PM, John Whittington
>>>Hi folks, I hope that all is well with you all, and all of your
>>>'yours', and (a little belatedly) that you all have a healthy, happy
>>>and successful 2022.
>>>
>>>This time, it's all pretty basic and straightforward 'sample size'
>>>issue (essentially just t-tests), and I'm just seeking reassurance!
>>>
>>>I'm playing with some sample sizes estimates for paired differences
>>>of hypothesised Normal distributions with various variances. I think
>>>that one-sided tests are appropriate, although it actually makes very
>>>little difference with my figures.
>>>
>>>With a simplified, but typical, example, I have a distribution with a
>>>mean of 26 and an SD of 13. With that, I think that the sample size
>>>required to give 95% power to reject H0:difference =<0 with
>>>(one-tailed) p=0.0001 is about N=14 (differences).
>>>
>>>Moving on for that, what if I want the sample size to reject, say,
>>>H0:difference =<20 (with the same parameters)? If I have got it
>>>right, that results in a sample size estimate of about N=19 - and,
>>>similarly, about N=56 to reject H0:difference =<10 (again, with same
>>>parameters. Is that correct? If not, "please advise" !
>>>
>>>Thanks for any reassurance (and/or 'education'!).
>>>
>>>Kind Regards,
>>>John

John

----------------------------------------------------------------
Dr John Whittington,Â  Â  Â  Â Voice:Â  Â  +44 (0) 1296 730225
Mediscience ServicesÂ  Â  Â  Â Fax:Â  Â  Â  +44 (0) 1296 738893
Twyford Manor, Twyford,Â  Â  E-mail:Â  Â Joh...@mediscience.co.uk
BuckinghamÂ  MK18 4EL, UK

----------------------------------------------------------------

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/202202091654.219GsO18000577%40mail194c50.megamailservers.eu .

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

### Rich Ulrich

Feb 10, 2022, 2:17:02 PMFeb 10
John,
"Thanks again, Rich.  However, I'm getting more confused, because I totally agree with everything you've written, and hence am not sure what point you are making.  Perhaps you can help me understand?"

Okay - You seemed to have the right pieces, but they have
been scattered around.  My post was providing a model -
change my words a bit, make your statement like that.

Your addressing of "effect size" has been awkward.

And I offered a caution about p = 0.0001 -- I HAVE seen power
analyses implied for tiny alpha, by folks counting events for
sub-atomic particles, or for astronomical observations. Every
time I know of, they are using so-called simple "exact statistics",
so their tests do not have the far-tail inaccuracy of a t-test.

--
Rich Ulrich

### Abhaya Indrayan

Feb 10, 2022, 8:20:56 PMFeb 10
On Thu, Feb 10, 2022 at 11:12 PM John Whittington <Joh...@mediscience.co.uk> wrote:

Agreed, albeit Marc and myself got 56, rather than 49.

I am getting 49.

The other point is that the sample size calculations are for detecting a specified effect when present. If not present, no big sample size will not be able to detect it. That is my understanding and I hope I am not wrong.
The extra 'not' was an error.

That is obviously true, provided that "not present" means EXACTLY zero.  If the effect is non-zero (even if incredibly close to zero) one can get whatever power one wants, with any alpha, to detect the effect if the sample size is large enough.

Yes, this is obvious for diffrence = 0. This is where we have to be careful. I said 'specified' effect when present. In this case this is 10 and not zero. If the diffrence in the population is less than 10, no sample size will be able to detect a diffrence of at least 10. At least that is my understanding. I would like to be wiser on this count.

~Abhaya

### John Whittington

Feb 11, 2022, 10:59:34 AMFeb 11
At 01:20 11/02/2022, Abhaya Indrayan wrote:
The extra 'not' was an error.

Yes, I assumed that.

Yes, this is obvious for diffrence = 0. This is where we have to be careful. I said 'specified' effect when present. In this case this is 10 and not zero. If the diffrence in the population is less than 10, no sample size will be able to detect a diffrence of at least 10. At least that is my understanding. I would like to be wiser on this count.

[ when you talk of "detecting a difference", I presume you are referring to rejecting the corresponding  null hypothesis ]

My comment was a general one, relating to (the more common) two-tailed hypotheses.  In that situation, I'm sure that what I said was correct - i.e. that if we have HO:mean = 0, then if there is any finite effect in the population (even if incredibly close to zero) then a sufficiently large sample size will enable one to have the desired (any) power to reject that null hypothesis, with whatever (any) alpha one might desire.

In the current context of one-tailed hypotheses, the equivalent statement would be, for example .... "if we have HO:mean => 10, then if the effect in the population has any magnitude greater than 10 (even if incredibly close to 10) then a sufficiently large sample size will enable one to have the desired (any) power to reject that null hypothesis, with whatever (any) alpha one might desire.

Of course, with any particular sample,one might 'detect' (reject the corresponding null) an effect when the population mean is less than 10 - that simply being a Type I Error, the risk of which is quantified by the value of alpha.

Kindest Regards,
John

At 10:54 10/02/2022, Abhaya Indrayan wrote:
To detect a minimum mean difference delta = 10 with a power of at least 95% and significance level (one-tail) 0.0001, I get a sample size of a minimum ofÂ  49 when the SD = 13. This sample size almost surely will not miss a mean difference of 10 or more (if present) but can miss if the mean differenceÃ‚ is less than 10.
Agreed, albeit Marc and myself got 56, rather than 49.

The other point is that the sample size calculations are for detecting a specified effect when present. If not present, no big sample size will not be able to detect it. That is my understanding and I hope I am not wrong.
That is obviously true, provided that "not present" means EXACTLY zero.Â  If the effect is non-zero (even if incredibly close to zero) one can get whatever power one wants, with any alpha, to detect the effect if the sample size is large enough.
For example, if one can believe my software with such extreme parameters, with our SD of 13, a power of at least 95% could be achieved to detect an effect of 0.0001 using a one-tailed t-test with alpha=0.0001 with a sample size of N = 5,178,280,000
Kindest Regards,
John

On Thu, Feb 10, 2022 at 10:44 AM Rich Ulrich <rich-...@live.com> wrote:
A power statement has a set of parameters; you GIVE all but
the one you will solve for. You GIVE a test by stating both the
name of the test and the alpha.
Thus, I work from something like this.Ã‚ Ã‚
For a given test and alpha error [one-tailed t-test at p= 0.0001],
what is the sample size [solve for N]
required to GIVE a stated power [95%]
for this GIVEN effect size?Ã‚Â  [Three: 26/13; 20/13; 10/13]
For clinical psychiatric research, I most often provided the PI
with tables for a 5% test at 80% power and 90% or 95% power,
showing Ns for several effect sizes.
By the way, none of my power analyses ever used alpha smaller
than 1%, but I am aware that the t and F distributions are
increasingly LESS accurate in real data for alphas smaller than that.
Does this bother folks who want to use p= 0.0001?
I think the discrepancy in two-group t occurs when there are
short-tail or long-tail distributions in the samples, which (IIRC)
yield the opposite excesses for the t's.Ã‚Â  (On randomly generated