diabetes sample size calculation

ravi rohilla

unread,

May 14, 2012, 1:36:30 PM5/14/12

to meds...@googlegroups.com

Elaborating the problem stated earlier involving the sample size calculation of a study to identify the subjects which come under the risk of Diabetes, i want to calculate the sample size using a risk assessment scale known as Diabetes Risk Scale which includes the basic risk factors assessment like age, family history, physical activity and stress factors. The thing is that what sample size is required to validate the study with diabetes prevalence of 10-13% in the study area and relative precision of 10%. What factors i need to consider to calculate my sample size for a study area of population 20000.

--

Ravi

Dr Ravi Rohilla
Junior Resident, Community Medicine
Postgraduate Institute of Medical Sciences
Rohtak (Haryana) - 124001
Mo: +91 93153 80656

Thompson,Paul

unread,

May 14, 2012, 1:53:05 PM5/14/12

to meds...@googlegroups.com

This is not usually what is used in a sample size calculation. To calculate sample size via power analysis, you need to determine the sort of null and alternative hypotheses. So, if your treatment produces an effect of 100, the null effect is 50, the std dev is 45, you can calculate a power/sample size by assuming one and calculating the other.

You seem to be talking more about subject availability or patient flow – how many subjects will be available in a given area. That’s important, but not to power.

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

Frank Harrell

unread,

May 14, 2012, 2:42:07 PM5/14/12

to meds...@googlegroups.com

Besides power, precision-based sample size calculations are another good approach and has a few advantages.

Frank

Venkata Putcha

unread,

May 15, 2012, 6:27:08 AM5/15/12

to meds...@googlegroups.com

I have done some clinical studies like what Frank was mentioned below the precision-based sample size. Its simple take a required precision on RHS = sample size formula LHS. Supply all the parameters and calculate for "n" is the required sample size. The Z value in the formula is fix for assumed 90%, 95% or 99%.

Best wishes

Venkata

--

To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--

With regards,
Venkata Putcha MSc (Andhra), MPhil (IIPS), Ph.D (Reading)
Felix Fellow, Consultant Statistician, SAS & Health Demographer
Email : putc...@hotmail.com or Venkata...@consultant.com

Skype: putchavr

WWW :

BXC (Bendix Carstensen)

unread,

May 15, 2012, 6:35:00 AM5/15/12

to meds...@googlegroups.com

As always when it comes to sample size and precision:

SIMULATE a number of datasets that looks like the one you anticipate to get.
Analysing them will give you all you need:
- power is the fraction of significant results
- precision is the (average, median) with of the c.i. for the parameter you are interested in.

Simulation guarantees that you have considered all the relevant aspects of your study, if one is missing, you cannot simulate.

Therefore it is the safest to do.

As far as I can see, the only drawback of the simulation approach is that it requires that you have a computer, and know how to analyse your data with it.

Best regards,
Bendix Carstensen

________________________________

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Venkata Putcha
Sent: 15. maj 2012 12:27
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} diabetes sample size calculation

I have done some clinical studies like what Frank was mentioned below the precision-based sample size. Its simple take a required precision on RHS = sample size formula LHS. Supply all the parameters and calculate for "n" is the required sample size. The Z value in the formula is fix for assumed 90%, 95% or 99%.

Best wishes
Venkata

On 14 May 2012 23:06, ravi rohilla <ravikr...@gmail.com> wrote:

Elaborating the problem stated earlier involving the sample size calculation of a study to identify the subjects which come under the risk of Diabetes, i want to calculate the sample size using a risk assessment scale known as Diabetes Risk Scale which includes the basic risk factors assessment like age, family history, physical activity and stress factors. The thing is that what sample size is required to validate the study with diabetes prevalence of 10-13% in the study area and relative precision of 10%. What factors i need to consider to calculate my sample size for a study area of population 20000.

--

Ravi

Dr Ravi Rohilla
Junior Resident, Community Medicine
Postgraduate Institute of Medical Sciences
Rohtak (Haryana) - 124001

Mo: +91 93153 80656 <tel:%2B91%2093153%2080656>

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--

With regards,
Venkata Putcha MSc (Andhra), MPhil (IIPS), Ph.D (Reading)
Felix Fellow, Consultant Statistician, SAS & Health Demographer

Email : putc...@hotmail.com <mailto:putc...@hotmail.com> or Venkata...@consultant.com <mailto:Venkata...@consultant.com>
Skype: putchavr
WWW :

SR Millis

unread,

May 15, 2012, 10:22:14 AM5/15/12

to meds...@googlegroups.com

Bendix,

I agree with you that simulation can be a very useful method in calculating sample size. However, I see it used rarely.

Can you recommend a text or other resources on the topic?

Thanks,

SR Millis

From: BXC (Bendix Carstensen) <b...@steno.dk>
To: "meds...@googlegroups.com" <meds...@googlegroups.com>
Sent: Tuesday, May 15, 2012 6:35 AM
Subject: RE: {MEDSTATS} diabetes sample size calculation

Richard Goldstein

unread,

May 15, 2012, 11:02:31 AM5/15/12

to meds...@googlegroups.com

Scott,

note that for users of Stata, there is both a Stata Journal article
(SJ 2(2):107--124) available and a Stata FAQ on this
( http://www.stata.com/support/faqs/stat/power.html)

Rich

On 5/15/12 10:22 AM, SR Millis wrote:
> Bendix,
>
> I agree with you that simulation can be a very useful method in
> calculating sample size. However, I see it used rarely.
>
> Can you recommend a text or other resources on the topic?
>
> Thanks,
>
> SR Millis
>
>

> ------------------------------------------------------------------------
> *From:* BXC (Bendix Carstensen) <b...@steno.dk>
> *To:* "meds...@googlegroups.com" <meds...@googlegroups.com>
> *Sent:* Tuesday, May 15, 2012 6:35 AM
> *Subject:* RE: {MEDSTATS} diabetes sample size calculation

> <mailto:ravikr...@gmail.com>> wrote:
>
>
> Elaborating the problem stated earlier involving the sample size
> calculation of a study to identify the subjects which come under the
> risk of Diabetes, i want to calculate the sample size using a risk
> assessment scale known as Diabetes Risk Scale which includes the basic
> risk factors assessment like age, family history, physical activity and
> stress factors. The thing is that what sample size is required to
> validate the study with diabetes prevalence of 10-13% in the study area
> and relative precision of 10%. What factors i need to consider to
> calculate my sample size for a study area of population 20000.
>
> --
>
> Ravi
>
> Dr Ravi Rohilla
> Junior Resident, Community Medicine
> Postgraduate Institute of Medical Sciences
> Rohtak (Haryana) - 124001
> Mo: +91 93153 80656 <tel:%2B91%2093153%2080656>
>
>
> --
> To post a new thread to MedStats, send email to

> MedS...@googlegroups.com <mailto:MedS...@googlegroups.com> .

> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules
>
>
>
>
>
> --
>
> With regards,
> Venkata Putcha MSc (Andhra), MPhil (IIPS), Ph.D (Reading)
> Felix Fellow, Consultant Statistician, SAS & Health Demographer
> Email : putc...@hotmail.com <mailto:putc...@hotmail.com>

> <mailto:putc...@hotmail.com <mailto:putc...@hotmail.com>> or
> Venkata...@consultant.com <mailto:Venkata...@consultant.com>
> <mailto:Venkata...@consultant.com

> <mailto:Venkata...@consultant.com>>
> Skype: putchavr
> WWW :
>
>
> --
> To post a new thread to MedStats, send email to

> MedS...@googlegroups.com <mailto:MedS...@googlegroups.com> .

> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules
>
>
> --
> To post a new thread to MedStats, send email to

> MedS...@googlegroups.com <mailto:MedS...@googlegroups.com> .

SR Millis

unread,

May 15, 2012, 12:03:01 PM5/15/12

to meds...@googlegroups.com

Thanks, Rich! Much appreciated!

Scott

~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat®
Board Certified in Clinical Neuropsychology, Clinical Psychology, & Rehabilitation Psychology
Professor
Wayne State University School of Medicine
Email: aa3...@wayne.edu
Email: srmi...@yahoo.com
Tel: 313-993-8085

From: Richard Goldstein <rich...@ix.netcom.com>
To: meds...@googlegroups.com
Sent: Tuesday, May 15, 2012 11:02 AM

Philip Jones

unread,

May 16, 2012, 11:50:02 AM5/16/12

to meds...@googlegroups.com

Dear Bendix,

This approach makes intuitive sense to me, but it seems that the rate-limiting step is figuring out how the data is going to look before the experiment is done (step one where one simulates a number of datasets that look like the one you anticipate to get).

If one knows with sufficient accuracy how the data will look, doesn't that mean a priori that the experiment does not need to be done? Likewise, if one doesn't know accurately how the data will look (for instance, a very skewed distribution), doesn't that mean the simulation is invalid?

Many thanks in advance for clarifying this for me and others!

Phil

On Tuesday, 15 May 2012 06:35:00 UTC-4, b...@steno.dk wrote:

As always when it comes to sample size and precision:

SIMULATE a number of datasets that looks like the one you anticipate to get.
Analysing them will give you all you need:
- power is the fraction of significant results
- precision is the (average, median) with of the c.i. for the parameter you are interested in.

Simulation guarantees that you have considered all the relevant aspects of your study, if one is missing, you cannot simulate.

Therefore it is the safest to do.

As far as I can see, the only drawback of the simulation approach is that it requires that you have a computer, and know how to analyse your data with it.

Best regards,
Bendix Carstensen

________________________________

        From: meds...@googlegroups.com [mailto:medstats@googlegroups.com] On Behalf Of Venkata Putcha
        Sent: 15. maj 2012 12:27
        To: meds...@googlegroups.com
        Subject: Re: {MEDSTATS} diabetes sample size calculation


        I have done some clinical studies like what Frank was mentioned below the precision-based sample size. Its simple take a required precision on RHS = sample size formula LHS. Supply all the parameters and calculate for "n" is the required sample size. The Z value in the formula is fix for assumed 90%, 95% or 99%.

        Best wishes
        Venkata


        On 14 May 2012 23:06, ravi rohilla <ravikr...@gmail.com> wrote:


                Elaborating the problem stated earlier involving the sample size calculation of a study to identify the subjects which come under the risk of Diabetes, i want to calculate the sample size using a risk assessment scale known as Diabetes Risk Scale which includes the basic risk factors assessment like age, family history, physical activity and stress factors. The thing is that what sample size is required to validate the study with diabetes prevalence of 10-13% in the study area and relative precision of 10%. What factors i need to consider to calculate my sample size for a study area of population 20000.

                --

                Ravi

                Dr Ravi Rohilla
                Junior Resident, Community Medicine
                Postgraduate Institute of Medical Sciences
                Rohtak (Haryana) - 124001
                Mo: +91 93153 80656 <tel:%2B91%2093153%2080656>


                                --
                To post a new thread to MedStats, send email to MedS...@googlegroups.com .
                MedStats' home page is http://groups.google.com/group/MedStats .
                Rules: http://groups.google.com/group/MedStats/web/medstats-rules


        --

        With regards,
        Venkata Putcha MSc (Andhra), MPhil (IIPS), Ph.D (Reading)
        Felix Fellow, Consultant Statistician, SAS & Health Demographer

Email : putc...@hotmail.com <mailto:putc...@hotmail.com> or Venkata...@consultant.com <mailto:Venkata.Putcha@consultant.com>

BXC (Bendix Carstensen)

unread,

May 16, 2012, 3:20:05 PM5/16/12

to meds...@googlegroups.com

Hi Phil,
When simulation data you are not making assumptions substantially different from those you do when doing power calculations. In a typical power calculation you will normally make much simpler assumptions about your distributions.

Now suppose that you want to test whether the mean of some measurement in a population is different from 0, say. In order to make a power calculation for a data collection you will have to make assumption about the variance of the measurements, and the alternative (i.e. the true mean). Then you can compute the power for any given sample size algebraically. But you would get exactly the same power if you simulated many thousands of datasets from the distribution under the alternative and analysed them.

Classical power calculations are basically used in situations where the setup is so simple that you algebraically can compute what the simulation would show. And therefore they invite researchers to squeeze their setup into (sometimes over-) simplified setups.

Ideally I would like to see funding bodies require calculations of power and in particular precision by simulation. If researchers in the planning phase is not able to set up assumptions and - this is the crux - demonstrate how they in the simplest situation would analyse the collected data, then they are probably not worth funding.

Note however that this is a biased statement as this will automatically require many more research groups to include qualified statisticians in planning and analysis, i.e. more work opportunities for people like me.
Normally this view is expressed as "heightening the quality of research".

Best regards,
Bendix

________________________________

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Philip Jones
Sent: 16. maj 2012 17:50
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} diabetes sample size calculation

Dear Bendix,

This approach makes intuitive sense to me, but it seems that the rate-limiting step is figuring out how the data is going to look before the experiment is done (step one where one simulates a number of datasets that look like the one you anticipate to get).

If one knows with sufficient accuracy how the data will look, doesn't that mean a priori that the experiment does not need to be done? Likewise, if one doesn't know accurately how the data will look (for instance, a very skewed distribution), doesn't that mean the simulation is invalid?

Many thanks in advance for clarifying this for me and others!

Phil

On Tuesday, 15 May 2012 06:35:00 UTC-4, b...@steno.dk wrote:

As always when it comes to sample size and precision:

SIMULATE a number of datasets that looks like the one you anticipate to get.
Analysing them will give you all you need:
- power is the fraction of significant results
- precision is the (average, median) with of the c.i. for the parameter you are interested in.

Simulation guarantees that you have considered all the relevant aspects of your study, if one is missing, you cannot simulate.

Therefore it is the safest to do.

As far as I can see, the only drawback of the simulation approach is that it requires that you have a computer, and know how to analyse your data with it.

Best regards,
Bendix Carstensen

________________________________

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com <mailto:meds...@googlegroups.com> ] On Behalf Of Venkata Putcha
Sent: 15. maj 2012 12:27
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} diabetes sample size calculation

I have done some clinical studies like what Frank was mentioned below the precision-based sample size. Its simple take a required precision on RHS = sample size formula LHS. Supply all the parameters and calculate for "n" is the required sample size. The Z value in the formula is fix for assumed 90%, 95% or 99%.

Best wishes
Venkata

On 14 May 2012 23:06, ravi rohilla <ravikr...@gmail.com> wrote:

Elaborating the problem stated earlier involving the sample size calculation of a study to identify the subjects which come under the risk of Diabetes, i want to calculate the sample size using a risk assessment scale known as Diabetes Risk Scale which includes the basic risk factors assessment like age, family history, physical activity and stress factors. The thing is that what sample size is required to validate the study with diabetes prevalence of 10-13% in the study area and relative precision of 10%. What factors i need to consider to calculate my sample size for a study area of population 20000.

--

Ravi

Dr Ravi Rohilla
Junior Resident, Community Medicine
Postgraduate Institute of Medical Sciences
Rohtak (Haryana) - 124001
Mo: +91 93153 80656 <tel:%2B91%2093153%2080656>

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

MedStats' home page is http://groups.google.com/group/MedStats <http://groups.google.com/group/MedStats> .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules <http://groups.google.com/group/MedStats/web/medstats-rules>

--

With regards,
Venkata Putcha MSc (Andhra), MPhil (IIPS), Ph.D (Reading)
Felix Fellow, Consultant Statistician, SAS & Health Demographer

Email : putc...@hotmail.com <mailto:putc...@hotmail.com> or Venkata...@consultant.com <mailto:Venkata...@consultant.com <mailto:Venkata...@consultant.com> >

Skype: putchavr
WWW :

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

MedStats' home page is http://groups.google.com/group/MedStats <http://groups.google.com/group/MedStats> .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules <http://groups.google.com/group/MedStats/web/medstats-rules>

Peter Flom

unread,

May 16, 2012, 3:36:52 PM5/16/12

to meds...@googlegroups.com

Hi Bendix et al.

What you say certainly makes sense.

However, in my experience, it is difficult enough to get my clients to even
suggest some reasonable effect size. To get them to suggest sensible
criteria for other values is even harder. Even for something such as a OLS
regression (i.e. not something complex) to simulate the data you would need
to have guesses for the distributions of all the variables. For a more
complex situation (e.g. a multilevel model) there are more assumptions to be
made.

I agree that "going with defaults" that are used in a power analysis program
just shoves the problem under the rug.

But what methods have people on this list used to elicit guesses from
investigators?

Also, I think that many investigators will be uncomfortable with the
simulation approach because it isn't "from a program" or "from a book". Have
others found this to be the case, and if so, what have you done?

Peter

Peter Flom
Peter Flom Consulting
http://www.statisticalanalysisconsulting.com/
http://www.IAmLearningDisabled.com

Thompson,Paul

unread,

May 16, 2012, 3:39:34 PM5/16/12

to meds...@googlegroups.com

I try to get investigators to tell me what they expect to get out of the project. I am not clear what an effect size is in many cases. But we may know a reasonable guess for a SD, and we can guess on means, often.

BXC (Bendix Carstensen)

unread,

May 16, 2012, 4:23:21 PM5/16/12

to meds...@googlegroups.com

This is where simulation has become handy for me.
If your colleagues (remember that term, it's never going to fly unless you consider yourself a responsible PART of the research team) have difficulties coming up with effect sizes, you invent some yourself, and set up a couple of simulations, telling what sort of precision you are likely to get.

That will often prove very useful for a better discussion and then a second iteration.

When you are working with precision as the half width of the c.i.; that is the c.i. is \pm prc, where prc is what we call the precision here.
Then note that a set-up with a precisions of prc has 50% power of detecting an effect size of prc. This is because if the true effect is prc, then there is about 50% chance of getting an estimate less than prc (which will be insignificant with a precision of prc) and 50% chance of getting an estimate larger than prc which will be significant. Incidentally, this simple observation often makes non-statisticians better understand why a clear-cut significance level of say 5% is rather goofy.

b.r.
Bendix

Philip Jones

unread,

May 16, 2012, 7:08:55 PM5/16/12

to meds...@googlegroups.com

Dear Bendix,

That is a very useful reply! Thank you for taking the time to write it.

Regards,

Phil

Peter Flom

unread,

May 17, 2012, 1:18:51 PM5/17/12

to meds...@googlegroups.com

Thanks! This is helpful

Bruce Weaver

unread,

May 17, 2012, 3:30:40 PM5/17/12

to MedStats

I tried sending this earlier, but it has not yet appeared. So
apologies if it eventually appears twice.

I am curious about this warning in that Stata FAQ:

Warning: If there are covariates, make sure they remain fixed
throughout all N iterations of the simulation! Do not regenerate them
each time.

Here's why I am curious about it. I think another approach one could
take is to simulate a very large population that has the desired
characteristics (including specified correlations among the
variables), and then draw a large number of random samples from that
population, running the model each time. With this approach, the
values of the covariates would not be fixed, but would vary from
sample to sample.

Any comments?

Thanks,
Bruce

On May 15, 11:02 am, Richard Goldstein <richg...@ix.netcom.com> wrote:
> Scott,
>
> note that for users of Stata, there is both a Stata Journal article
> (SJ 2(2):107--124) available and a Stata FAQ on this

> (http://www.stata.com/support/faqs/stat/power.html)

> > On 14 May 2012 23:06, ravi rohilla <ravikrrohi...@gmail.com

> > Email : putch...@hotmail.com <mailto:putch...@hotmail.com>
> > <mailto:putch...@hotmail.com <mailto:putch...@hotmail.com>> or
> > Venkata.Put...@consultant.com <mailto:Venkata.Put...@consultant.com>
> > <mailto:Venkata.Put...@consultant.com
> > <mailto:Venkata.Put...@consultant.com>>

Munya Dimairo

unread,

May 17, 2012, 3:49:29 PM5/17/12

to MedStats

I agree with Bruce here. My view is that participants' characteristics are also random variables in sampling. Fixing them across all simulation doesn't seem right to me. Every time you draw a random sample from a population you get a different representation and that's the reality.

If you fix participants' characteristics then it means their outcomes should be fixed as well. I don't see a reason why the outcome varies for a fixed participants' characteristics.

I smell something dodgy here folks.

Am I getting something wrong?

Munya
Sent using BlackBerry® from Orange

Reply all

Reply to author

Forward