Comparing two proportions with small sample size

Munyaradzi Dimairo

unread,

Jan 10, 2010, 2:10:29 PM1/10/10

to meds...@googlegroups.com

Dear All

Can someone help me on this; The proportion test (prtesti in STATA) rely on normal asymptotic theory and gives robust results with increased sample size and sucesses within each group. With this in mind i want to compare 39/182 vs 3/34 and am really worried about the second proportion which lies on the boundary of the binormial distribution, and its standard error might be poorly estimated.

Is there any exact test to test difference between proportions similar to the bitesti command (STATA)??

with many thanks

Munya

--
Munyaradzi Dimairo (MSc)

Martin Holt

unread,

Jan 10, 2010, 3:03:12 PM1/10/10

to meds...@googlegroups.com

Hi Munya,

I recommend you take a look at Professor Robert Newcombe's pages, of the Heath Hospital, Cardiff, South Wales. As well as having a number of publications in this area, he has webpages that use the methods he recommends that you fill in, and out comes the answer.

Best Regards,

Martin Holt

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Munyaradzi Dimairo

unread,

Jan 10, 2010, 3:11:48 PM1/10/10

to meds...@googlegroups.com

Hi Martin

Thanks so much for the link to Prof Newcombe's work, from what i am getting at the moment it seems I will figure it out very soon!

with mamy thanks

Munya

--
Munyaradzi Dimairo (MSc)

Martin Holt

unread,

Jan 10, 2010, 4:45:35 PM1/10/10

to meds...@googlegroups.com

Hi Munya,

It's the _expected_ value that is important (in a chi-square test). Another good reference is Ian Campbell http://www.iancampbell.co.uk/ who has researched the history....30 - odd tests....but this can be summarised as

(1) Where all expected numbers are at least 1, analyse by the 'N - 1' chi-squared test (the K. Pearson chi-squared test but with N replaced by N - 1).
(2) Otherwise, analyse by the Fisher-Irwin test, with two-sided tests carried out by Irwin's rule (taking tables from either tail as likely, or less, as that observed).

There is an online calculator

http://www.iancampbell.co.uk/twobytwo/calculator.htm

for the 'N-1' chi-squared test.

I think that's a bit more explicit !

Best Wishes,

Martin

mdim...@gmail.com

unread,

Jan 10, 2010, 4:57:08 PM1/10/10

to meds...@googlegroups.com

Thanks Martin, its another interesting way to look at it. I will give it a closer look and give you a feedback!

Thank you

Munya
Sent using BlackBerry® from Orange

Barry Brown

unread,

Jan 10, 2010, 5:17:37 PM1/10/10

to meds...@googlegroups.com

    HERE are some results from a program that we distribute. This
is for the stated data of 39/182 versus 3/34.

        1-sided test                    2-sided test
          p1>p2

        Fisher's Exact                 Fisher's Exact

        0.0641                         0.1018

        Chi-Square                     Chi-Square

        0.0441                         0.0882

        Exact (Fisher                  Exact (Fisher
        Sort Order)                    Sort Order)

        0.0448                         0.0888

    The program is called EXTSIG and is available free (Mac and Windows
    executables, Fortran 95 source) from
         http://biostatistics.mdanderson.org/SoftwareDownload

    The following paragraph describes the working of the exact test.

          --------------------------------------------------

         Logically, the exact test is performed   by   first sorting all
     possible outcomes of the sample space. The sample space is the set of
     all possible number of successes in the two groups. The sort order
     places the points with maximal evidence against the null hypothesis
     that p1 = p2 at the top of the sorted list. Then assuming that the
     probability of   a success in the   two groups   is the   same, the
     probability of the points in the sorted sample space is summed from
     the top of the list to the observed data point. This sum is the
     significance level   of the observed   outcome assuming the   common
     probability of success.   The maximum of this significance level over
     common probabilities of 0.01, 0.02, ..., 0.99 is the reported level.
     Computationally, the sort is not required, only a determination of
     whether any outcome will sort above, below, or at the same level as
     the observed point.

          --------------------------------------------------

The set of all outcomes from group 1 sample size of 182 and a group 2
sample size of 34. Let S be the set of outcomes for which the Fisher
exact p-value is <= the Fisher exact p-value for the observed sample.
The p-value is computed for each p assumming p1=p2=p and the maximum
over p taken.

This provides an 'exact' test for any sort order of outcomes -- hence
for and statistic. Fisher's p-value is used here because it works
fairly well. However, there is no best sort order else there would be
a uniformly most powerful test which there is NOT.

--
Barry W Brown
Professor, Biostatistics and Applied Mathematics
M. D. Anderson Cancer Center

Munyaradzi Dimairo

unread,

Jan 10, 2010, 5:56:51 PM1/10/10

to meds...@googlegroups.com

Thanks Barry for giving me a clear insight to an interesting way how your program works. I was really interested with the background statistical aspects of the test and i finally figured out where i was going wrong. Thank you all for the help!

with thanks

Munya

Sent using BlackBerry® from Orange

John Uebersax

unread,

Jan 11, 2010, 8:19:16 PM1/11/10

to MedStats

Hi Munya,

The StatXact manual could probably be taken as an up-to-date and
'authoritative' reference on small sample statistics. It says:

“Fisher, Pearson and Likelihood Ratio Conditional Tests. Most
statisticians automatically pick one or another of these three exact
tests for p-value computations on a single 2 × 2 table. For one-sided
tests Davis (1986) has shown that the p-values computed by all three
methods are the same. For two-sided tests there can be differences.
Pearson and Fisher tend to have the same power and slightly higher
power than likelihood ratio in most designs, while in some cases,
perhaps characterized by heavily unbalanced designs, likelihood ratio
has highest power (Lydersen and Laake (2003), Kang and Kim (2004)).
Choosing any one of these tests implies that you accept the
statistical concept of conditional inference.” (StatXact Version 8
Manual, 2007; Chapter 17, Two Independent Binomial Samples, p. 287)

So it appears the other options mentioned in the manual are not
significantly better than the familiar Fisher exact test. Most
software packages (including Stata) will compute the Fisher exact
test.

Hope this helps.

References cited:

Davis, Linda June (1986).
Exact tests for 2 × 2 contingency tables.
The American Statistician, Vol. 40, No. 2 (May, 1986), pp. 139-141

Kang SH, Kim SJ (2004).
A comparison of the three conditional exact tests in two-way
contingency tables using the unconditional exact power.
Biometrical Journal, 46; 3, 320-330.

Lydersen S, Laake P (2003).
Power comparison of two-sided exact tests for association in
contingency tables using standard, mid p, and randomized test
versions.
Statistics in Medicine, 22, 24; 3859-3871.

John Uebersax PhD
http://www.john-uebersax.com

Munyaradzi Dimairo

unread,

Jan 12, 2010, 7:32:05 AM1/12/10

to meds...@googlegroups.com

Hi John;

Thanks for the references and they are quite helpful. Hope to get a clear view after this!

with many thanks

Munya

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--
Munyaradzi Dimairo (MSc)

zhon...@aol.com

unread,

Jan 13, 2010, 12:22:43 PM1/13/10

to meds...@googlegroups.com

Dear Group,

I am doing sample size and power computing.

p1=20% for control group is given, p2 for inervention group, hoping p2 < p1.

How much the difference should be detected? 10%? or 15%? or totally subjective?

Frank

Swank, Paul R

unread,

Jan 13, 2010, 12:27:21 PM1/13/10

to meds...@googlegroups.com

There is a statistical effect size based on an arcsin transformation (see Cohen, 1988). However, it is better to use what is seen as an important difference in the particular field, also known as clinical effect size. SO what are the ramifications of a 10% difference?

Paul

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

mdim...@gmail.com

unread,

Jan 13, 2010, 12:30:22 PM1/13/10

to meds...@googlegroups.com

Hi;

I think this is more of a clinical decision which is difficult to answer blindly. Is this a new intervention? If not I would recommend you to look on previous trials and do a meta analysis to get an idea on this!!

Munya
Sent using BlackBerry® from Orange

-----Original Message-----

From: zhon...@aol.com
Date: Wed, 13 Jan 2010 12:22:43
To: <meds...@googlegroups.com>
Subject: {MEDSTATS} How much should be detected? Comparing two proportions,
sample size and power

zhon...@aol.com

unread,

Jan 13, 2010, 12:38:22 PM1/13/10

to meds...@googlegroups.com

Any help will be greatly appreciated. I am reading a paper by Wei Gao,
Satoshi Kuriki, Journal of
multivariate analysis 97 (2006) 1330-1341, Title: Testing marginal
homogeneity against staochastically ordered marginals for rxr
contingency tables. I don't know how to get the level probability
first and then the p-value=0.1588 on page 1335, line -7 and -8. ---
Frank

Gao 2006.pdf

Peter Flom

unread,

Jan 13, 2010, 2:04:02 PM1/13/10

to meds...@googlegroups.com

Hi Frank

OK, so, based on other research about this, or something similar, how much effect does passing a law have?
Or, if no such data exists, how big an effect would be interesting?

Since it's about public health, what amount of reduction would lawmakers care about?

It's still a substantive question, and you surely know more about the area than I do

Peter

-----Original Message-----
From: zhon...@aol.com
Sent: Jan 13, 2010 1:57 PM
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} How much should be detected? Comparing two proportions, sample size and power

Hi Peter,

Thank you very much for your help!

It is about public health law research. MRSA rate in each state is the percentage, we should detect this.

We assume the MRSA rate is 20% if no such public health law in a state. The intervention is the law.

Frank

-----Original Message-----
From: Peter Flom <peterflom...@mindspring.com>
To: meds...@googlegroups.com
Sent: Wed, Jan 13, 2010 10:34 am
Subject: Re: {MEDSTATS} How much should be detected? Comparing two proportions, sample size and power

From: zhon...@aol.com wrote
<<<

I am doing sample size and power computing.

p1=20% for control group is given, p2 for inervention group, hoping p2 < p1.

How much the difference should be detected? 10%? or 15%? or totally subjective?
>>>

How much of what? With what intervention? this is entirely context specific. It's not *subjective* exactly,
it's just not a statistical question

Peter

Peter L. Flom, PhD Statistical Consultant Website: http://www DOT statisticalanalysisconsulting DOT com/ Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom


Peter L. Flom, PhD
Statistical Consultant
Website: http://www DOT statisticalanalysisconsulting DOT com/
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

Steve Simon, P.Mean Consulting

unread,

Jan 13, 2010, 5:53:23 PM1/13/10

to meds...@googlegroups.com

zhon...@aol.com wrote:

In medicine, this is known as the minimum clinically significant
difference. It is the boundary between a difference so small that no one
would adopt the new intervention on the basis of such a meager changer
and a difference large enough to make a difference (that is, to convince
people to change their behavior and adopt the new therapy).

Establishing the minimum clinically relevant difference is a tricky
task, but it is something that should be done prior to any research study.

For binary outcomes, the choice is not too difficult in theory. Suppose
that an intervention "costs" X dollars in the sense that it produces
that much pain, discomfort, and inconvenience, in addition to any direct
monetary costs. Suppose the value of a cure is kX where k is a number
greater than 1. A number less than 1, of course, means that even if you
could cure everyone, the costs outweigh the benefits of the cure.

For k>1, the minimum clinically significant difference in proportions is
1/k. So if the cure is 10 times more valuable than the costs, then you
need to show at least a 10% better cure rate (in absolute terms) than no
treatment or the current standard of treatment. Otherwise, the cure is
worse than the disease.

It helps to visualize this with certain types of alternative medicine.
If your treatment is aromatherapy, there is almost no cost involved, so
even a very slight probability of improvement might be worth it. But
Gerson therapy, which involves, among other things, coffee enemas, is a
different story. An enema is reasonably safe, but is not totally risk
free. And it involves a substantially greater level of inconvenience
than aromatherapy. So you'd only adopt Gerson therapy if it helped a
substantial fraction of patients. Exactly how many depends on the dollar
value that you place on having to endure a coffee enema, which I will
leave for someone else to quantify.

If there are side effects associated with the treatment that only occur
in a fraction of the patients receiving the treatment, then the
calculations are a bit trickier, but still possible in theory.

You explained in a later email that the intervention is passing a law.
Ask a politician how much change they would need to see in order to
justify passing the law, and that becomes your minimum clinically
significant difference.

Of course, no one does this, so typically they use a SWAG (if you don't
know this acronym, you'll have to look it up).
--
Steve Simon, Standard Disclaimer
"The first three steps in a descriptive
data analysis, with examples in PASW/SPSS"
Thursday, January 21, 2010, 11am-noon, CST.
Free to all! Details at www.pmean.com/webinars

John Whittington

unread,

Jan 14, 2010, 8:27:40 AM1/14/10

to meds...@googlegroups.com

Whilst I agree, in mathematical terms, with what Steve has written, I think
there are some potential confusions in what he has said, and not much said
about the serious difficulties associated with trying to implement the
concepts described. Just a few, fairly unstructured, ramblings ......

Perhaps my greatest concern is that, having given a good definition of
"minimum clinically significant difference", indicated that it can be
tricky to establish, and saying (with which I would agree) that one should
nevertheless always try to do this, Steve then devotes most of the rest of
what he writes to the matter of cost-benefit (and/or risk-benefit) issues -
which I personally regard as a very different matter.

To my mind, the design of most research should (and usually does) keep
these two concepts separated. Hence, most studies are designed to detect
an effect which is at least as great as the "minimum clinically significant
difference" (which, as Steve says, means exactly what it says), regardless
of 'costs' - particularly when "direct monetary costs" are part of the
overall 'cost' being considered. First, one wants to know whether a
treatment is 'clinically useful'; if it is, then one subsequently has to
look at 'costs' (monetary and otherwise) in order to decide (on the basis
of a whole host of considerations) whether the clinical usefulness
outweighs the 'costs'.

Next, in terms of the simple examples given (relating essentially to
cost-benefit assessment), the very difficult (many would say
'next-to-impossible') problem is the need to ascribe monetary values (or
some other unified measure) to non-monetary costs - like the "pain,
discomfort and inconvenience" mentioned by Steve, but sometimes even things
like a risk of death. At best, this is pretty arbitrary - and consensus
difficult to achieve given the degree of human diversity.

I think that one of the greatest problems with the mathematical approaches
which we are obliged to take to these situations is that 'statistics'
obviously majors on the concept of what happens 'on average', and on
probabilistic information - whereas the real world concerns individual
patients and their clinicians. Steve touches on this (indicating that
calculations may be 'a bit trickier'), the most common situation being when
major 'costs' (such as side effects) can only be handled in probabilistic
terms - and the problems even greater when the outcome is not a simply
binary one. This is considerably further complicated by the fact that
different patients (and clinicians) will have very different views of what
one might call 'utility'
(i.e. the degree of risk-averseness) - and therefore will have different
views of situations in which one has to balance the probability of benefit
against the probability of harm. In the real world, it does not even
necessarily follow (as suggested by Steve in his example) that a given
patient or clinician will reject a treatment because ('on average') the
costs outweigh the benefits. A treatment which is ('on average') 'more
likely to kill than cure' can, in some people's minds, represent an
acceptable risk in certain situations.

For all these reasons, I think that it is best to first look to determine
whether a treatment achieves the "minimum clinically significant effect",
per se, and then subsequently try to grapple with the complex and partially
subjective issues of cost-benefit balance.

Returning to the issue of "minimum clinically significant difference",
although I rarely see this said, I would suggest that it can in some cases
be unethical to use that as the basis of a sample size estimation for a
study. If there is (as will often be the case) good reason to believe that
a treatment will result in effects considerably greater than the minimum
that would be considered clinically significant, then (quite apart from
cost/time considerations etc.) it would seem difficult to ethically justify
exposing (to the test treatment) the large number of subjects that would be
required if one were designing to be able to detect an effect which was
only "the minimum clinically significant".

That's how I see it, anyway,

Kind Regards,
John

John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

Peter Flom

unread,

Jan 14, 2010, 8:59:31 AM1/14/10

to meds...@googlegroups.com

John Whittington <Joh...@mediscience.co.uk> wrote a great deal that I agree with, and finished with

>Returning to the issue of "minimum clinically significant difference",
>although I rarely see this said, I would suggest that it can in some cases
>be unethical to use that as the basis of a sample size estimation for a
>study. If there is (as will often be the case) good reason to believe that
>a treatment will result in effects considerably greater than the minimum
>that would be considered clinically significant, then (quite apart from
>cost/time considerations etc.) it would seem difficult to ethically justify
>exposing (to the test treatment) the large number of subjects that would be
>required if one were designing to be able to detect an effect which was
>only "the minimum clinically significant".
>

Indeed. But is this really a problem that comes up?

If we are fortunate enough to have good reason to believe there will be a large effect,
then all the researchers I've dealt with would be delighted to be told that they
need only a small N.

The problem is what to do when there is no reason to suspect some particular effect size.

Usually, the dialogue between me and the person requesting a power analysis consists of
me asking for estimates of things and my client telling me to take my best guess ...

John Whittington

unread,

Jan 14, 2010, 9:30:46 AM1/14/10

to meds...@googlegroups.com

At 08:59 14/01/2010 -0500, Peter Flom wrote:
>John Whittington <Joh...@mediscience.co.uk> wrote a great deal that I
>agree with, and finished with
> >Returning to the issue of "minimum clinically significant difference",
> >although I rarely see this said, I would suggest that it can in some cases
> >be unethical to use that as the basis of a sample size estimation for a
> >study. If there is (as will often be the case) good reason to believe that
> >a treatment will result in effects considerably greater than the minimum
> >that would be considered clinically significant, then (quite apart from
> >cost/time considerations etc.) it would seem difficult to ethically justify
> >exposing (to the test treatment) the large number of subjects that would be
> >required if one were designing to be able to detect an effect which was
> >only "the minimum clinically significant".
> >
>
>Indeed. But is this really a problem that comes up?
>
>If we are fortunate enough to have good reason to believe there will be a
>large effect,
>then all the researchers I've dealt with would be delighted to be told
>that they
>need only a small N.

Yes, of course - and if it were me that was being asked to undertake a
sample size estimation, it would not be a problem - because I always ask
about the expected magnitude of effect as well as the "minimum clinically
significant" effect magnitude. I would hope that this is what everyone
does but, as I wrote, I have rarely seen this said/written. It seems
almost universal that anything written or taught about sample size
estimation talks in terms of the "minimum clinically significant effect",
without any mention of the possible scenario I was discussing.

As for whether the issue comes up, it certainly does in my experience. For
obvious reasons, it is most likely to arise in relation to
placebo-controlled trials, and particularly when there is no 'pre-existing
treatment' available for whatever condition is being studied. In that
latter situation, almost any degree of effect would probably be regarded as
clinically significant, but the mechanism of action of the treatment might
be such that a very high degree of effect was expected. As an example,
consider a new antibiotic for treating a currently untreatable (maybe
because of the development of resistance to all existing drugs) serious
infection; in that situation, virtually any degree of true efficacy would
probably be regarded as "clinically significant" (i.e. 'better than
nothing"), but if pre-clinical studies had indicate a high level of
activity against the pathogen involved, there would be good reason to
expect a high level of efficacy.

>The problem is what to do when there is no reason to suspect some
>particular effect size.

Of course, and that's the common situation, in which one does need the
concept of "minimum clinically significant effect" - but, as I said, that
concept, per se, has not (at least in my mind) got anything to do with the
'cost' of the treatment.

>Usually, the dialogue between me and the person requesting a power
>analysis consists of
>me asking for estimates of things and my client telling me to take my best
>guess ...

Indeed so - although that is, at least in my experience, much more common
in relation to measures of variability than of the magnitude of effect.

Kind Regards,

Reply all

Reply to author

Forward