Confidence interval for ratio of independent binomial proportions

289 views
Skip to first unread message

John Uebersax

unread,
Nov 13, 2009, 3:15:00 PM11/13/09
to MedStats
Can anyone give advice on the state of the art for estimating the
confidence interval for the ratio of two independent binomial
proportions (e.g., CI for a likelihood ratio)?

An FDA guidance seems to be recommending 'exact' confidence intervals
here, though I'm not sure what they mean -- maybe something available
with StatXact.

However StatXact 8 (p. 336 f. in the manual) has at least three
different methods for estimating the CI of a ratio of independent
binomial proportions, and their results don't always agree.

Possibly complicating things, I have data with large denominators and
small numerators (i.e., for each of the two individual proportions
whose ratio is considered). My test data required large cpu times
(e.g., 4 minutes).

Basically I have two questions:

1. Is there consensus that 'exact' CIs should be used here if
possible?
2. With large denominators and small numerators, can asymptotic
formulas can be used instead of exact methods; or would the small
numerators control the decision?
3. Is there a recent review of this general topic (exact tests for a
ratio of independent proportions)?

Any suggestions or pointers would be appreciated.

Ray Koopman (if it's the same person) has published in this area; I
wonder if he's following the discussion group.

--
John Uebersax PhD
http://www.john-uebersax.com

Bruce Weaver

unread,
Nov 13, 2009, 3:49:45 PM11/13/09
to MedStats
Hi John. I expect you'll find something useful on Robert Newcombe's
"Resources" page. Mind the line-wrap.

http://www.cardiff.ac.uk/medic/aboutus/departments/primarycareandpublichealth/ourresearch/resources/index.html

Cheers,
Bruce

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Ray Koopman

unread,
Nov 14, 2009, 2:24:57 AM11/14/09
to MedStats
If I did I don't remember it. Do you have a reference?

John Uebersax

unread,
Nov 14, 2009, 12:10:30 PM11/14/09
to MedStats
Hi Ray,

Koopman PAR. Confidence limits for the ratio of two binomial
proportions. Biometrics 1984;40:513-517.

John Uebersax

Pedro Emmanuel Alvarenga Americano do Brasil

unread,
Nov 14, 2009, 1:11:49 PM11/14/09
to meds...@googlegroups.com
I cant say that this the most recomended but there is a formula in Zhou book - chapter  four I think. I serached in google books and did not find it. This books has a website and a software available form its author website. Also there are at least two functions in R that inplemented this in binary tests.

http://www.amazon.com/Statistical-Diagnostic-Medicine-Probability-Statistics/dp/0471347728


Abraço forte e que a força esteja com você,

Dr. Pedro Emmanuel A. A. do Brasil
Instituto de Pesquisa Clínica Evandro Chagas
Fundação Oswaldo Cruz
Rio de Janeiro - Brasil


Ted Harding

unread,
Nov 14, 2009, 1:50:45 PM11/14/09
to meds...@googlegroups.com
Going back to John Ubersax's original query, he wrote:

"Possibly complicating things, I have data with large
denominators and small numerators (i.e., for each of
the two individual proportions whose ratio is considered).
My test data required large cpu times (e.g., 4 minutes)."

On those circumstances, it is very plausible (though I can
not be definitive with knowing the details more precisely)
that a very adequate approximation may be to treat the two
counts (r1 and r2) as having Poisson distributions with
means n1*p1 and n2*p2 respectively, where n1 and n2 are the
two denominators, and p1 and p2 are the two proportions.

Now, it is much more straightforward to approach the question
of the ratio of two Poisson means than the ratio of two
binomial proportions.

Namely, conditional on the value of R = (r1 + r2), where
r1 & r2 have Poisson distributions with means mu1 & mu2,
the distribution of r1, given R, is binomial with n = R
and binomial proportion p = mu1/(mu1 + mu2) = rho/(1 + rho)
where rho = mu1/mu2 is the ratio of the two Poisson means.

Hence you can obtain a confidence interval (by an exact
method if you wish, especially if r1 & r2 are small) for
this binomial proportion p, and then derive algebraically
a confidence interval for p1/p2. Namely:

p = rho/(1 + rho) so rho = p/(1 - p)

so if L.p and U.p are the lower & upper confidence limits for p,
then the lower and upper limite for rho are

L.rho = L.p/(1 - L.p) U.rho = U.p/(1 - U.P)

and, since rho = mu1/mu2 = (n1*p1)/(n2*p2), p1/p2 = (n2/n1)*rho.

Hence the lower and upper confidence limits for p1/p2 are:

L.[p1/p2] = (n2/n1)*L.rho = (n2/n1)*L.p/(1 - L.p)

U.[p1/p2] = (n2/n1)*U.rho = (n2/n1)*U.p/(1 - U.p)

The only inaccuracy involved in this approach (certainly if an
exact method for the binomial CLs L.p and U.p is used) lies in
the use of the Poisson approximation to the binomial probabilities
for r1 and r2. How close this is depends on n1, p1, n2, p2; and
this can be inferred from r1, n1, r2, n2.

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 14-Nov-09 Time: 18:50:39
------------------------------ XFMail ------------------------------

Ray Koopman

unread,
Nov 14, 2009, 8:26:11 PM11/14/09
to MedStats
Sorry, John, but I'm R.F., not P.A.R.

Thomas Keller

unread,
Nov 16, 2009, 2:45:36 AM11/16/09
to MedStats
John,

(this is not a mathematical/scientific but a formal answer)

CLSI guideline EP12-A2 "User protocol for Evaluation of Qualitative
test performance" says:
"Alternatively, exact confidence intervals (Clopper-Pearson method)
for sensitivity and specificity can be computetd from the binomial
distribution ...".

Clopper-Pearson method is proposed for users who have the capability
of calculating these exact CI.

The term "Alternatively" refers to situation, where normal
distribution approximation (which is the first proposal of the
guideline) is not appropriate as it is the case for proportions near
zero or 100%.

The final recommendation of guideline, however, is the "score
confidence interval" attributed to Wilson (1927) which is described in
Agresti A, Coull BA, The Americ. Stat. 1998, 52: 119-126; Newcombe RG,
Stat Med. 1998, 17: 857-872 and a book of Altman DG et al. Statistics
with Confidence, 2000

Because FDA is particating in setup of CLSI guidelines, you might use
the Clopper-Pearson method and refer to this guideline.

Kind regards
Thomas

Thomas Keller
ACOMED statistik
Leipzig, Germany
www.addstat.com

John Uebersax

unread,
Nov 16, 2009, 10:44:47 AM11/16/09
to MedStats
Thank you to all for your replies.

Ray,

Sorry for the confusion. I hadn't looked at P.A.R.'s initials before
making the post; I had just seen the article referred to by the last
name of the author. I don't what in particular made me think it was
you (but consider it a compliment!).

Thomas,

Yes, I've seen CLSI guidance EP12-A2. That's a different guidance
from the one I mean. The passage you quote suggests methods for the
CI of a single binomial proportion. For this I use a Wilson score
interval.

The document I'm referring to is an FDA draft guidance, Establishing
the Performance Characteristics of In Vitro Diagnostic Devices for the
Detection or Detection and Differentiation of Human Papillomaviruses.

http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm181509.htm

John Uebersax
http://www.john-uebersax.com

John Uebersax

unread,
Nov 16, 2009, 11:11:19 AM11/16/09
to MedStats
Thanks, Pedro.

> I cant say that this the most recomended but there is a formula in Zhou book
> - chapter four I think. I seached in google books and did not find it.

Yes, the formulas for asymptotic estimates of the standard error are
on p. 107 of Zhou, McClish and Obuchowski.
The authors also suggest that a score-type estimate may be better,
giving two references:

Centor (1992). Estimating confidence intervals of likelihood ratios.
Med. Dec. Making 12:229-233.

Gart & Nam (1988). Approximate interval estimation of the ratio of
binomial parameters. A review and corrections for skewness.
Biometrics 44: 323-328.

The same asymptotic formulas are given in Margaret Pepe's book (p.
24):

Pepe, Margaret S. The statistical evaluation of medical tests for
classification and prediction. 2004.
http://books.google.com/books?id=kMyXEJEtFmkC&pg=PA24
http://www.amazon.com/Statistical-Evaluation-Medical-Classification-Prediction/dp/0198565828/

BTW, some might be interested to know that SAS proc freq computes the
asymptotic estimates for a 2x2 table via the option RELRISK.

John Uebersax
http://www.john-uebersax.com



John Uebersax

unread,
Nov 16, 2009, 11:25:12 AM11/16/09
to MedStats
Thomas,

> Clopper-Pearson method is proposed for users who have the capability
> of calculating these exact CI.

It's probably worth mentioning that the Clopper-Pearson intervals are
not exact in the modern sense (as in permutation-based methods).
George Cassella recommends the latter (permutation) exact approach in
his comment to:

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta, Interval
Estimation for a Binomial Proportion, Statistical Science 2001, Vol.
16, No. 2, 101–133.
http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf

Personally I don't think the Clopper-Pearson interval should be called
an "the exact CI for a proportion", because this only invites
confusion by non-statisticians.

Also, computing the CP intervals is not difficult; it can be easily
done in Excel using the FINV function, or even more easily using the
BETAINV function.

John Uebersax
http://www.john-uebersax.com

Barry W Brown

unread,
Nov 16, 2009, 4:47:56 PM11/16/09
to MedStats

Free Code for Clopper Pearson CI and requisite sample size
We provide two programs relevent to Clopper-Pearson confidence limits.
The first (BP1CI) calculates these intervals; the second (CONFINT)
calculates the sample size necessary to have a specified probability
that the length of the interval is not larger than a particular
amount.

The programs are written in standard Fortran95 and come with source
and Windows32 and Mac executables. The executables were compiled with
the free g95 compiler. The programs are free (open source, no charge)
and we encourage redistribution. They can be downloaded from:
http://biostatistics.mdanderson.org/SoftwareDownload

Here is a sample run of each of the two programs.

Barry W Brown
Professor, Biostatistics &
Applied Mathematics
U Texas MD Anderson Cancer
Center

bwb...@mdanderson.org


&&&&&&&&&&&&&&&&&&&& BEGIN EXAMPLE &&&&&&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&&&&&&& BP1CI &&&&&&&&&&&&&&&&&&&&

bp1ci.exe
BP1CI
Version 2.0: May 2008

Confidence Intervals for One-Sample
Binomial and Poisson Trials

Barry W. Brown
Floyd M. Spears
Dan M. Serachitopol

Copyright 2008 for:
The University of Texas M.D. Anderson Cancer Center
Department of Biomathematics, Box 237
1515 Holcombe Boulevard
Houston, TX 77030 (USA)


Contact: BWB at above address or b...@mdanderson.org

This program can be freely copied and (noncommercially)
distributed.

Press the Return or Enter key to continue ...


Enter '0' to exit this program
'1' to change the setup (confidence level, binomial or
Poisson
distribution, method of entry of binomial data)
Current settings:
Distribution - Binomial
Confidence level - 95.0%
Binomial entry method -
Number of Successes, Number of Failures
'2' to calculate a confidence interval
> 2

Enter number of successes then the number of failures. Separate
the two entries with a space.

> 10 15
======================================================================
Data

N Successes: 10.0 N Failures: 15.0 N Total:
25.0

95.0% Confidence Interval and Estimate of the Probability of
Success

Low Bound: 0.2113 Estimate: 0.4000 High Bound: 0.6133

======================================================================

Enter '0' to exit this program
'1' to change the setup (confidence level, binomial or
Poisson
distribution, method of entry of binomial data)
Current settings:
Distribution - Binomial
Confidence level - 95.0%
Binomial entry method -
Number of Successes, Number of Failures
'2' to calculate a confidence interval
> 0
STOP User chosen termination of program

&&&&&&&&&&&&&&&&&&&& END EXAMPLE &&&&&&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&&&&&&& BP1CI &&&&&&&&&&&&&&&&&&&&

&&&&&&&&&&&&&&&&&&&& BEGIN EXAMPLE &&&&&&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&&&&&&& CONFINT &&&&&&&&&&&&&&&&&&&&


confint.exe

CONFINT
Version 2.0: April 2008

Sample Sizes Required to Achieve a
Specified Small Confidence Interval

Barry W. Brown

Copyright 2008 for:

The University of Texas M. D. Anderson Cancer Center
Department of Biostatistics and Applied Mathematics
1515 Holcombe Boulevard
Houston, TX 77030 (USA)

Contact: BWB at above address of b...@mdanderson.org

This program may be freely copied and (noncommercially)
redistributed.


Do you want a report file (y/n)? The report file will
contain all
answers to the calculations performed. You can print it
instead of
copying numbers from the screen.

Please enter one of [yn]: > n

ENTER 0 to exit this program

Calculations for a one-sample confidence interval

ENTER number of calculation desired:

(1) Normal (Gaussian) mean

(2) Normal (Gaussian) standard deviation

(3) binomial

(4) Poisson

(5) Exponential Survival -- Clinical Trial
(Confidence Interval on the Hazard)

(6) Exponential Survival -- Clinical Trial
(Confidence Interval on the Mean Survival Time)

> 3
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++

+ +
+ One-Sample Binomial Confidence
Interval +

+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++

Enter (0) to return to the main menu
(1) to compute the confidence interval properties of a
study
with fixed sample size.
(2) to compute the sample size required to achieve a
confidence
interval with a maximum fixed size.

> 2

ENTER:

(1) The (presumed true) probability of an event at which
this calculations is made

(2) The maximum desired length of the confidence interval

(3) The confidence LEVEL of the interval
(0.8, 0.95 are popular values)

(4) The probability that the extent of the confidence
interval does not exceed the maximum desired length.

> 0.8 0.1 0.95 0.8
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++

+ +
+ Input
Values +

+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++
The probability of an event is:
0.8000
The maximum length of the confidence interval is:
0.1000
The confidence level of the interval is
0.9500
The probability that the confidence interval is small is:
0.800000



!....................................................................!


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++

+ +
+ Calculating Sample Size of the
Study +

+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++

The sample size is:
278.0000



Let phat = (number of events)/(number of trials).

The confidence interval will have length at most 0.10000 if
phat is <= 0.21942 or else phat >= 0.78058


Hit the "Enter" or "Return" key to continue



PROBABILITY LENGTH LOW HIGH MEDIAN
OF CONFIDENCE INT PROBABILITY PROBABILITY LENGTH
<= 0.1000 OF EVENT OF EVENT OF CI


0.050 0.2642 0.7358 0.1062
0.100 0.2545 0.7455 0.1052
0.150 0.2480 0.7520 0.1043
0.200 0.2430 0.7570 0.1032
0.250 0.2386 0.7614 0.1027
0.300 0.2348 0.7652 0.1022
0.350 0.2312 0.7688 0.1017
0.400 0.2279 0.7721 0.1011
0.450 0.2247 0.7753 0.1005
0.500 0.2216 0.7784 0.1005
0.550 0.2184 0.7816 0.1000
0.600 0.2153 0.7847 0.0994
0.650 0.2121 0.7879 0.0988
0.700 0.2087 0.7913 0.0982
0.750 0.2051 0.7949 0.0976
0.800 0.2011 0.7989 0.0969
0.850 0.1965 0.8035 0.0963
0.900 0.1908 0.8092 0.0950
0.950 0.1825 0.8175 0.0936

NOTE: Due to the symmetry of the problem, the probability
that the
length of the confidence interval is less than the specified
value is
the same for any value of the true probability of an event
and one
minus this value.


Enter (0) to return to the main menu
(1) to compute the confidence interval properties of a
study
with fixed sample size.
(2) to compute the sample size required to achieve a
confidence
interval with a maximum fixed size.

> 0

ENTER 0 to exit this program

Calculations for a one-sample confidence interval

ENTER number of calculation desired:

(1) Normal (Gaussian) mean

(2) Normal (Gaussian) standard deviation

(3) binomial

(4) Poisson

(5) Exponential Survival -- Clinical Trial
(Confidence Interval on the Hazard)

(6) Exponential Survival -- Clinical Trial
(Confidence Interval on the Mean Survival Time)

> 0
User requested termination to program
STOP User requested termination to program

&&&&&&&&&&&&&&&&&&&& END EXAMPLE &&&&&&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&&&&&&& CONFINT &&&&&&&&&&&&&&&&&&&&

Thomas Keller

unread,
Nov 16, 2009, 5:55:47 PM11/16/09
to MedStats
John, we agree.

Therefore I gave the introductionary remark about the two worlds of
science/mathematics on the one hand and guidelines on the other hand.

From a former discussion in this forum I know, that you work on an
project which is due to HPV related FDA-guideline. But this guidelines
refers to CLSI EP12 in terms of CI. So EP12 might be appropriate,
although the meaning of "exact" is not due to the scientific
understanding.

From practical point of view, I prefer to use regression framework in
order to analyse these type of studies (as described in Pepe's book),
because covariates can be included (e.g. age in case of HPV), if
necessary. I'm using CI of estimates calculated by these models. From
my experience, one yields same numbers as with Pearson-Clopper-method,
but I have to check this for for proportions near zero or 100%.
In addition, one should think about including relative measures (rTPF,
rFPF) in order to address issues in terms of missing diagnostic
accuracy criterion for test negatives.

For other readers: I provide an Excel-tool for "exact" CI on my
website:
http://www.acomed-statistik.de/math_files/Konfidenzintervall_Ratio_V03.xls
however, in german language.

Kind regards
Thomas






Reply all
Reply to author
Forward
0 new messages