Comparison of the changes in two proportions?

5 views
Skip to first unread message

Bettina Haidich

unread,
Dec 1, 2009, 8:29:52 AM12/1/09
to meds...@googlegroups.com

Dear list,

 

 

How would I compare two changes (differences) in proportions and get a p-value. I have one sample of articles in 2003 and I compare the change of an event in another sample of articles in 2006 for two different journals. Specifically, in 2003 I have a sample of 83 articles and in 2006 I have a sample of 51 articles for a  journal. The event I am interested in, occur in 33 articles (33/83 = 39.8%) in 2003 and in 18 articles (18/51=35.3%) in 2006. Therefore, the change over time is 39.8%-35.3% = -4.5% I also calculated the 95% confidence interval (CI) of the change: -20.3%, 12.4%).

In another journal, the corresponding numbers are 22/43=51.2% in 2003 and 27/51=52.9% in 2006. Therefore, the change over time is 51.2%-52.9% =1.8% with 95% CI of the change: (-17.7%, 21.2%).

 

Now I want to compare if the change in one journal was significantly different of the change in the other journal, e.g. -4.5% versus 1.8% and get a p-value of this comparison. From the confidence intervals it is obvious that there is no significant difference but I was wondering if there is a way to calculate a p-value and in which way? Which is the appropriate statistical test for the comparison of the changes in two proportions?  

 

Thank you in advance,

 

Bettina


BXC (Bendix Carstensen)

unread,
Dec 1, 2009, 11:06:53 AM12/1/09
to meds...@googlegroups.com
This is but a 2 by 2 by 2 table, where you are asking for the interaction.

That can be put into a generalized linear model with binomial error, normally with a logit link, but since you are asking for differences in proportions you may use the identity link (since yiour data are well-behaved).

Here is a small R-program that does the two analyses and gives the likelihood-ratio tests (both non-significant) on the two scales (note the trick to generate the identity link on the fly, it is not available by default):

hh <- read.table( "Haidich.txt", header=T )
hh
str(hh)
# Model where events depend on journal and period, that is
m0 <- glm( event ~ jn + factor(per), weight=n, family=binomial(link=log), data=hh )
# Update (and compare to) a model where the difference between years depend onn journal
anova( m0, m1 <- update( m0, . ~ . + jn:factor(per) ), test="Chisq" )
summary(m0)
summary(m1)

# Now do the same on the identity link scale
m0i <- glm( event ~ jn + factor(per), weight=n, family=binomial(link=make.link("identity")), data=hh )
anova( m0i, m1i <- update( m0i, . ~ . + jn:factor(per) ), test="Chisq" )
summary(m0i)
summary(m1i)

This is the result of running the code; the file Haidich.txt looks just as the
printout of the dataframe hh.

Note that you original difference between changes 4.5% minus -1.8% = 6.3% appears as the estimate of the interaction term in the model using identity link, almost at the bottopm of the output.

Best regards,
Bendix

R 2.10.0
---------------------------------------------
Program: Haidich.R
Folder: C:\stat\R\BxC\Examples
Started: tirsdag 01. december 2009, 16:55:52
---------------------------------------------
> hh <- read.table( "Haidich.txt", header=T )
> hh
n event jn per
1 33 1 A 2003
2 50 0 A 2003
3 18 1 A 2006
4 33 0 A 2006
5 22 1 B 2003
6 21 0 B 2003
7 27 1 B 2006
8 24 0 B 2006
> str(hh)
'data.frame': 8 obs. of 4 variables:
$ n : int 33 50 18 33 22 21 27 24
$ event: int 1 0 1 0 1 0 1 0
$ jn : Factor w/ 2 levels "A","B": 1 1 1 1 2 2 2 2
$ per : int 2003 2003 2006 2006 2003 2003 2006 2006
> # Model where events depend on journal and period, that is
> m0 <- glm( event ~ jn + factor(per), weight=n, family=binomial(link=log), data=hh )
> # Update (and compare to) a model where the difference between years dpend onn journal
> anova( m0, m1 <- update( m0, . ~ . + jn:factor(per) ), test="Chisq" )
Analysis of Deviance Table

Model 1: event ~ jn + factor(per)
Model 2: event ~ jn + factor(per) + jn:factor(per)
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 5 308.14
2 4 307.89 1 0.25374 0.6145
> summary(m0)

Call:
glm(formula = event ~ jn + factor(per), family = binomial(link = log),
data = hh, weights = n)

Deviance Residuals:
1 2 3 4 5 6 7 8
7.934 -6.975 5.955 -5.554 5.287 -5.630 5.999 -5.881

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.95384 0.12320 -7.742 9.77e-15
jnB 0.31865 0.14989 2.126 0.0335
factor(per)2006 -0.03126 0.14959 -0.209 0.8345

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 312.63 on 7 degrees of freedom
Residual deviance: 308.14 on 5 degrees of freedom
AIC: 314.14

Number of Fisher Scoring iterations: 6

> summary(m1)

Call:
glm(formula = event ~ jn + factor(per) + jn:factor(per), family = binomial(link = log),
data = hh, weights = n)

Deviance Residuals:
1 2 3 4 5 6 7 8
7.802 -7.119 6.123 -5.360 5.430 -5.486 5.860 -6.015

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9223 0.1351 -6.827 8.7e-12
jnB 0.2522 0.2011 1.254 0.210
factor(per)2006 -0.1191 0.2328 -0.512 0.609
jnB:factor(per)2006 0.1533 0.3063 0.500 0.617

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 312.63 on 7 degrees of freedom
Residual deviance: 307.89 on 4 degrees of freedom
AIC: 315.89

Number of Fisher Scoring iterations: 6

>
> # Now do the same on the identity link scale
> m0i <- glm( event ~ jn + factor(per), weight=n, family=binomial(link=make.link("identity")), data=hh )
> anova( m0i, m1i <- update( m0i, . ~ . + jn:factor(per) ), test="Chisq" )
Analysis of Deviance Table

Model 1: event ~ jn + factor(per)
Model 2: event ~ jn + factor(per) + jn:factor(per)
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 5 308.11
2 4 307.89 1 0.21537 0.6426
> summary(m0i)

Call:
glm(formula = event ~ jn + factor(per), family = binomial(link = make.link("identity")),
data = hh, weights = n)

Deviance Residuals:
1 2 3 4 5 6 7 8
7.908 -7.004 5.994 -5.510 5.273 -5.644 6.008 -5.873

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.38773 0.04908 7.901 2.78e-15
jnB 0.14386 0.06723 2.140 0.0324
factor(per)2006 -0.01906 0.06619 -0.288 0.7734

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 312.63 on 7 degrees of freedom
Residual deviance: 308.11 on 5 degrees of freedom
AIC: 314.11

Number of Fisher Scoring iterations: 4

> summary(m1i)

Call:
glm(formula = event ~ jn + factor(per) + jn:factor(per), family = binomial(link = make.link("identity")),
data = hh, weights = n)

Deviance Residuals:
1 2 3 4 5 6 7 8
7.802 -7.119 6.123 -5.360 5.430 -5.486 5.860 -6.015

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.39759 0.05372 7.401 1.35e-13
jnB 0.11404 0.09326 1.223 0.221
factor(per)2006 -0.04465 0.08581 -0.520 0.603
jnB:factor(per)2006 0.06243 0.13439 0.465 0.642

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 312.63 on 7 degrees of freedom
Residual deviance: 307.89 on 4 degrees of freedom
AIC: 315.89

Number of Fisher Scoring iterations: 3

>

---------------------------------------------
Program: Haidich.R
Folder: C:\stat\R\BxC\Examples
Ended: tirsdag 01. december 2009, 16:55:53
Elapsed: 00:00:01
---------------------------------------------

Ray Koopman

unread,
Dec 2, 2009, 2:39:33 AM12/2/09
to MedStats
Don't make it more complicated than it needs to be. Your null
hypothesis is that a particular linear combination of independent
binomial proportions should be 0. The samples sizes are large enough
that asymptotic standard errors can be used.

x = (33/83 - 18/51) - (22/43 - 27/51) = .062433

s.e.(x) = sqrt(33*50/83^3 + 18*33/51^3 + 22*21/43^3 + 27*24/51^3)

= .134385

z = x/s.e.(x) = .465, p = .64

Same answer as Bendix gave you, but simpler :)

BXC (Bendix Carstensen)

unread,
Dec 2, 2009, 5:02:06 AM12/2/09
to meds...@googlegroups.com
You can always debate wheter the simplest solution is the particular one or the general one. The latter gives you the opportunity to assess which scale you want the interaction on, the probability scale or the logit scale. And inevitably there will be another journal or another year or... so in the end it is usually time-saving to stick it all in a model from the very start.

Best regards,
Bendix

John Sorkin

unread,
Dec 2, 2009, 7:05:45 AM12/2/09
to meds...@googlegroups.com
Bendix,
I missed your original reply to the question. Could you send it to me?
Thanks,
John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> "BXC (Bendix Carstensen)" <b...@steno.dk> 12/2/2009 5:02 AM >>>
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Thomas Keller

unread,
Dec 2, 2009, 7:15:26 AM12/2/09
to MedStats
Dear Bettina,

beside p-values: the first aspect one has to consider is the effect
itself (-4.5% vs. 1.8%) and it's relevance.

I would say, these are not relevant effects. Even in case of thousands
of articles and therefore large enough power to detect a statistical
significance of effect: what would be the impact of those small
changes?

Kind regards
Thomas

Thomas Keller



BXC (Bendix Carstensen)

unread,
Dec 2, 2009, 7:54:23 AM12/2/09
to meds...@googlegroups.com
The earlier posts are all on the group website, http://groups.google.com/group/MedStats.

I have uploaded a groomed version of the R-program that does the analysis with three different links (logit, log and identity) and using two different ways of setting up data; the file is called Haidich.r. Data is now contained in the file so it should be absolutely self-contained.
For the sake of completeness there is also a file, Haidich.Rout, with the entire output in it.

John Sorkin

unread,
Dec 2, 2009, 9:47:22 AM12/2/09
to meds...@googlegroups.com
Thank you
John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119

(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> "BXC (Bendix Carstensen)" <b...@steno.dk> 12/2/2009 7:54 AM >>>


The earlier posts are all on the group website, http://groups.google.com/group/MedStats.

Confidentiality Statement:

Martin Holt

unread,
Dec 2, 2009, 10:13:24 AM12/2/09
to meds...@googlegroups.com
Just in case you don't know, the location Bendix uploaded to is "Files" on
the right hand side of the Home webpage
http://groups.google.com/group/MedStats . (See posting below).

Many thanks to Bendix for these, from me and on behalf of the Group.

Best Wishes,
Martin

John Sorkin

unread,
Dec 2, 2009, 11:23:41 AM12/2/09
to meds...@googlegroups.com
Thank you. I was searching and could not find the file.
John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> "Martin Holt" <m861...@btinternet.com> 12/2/2009 10:13 AM >>>
Reply all
Reply to author
Forward
0 new messages