Comparison of different measures of association

Arin Basu

unread,

Dec 16, 2009, 3:05:20 PM12/16/09

to meds...@googlegroups.com

Dear Sarah,

While Odds Ratio and Risk Ratios are conceptually same measures and even
numerically they approximate for rare diseases or under rare disease
assumption conditions, however, they are contextually different. Odds
Ratios are typical for cross sectional or case control studies while
risk ratios are calculated for prospective studies where the subjects
were followed up in time (either prospective or retrospectively
prospective as in retrospective cohort studies). That said, both RR and
OR differ fundamentally as measures of association from risk difference.
In risk difference, the measure of association is based on a numeric
scale, whereas an Odds Ratio is on ratio scale (different meanings and
interpretations for RD = 0 and OR = 0). In your case, as Steve
suggested, NNT is a highly practical measure to use for all practical
purposes.

HTH,
Arin

Sarah

unread,

Dec 17, 2009, 4:21:18 AM12/17/09

to MedStats

Thanks, that explains my confusion at trying to exaplin all three, in
one report! Very useful thanks!

Ted Harding

unread,

Dec 17, 2009, 4:53:05 AM12/17/09

to meds...@googlegroups.com

Some people may find it useful to read a summary of the mathematical
relationships between the RR = p2/p1, the OR = (p2/(1-p2))/(p1/(1-p1)),
and the difference = p2-p1 involving two proportions p1 and p2.

A little while ago I prepared an outline of the relationships between
these quantities, and in view of this correspondence I have uploaded
it onto my little website at

http://www.zen89632.zen.co.uk/R/TwoProportions/lambda_delta.pdf

in case anyone is interested. If anyone has comments or criticisms,
I would be grateful to hear of them.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Dec-09 Time: 09:53:02
------------------------------ XFMail ------------------------------

Doug Altman

unread,

Dec 17, 2009, 12:34:39 PM12/17/09

to meds...@googlegroups.com, meds...@googlegroups.com

It is perhaps worth adding that there are two relative risks - the RR of an event and the RR of a non event. These are very different. Using the same notation, they are

RR1 = p2/p1
RR2 = (1-p2)/(1-p1)

It is easy to see that the OR is the ratio of these two RRs (which is interesting but not all that useful). With a very common event the OR will be close to RR of not having an event (and very different from the RR of an event).

While it is usually clear which RR one is interested in, that is not always the case - an example might be when considering the (relative) risk of continuing or ceasing some behaviour, such as smoking. For more discussion see:

Deeks JJ.
Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes.
Stat Med. 2002 Jun 15;21(11):1575-600.

Doug

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

_____________________________________________________

Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
University of Oxford
Wolfson College Annexe
Linton Road
Oxford OX2 6UD

email:  doug....@csm.ox.ac.uk
Tel:    01865 284400 (direct line 01865 284401)
Fax:    01865 284424
www:     http://www.csm-oxford.org.uk/

EQUATOR Network - resources for reporting research
www: http://www.equator-network.org/

Frank Harrell

unread,

Dec 17, 2009, 4:24:02 PM12/17/09

to MedStats

What a great handout Ted.

To slightly disagree with Arin, the odds ratio is a great measure for
prospective cohort studies.

A major advantage of the odds ratio is that it does not impose any
restrictions on p1 and p2, whereas, for example, a risk ratio of 2 can
only apply to p1 <= 1/2. So models based on odds ratios will not
require 'mathematical' interactions (interactions that don't make
sense based on subject matter knowledge) just to keep probabilities
between 0 and 1.

Frank

On Dec 17, 3:53 am, (Ted Harding) <Ted.Hard...@manchester.ac.uk>
wrote:

> E-Mail: (Ted Harding) <Ted.Hard...@manchester.ac.uk>

Ted Harding

unread,

Dec 18, 2009, 12:47:13 PM12/18/09

to meds...@googlegroups.com

On 17-Dec-09 21:24:02, Frank Harrell wrote:
> What a great handout Ted.

Thanks Frank! That induced me to have another look at it, to see
how great it is -- and my eye promptly spotted an error! Namely,
Page 1 left-hand side, paragraph "For Constant RR", I wrote
"so 0 <= p1 <= 1 - 1/lambda", which (if you think and/or look at
the diagram) should be "so 0 <= p1 <= 1/lambda".

I have corrected this, and the new version is available as before:

http://www.zen89632.zen.co.uk/R/TwoProportions/lambda_delta.pdf

I plead initial laziness, in that I think I must have copied down
the preceding "For Constant Difference" and then edited the bits
that needed changing (but not all of them ... ). Be that as it may,
it links nicely into Frank's comment below:

> To slightly disagree with Arin, the odds ratio is a great measure
> for prospective cohort studies.
>
> A major advantage of the odds ratio is that it does not impose any
> restrictions on p1 and p2, whereas, for example, a risk ratio of 2
> can only apply to p1 <= 1/2. So models based on odds ratios will
> not require 'mathematical' interactions (interactions that don't
> make sense based on subject matter knowledge) just to keep
> probabilities between 0 and 1.
> Frank

Indeed. And this raises the perennial issue of the tensions between
[A] Mathematically smooth/tractable models
[B] Interpretation of models in terms of mechanisms
[C] Expressing the results of model fitting in terms people can grasp
with minimal risk of confusion or misinterpretation.

For a binary outcome, logistic regression models a linear predictor
for the log-Odds, so an increment in a covariate X emerges as a
proportional increment in log-Odds, i.e. as an Odds-Ratio. With
this linear predictor, the sufficient statistics are sum(Yi) and
sum(Yi*Xi) (outcome Yi = 0/1), and all fits smoothly into the
classical Fisherian theory of inference and information.

But what does it represent? One interpretation of a model for
the probability of Y=1 for a binary outcome:

Prob(Y=1|X=x) = F(x; alpha, beta)

is that there is an underlying latent varable U, not directly
observed, such that each potential subject has a value of U,
distributed over subjects with distribution function F as above,
which can be interpreted as a "tolerance" towards "stimulus" X:
if, given X=x. a subject has U=u, then Y=1 if u<x (insufficient
tolerance) -- subject (say) dies; while if u>x (sufficient
tolerance) then Y=0 (subject survives).

So, if you use a logistic regression, you are implicitly acting
as though there is some U which has the Logistic distribution:

Prob(U <= u) = F(u; alpha, beta) = exp(L)/(1+exp(L))
L = alpha + beta*u

But where, in the world of adopting probability distributions
to model naturally occurring variables, would you spontaneously
adopt a logistic distribution as a natural representation?
(Apart, of course, from when you are quietly coerced into it
by "spontaneously" adopting logistic regression, perhaps not
being aware of what it implies under its skin).

If you approach the question from this point of view, you might
more spontaneously adopt a Normal distribution for the underlying
latent variable. And then, of course, you would be led to use
a Probit model where F is now the distribution function for the
Normal distribution, not the Logistic.

And the Probit model was the first on the scene (Bliss, 1935;
further developed by Fisher, in part with Bliss). Interestingly,
however, Fisher (the discoverer of the concept of Sufficiency)
did not pursue the question of sufficiency in the Probit model
(which does not lead to interesting sufficient statistics).

The Logistic model seems to have raised into prominence by Berkson
in the 1950s, and somewhere along that line the simple sufficient
statistics emerged. Also, the natural interpretation of the
coefficients as proportionality constants for changes in log-OR
emerged (especially in connection with contingency tables).

However, an increment in log-OR is not simple to understand
or explain. How should I understand some exposure which "doubles
my Odds-Ratio for death"? I need a baseline odds (from which the
baseline risk can be calculated, of course) so then I can get
the odds if exposed (and then could get the exposed risk). But
it is the risk which is interesting, not the odds (unless someone
is taking bets on the outcome).

It is a bit easier to understand something which would double my
Risk-Ratio. But I still nedd the baseline. If, in round numbers,
a 50km car journey normally gives me a 1/10000 chance of becoming
a casualty, and (in the current wintry conditions here), that
risk is doubled, then it becomes 1/5000. That is the sort of
risk one normally treats lightly (though not too often).

But if I had a choice between (say) undergoing some procedure
which gave me a 50% chance of death, or not undergoing it which
would almost certainly result in death. then the choice between
the Risk (50%) and twice the Risk (100%) becomes starker.

However, often some result is expressed in the media as "X doubles
the risk of cancer", which tends to provoke a sensation of understanding
in the people who are given this message. Presumably they refer
"double the risk" to some intuitive notion of "the risk", which
implies that there is some populist Bayesin prior out there.
Few are they who ask "What, from 1 in 1,000,000 to 2 in 1,000,000?
Who cares?" (Provided 1 in a million is realistic, of course ... ).

The nice thing about the Logistic model is that the coefficients
which flow out of it relate directly to a risk-related entity,
namely the log-Odds. Setting up a model in terms of Risk is not
so straightforward. Using a GLM with log-Poisson link will do it,
but you are only safe from unrealistic results if you apply it
to fairly rare outcomes. Otherwise you can predict negative Risk,
or Risk > 1 (just as, in Frank's comment, you can't have RR>2
if p1 > 0.5).

But what is it that flows out of the Probit model? Nothing obviously
natural whatever.

Nevertheless, there is an interesting relationship between the
Logistic model and the Normal distribution. Imagine a population
consisting of two Groups, one labelled Y=0 and the other labelled Y=1.
In each, a variable X has a Normal distribution; the variance is
the same in both Groups, they differ only in their means.
Group 1 constitutes a proportion p of the population, Group 2
constitutes a proportion 1-p.

Now choose a member at random from the entire population, and observe
the value of X. Then the probability that that individual has Y=1
is given by the Logistic model. So you can at one and the same time
"spontaneously" adopt a Normal distribution for a naturally occurring
variable, and a Logistic model for the outcome Y.

But, of course, when one looks at typical epidemiological data,
and segregates them into a Y=1 group and a Y=0 Group. you are
very unlikely (as it appears in practice) to be dealing with a
situation where the distribution of X in each group would appear
to be Normal. So maybe the above relationship is not good comfort
for those who want to feel happy with both the "smoothness" of
the Logistic model and the "naturalness" of the Normal distribution.

So, all that being said, just *what* is the interpretation, as a
mechanism, if the Logistic distributikon implied by the Logistic
regression model? Or is it just a tractable approximation to the
Normal ditribution (and hence the Logistic model to the Probit
model)? Not forgetting, that in the Bioassay context where these
things were first developing, the Y=1 response rate was typically
fairly well clear of P=0 and fairly well clear of P=1 (e.g. over
a range 0.1 < P < 0.9); in such a range, there is not a lot of
difference between the Logistic model for Prob(Y=1|X=x) and the
Probit model. It is only when you get out into the P<0.05 (or
P>0.95) tail that they begin to differ markedly. But this sort
of prevalence is common in epidemiological studies -- so then
the question "Is the Logistic model adequately accurate for the
true mechanism?" becomes more pressing.

Not that any of this is intended to be a definitive resolution
of the issues [A], [B] and [C]. It is just a case of observing
them float in turn to the surface, as you turn the problem round.

Ted.

>
> On Dec 17, 3:53_am, (Ted Harding) <Ted.Hard...@manchester.ac.uk>

>> _http://www.zen89632.zen.co.uk/R/TwoProportions/lambda_delta.pdf

>>
>> in case anyone is interested. If anyone has comments or criticisms,
>> I would be grateful to hear of them.
>>
>> Ted.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Hard...@manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861

>> Date: 17-Dec-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 09:53:02
>> ------------------------------ XFMail ------------------------------
>

> --
> To post a new thread to MedStats, send email to
> MedS...@googlegroups.com .
> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>

Fax-to-email: +44 (0)870 094 0861

Date: 18-Dec-09 Time: 17:47:09
------------------------------ XFMail ------------------------------

Frank Harrell

unread,

Dec 19, 2009, 2:43:19 PM12/19/09

to MedStats

Hi Ted,

One could argue that the logistic model is more tied to the normal
distribution than is the probit model, because Bayes' rule gives you
the logistic model if you start with multivariate normality for X.
You're right that the probit model coefficients are very hard to
interpret.

I would argue that logistic models are very interpretable withing
envisioning latent variables. I think that various plots of predicted
probabilities and nomograms to obtain risk differences for a given
covariate setting are some of the best ways to go.

Cheers
Frank

On Dec 18, 11:47 am, (Ted Harding) <Ted.Hard...@manchester.ac.uk>

> ...
>
> read more »

Reply all

Reply to author

Forward