Does regression prove causation ?

2,189 views
Skip to first unread message

Vlad

unread,
Aug 15, 2012, 4:19:51 PM8/15/12
to meds...@googlegroups.com
Does regression methods prove causal relationship between the independent variable and the dependent one ?

mdim...@gmail.com

unread,
Aug 15, 2012, 4:26:52 PM8/15/12
to meds...@googlegroups.com
Hi Vlad

It doesn't prove causation at all but only shows association between dependent and independent variables. Proving causation is not that straightforward and goes beyond association. I would urge you to read some of Sir Bradford Hill's instruments in order to prove causation.

Cheers

Munya
Sent using BlackBerry® from Orange

From: Vlad <vlds...@gmail.com>
Date: Wed, 15 Aug 2012 13:19:51 -0700 (PDT)
Subject: {MEDSTATS} Does regression prove causation ?

Does regression methods prove causal relationship between the independent variable and the dependent one ?

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Vlad

unread,
Aug 16, 2012, 2:27:13 AM8/16/12
to meds...@googlegroups.com, mdim...@gmail.com
Thanks Munya,
But, I met a lecture http://www.nemoursresearch.org/open/StatClass/January2011/Class5.ppt  where is a statement that confuses me "Correlation does not assume causality but  regression does".

In the particular case i am interesed, it's not possible to prove causation with an experimentl study due to ethical reasons, so I am searching another way.

Is it correct to say that if a predictor influences the outcome variable, then it is the cause ?!

Best Regards,
Vlad.

Munyaradzi Dimairo

unread,
Aug 16, 2012, 2:43:49 AM8/16/12
to meds...@googlegroups.com
Yes correlation doesn't assume causality but regression does. However,
assuming causality isn't the same as proving or establishing
causality. Having a predictor influencing the outcome does not suffice
causation on itself. There are minimum criteria needed to establish
causality between a predictor and outcome, such as biological
plausibility, strength of an association (established through
regression say), dose response relationship etc. Hills criteria for
causation is a good start on causation!

I'm not sure about your special case!

All the best

Munya
> --
> To post a new thread to MedStats, send email to MedS...@googlegroups.com .
> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules



--
************************************************
Munyaradzi Dimairo
Medical Statistician
Clinical Trials Research Unit
ScHARR (School of Health Related Research)
University of Sheffield
30 Regent Street
Sheffield S1 4DA
Tel: 0114 (0) 22 25204
Mobile: +447531421509
Fax: +44 (0) 114 222 0870
Email: mdim...@gmail.com
m.di...@sheffield.ac.uk
jmdi...@yahoo.co.uk

http://www.shef.ac.uk/ctru/staff/munyadimairo.html

http://www.rds-yh.nihr.ac.uk/about-the-rds/staff/sheffield.aspx

"Statistics are like bikinis. What they reveal is suggestive, but
what they conceal is vital. ~Aaron Levenstein...can be easily
misinterpreted and abused!!"
************************************************

Ted Harding

unread,
Aug 16, 2012, 4:00:20 AM8/16/12
to meds...@googlegroups.com
This had better be stated before things get spread too widely!

Regression does NOT assume causality. Regression is simply about
establishing a relationship between the variations of two (or more)
variables. Causality does not come into it at all. How such a
relationship might be caused is a totally separate question.

There are plenty of well-known examples where values of a variable Y
are (say) greater when the values of a variable X are also greater,
but in terms of what X and Y mean the notion that X might cause Y
(or vice versa) is clearly nonsense. One example (from way back)
is that in the 1930s it was observed that the number of admissions
to lunatic asylums per year increased from year to year, as also
did the number of listeners to BBC Radio. You can regress admissions
on listeners -- but be careful about radio causing madness. The plain
fact there is that, for quite separate rreasons, both happened to be
increasing.

Nor is regression (at any rate in the basic linear regression case)
essentially different from correlation: the slope of the regression
of Y on X is simply the correlation coefficient between X and Y,
multiplied by the ratio SD(Y)/SD(X).

The primary difference between regression and correlation is that
correlation is symmetric (the correlation between X and Y is the same
as the correlation between Y and X), while regression (in general)
is not: the regression slope of Y on X is corr(X,Y)*SD(Y)/SD(X),
while the regression slope of X on Y is corr(X,Y)*SD(X)/SD(Y).

Of course it is fine to use regression in a context where you believe
that, if causation there be, X causes Y. Then the regression of Y on X
estimates how much variation in Y is caused by a given variation in X.
If, then, you find that there is no evidence for a change in Y for a
given change in X (e.g. very small regression slope, or non-significant
result), then you have not established that a change in X causes a
change in Y.

On the question:
"Is it correct to say that if a predictor influences
the outcome variable, then it is the cause ?!"
one must be careful about the semantics of "influences". In the normal
usage of the word "influence", causality is part of the meaning of the word.
Therefore, if you say "X influences Y", what you *mean* is that a change
on X *causes* a change in Y. That is built into your decision to use the
word "influence". Why you decide to use the word is another question.

Therefore the essential issue with regard to that question is whether
it is correct to use the word "influences". That is not established
by determining that you have, by regression of Y on X, established
a positive relationship. You could just as well regress X on Y, and
get it the other way round.

This essential question of keeping the logic clear about the use of
language tends, nowadays, to get obscured by a *false* habit, an
unthinking reaction, of stating that "We conclude that X influences Y:
in a regression of Y on X the slope was 2.14 (1.96, 2.32, P < 0.001)".
That is only valid if you have good general grounds for supposing that,
by the nature of X and Y, X might be the cause of Y (rather than Y being
the cause of X, or both X and Y being causally influenced by some
unobserved third variable, etc.).

Hoping this helps.
Ted.
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@wlandres.net>
Date: 16-Aug-2012 Time: 09:00:17
This message was sent by XFMail
-------------------------------------------------

Vlad

unread,
Aug 16, 2012, 4:52:47 AM8/16/12
to meds...@googlegroups.com, Ted.H...@wlandres.net
Ted , Munya, thanks a lot for explanation!

Peter Flom

unread,
Aug 16, 2012, 4:55:19 AM8/16/12
to meds...@googlegroups.com
I wouldn't say regression assumes causation, exactly.

Regression investigates whether one variable *depends* on one or more
others. That doesn't exactly mean it is cause by them, although it's close.
Or, it can be only partially caused.

I am not exactly sure how to articulate this, myself. I think causation has
more philosophical connotations than causation.

Regardless, regression certainly doesn't *prove* causation.

Peter Flom
Peter Flom Consulting
http://www.statisticalanalysisconsulting.com/
http://www.IAmLearningDisabled.com
http://www.linkedin.com/in/peterflom

Abhaya Indrayan

unread,
Aug 16, 2012, 5:34:51 AM8/16/12
to meds...@googlegroups.com
Certainly. Causation has different connotation. In India, chronic kidney diseases and cardiovascular diseases are simultaneously increasing but one is not cause of the other. In regression though, luckily, we have the opportunity to incorporate all the factors that can possibly influence the outcome, and study net effect. However, this also would be limited to the ___Known___ factors. Unknown or those that are not stipulated will not be counted. Also the shape of the regression we choose will also be a limitation. I guess all this comes under my favorite "epistemic" zone that we tend to forget. 

For causation, criteria such as consistency of association, temporality, dose-response, specificity, statistical significance, and above all biological plausibility can be advocated. By the way, the strength of association is not important. The correlation between child's level of cholesterol and mother's level of cholesterol is weak but that does not rule it out as a cause, although this would be one of the many causes.

~Abhaya Indrayan
-- 
Dr Abhaya Indrayan, PhD(OhioState),FAMS,FRSS,FASc
Medicalizing Biostatistics: http://www.MedicalBiostatistics.com

Vlad

unread,
Aug 16, 2012, 6:19:53 AM8/16/12
to meds...@googlegroups.com
Abhaya, Peter , Thank you guys !

How weak correlation can be considered as an acceptable strength of association ? eg. r=.34 is also weak, is it acceptable ?

John Sorkin

unread,
Aug 16, 2012, 6:43:49 AM8/16/12
to meds...@googlegroups.com
The question posed quickly devolves into the arena of epistemology and metaphysics, but to a first approximation, no statistical test of any kind proves causation. Statistical tests can support a hypothesis of causation, but can not prove the hypothesis. As noted by other posters there can be many reasons for an association between x and y that is not casual. Causation can only be proven (if in fact it can every be proven) only by a properly designed experiment.
John

 
John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Vlad <vlds...@gmail.com> 8/16/2012 6:19 AM >>>

Abhaya, Peter , Thank you guys !

How weak correlation can be considered as an acceptable strength of association ? eg. r=.34 is also weak, is it acceptable ?

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Confidentiality Statement:

This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Thompson,Paul

unread,
Aug 16, 2012, 7:20:42 AM8/16/12
to meds...@googlegroups.com, mdim...@gmail.com

It is certainly not possible to conclude that a cause is revealed in a regression.

 

However, if a regression reveals a relationship which is not strong, we are in a good situation to state that the variables on one side of the = sign cannot be causal for variables on the other side.

 

That is, correlation does not imply causation, but lack of correlation does imply lack of direct causation.

--

To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules



-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,

is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,

disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy

John Whittington

unread,
Aug 16, 2012, 7:52:25 AM8/16/12
to meds...@googlegroups.com
At 06:43 16/08/2012 -0400, John Sorkin wrote:
The question posed quickly devolves into the arena of epistemology and metaphysics, but to a first approximation, no statistical test of any kind proves causation. Statistical tests can support a hypothesis of causation, but can not prove the hypothesis. As noted by other posters there can be many reasons for an association between x and y that is not casual. Causation can only be proven (if in fact it can every be proven) only by a properly designed experiment.

I almost agree with all of that, give or take the fact that 'proving' anything (i.e. with 100% certainty) is rarely possible, particularly in biological sciences.

The one point I would make is that, in a world of variability, 'statistical tests' can be of value in helping us to get close to establishing causation.  Probably the most compelling way of establishing probable causation is by demonstrating a 'dose-response' relationship in the absence of (or with consideration of) differences in other factors.  If that dose-response relationship is anything other than 'blindingly obvious', then 'statistical tests' may be of value in putting a probabilistic handle on the likelihood of the dose-response relationship (hence causality) being 'real'.

Kind Regards,


John

----------------------------------------------------------------
Dr John Whittington,       Voice:    +44 (0) 1296 730225
Mediscience Services       Fax:      +44 (0) 1296 738893
Twyford Manor, Twyford,    E-mail:   Joh...@mediscience.co.uk
Buckingham  MK18 4EL, UK            
----------------------------------------------------------------

John Whittington

unread,
Aug 16, 2012, 7:55:05 AM8/16/12
to meds...@googlegroups.com
At 11:20 16/08/2012 +0000, Thompson,Paul wrote:
>That is, correlation does not imply causation, but lack of correlation
>does imply lack of direct causation.

...provided that the situation is not being complicated by
interacting/confounding factors. In the presence of such factors, there
could be an apparent lack of correlation despite a real causal relationship.

Thompson,Paul

unread,
Aug 16, 2012, 9:27:36 AM8/16/12
to meds...@googlegroups.com
Which is exactly why I put in the statement about "direct causation".
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

John Whittington

unread,
Aug 16, 2012, 9:40:53 AM8/16/12
to meds...@googlegroups.com
At 13:27 16/08/2012 +0000, Thompson,Paul wrote:
>Which is exactly why I put in the statement about "direct causation".

I would have said that even 'direct causation' can be hidden by confounding
factors (i.e. resulting in a lack of correlation).

Kind Regards,
John

Vlad

unread,
Aug 16, 2012, 9:47:25 AM8/16/12
to meds...@googlegroups.com
Thanks all, a very informative topic !

I was googling, and there is a statement on wiki: "In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible." ( http://en.wikipedia.org/wiki/Instrumental_variable )

Can it be considered as a real solution to prove causation ?

Peter Flom

unread,
Aug 16, 2012, 11:20:15 AM8/16/12
to meds...@googlegroups.com
John W wrote

> >That is, correlation does not imply causation, but lack of
> >correlation does imply lack of direct causation.

There are other possibilities too, like curvilinear relationships. The
relationship between stress and test performance has an inverted U shape,
correlation can be very low, yet there can be causation

Peter

John Whittington

unread,
Aug 16, 2012, 11:32:16 AM8/16/12
to meds...@googlegroups.com
To be clear, it was actually Paul T, not me, who wrote that, and I was
challenging it's universal correctness - and, indeed, you now mention
another situation in which the 'lack of (strong) correlation' could
co-exist with a strong 'direct' (but non-linear) causal relationship.

Kind Regards,


John

Peter Flom

unread,
Aug 16, 2012, 11:33:24 AM8/16/12
to meds...@googlegroups.com
Ooops..... Sorry Paul and John for mixing up the thread

Peter
-----Original Message-----
From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf
Of John Whittington
Sent: Thursday, August 16, 2012 11:32 AM
To: meds...@googlegroups.com

Steve Simon, P.Mean Consulting

unread,
Aug 16, 2012, 5:06:33 PM8/16/12
to meds...@googlegroups.com, Vlad
On 8/15/2012 3:19 PM, Vlad wrote:

> Does regression methods prove causal relationship between the
> independent variable and the dependent one ?

To expand on the already good comments, establishing causation typically
requires making untestable assumptions about your data. Are these
untestable assumptions reasonable? They are very reasonable in the case
of randomization. Other approaches like instrumental variables and
propensity score matching are good, but even so, the assumptions that
they require are difficult at times to believe.

It sounds just awful to make untestable assumptions, but we do it all
the time. We assume, for example, that the laws of physics apply equally
well at all points in time. But can we prove that the effect of gravity
is the same today as it was 4.5 billion years ago when our solar system
was just being formed?

We statisticians blithely assume independence, and even if you do a runs
test or other test of autocorrelation, it does not totally remove the
possibility of lack of independence. You often have to take the
independence assumption on faith. It is, for the most part, an
untestable assumption. Most of the time, if you are not dealing with an
infectious disease and you don't recruit patients at that festival of
twins that occurs every year in Twinsburg, Ohio and that one person is
not copying off another person during the exam, etc., etc., this is a
very reasonable assumption.

There's nothing wrong with untestable assumptions as long as you are
honest with yourself about them.

I like the Hill criteria that others have mentioned, but there are
plenty of counterexamples out there. For example, X and Y could have a
dose response relationship but it is totally an artefact. Birth order
has a dose response relationship with Down's syndrome. First born
children have less risk than second born, who have less risk than third
born, etc. But the real cause is mother's age and it just happens that
birth order and mother's age are closely related. It's pretty hard to be
the seventh child of a twenty year old mother and a lot easier to be the
seventh child of a forty year old mother.

The Hill criteria have a cumulative impact in that if more of them are
satisfied, it is harder to envision how this statistical relationship
could have be produced artefactually. But the Hill criteria are never
sufficient. They get you closer to establishing causality, but there
will always be some residual doubt.

All in all, if you want to establish causation, especially in a
non-randomized study, you have to make qualitative arguments that are
independent of your data itself. There is no formal statistical test
that can establish causation.

If you want to understand this better, think of it as a missing data
problem. The people in the treatment group are missing data on what
their outcome would have been if they had been given the placebo.
Likewise the people in the placebo group are missing data on what their
outcome would have been if they had been given the treatment. In a
randomized study, you have what is equivalent to the missing completely
at random (MCAR) case. It's pretty easy to impute values in the MCAR
model. Missing data in an observational study is, at best, missing at
random (MAR). You can impute values in the MAR case, but it takes a lot
more work, and you are never quite sure if you have MAR or the dreaded
missing not at random (MNAR) case. To distinguish between MAR and MNAR
requires that you make untestable assumptions about your data that are
not too much different than the untestable assumptions that you have to
make about causation.

If you can stand a philosophical approach to causation, read up on
counterfactual statements. To imply causation is to make a
counterfactual statement. I've never been comfortable with a
philosophical discourse on causation, but that's more my limitation than
a statement of the validity of the philosophical approach.

Steve Simon, n...@pmean.com, Standard Disclaimer.
Sign up for the Monthly Mean, the newsletter that
dares to call itself average at www.pmean.com/news

Dora Smith

unread,
Aug 16, 2012, 5:22:41 PM8/16/12
to meds...@googlegroups.com
No.  Regression does a better job with causation than a correlation coefficient as it gives some kind of direction of an effect, but it may not be foolproof.   One problem is that something else that is associated with both variables could be the actual cause.   Kind of like proving that being Black tends to make people poor.   

With that said, I most often see this question raised in the context-free way you've put it, when a relationship genuinely exists and somebody doesn't want to believe it, or hopes other people won't believe it.  

Yours,
Dora
Reply all
Reply to author
Forward
0 new messages