independent sample t-test or paired t-test?

shima younespour

unread,

Aug 4, 2011, 4:22:36 AM8/4/11

to meds...@googlegroups.com

Dear all,

Thirty patients with psoriasis and 30 age and sex-matched healthy
control subjects were recruited in the study. what statistcal analysis
should I use? independent sample t-test or paired t-test?
Should I use matched analysis (paired t-test or Wilcoxon signed-rank
test) just because of matching based on age and sex?

Thank you in advance for any thoughts or comments.
Shima Younespour.

Karl Schlag

unread,

Aug 4, 2011, 5:14:04 AM8/4/11

to meds...@googlegroups.com

As you matched healthy control subjects based on age and sex to the sick
ones you do not have independent samples. So you have matched pairs and
have to use a test for this case.

t test or nonparametric test? If you are willing to assume that your
data is normally distributed or that 30 is a sufficiently large sample
then the paired t test would do and you would be able to detect
differences in the mean.

Wilcoxon signed-rank can help to identify whether sick and healthy
patients are drawn from the same or from different distributions. It can
also help identify differences in the location (mean) if you assume that
sick and healthy patients only differ in terms of the location of the
underlying distribution. However Wilcoxon test is not a general
nonparametric test for comparing means.

Karl

--
---------------------------------------------------------------------
Karl Schlag
Professor Tel: +43-1-4277-374-37
Department of Economics Fax: +43-1-4277-9374
University of Vienna email: karl....@univie.ac.at
Hohenstaufengasse 9 Room: 505
1010 Vienna, Austria

http://homepage.univie.ac.at/karl.schlag/

David Wooff

unread,

Aug 4, 2011, 5:31:54 AM8/4/11

to meds...@googlegroups.com

I'm always a little uncomfortable using a paired test in this situation
(even though I teach its use to 1st year undergraduates as part of the
received canon) because I have problems with the idea that such matching
removes all the variation it's intended to remove. It's like pretending
that one has access to a healthy clone of the sick patient. Thus I'd
expect background variation from the paired test to understate actual
variation. The independence assumption, on the other hand, would tend to
overstate variation because you ignore your attempts at matching and
these may be relevant. I guess what I would do first is look at
controlling for age and sex as covariates and use ancova. I'd also
compute an effect size to help assess practical significance.

David

--
Dr David Wooff,
Director, Statistics and Mathematics Consultancy Unit,
& Senior Lecturer in Statistics, University of Durham.
Department of Mathematical Sciences, Science Laboratories,
South Road, Durham, DH1 3LE, UK. email: d.a....@dur.ac.uk
Tel. 0191 334 3121, Fax 0191 334 3051.
Web: http://maths.dur.ac.uk/stats/people/daw/daw.html

Ted Harding

unread,

Aug 4, 2011, 5:55:44 AM8/4/11

to meds...@googlegroups.com

There is one feature of this question (as posed) which has not
been discussed regarding "matching" versus "pairing".

Say your 30 patients with Psoriasis have the following numbers
by Age and Sex (admittedly an extreme example, but it makes
the point):

Males:
Age 30: 3; Age 31: 3; Age 32: 3; Age 33: 3; Age 34: 3

Females:
Age 25: 3; Age 26: 3; Age 27: 3; Age 28: 3; Age 29: 3

You now recruit 30 controls "matched for age and sex", so with the
same numbers as above.

If you propose to do a paired t-test, say, then how will you
choose what to pair with what? There are 3 Males aged 30 in the
Psoriasis Group, and 3 in the Control Group. Then there are 6
ways to pair one from Psoriasis with one from Control:

(1,1), (2,2), (3,3)
(1,2), (2,3), (3,1)
(1,3), (2,1), (3,2)
(1,1), (2,3), (3,2)
(1,3), (2,2), (3,1)
(1,2), (2,1), (3,3)

And that is 6 ways just for the 30-year-olds. There are 6 for each
of the other 4 Ages, so 6^5 = 7776 in all! Similarly for the Females.
Hence 7776*7776 = 60466176 in all.

Now each of these 60466176 possibilities can potentially give a different
set of differences (Psoriasis - Control), depending on how you decided
(presumably arbitrarily, even randomly) to do the pairing. So the
result of your paired t-test will also depend on this.

Now, while that is an extreme case, if you have 15 Males (or 15 Females)
with Psoriasis, there is a non-negligible change that there will be
one or more coincidences of Age amongest them, and the problem would
then also arise (though to a lesser extent).

This possibility is not covered in the original query. Of course, if all
the Males have different Ages, and all the Females have different ages,
then it does not arise at all. But I don't know that!

Ted.

> --
> To post a new thread to MedStats, send email to
> MedS...@googlegroups.com .
> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.h...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Aug-11 Time: 10:55:40
------------------------------ XFMail ------------------------------

Peter Flom

unread,

Aug 4, 2011, 5:58:14 AM8/4/11

to meds...@googlegroups.com

shima younespour wrote

<<<
Thirty patients with psoriasis and 30 age and sex-matched healthy control
subjects were recruited in the study. what statistcal analysis should I use?
independent sample t-test or paired t-test?
Should I use matched analysis (paired t-test or Wilcoxon signed-rank
test) just because of matching based on age and sex?
>>>

You've gotten several good answers already, comparing paired to independent
tests. But I'd like to step back and ask "Why t-tests of any sort?"

Clearly you have some continuous DV, I'd like to know what it is, and how
it's distributed.

But what else? Did you assess these patients only once, or multiple times?
And what other information have you got? You matched on age and sex, but are
there no other IVs?

Peter

BXC (Bendix Carstensen)

unread,

Aug 4, 2011, 6:10:29 AM8/4/11

to meds...@googlegroups.com

As always it all seems to boil down to:

Set up a sensible model and report estimates from this.

If you match on sex and age and is willing to assume that these variables have some simple effect on your outcome, then a linear model with these two variables is the answer.

If your two sets are matched closely on age and you insist to model age meticulously with say one parameter per distinct age, then you end up using one parameter per matched pair. Which is the paired t-test.

So the morale is as always:

1) Computers were invented 20+ years ago.
2) ANOVA, t-tests etc. are relics of days before computers and all proper subsets of what you get from a liner model.
3) Never use ANOVA, t-test
4) Always use a proper model to report your prespecified effect estimates.

The disadvantage of course being that this makes you basic assumptions more explicit than just stuffing your data in a computer-implementation of an ancient hand-method.

The two-sample and the paired t-test are just extremes of linear models which either ignores confounding or over-models it.

Best regards,
Bendix

shima younespour

unread,

Aug 4, 2011, 6:52:21 AM8/4/11

to meds...@googlegroups.com

Dear Peter,

I have two groups, patients and controls, that are matched
individually based only on age and sex of participants. A FPG test is
done for both groups and I want to know if there is any statistical
significant difference between the mean FPG level of two groups. But
unfortunately, there is a controversy in applying matched and
unmatched analysis. I'd like to know which method is preferred.

kind regards,
Shima Younespour.

BXC (Bendix Carstensen)

unread,

Aug 4, 2011, 7:01:05 AM8/4/11

to meds...@googlegroups.com

Dear Shima,
in this case the matching hardly warrants a matched analysis, that is paired t-test or one parameter per pair.

You would be perfectly OK with linear model with group indicator, age and sex. You will most likely find that the estimate of the patient effect is pretty much the same as that you will get from either of the extreme analyses (i.e. a model with ONLY group effect or a model with group effect and one parameter per matched pair).

In any case you will want to report the average difference in FPG with a confidence interval.

Best regards,

Bendix Carstensen
Senior Statistician
Steno Diabetes Center A/S
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk www.biostat.ku.dk/~bxc
www.steno.dk

> -----Original Message-----
> From: meds...@googlegroups.com
> [mailto:meds...@googlegroups.com] On Behalf Of shima younespour
> Sent: 4. august 2011 12:52
> To: meds...@googlegroups.com

shima younespour

unread,

Aug 4, 2011, 7:30:35 AM8/4/11

to meds...@googlegroups.com

Dear Bendix,

unfortunately, I did not clearly understand. Do you mean I should
apply a linear model to evaluate the effect of group status, age and
sex on my response variable?
In this study, the groups are matched based on age and sex. I think in
this situation, the effect of matching factors can no longer be
studied.
Also, what do you mean by "one parameter per matched pair"?

regards,
Shima.

BXC (Bendix Carstensen)

unread,

Aug 4, 2011, 8:00:37 AM8/4/11

to meds...@googlegroups.com

You are not quite right about the matching, you CAN estimate the
effect of age and sex. I suppose that you are confusing your situation
with matching in case-control studies where you match the outcome
variable, but you do nothing of the sort here.
Actually stratified sampling would probably be a better term for what
you did.

Now, suppose you have the following 5 variables in your data set:

- FPG (the outcome of interest, response variable)
- sex
- age
- set (the matched sets within which age and sex are the same)
- grp (the grouping being either ptt or ctrl)

so the first few lines of your data may look something like:

FPG sex age set grp
4.1 M 37 1 ptt
7.8 M 37 1 ctr
5.9 F 53 2 ptt
12.3 F 53 2 ctr
...

The model corresponding to the paired t-test would in R be:

lm( FPG ~ factor(grp) + factor(set) )

The model corresponding to the two-sample t-test is:

lm( FPG ~ factor(grp) )

In both cases the t-test would be the Wald test (estimate/s.e.) for the grp-parameter.

What I suggest as scientifically more sound is:

lm( FPG ~ factor(grp) + factor(sex) + age )

or even (a separate linear relation for males and females):

lm( FPG ~ factor(grp) + factor(sex)*age )

which means that you estimate effects of sex and age and on top of
that the difference between the patients and the controls.

But my contention is that the resulting effect and s.e. for grp will
very close between the three.

I guess the models in Stata would be:

xi: regress FPG i.grp i.set
xi: regress FPG i.grp
xi: regress FPG i.grp i.sex age
xi: regress FPG i.grp i.sex age i.sex*age

(correct me please, MEDSTAT)

and in SAS:

proc glm ; class grp set sex ;
model FPG = grp set ; run ;

proc glm ; class grp set sex ;
model FPG = grp ; run ;

proc glm ; class grp set sex ;
model FPG = grp sex age ; run ;

proc glm ; class grp set sex ;
model FPG = grp sex*age ; run ;

Best regards,
Bendix

peyman Jafari

unread,

Aug 4, 2011, 9:48:46 AM8/4/11

to meds...@googlegroups.com

Hi Shima
You should use independent t-test.
Peyman

shima younespour

unread,

Aug 4, 2011, 10:55:46 AM8/4/11

to meds...@googlegroups.com

Dear Bendix,

I am really confused. I would like to know what is your opinion about my study design.

I think my study is a case-control study. because, I have two groups of cases and control subjects and the FPG level is measured in these groups. Each subject in the case group is individually matched with a subject in control group, based on age and sex (which maybe considered as potential confounding variables).

Also, I don't clearly understand what you told about "matching outcome variable in case-control studies".

Based on STROBE statement (Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration):

"In case-control studies matching is done to increase a study's efficiency by ensuring similarity in the distribution of variables between cases and controls, in particular the distribution of potential confounding variables. "

regards,

Shima Younespour

BXC (Bendix Carstensen)

unread,

Aug 4, 2011, 11:19:50 AM8/4/11

to meds...@googlegroups.com

Your study is NOT a case-control study.

A case-control study is one where you sample based on the OUTCOME.

In order to find out how reproductive history affects ovarian cancer, say, we take all cases of ovarian cancer in a given period.
And a sample of non-cases, normally about the same number.
Then your sample has the same number of cases and non-cases. So your outcome is fixed in advance, namely an equal number of cases and non-cases. What you do is to compare the COVARIATES between cases and non-cases. Say if it turns out that cases have more children than non-cases you would suspect childbearing to be a risk factor for ovarian cancer.

Your study is entirely different. You compare the outcome FPG between two groups of people, namely patients and non-patients. There is not a priori fixed distribution of the response variable (FPG) as there is in the case-control study described above (ovarian cancer).
You subjects are included in equal proportion according to their COVARIATE value (namely patient/non-patient status). It is merely a linguistic confusion that you happened to call one group cases and the other controls. The terminology does not make it a case-control study.

You wisely made sure that the age-range was the same among your patients and non-patients, so that the dependency of FPG on age in the two groups can be adequately assessed in the same age-range. If the two groups had different age-ranges you would not know whether the difference between groups were attributable to patient status or age difference. And that is basically all there is to it.

So you design is sound and reasonable, but it is not a case-control study.
Your study design is not what is referred to in the STROBE statement.
You analyse your outcome using a normal linear model.

A case-control study as described above is analysed by logistic regression for a binary outcome.

Best regards,
Bendix Carstensen

________________________________

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of shima younespour
Sent: 4. august 2011 16:56
To: meds...@googlegroups.com

Doug Altman

unread,

Aug 4, 2011, 11:30:32 AM8/4/11

to meds...@googlegroups.com

I agree with Bendix. The mislabelling of studies as case-control studies is extremely common and the labels "cases" and "controls" often cause confusion. Your study is not a study of cases and controls.

Grimes reviewed 124 published reports labeled as case-control studies - 30% were mislabelled. His abstract's conclusion is:

Retrospective cohort studies are often mislabeled as "case- control" studies. This misleads readers
as to what was done. Researchers need better training in methods and terminology, and editors and reviewers
should scrutinize more carefully manuscripts claiming to be "case-control" studies.
(Obstet Gynecol 2009;114:1284-6)

In fact mislabelling also applies to prospective cohort studies.

STROBE also discusses reporting of cohort studies, which is what I think your study was. One can also use matching in cohort studies.

Doug

MaxJasper

unread,

Aug 4, 2011, 12:20:18 PM8/4/11

to meds...@googlegroups.com

You can easily analyze matched case-control cases in SPSS using Cox Survival analysis for the following situations:

Cases=1 Controls =1

Cases=1 Controls=n

But not:

Cases=n Controls=m

to see how easily you can perform this task see ref.:

Marija J Norusis: SPSS [15,16,17,18,19] Advanced Statistical Procedures Companion

Good luck,

Max.

shima younespour

unread,

Aug 7, 2011, 3:11:36 AM8/7/11

to meds...@googlegroups.com

Dear MedStat,
Thanks to all specially Bendix for follow-up comments.

I am confused about the type of this study. Maybe, I should explain
the situation better.
This study is performed to see if psoriasis is a risk factor for
insulin resistance. For this purpose, the metabolic state of thirty
chronic plaque type psoriatic patients in comparison with control
group is evaluated. Patient and control subjects are matched in their
age and sex. The criteria of insulin resistance (BMI, Systolic Blood
Pressure, Fasting Plasma Glucose, Oral Glucose Tolerance Test, Serum
Insulin and Lipid Profile) are measured for each participant.
statistical analysis are performed to see if the two groups are
significantly different in their above facors.

I have read in books that the type of study is cohort when the
subjects, exposed and unexposed to the risk factor, are followed
forward in time to determine if one or more new outcomes (diseases)
occur. But, in this study, some factors are
checked and the participants are not followed in time.
Now, I'd like to know what is the type of this study.

Best regards,
Shima Younespour

Reply all

Reply to author

Forward