Comparing a sample mean to a normative data set mean

JG28

unread,

Oct 10, 2010, 8:03:39 PM10/10/10

to MedStats

Hi there,

I'd appreciate any advice about the following issue;

I'm analysing some data on 30 individuals all with a medical
condition. I have scores on a test that they performed. I also have a
large age and sex matched normative data set with about 2000 'healthy'
individuals with scores for the same test.

I would like to show that people with the medical condition perform
worse on the test than would be expected. Can I legitimately conduct a
test for a difference in mean scores between the 2 groups? Should I
use a nonparametric test (Wilcoxin rank sign test) because of the huge
discrepancy in sample sizes? Is it silly to calculate the effect size
change (Cohen's d) using a a pooled estimate of the standard deviation
in the scores i.e. Cohen's d= mean difference/pooled SD.

Thanks in advance for help with this problem, or advice/references in
general about comparing samples/individuals to normative data.

Kind Regards,
Joanne

Steve Simon, P.Mean Consulting

unread,

Oct 10, 2010, 11:40:49 PM10/10/10

to meds...@googlegroups.com, JG28

On 10/10/2010 7:03 PM, JG28 wrote:

> I'm analysing some data on 30 individuals all with a medical
> condition. I have scores on a test that they performed. I also have
> a large age and sex matched normative data set with about 2000
> 'healthy' individuals with scores for the same test.
>
> I would like to show that people with the medical condition perform
> worse on the test than would be expected. Can I legitimately conduct
> a test for a difference in mean scores between the 2 groups?

This is a common concern that people raise, but it is a false concern
for the most part. I've written about this at my old website:
* http://www.childrensmercy.org/stats/weblog2004/UnequalSampleSizes.asp
* http://www.childrensmercy.org/stats/ask/unequal.asp

You should be careful anytime you have unequal sample sizes in the two
groups, but there is nothing that disallows an analysis when the sample
sizes are unequal, even grossly unequal. If you have unequal variances
in the two groups, then the unequal sample sizes will exacerbate that
problem. But if everything else is fine, the only difficulty caused by
unequal sample sizes is that you have to use slightly more complicated
formulas.

> Should I use a nonparametric test (Wilcoxin rank sign test) because
> of the huge discrepancy in sample sizes?

The unequal sample sizes, by itself, is no reason to use a
non-parametric test. Furthermore, the most troublesome potential
violation of assumptions, unequal variances, would cause just as many
problems with a rank-based nonparametric test. Nonparametric is
synonymous with fewer assumptions, but even nonparametric tests are not
assumption-free.

> Is it silly to calculate the effect size change (Cohen's d) using a
> a pooled estimate of the standard deviation in the scores i.e.
> Cohen's d= mean difference/pooled SD.

I would argue that it is always silly to calculate the effect size
change, but there is nothing inherent in unequal sample sizes that makes
an effect size calculation more problematic.

Bottom line: don't let unequal sample sizes ruin your day.
--
Steve Simon, Standard Disclaimer
Sign up for The Monthly Mean, the newsletter that
dares to call itself "average" at www.pmean.com/news

Karl Schlag

unread,

Oct 11, 2010, 4:44:42 AM10/11/10

to meds...@googlegroups.com

Hi Joanne, which test you use really depends on your taste.

You can use *Wilcoxon rank test* but you would have to assume that the
two underlying processes generating the data differ only by their
location, by nothing else. So in particular that have exactly the same
variance. A classic assumption but hard to swallow for some.

You can use *my test* for comparing means (see my homepage at upf, i can
help).
->>> It is free of making assumptions provided your scores come from a
bounded scale.
Disadvantage is only that it has only been published as a working paper.

You can also use the *two sample t test* and assume that the two
variances are the same and assume
that the samples are sufficiently large. There is nothing for you to
prevent from assuming that 30 is sufficiently large, there is no way of
testing how large large is.

Karl

-- 
---------------------------------------------------------------------
Karl Schlag  
Professor 			Tel:  +43-1-4277-374-37 
Department of Economics		Fax:  +43-1-4277-9374 
University of Vienna		email: karl....@univie.ac.at
Hohenstaufengasse 9		Room: 505
1010 Vienna, Austria		

http://homepage.univie.ac.at/karl.schlag/

Munyaradzi Dimairo

unread,

Oct 11, 2010, 5:23:18 AM10/11/10

to meds...@googlegroups.com

I addition to Karl......i think you can still use the two sample t-test even if the variances are unequal but you then need to use Satterthwaite’s or Welch's approximation formula to approximate the degrees of freedom of the test. Note this is an approximate to the "true" df!!!

BW

Munya

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Karl Schlag

unread,

Oct 11, 2010, 6:03:07 AM10/11/10

to meds...@googlegroups.com

Hi Munya

But you still have to believe - using the two sample t test with the corrections for unequal variances - that the samples are sufficiently large and there will be no formula for how large they have to be. And for any sample size you will know that the p values are approximately correct but there will be no formula that tells us what "approximate" means. In particular they could still be 0.1 off.

Sorry about this bad news.

Greetings, Karl

On 11/10/2010 11:23, Munyaradzi Dimairo wrote:

I addition to Karl......i think you can still use the two sample t-test even if the variances are unequal but you then need to use Satterthwaiteï¿½s or Welch's approximation formula to approximate the degrees of freedom of the test. Note this is an approximate to the "true" df!!!

Greg Snow

unread,

Oct 11, 2010, 11:30:13 AM10/11/10

to meds...@googlegroups.com

A few more questions to throw into the mix:

Have you plotted the data and compared the 2 groups that way? (I have heard this referred to as the interocular concussion test).

Would you be interested in differences in the variances? We often treat the variance as a nuisance parameter that we have to deal with, but in many cases (and this looks like one of them to me) a difference in variance between 2 groups would be of interest whether the means also differ or not.

Are you able/willing to explain what the Wilcoxin test is doing? If you want a non-parametric test with a simpler interpretation then you might want to consider a permutation test (the Wilcoxin is one particular case of a permutation test).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg...@imail.org
801.408.8111

Wei

unread,

Oct 11, 2010, 4:36:29 PM10/11/10

to MedStats

I would ask whether we can ignore the variability of estimate from the
large dataset and use one sample t test. Because even it is included
(as in a 2 sample t test), it would contribute very little to the
overall SE of the difference estimate.

On the other hand, in the case-control setting, if you collect matched
samples, the testing procedure has to reflect that sampling scheme. I
am not sure how this is applicable for quanlitative endpoints. But
anyway, the overall normative sample is not strictly a random sample
anymore if you use matching.

thanks
wei

mdim...@gmail.com

unread,

Oct 12, 2010, 4:05:58 AM10/12/10

to meds...@googlegroups.com

Hi Karl

I tried to emphasise that this is an approximate and its another option available! Myself I would prefer permutation test to asymptotic based test in this scenario but my little experience with the two sample t-test in small sample sizes after df adjustment normally gives the same decision as non-parametric tests. Believing to be in a world asymptotia when you are not is always disastrous at best.

With thanks Karl

Munya

Sent using BlackBerry® from Orange

From: Karl Schlag <karl....@upf.edu>

Sender: meds...@googlegroups.com

Date: Mon, 11 Oct 2010 12:03:07 +0200

To: <meds...@googlegroups.com>

ReplyTo: meds...@googlegroups.com

Subject: Re: ::{MEDSTATS} Comparing a sample mean to a normative data set mean

Hi Munya

But you still have to believe - using the two sample t test with the corrections for unequal variances - that the samples are sufficiently large and there will be no formula for how large they have to be. And for any sample size you will know that the p values are approximately correct but there will be no formula that tells us what "approximate" means. In particular they could still be 0.1 off.

Sorry about this bad news.

Greetings, Karl

On 11/10/2010 11:23, Munyaradzi Dimairo wrote:

I addition to Karl......i think you can still use the two sample t-test even if the variances are unequal but you then need to use Satterthwaite’s or Welch's approximation formula to approximate the degrees of freedom of the test. Note this is an approximate to the "true" df!!!

Kornbrot, Diana

unread,

Oct 12, 2010, 2:11:19 PM10/12/10

to meds...@googlegroups.com

Firstly, it is MUCH easier for list to help if one gives content as well as statistical struture.
Here: give medical condition, target test and source of normal group together with what other info you have (age, sex, income, marital status, etc.
DESIGN ISSUES
For example, if the medical group is older than the normative group on average, it would be a serious error to attribute score differences to the medical condition unless you have corrected for age [e.g. By a covariate analysis]. This holds no matter what test you choose.
So, first check that your medical group is similar on extraneous variables to the normative group. If not, you need more complicated analyses than simple pairwise group comparisons.
TEST ISSUES
The most useful test will depend on the distribution of the test scores. If test scores are normal then t-test is fine. BUT strongly suggest you use unequal variance form. You won’t lose much power if variances are equal, but may avoid wrong inferences if the are not.
I STRONGLY advice against methods base on ranks, e.g. the Wilcoxon, as previously noted they assume SAME distribution in both groups. If the data are not normal, the reason for using ranks, it is extraordinarily unlikely that they will have the same distribution. Indeed if the are Likert items it is, I believe, IMPOSSIBLE for there to be differences in mean without differences in shape.
In my view there are 2 useful alternatives. 1. It may be possible to transform data to normaity, e.g. with a log or power transformation. 2. Perform ordinal regression [available in most stats packages].
EFFECT SIZE ISSUES
It is perfectly reasonable to calculate effect size as difference in means/estimate of standard deviation of difference.
Estimate of standard deviation may come from assuming both groups much larger normative group SD, or by estimating form a weighted mea of the variances. Then sd(differece) =√2 sd(separate groups). Or you can uses effect size base on %variance.
Statistical effect size is useful, but you may be more interested in substantive effct, that is if the group differences on the test amike a difference in real life.
I remain interested in the substantative problem
Best

Diana

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Professor Diana Kornbrot
  email: d.e.ko...@herts.ac.uk
   web:    http://web.mac.com/kornbrot/iweb/KornbrotHome.html
Work
School of Psychology
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
    voice:     +44 (0) 170 728 4626
    mobile:   +44 (0) 796 890 2102
    fax          +44 (0) 170 728 5073
Home
19 Elmhurst Avenue
London N2 0LT, UK
   landline: +44 (0) 208 883 3657
   mobile:   +44 (0) 796 890 2102
   fax:         +44 (0) 870 706 4997

Reply all

Reply to author

Forward