independent t tests and unequal sample sizes

james

unread,

May 9, 2006, 1:56:34 PM5/9/06

to

Hi everyone,
i want to clarify a point: when conducting an independent t-test in
spss, assuming that two groups have equal variances, how do i correct
for UNEQUAL sample sizes?

I understand spss does this automatically, but how does it do this? I'm
curious.

thanks,
james

Richard Ulrich

unread,

May 9, 2006, 2:56:16 PM5/9/06

to

This is a basic "statistics" question -- you should read some
basic statistics to get an orientation. But, in brief....

The t-test on a difference is computed as the Difference
divided by the standard error of the difference.

Or, generally speaking,
t_df = (A-B)/ (sqrt( var(A-B) )

where the variance of A-B is equal to the sum of the
two independent variances, Var(A) and Var(B).
If "B" is the mean of a group with a small N, then
the variance of B (a group mean) is larger than the
variance of A (a mean with a much larger N and using
the same pooled estimate of variance).

Hope this helps.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

james

unread,

May 10, 2006, 9:58:40 AM5/10/06

to

Rich,
thanks for the response, though I am still a bit confused:

so are you saying if the two sample sizes are very different, then an
indepedent t- automatically takes this into account? Or must I
'correct' for sample size differences? I've done loads of t-tests
before and assumed sample size differences (between the two groups)
were irrelevant...but I did wonder whether this is indeed right.

thanks.

Richard Ulrich

unread,

May 11, 2006, 2:37:25 PM5/11/06

to

Please, as I suggested before, read up on the basics. You can't
get a deep understanding from a short answer here. Perhaps -
study the equation for the test, in a textbook that explains
what the terms stand for.

Sample sizes, or differences in them, are surely "relevant."
But there are different ways of being relevant.

If you double the d.f.'s while keeping means and s.d.'s the
same, the t value is multiplied by the square root of 2.

N1=18 and N2=2 gives a different t-test from N1=N2=10,
for the same means and variance.

Ray Koopman

unread,

May 12, 2006, 1:57:59 AM5/12/06

to

The relevance of sample size differences is that if the sample sizes
are equal then the t-test is insensitive to heteroscedasticity, but
the more unequal the sample sizes are (i.e., the more the ratio of the
larger n over the smaller n departs from 1) the more sensitive the
test is to heteroscedasticity. In short, if your n's are equal then
you don't worry about heteroscedasticity; the more unequal they are,
the more you worry about heteroscedasticity.

Bruce Weaver

unread,

May 12, 2006, 10:28:59 AM5/12/06

to

Ray, when you say "don't worry about it" for equal n's, I assume you
mean provided the ratio of the larger to the smaller variance is not too
large. E.g., Dave Howell suggests (in his textbook) that as long as the
ratio is 4 or less, the t-test is robust to heterogeneity of variance
when the sample sizes are equal.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Ray Koopman

unread,

May 12, 2006, 6:34:18 PM5/12/06

to

Right. Fven when the n's are equal, there is a point at which we start
wondering about heteroscedasticity, and another, more extreme, point
at which we start worrying. Just where those points might be is some-
what uncertain. I think Howell may be a touch conservative, but not
enough to get into an argument about. And wherever those points may
be, they come sooner as the n's become more unequal.

ezinne...@gmail.com

unread,

Jul 6, 2014, 2:21:32 PM7/6/14

to

It is nice to join this conversation. However, I noticed (I may be wrong), that the question here is yet to be addressed.

Rich Ulrich

unread,

Jul 6, 2014, 4:46:04 PM7/6/14

to

On Sun, 6 Jul 2014 11:21:32 -0700 (PDT), ezinne...@gmail.com
wrote:

Answer: N's that differ by much can be problematic if you
do not assume that the variances are equal. So, the OP was
wrong when he assumed irrelevance of sample sizes.

The OP wondered if it was okay to ignore sample size differences
when computing the t-test. The answer was complete enough
for the question, I think.

Today, I might add a citation to the Behrens-Fisher problem,
http://en.wikipedia.org/wiki/Behrens%E2%80%93Fisher_problem

- I hope that you did notice that you were posting a comment
on a question posted in 2006.

--
Rich Ulrich