Comparing Regression Coefficients Across Samples

No Reply

unread,

Nov 13, 2004, 4:26:35 PM11/13/04

to

I have looked in all my statistics references but been unable to find a
way to do the following, so any advice will be appreciated.

I have estimated the same regression model in two different samples, A
and B. Can someone suggest a method by which I can test the hypothesis
that the regression coefficient b1 in sample A is different from b1 in
sample B? Would a simple t-test work or does this create a problem with
Type I error?

If I were to add a third sample, C, how would I compare the coefficients
across the three samples. Would ANOVA with contrasts work?

Thanks for any advice. I'm using SPSS 13 if there are any specific
routines you can suggest.

George

jim clark

unread,

Nov 13, 2004, 11:46:33 PM11/13/04

to

Hi

On Sat, 13 Nov 2004, No Reply wrote:
> I have looked in all my statistics references but been unable to find a
> way to do the following, so any advice will be appreciated.
>
> I have estimated the same regression model in two different samples, A
> and B. Can someone suggest a method by which I can test the hypothesis
> that the regression coefficient b1 in sample A is different from b1 in
> sample B? Would a simple t-test work or does this create a problem with
> Type I error?

Analyze the data together as described below (names given are
arbitrary). Include an indicator variable for the two samples
(-1 and +1 ... call this S). Center your continuous variable
(X); that is, subtract its mean (call this P). Compute the
interaction term by multiplying samp times pred (call this SXP).
Regress y on S, P, and SXP. If SXP is significant, then slopes
for two groups differ. If S is significant, then intercepts
differ. Rational illustrated below:

y = b0 + b_sS + b_pP + b_sxpSXP
y = [b0 + b_sS] + [b_p + b_sxpS]P

Substituting the two values for S gives the separate regression
equations for the two samples. Can be verified by running the
separate regressions.

Obviously becomes more complex if your regression equation
includes more than a single predictor, and I am not sure what the
effects of unequal ns or other complications might be.

> If I were to add a third sample, C, how would I compare the coefficients
> across the three samples. Would ANOVA with contrasts work?

Right idea, but incorporate contrasts into the regression design
described above; for example, S1 = -2 1 1, S2 = 0 -1 +1. Then
several interactions that provided distinct contrasts among the
groups.

Aikens and West have an excellent book in interactions in
regression analyses.

Best wishes
Jim

============================================================================
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba R3B 2E9 cl...@uwinnipeg.ca
CANADA http://www.uwinnipeg.ca/~clark
============================================================================

No Reply

unread,

Nov 14, 2004, 5:53:56 PM11/14/04

to

Jim,

Thanks, that's very helpful and solves my immediate problem. If I may,
allow me to ask a couple of followup questions now.

If I didn't have the data myself but only the reported statistics on the
two regressions, is there a way to compare the coefficients across
studies when the data are not available for pooling and retesting?

Second, on a related but different issue, if I need to compare two
regression coefficients within the _same_ model on a single sample, can
I do that? For example, Assume I have included two dummy variables for
Political Party (which for the sake of discussion could be divided into
three groups: Democratic, Republican, and Independent/Undeclared).
Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
Republican" and "Independent/Undeclared" serves as the omitted or base
category. I want to test whether the coefficient of D1 is statistically
larger than that of D2 in a regression model. Is that possible? In
reported studies I have seen, it seems researchers treat apparently
different coefficients as statistically different as long as each is
statistically significant. Is that justified?

Thanks again.

George

jim clark

unread,

Nov 14, 2004, 11:31:43 PM11/14/04

to

Hi

On Sun, 14 Nov 2004, No Reply wrote:

> jim clark wrote:
> > Analyze the data together as described below (names given are
> > arbitrary). Include an indicator variable for the two samples
> > (-1 and +1 ... call this S). Center your continuous variable
> > (X); that is, subtract its mean (call this P). Compute the
> > interaction term by multiplying samp times pred (call this SXP).
> > Regress y on S, P, and SXP. If SXP is significant, then slopes
> > for two groups differ. If S is significant, then intercepts
> > differ. Rational illustrated below:
> >
> > y = b0 + b_sS + b_pP + b_sxpSXP
> > y = [b0 + b_sS] + [b_p + b_sxpS]P
> >
> > Substituting the two values for S gives the separate regression
> > equations for the two samples. Can be verified by running the
> > separate regressions.
> >
> > Obviously becomes more complex if your regression equation
> > includes more than a single predictor, and I am not sure what the
> > effects of unequal ns or other complications might be.
> >
> >>If I were to add a third sample, C, how would I compare the coefficients
> >> across the three samples. Would ANOVA with contrasts work?
> >
> > Right idea, but incorporate contrasts into the regression design
> > described above; for example, S1 = -2 1 1, S2 = 0 -1 +1. Then
> > several interactions that provided distinct contrasts among the
> > groups.
>

> Thanks, that's very helpful and solves my immediate problem. If I may,
> allow me to ask a couple of followup questions now.
>
> If I didn't have the data myself but only the reported statistics on the
> two regressions, is there a way to compare the coefficients across
> studies when the data are not available for pooling and retesting?

As I noted in my posting, the overall regression equation can be
broken into the two separate regression, so the opposite (i.e.,
building the overall equation from the separate) would not be
difficult. The challenge would be finding the error term for the
slope of the interaction term from the separate regressions. It
would take someone more knowledgeable than me to say whether that
is possible, and if so, how to do it. Sorry.

> Second, on a related but different issue, if I need to compare two
> regression coefficients within the _same_ model on a single sample, can
> I do that? For example, Assume I have included two dummy variables for
> Political Party (which for the sake of discussion could be divided into
> three groups: Democratic, Republican, and Independent/Undeclared).
> Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
> Republican" and "Independent/Undeclared" serves as the omitted or base
> category. I want to test whether the coefficient of D1 is statistically
> larger than that of D2 in a regression model. Is that possible? In
> reported studies I have seen, it seems researchers treat apparently
> different coefficients as statistically different as long as each is
> statistically significant. Is that justified?

I have more often seen differences inferred when one coefficient
is significant and the other not, which is not generally
justified. And I would expect that the same holds if both are
significant, although perhaps more sound if their signs are in
the opposite direction. I cannot answer your basic question.

Bruce Weaver

unread,

Nov 15, 2004, 7:15:25 AM11/15/04

to

No Reply wrote:

> jim clark wrote:
>

---- Jim's good advice snipped -----

>
>
> Jim,
>
> Thanks, that's very helpful and solves my immediate problem. If I may,
> allow me to ask a couple of followup questions now.
>
> If I didn't have the data myself but only the reported statistics on the
> two regressions, is there a way to compare the coefficients across
> studies when the data are not available for pooling and retesting?

Under the null hypothesis that the coefficients from your two
independent groups (b1 and b2) are equal, the difference between the two
coefficients is approximately normally distributed with mean = 0 and
variance = the sum of the individual variances. So,

b1-b2
t = --------
SE(diff)

where SE(diff) = SQRT[Var(b1) + Var(b2)], and df = n1 + n2 - 4. Dave
Howell's book "Statistical Methods for Psychology" gives an example of this.

>
> Second, on a related but different issue, if I need to compare two
> regression coefficients within the _same_ model on a single sample, can
> I do that? For example, Assume I have included two dummy variables for
> Political Party (which for the sake of discussion could be divided into
> three groups: Democratic, Republican, and Independent/Undeclared).
> Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
> Republican" and "Independent/Undeclared" serves as the omitted or base
> category. I want to test whether the coefficient of D1 is statistically
> larger than that of D2 in a regression model. Is that possible? In
> reported studies I have seen, it seems researchers treat apparently
> different coefficients as statistically different as long as each is
> statistically significant. Is that justified?

No, that is not justified. If you want to compare the two coefficients,
you would need the SE of the difference between the two coefficients.
The variance of that difference would be the sum of the individual
variances minus two times the covariance, and the SE of the difference
would be its square root. So your question really boils down to, what
is the covariance between the two coefficients in the same model?
Offhand, I can't remember if SPSS will give those covariances in the
regression output.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Richard Ulrich

unread,

Nov 15, 2004, 1:05:54 PM11/15/04

to

On Sun, 14 Nov 2004 22:53:56 GMT, No Reply <nor...@nospam.invalid>
wrote:
[snip, much]

>
> Second, on a related but different issue, if I need to compare two
> regression coefficients within the _same_ model on a single sample, can
> I do that? For example, Assume I have included two dummy variables for
> Political Party (which for the sake of discussion could be divided into
> three groups: Democratic, Republican, and Independent/Undeclared).
> Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
> Republican" and "Independent/Undeclared" serves as the omitted or base
> category. I want to test whether the coefficient of D1 is statistically
> larger than that of D2 in a regression model. Is that possible? In

How big is the independent base category?
- If there aren't many, then D1 is approximately the
negative of D2, because it is approximately the same
comparison, R vs. D. Using logically distinct comparisons
is usually a better idea.
- It could be plainer to use R vs. D as one dummy
variable, and R+D vs. Ind as the other.

> reported studies I have seen, it seems researchers treat apparently
> different coefficients as statistically different as long as each is
> statistically significant. Is that justified?

No, not justified. See the other replies for good details.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Bruce Weaver

unread,

Nov 15, 2004, 1:32:18 PM11/15/04

to

No Reply wrote:

---- snip ------

>
> Second, on a related but different issue, if I need to compare two
> regression coefficients within the _same_ model on a single sample, can
> I do that? For example, Assume I have included two dummy variables for
> Political Party (which for the sake of discussion could be divided into
> three groups: Democratic, Republican, and Independent/Undeclared).
> Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
> Republican" and "Independent/Undeclared" serves as the omitted or base
> category. I want to test whether the coefficient of D1 is statistically
> larger than that of D2 in a regression model. Is that possible? In
> reported studies I have seen, it seems researchers treat apparently
> different coefficients as statistically different as long as each is
> statistically significant. Is that justified?

In your regression model, the coefficient for D1 gives the difference
between Democrat & Independent; and the coefficient for D2 gives the
difference between Republican & Independent. So the difference between
D1 and D2 gives the difference between these differences. In symbols,
with G standing for group:

D1 = G1 - G3
D2 = G2 - G3
D1 - D2 = (G1-G3) - (G2-G3)
= G1 - G3 - G2 + G3
= G1 - G2

So if you were to treat this as an ANOVA problem rather than regression,
any of a number of multiple comparison procedures could be used to
compare Democrats to Republicans (G1 to G2). It sounds like this is an
a priori (planned) contrast, so you could do the usual t-test (or
equivalent F-test) with the overall error term from the ANOVA in the
denominator.

Richard Ulrich

unread,

Nov 15, 2004, 2:14:01 PM11/15/04

to

On Mon, 15 Nov 2004 13:05:54 -0500, Richard Ulrich
<Rich....@comcast.net> wrote:

> On Sun, 14 Nov 2004 22:53:56 GMT, No Reply <nor...@nospam.invalid>
> wrote:
> [snip, much]
>
> >
> > Second, on a related but different issue, if I need to compare two
> > regression coefficients within the _same_ model on a single sample, can
> > I do that? For example, Assume I have included two dummy variables for
> > Political Party (which for the sake of discussion could be divided into
> > three groups: Democratic, Republican, and Independent/Undeclared).
> > Assume D1 is "Democrat/Not Democrat" and D2 is "Republican/Not
> > Republican" and "Independent/Undeclared" serves as the omitted or base
> > category. I want to test whether the coefficient of D1 is statistically
> > larger than that of D2 in a regression model. Is that possible? In
>
> How big is the independent base category?
> - If there aren't many, then D1 is approximately the
> negative of D2, because it is approximately the same
> comparison, R vs. D. Using logically distinct comparisons
> is usually a better idea.
> - It could be plainer to use R vs. D as one dummy
> variable, and R+D vs. Ind as the other.

Oh, I should have also mentioned -- Because D1 and D2
are somewhat (or very) similar, the "variance inflation factor"
will exist be meaningful, and the reported errors of the two
coefficients will be larger than an error for the simple contrast.

Kylie Lange

unread,

Nov 15, 2004, 6:19:29 PM11/15/04

to

Just regarding Bruce's last comment on covariances of regression
coefficients, it isn't part of SPSS' default output but can be requested
under Statistics > Covariance matrix, or equivalently by the /STATISTICS
BCOV option in syntax.

Cheers,
Kylie.