Logistic regression (via GENLIN) with 3:1 matching won't run

Bruce Weaver

unread,

Jan 15, 2009, 1:45:04 PM1/15/09

to

Hello group. Some time ago, I used Stata to perform conditional
logistic regression for a study that had 3:1 matching (3 specialists
for every 1 GP).

An alternative to conditional logistic regression is use of
generalized estimating equations (GEE), and GEE is now possible in
SPSS via the GENLIN procedure. So, I wanted to try reanalyzing those
old data using GENLIN.

Here is the layout of my file (the key variables at least):

mchgrp gp spec all_maj

1 1 0 0
1 0 1 1
1 0 1 0
1 0 1 0
2 1 0 0
2 0 1 0
2 0 1 0
2 0 1 0
3 1 0 0
3 0 1 0
3 0 1 0
3 0 1 0
4 1 0 0
4 0 1 0
4 0 1 0
4 0 1 0
etc

MCHGRP = match group
GP = indicator for GP (1=GP, 0 = Specialist)
SPEC = indicator for Specialist
ALL_MAJ = binary outcome variable (1=Y, 0=N)

As you can see, there are 3 specialists matched to every GP.

Here is the syntax for my first attempt at analyzing this with GENLIN.

* Now account for match groups by using GEE .
* [1] Model with GP as lone predictor of MAJ_SURG .

* Generalized Estimating Equations.
GENLIN all_maj (REFERENCE=FIRST) WITH gp
/MODEL gp INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/REPEATED SUBJECT=mchgrp WITHINSUBJECT=gp SORT=YES
CORRTYPE=UNSTRUCTURED ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE)
UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

This produced the following output:

Warnings
There are at least two records with the same values for the subject
and within-subject variables. No output will be displayed.
This command is not executed.

Model Information
Dependent Variable all major complications
Probability Distribution Binomial
Link Function Logit
Subject Effect 1 mchgrp
Within-Subject Effect 1 GP
Corr. Mat. Structure Unstructured

My first thought was that the 3:1 matching was causing some sort of
problem, given that the 3 specialists all had the same value of GP
(GP=0 indicates "specialist"). So I filtered out the last 2
specialists in each match group and tried again. With 1:1 matching,
the syntax shown above ran.

In another dataset (different analysis), I had 3 repeated measures (3
time points) for each subject, and GENLIN worked OK there--but the
obvious difference was that the TIME variable had a different value
for each time point, whereas here, 3 of the 4 cases within a match
group have the same value of the GP variable.

So in a nutshell, my question is how does one perform logistic
regression with n:1 matching (with n > 1) with GENLIN?

Sorry for the long preamble, but I thought the background info might
be useful.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

Ryan

unread,

Jan 15, 2009, 9:18:26 PM1/15/09

to

> bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/

> "When all else fails, RTFM."

Hi Bruce,

One possible approach would be to treat mchgrp variable as the within
subjects variable.

Dataset:

ID mchgrp gp spec all_maj

1 1 1 0 0
2 1 0 1 1
3 1 0 1 0
4 1 0 1 0
5 2 1 0 0
6 2 0 1 0
7 2 0 1 0
8 2 0 1 0
9 3 1 0 0
10 3 0 1 0
11 3 0 1 0
12 3 0 1 0
13 4 1 0 0
14 4 0 1 0
15 4 0 1 0
16 4 0 1 0
.
.
.

GENLIN code:

* Generalized Estimating Equations.
GENLIN all_maj (REFERENCE=FIRST) BY gp (ORDER=ASCENDING)

/MODEL gp INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT

/CRITERIA SCALE=1 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95
LIKELIHOOD=FULL
/REPEATED SUBJECT=ID WITHINSUBJECT=mchgrp SORT=YES
CORRTYPE=UNSTRUCTURED ADJUSTCORR=YES
COVB=MODEL MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE)

UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

This model compares specialists to GPs, after taking into
consideration correlations between individuals within the same match
group (the 3 specialists and 1 GP).

What do ya think?

Ryan

Bruce Weaver

unread,

Jan 16, 2009, 10:20:50 AM1/16/09

to

Hi Ryan. I don't think that will work. In a repeated measures
situation, ID would be the subject variable (i.e., the variable that
says which subject or patient the repeated measures belong to). But
in this case, the measures that belong together are the ones from the
same match group, not from the same person. Note too that in my data
set, each row has a unique value of ID, so there are no repeated
measures within IDs.

Here's an example of repeated measures logistic regression from the
Syntax Reference Manual:

--- start of example ---
Repeated Measures Logistic Regression (Generalized Estimating
Equation)

* Generalized Estimating Equations.
GENLIN wheeze (REFERENCE=LAST) BY age smoker (ORDER=ASCENDING)
/MODEL age smoker INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD)
CILEVEL=95
/REPEATED SUBJECT=id WITHINSUBJECT=age SORT=YES CORRTYPE=UNSTRUCTURED

ADJUSTCORR=YES COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006
(ABSOLUTE)
UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE

/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION WORKINGCORR.

The procedure fits a model for the dependent variable wheeze, using
smoker and age as factors. The first category of wheeze is used as the
reference category.

The model specification assumes that wheeze has a binomial
distribution. A logit link function relates the probability of wheeze
to a linear combination of the predictors, including an intercept
term.

Clusters of correlated observations are defined by values of the
subject variable id. Repeated measurements are ordered within subjects
by values of age. An unstructured working correlation matrix is
estimated.

Model fitting criteria are set to their default values.

The working correlation matrix is requested as output in addition to
the default output.

--- end of example ---

The only differences in what I want to do are:

1) MatchGroup is the SUBJECT variable rather than ID, and
2) An indicator for GP (vs. specialist) is my WITHINSUBJECT variable.

The key difference seems to be that this example has no repetitions of
the same age within an ID, whereas my example has 1 GP (GP=1) and 3
different specialists (coded GP=0) within each match group. There is
no inherent ordering of the specialists, so this is the coding that
makes sense to me. But, as I've said, it won't run this way.

--
Bruce Weaver
bwe...@lakeheadu.ca

Ryan

unread,

Jan 16, 2009, 10:49:57 PM1/16/09

to

> "When all else fails, RTFM."- Hide quoted text -
>
> - Show quoted text -

I *think* I follow now. Thanks for the clarification. Unfortunately, I
see no hope using the GENLIN procedure using *all* data. The only
option I can think of would be to run a random intercept
(subject=mchgrp) model in a generalized linear mixed modeling
procedure. With this model, you would not need to include group as a
within subjects factor. As you know, SPSS, version 17, does not offer
this procedure.

In SAS, the code would look like this:

proc glimmix data=mydata;
class grp mchgrp ;
model all_maj = grp / solution dist=binary link=logit ;
random intercept / subject = mchgrp;
run;

I know that wasn't what you were asking, though...Sorry I couldn't be
more help.

Ryan

Bruce Weaver

unread,

Feb 3, 2009, 4:07:02 PM2/3/09

to

On Jan 15, 1:45 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:

> Hello group. Some time ago, I used Stata to perform conditional
> logistic regression for a study that had 3:1 matching (3 specialists
> for every 1 GP).
>
> An alternative to conditional logistic regression is use of

> generalized estimating equations (GEE), andGEEis now possible in

> SPSS via the GENLIN procedure. So, I wanted to try reanalyzing those
> old data using GENLIN.
>
> Here is the layout of my file (the key variables at least):
>
> mchgrp gp spec all_maj
>
> 1 1 0 0
> 1 0 1 1
> 1 0 1 0
> 1 0 1 0
> 2 1 0 0
> 2 0 1 0
> 2 0 1 0
> 2 0 1 0

Dave Matheson from SPSS Tech Support advised me to remove the
WITHINSUBJECT variable from the REPEATED sub-command. So the revised
syntax looks like this:

* Generalized Estimating Equations.
GENLIN all_maj (REFERENCE=FIRST) WITH gp
/MODEL gp INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT

/REPEATED SUBJECT=mchgrp SORT=YES

CORRTYPE=UNSTRUCTURED ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE)
UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

This ran, and is giving results comparable to the conditional logistic
regressions I ran originally (in Stata). See below for odds ratios &
95% CI's for 4 different outcome variables. (Change font to Courier
if things don't line up).

Comparison of Conditional Logistic Regression & GEE

----------------------------------------------------------
Outcome | Odds Ratio [95% Conf. Interval] METHOD
----------------------------------------------------------
maj_surg gp | 1.599 1.064 2.403 1
| 1.606 1.072 2.406 2
----------------------------------------------------------
all_maj gp | 1.626 1.126 2.346 1
| 1.631 1.128 2.358 2
----------------------------------------------------------
blood gp | 0.755 0.536 1.065 1
| 0.839 0.616 1.142 2
----------------------------------------------------------
maj_oth gp | 1.579 0.734 3.340 1
| 1.585 0.735 3.419 2
----------------------------------------------------------

Method 1: Conditional logistic regression (Stata)
Method 2: GEE (via GENLIN in SPSS)