Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Case Control Syntax for SPSS

3,009 views
Skip to first unread message

Denise

unread,
Mar 14, 2012, 6:27:48 PM3/14/12
to
Hello,

I am writing because I am experiencing tremendous difficulty creating
a 3:1 match for my case control study in SPSS.

I have tried using the FUZZY extension, but to no avail, and was
wondering if what I wish to accomplish could be through some syntax
that is already out there?

Basically I have a sample of 130 patients who had a hospital
intervention and would like to identify 3 controls for each
intervention patient. My file is a list of emergency department
patient mrns and demographics, and there is a yes/no variable that
identifies which patients received the intervention.

I would also like to match on age, gender, race, payor, and zip.

Any input would be greatly appreciated.

Many thanks,

Denise




David Marso

unread,
Mar 14, 2012, 8:10:22 PM3/14/12
to


Please be a little more specific.
What have you *actually* tried?
How many non-intervention patient cases do you have?
What does "to no avail" mean wrt FUZZY?
What are demographics if not "age, gender, race, payor, and zip"
What is payor? what are patient mrns?
Why do you need 3 controls? Have you even successfully found 1 control per patient?

Jon Peck

unread,
Mar 14, 2012, 9:04:04 PM3/14/12
to
In particular,
do you have FUZZY installed and recognized when you run the command? If not, have you installed the Python Essentials? What platform and Statistics version are you on?

If you run the command, does it just fail to find enough matches, or are you getting syntax errors? What are they? How big is your control dataset?

Note that since FUZZY takes separate input datasets for cases (demanders) and controls (suppliers), you need to split your dataset into two

Denise

unread,
Mar 15, 2012, 2:59:50 PM3/15/12
to
Dear Jon -

Thanks for your input. I try to run the command and I get an error
message. I have Python 2.7 installed and am using SPSS 19 on a Windows
XP computer.

Also, thanks for letting me know about needing to split the datasets.

Best,

Denise
Message has been deleted

Denise

unread,
Mar 15, 2012, 2:57:52 PM3/15/12
to
David - thank you very much for your feedback, and here are my answers
to your questions:

The only thing I have tried to use is FUZZY
However I installed FUZZY and Python and cannot seem to get it
running.
Re: non-intervention patient cases, I have a file of roughly 150,000
patients who could be potential controls
Patient MRN is unique patient identifier
The demographics are indeed age, gender, race, payor, and zip.
Payor is the type of health insurance the patient has
We we were hoping for three controls for enhanced power, but obviously
we haven't found any controls yet due
to the inability to carry out the matching in SPSS.

Many thanks,

Denise

David Marso

unread,
Mar 15, 2012, 4:40:40 PM3/15/12
to
I can't comment on FUZZY, but have you tried using MATCH FILES or ADD FILES?
Rename the MRN variable in the intervention group to something else say MRM_1.
SORT both files by the demogs, a random variable and counter and save them.
use the intervention file as a /TABLE in the MATCH files to associate each intervention patient with an appropriate control.
Rinse, REPEAT.
---

On Wednesday, March 14, 2012 6:27:48 PM UTC-4, Denise wrote:

David Marso

unread,
Mar 15, 2012, 7:49:24 PM3/15/12
to
Here is some rather dirty syntax which simulates some data and obtains 3 matching cases from a second large file.
--
YMMV depending upon the distributions of the demogs (I have lazily named them D1...D5.
INPUT PROGRAM.
LOOP CASEID=1 TO 210.
DO REPEAT D=D1 TO D5 / X= 2 5 4 10 3.
COMPUTE D=TRUNC(UNIFORM(X))+1.
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE INTERV=1.
SORT CASES BY D1 D2 D3 D4 D5.
SAVE OUTFILE "C:\TEMP\Interv.sav".

INPUT PROGRAM.
LOOP CASEID=1000 TO 200000.
DO REPEAT D=D1 TO D5 / X= 2 5 4 10 3.
COMPUTE D=TRUNC(UNIFORM(X))+1.
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

COMPUTE INTERV=0.
SORT CASES BY D1 D2 D3 D4 D5 .
SAVE OUTFILE "C:\TEMP\Control.sav".

*------.
GET FILE "C:\TEMP\Interv.sav".
RENAME VARIABLES CASEID=ID_INT.
LOOP REPLIC=1 TO 3.
XSAVE OUTFILE "C:\TEMP\IntervRep3.sav".
END LOOP.
EXECUTE.

GET FILE "C:\TEMP\IntervRep3.sav".
SORT CASES BY D1 D2 D3 D4 D5 REPLIC.
SAVE OUTFILE "C:\TEMP\IntervRep3a.sav".

GET FILE "C:\TEMP\Control.sav".
COMPUTE RANDOM=Uniform(1).
SORT CASES BY D1 D2 D3 D4 D5 RANDOM.
COMPUTE REPLIC=SUM(1,(D1=LAG(D1)&D2=LAG(D2)&D3=LAG(D3)&D4=LAG(D4)&D5=LAG(D5))*LAG(REPLIC)).
MATCH FILES / FILE * / IN=CONTROL
/ FILE="C:\TEMP\IntervRep3a.sav" / IN=INTER
/ BY D1 D2 D3 D4 D5 REPLIC.
SELECT IF CONTROL AND INTER.
SORT CASES BY ID_INT.
FREQ REPLIC.

Andy W

unread,
Mar 15, 2012, 10:00:37 PM3/15/12
to
Also of potential interest, there is some code on Raynald's site to conduct propensity score matching. See http://spsstools.net/Syntax/RandomSampling/MatchCasesOnBasisOfPropensityScores.txt , I know it is not the same thing as matching directly on covariates, but could be useful given the intended subject.

With all the questions about the FUZZY command I think it would be good if someone made a canonical example with its use (blog post or post on here or the like).

Andy

David Marso

unread,
Mar 15, 2012, 11:25:12 PM3/15/12
to
Less dirty (accounting for possible duplicates on the demogs for the intervention file:
SNIP input program...
---
GET FILE "C:\TEMP\Interv.sav".
RENAME VARIABLES CASEID=ID_INT.
SORT CASES BY D1 D2 D3 D4 D5 RANDOM.
COMPUTE REP1=SUM(1,(D1=LAG(D1)&D2=LAG(D2)&D3=LAG(D3)&D4=LAG(D4)&D5=LAG(D5))*LAG(REP1)).
LOOP REPLIC=1 TO 3.
XSAVE OUTFILE "C:\TEMP\IntervRep3.sav".
END LOOP.
EXECUTE.

GET FILE "C:\TEMP\IntervRep3.sav".
COMPUTE REPLIC=(REP1-1)*3 +REPLIC.
SORT CASES BY D1 D2 D3 D4 D5 REPLIC.
SAVE OUTFILE "C:\TEMP\IntervRep3a.sav".


GET FILE "C:\TEMP\Control.sav".
COMPUTE RANDOM=Uniform(1).
SORT CASES BY D1 D2 D3 D4 D5 RANDOM.
COMPUTE REPLIC=SUM(1,(D1=LAG(D1)&D2=LAG(D2)&D3=LAG(D3)&D4=LAG(D4)&D5=LAG(D5))*LAG(REPLIC)).
MATCH FILES / FILE * / IN=CONTROL
/ FILE="C:\TEMP\IntervRep3a.sav" / IN=INTER
/ BY D1 D2 D3 D4 D5 REPLIC.
EXE.


SELECT IF CONTROL AND INTER.
SORT CASES BY ID_INT.
FREQ REPLIC.

Denise

unread,
Mar 16, 2012, 10:33:15 AM3/16/12
to
Dear Andy,

Thanks very much for the Raynald link! Also, re: the FUZZY example, I agree, I have Googled it so much and found little.

Denise

Denise

unread,
Mar 16, 2012, 10:31:38 AM3/16/12
to
Dear David,

Thank you so much for sharing this syntax. I am going to run this today and see what happens. I really appreciate the help and will let you know what happens.

Best,

Denise

Jon Peck

unread,
Mar 16, 2012, 10:38:25 AM3/16/12
to
M
Match/Add can't deal with fuzz in the matching criteria, which is usually necessary with variables like age.

Since the actual error still remains unstated, there is nothing that can be done to help with getting FUZZY to run.

David Marso

unread,
Mar 16, 2012, 9:29:33 PM3/16/12
to
Here is a somewhat more general approach with a relatively efficient macro!
The Input Program merely simulates the data!
--
INPUT PROGRAM.
+ LOOP CASEID=1 TO 210000.
+ DO REPEAT D=D1 TO D5 / X= 2 5 14 30 3.
+ COMPUTE D=TRUNC(UNIFORM(X))+1.
+ END REPEAT.
+ DO IF $CASENUM <= 250.
+ COMPUTE INTERV=1.
+ XSAVE OUTFILE "C:\TEMP\Interv.sav".
+ ELSE.
+ COMPUTE INTERV=0.
+ XSAVE OUTFILE "C:\TEMP\Control.sav".
+ END IF.
+ END CASE.
+ END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

**BEGIN ACTUAL RELEVANT CODE HERE!!! ***.
*------.
DEFINE CTR (!POS !TOKENS(1)/ !POS !CMDEND ,).
!LET !OUTX="SUM(1,("
!DO !D !IN (!2) !LET !OUTX=!CONCAT (!OUTX,!D," EQ LAG(",!D,")&") !DOEND
!LET !OUTX=!CONCAT(!OUTX,"1)*LAG(",!1,"))")
COMPUTE !1=!OUTX .
!ENDDEFINE .
*------.
DEFINE CCMatch
(CaseFil !CHAREND ('/') /CtlFil !CHAREND ('/')/ID !CHAREND ('/')/NRep !CHAREND ('/')/Demog !CMDEND )
GET FILE !QUOTE(!CaseFil).
RENAME VARIABLES !ID=@ID_INT.
SORT CASES BY !DEMOG.
CTR @REP1 !Demog.
LOOP @REP=1 TO !NREP.
+ COMPUTE @REPLIC=(@REP1-1)*!NREP +@REP.
+ XSAVE OUTFILE !QUOTE(!CONCAT(!CaseFil,"X")).
END LOOP.
EXECUTE.
*------.
GET FILE !QUOTE(!CtlFil).
COMPUTE @RANDOM=Uniform(1).
SORT CASES BY !Demog @RANDOM.
CTR @REPLIC !Demog.
MATCH FILES / FILE * / IN=@CONTROL / FILE=!QUOTE(!CONCAT(!CaseFil,"X"))/ IN=@INTER/ BY !Demog @REPLIC.
SELECT IF @CONTROL AND @INTER.
SORT CASES BY @ID_INT.
FREQ @REPLIC.
!ENDDEFINE.
*------.
SET MPRINT ON.
CCMATCH CaseFil C:\TEMP\Interv.sav
/CtlFil C:\TEMP\Control.sav
/ID CaseID
/NRep 3
/Demog D1 D2 D3 D4 D5.
EXECUTE.

Art Kendall

unread,
Mar 17, 2012, 9:11:18 AM3/17/12
to
Those who follow my posts know that I advocate use of syntax to
facilitate refining ones process.

David is one of the most capable programmers I have ever been exposed to.

Note that even at his skill level he has produced several drafts of
syntax for matching "case-control" cases.

Art Kendall
Social Research Consultants

Denise

unread,
Mar 21, 2012, 10:42:54 AM3/21/12
to
Dear David,

Thanks again for all your help with this. This code is running beautifully except for one issue with the "SORT CASES BY D1 D2 D3 D4 D5 RANDOM." syntax you provided above. When I try to run this:

SORT CASES BY age payor zip sex race RANDOM.

I get this message:

>Error # 701 in column 38. Text: RANDOM
>An undefined variable name, or a scratch or system variable was specified in a
>variable list which accepts only standard variables. Check spelling and
>verify the existence of this variable.
>Execution of this command stops.

I have scoured google for a possible reason, but can't find anything. Any feedback you may have would be greatly appreciated.

Many thanks,

Denise

Denise

unread,
Mar 21, 2012, 10:45:53 AM3/21/12
to A...@drkendall.org
Art - yes David's syntax and feedback is invaluable. I feel extremely lucky to have access to this group.

Denise

Denise

unread,
Mar 21, 2012, 11:12:43 AM3/21/12
to
Dear Jon,

I hate to say it but I can't find the syntax I initially tried to run the help for FUZZY, but that's where I got stuck.

Thanks, Denise

Denise

unread,
Mar 21, 2012, 2:03:02 PM3/21/12
to
On Friday, March 16, 2012 10:38:25 AM UTC-4, Jon Peck wrote:
Just wanted to let you know that I finally got FUZZY running. I am extremely impressed!

D.

Kashish Goel

unread,
Apr 12, 2012, 2:14:50 AM4/12/12
to
Hi Jon,

I had a quick question.

The FUZZY/Help. command takes data from two different datasets. But,
my cases and controls are embedded in one variable (as 0 and 1). How
can I create a syntax for that?

Thanks for your help!

Naresh

David Marso

unread,
Apr 12, 2012, 7:59:06 AM4/12/12
to
The *obvious* solution is to separate them into 2 different files ;-)
0 new messages