SPSS merge files problem - any help appreciated!

587 views
Skip to first unread message

Marnie

unread,
Feb 9, 2012, 5:37:22 PM2/9/12
to MedStats
I'm sorry to bother the list with this, but I am sure someone knows an
easy solution to something I have been working on unsuccessfully for a
week (I hope so, at least!)..

I need to merge two files: they are patient files. The first one
contains one line for each patient, identified by an ID, and
containing his demographic info (name, address, etc).

The second file contains many lines for each patient, corresponding to
all his visits in a year (so there may be many of these!), each line
of which is identified by the same ID, but with completely different
data (diagnosis, payment mechanism, etc.).

I want to merge them so that there are the same number of lines as in
file 2 (one per visit), where each line contains all the info for the
patient - all his visit data and all his demographic data.

Thanks so much in advance for any help or syntax!

Ted Harding

unread,
Feb 9, 2012, 8:16:13 PM2/9/12
to meds...@googlegroups.com
> --

What you seek is a basic generic database operation called
a JOIN (for which there are several options). If you are
using a UNIX-type operating system (e.g. Linux, or Mac OS-X
[based on FreeBSD]) then you should consult 'man join',
which will tell you how to do it.

Sorry that it is too late today for me to go into details,
but that is the line to pursue.

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@wlandres.net>
Date: 10-Feb-2012 Time: 01:16:07
This message was sent by XFMail
-------------------------------------------------

Rajeev Kumar

unread,
Feb 10, 2012, 2:42:27 AM2/10/12
to meds...@googlegroups.com
Hello Ted.

You can do that but it is tricky, first you convert the long format file (having many rows for each patients) into wide format using the restructure command present in data menu just above the merge command  is SPSS. Give ID as identifier variable. This will give one row for each patient.

 

Then merge these two files as usual because now you have one row for the each patient in both the files.

Then again convert this wide format into long format.

 

I hope you would knew how to convert the long format into wide format.


Rajeev Kumar
Department of Biostatistics and Medical Informatics
University College of Medical Sciences
Dilshad Garden
Delhi - 110095


--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules



--

Ted Harding

unread,
Feb 10, 2012, 4:23:06 AM2/10/12
to meds...@googlegroups.com

Marnie:
My apologies for overlooking "SPSS" in the Subject line
last night -- I was basing my reply on the descriptions
in the message body.

Since I'm not an SPSS user, I can't offer advice on what
manoeuvres to adopt in that software. However, now that
my blood-caffeine levels are approaching normal this
morning, the following is an example of how a simple
command in a UNIX-type system (see above) can do the job.
The command line is:

join -t "#" -1 1 -2 1 patients.txt visits.txt > patvis.txt

of which the explanation is:

join: Use the "join" utility program
-t "#": Use the character "#" as field separator
-1 1: Join using field 1 ("1") of file 1 ("-1")
-2 1: Join using field 1 ("1") of file 2 ("-2")
patients.txt: This is file 1
visits.txt: This is file 2
> patvis.txt: Output the result into file patvis.txt

The files patients.txt and visits.txt (silly data) are:

patients.txt:
=============
ID#NAME#AGE#SEX
0001#Alice Abbot#21#F
0002#Brian Bishop#32#M
0003#Charles Cardinal#43#M
0004#Diana Deacon#54#F

visits.txt:
===========
ID#Date#Symptoms#Treatment
0001#01/02/2011#Sleepy#Prescribe caffeine
0001#02/03/2011#Agitated#Prescribe alcohol
0002#03/01/2/11#Itching#Advise scratching
0002#02/02/2011#Skin trauma#Advise stop scratching
0002#01/03/2011#Itching#Refer to hospital
0003#01/02/2011#Emaciated#Advise eat more
0003#02/04/2011#Hunger#Advise eat more
0003#03/06/2011#Indigestion#Advise more healthy diet
0003#04/08/2011#Obese#Advise cut down on food
0004#03/03/2011#No symptoms#Reassurance
0004#05/03/2011#No symptoms#Reassurance
0004#07/03/2011#No symptoms#Reassurance
0004#21/03/2011#Mood swings#Prescribe tranquiliser
0004#04/04/2011#Violent demeanour#Call Police

Then the result of the command:

join -t "#" -1 1 -2 1 patients.txt visits.txt > patvis.txt

is the file patvis.txt:
=======================
ID#NAME#AGE#SEX#Date#Symptoms#Treatment
0001#Alice Abbot#21#F#01/02/2011#Sleepy#Prescribe caffeine
0001#Alice Abbot#21#F#02/03/2011#Agitated#Prescribe alcohol
0002#Brian Bishop#32#M#03/01/2/11#Itching#Advise scratching
0002#Brian Bishop#32#M#02/02/2011#Skin trauma#Advise stop scratching
0002#Brian Bishop#32#M#01/03/2011#Itching#Refer to hospital
0003#Charles Cardinal#43#M#01/02/2011#Emaciated#Advise eat more
0003#Charles Cardinal#43#M#02/04/2011#Hunger#Advise eat more
0003#Charles Cardinal#43#M#03/06/2011#Indigestion#Advise more healthy diet
0003#Charles Cardinal#43#M#04/08/2011#Obese#Advise cut down on food
0004#Diana Deacon#54#F#03/03/2011#No symptoms#Reassurance
0004#Diana Deacon#54#F#05/03/2011#No symptoms#Reassurance
0004#Diana Deacon#54#F#07/03/2011#No symptoms#Reassurance
0004#Diana Deacon#54#F#21/03/2011#Mood swings#Prescribe tranquiliser
0004#Diana Deacon#54#F#04/04/2011#Violent demeanour#Call Police

which, from your description, is exactly what you would want to see.

The generic route to achieve this result, without relying on
the basic UNIX utility "join", would be to read the original
files into just about any relational database software, and
then use the relational database function JOIN according to
the details for that software package.

It may well be that SPSS itself has such utilities built in
-- though I confess that I found Rajeev Kumar's description
horribly convoluted: I would hope that SPSS has a simpler
method, analogous to what I have illustrated above. The idea
of collating all the lines in file 2 for a given patient into
a single "long format" line strikes me as strange, possibly
leading to a result that may be difficult to unpack!

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@wlandres.net>

Date: 10-Feb-2012 Time: 09:15:19

MacLennan, Graeme

unread,
Feb 10, 2012, 4:59:49 AM2/10/12
to meds...@googlegroups.com
Ted said:
It may well be that SPSS itself has such utilities built in -- though I confess that I found Rajeev Kumar's description horribly convoluted: I would hope that SPSS has a simpler method, analogous to what I have illustrated above. The idea of collating all the lines in file 2 for a given patient into a single "long format" line strikes me as strange, possibly leading to a result that may be difficult to unpack!


It does indeed, the OP should read for the following UCLA web page for syntax on the MATCH command, in particular section 3.
http://www.ats.ucla.edu/stat/spss/modules/merge.htm
Regards,
Graeme.

The University of Aberdeen is a charity registered in Scotland, No SC013683.

Martin Roberts

unread,
Feb 10, 2012, 5:09:52 AM2/10/12
to meds...@googlegroups.com
Hi Marnie,
Isn't it done by a MATCH FILES command with your visit-based file as the initial file and the patient-based file as a lookup table. For example:

MATCH FILES FILE=Visitfile /TABLE=Patientfile /BY PatientID.

(You would need to sort both files by PatientID first and you can use the KEEP or DROP commands to specify which variables from Patientfile are added if you don't want all of them.)

Or is it more complicated than that?
Regards,
Martin

Martin Roberts,
Research Fellow
Peninsula College of Medicine & Dentistry
Plymouth 01752 439259
Exeter 01392 726189
________________________________________
From: meds...@googlegroups.com [meds...@googlegroups.com] On Behalf Of Marnie [marnie...@gmail.com]
Sent: 09 February 2012 22:37
To: MedStats
Subject: {MEDSTATS} SPSS merge files problem - any help appreciated!

--

Rajeev Kumar

unread,
Feb 10, 2012, 5:16:53 AM2/10/12
to meds...@googlegroups.com
Hello ted:
It is true that my method is complicated if the file2 has large number of variables. But if you have three or four variables in file2 you can easily apply this.
Rajeev

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules



--

Bruce Weaver

unread,
Feb 10, 2012, 2:01:44 PM2/10/12
to MedStats
I have two comments for Marnie.

1. On the web-page Grame gave, the important thing is the distinction
between /FILE and /TABLE. /TABLE causes that file (or dataset) to be
used as a look-up table.

2. For questions about how to do X in SPSS, you'll have much better
luck posting to the SPSS newsgroup (comp.soft-sys.stat.spss) or
mailing list (http://spssx-discussion.1045642.n5.nabble.com/). It's
my impression that most of the regulars here are not SPSS users.

HTH.


On Feb 10, 4:59 am, "MacLennan, Graeme" <g.maclen...@abdn.ac.uk>
wrote:
> Ted said:
> It may well be that SPSS itself has such utilities built in -- though I confess that I found Rajeev Kumar's description horribly convoluted: I would hope that SPSS has a simpler method, analogous to what I have illustrated above. The idea of collating all the lines in file 2 for a given patient into a single "long format" line strikes me as strange, possibly leading to a result that may be difficult to unpack!
>
> It does indeed, the OP should read for the following UCLA web page for syntax on the MATCH command, in particular section 3.http://www.ats.ucla.edu/stat/spss/modules/merge.htm
Reply all
Reply to author
Forward
0 new messages