I am asking for some advice on a data management procedure that would
allow me to perform a function that would aggregate variables and
return mode values. Aggregate command in SPSS does not allow to
aggregate by mode offering options to aggregate by mean, sum, etc.
Does anyone know if there is a roundabout way to compute a variable
that would return mode by break variable?
In the sample of data below a participant in my data set makes five
visits to a center and during one of the visits reports their race as
Asian, which is an erroneous value. I would like to aggregate the data
based on the ID variable and return value "white" to the aggregated
race variable.
ID TotalVisits SequenceofVisit Race RaceNum
1 5 1 White 1
1 5 2 White 1
1 5 3 Asian 2
1 5 4 White 1
1 5 5 White 1
Thank you,
Anastasia
Maybe this will work:
1. Use OMS to send frequency distribution tables to a data set
2. Split file by ID, then run FREQUENCIES on Race Num with output
sorted by descending frequency -- i.e., /FORMAT=DFREQ
3. Turn off split file and OMS
4. Activate the dataset containing the frequency tables, and select
the first row for each ID. The variable containing the frequency
count will be the mode (or at least a mode, if there are ties).
Delete all variables other than ID and that frequency count.
5. Merge the two data sets with MATCH FILES. The data set with the
mode will have to be treated as a look-up table (i.e., /TABLE rather
than /FILE).
HTH.
--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."
Oops...bit of a slip-up on point 4 above. The variable just to the
LEFT of the frequency count is the one you want. Here's an example.
* This code assumes that variable RACENUM has no value labels.
* If it does have value labels, VAR3 ends up holding the
* strings "White" and "Asian", and the wheels fall off.
* One could get around that problem, but since you already
* have the labels in variable RACE, you may not also need
* the value labels for RACENUM.
new file.
dataset close all.
data list list / ID TotalVisits SequenceofVisit (3f5.0) Race(a8)
RaceNum (f2.0).
begin data
1 5 1 White 1
1 5 2 White 1
1 5 3 Asian 2
1 5 4 White 1
1 5 5 White 1
2 5 1 Asian 2
2 5 2 Asion 2
2 5 3 Asian 2
2 5 4 White 1
2 5 5 White 1
3 5 1 White 1
3 5 2 White 1
3 5 3 Asian 2
end data.
dataset name maindata.
*val lab RaceNum
1 'White'
2 'Asian'
.
* OMS.
DATASET DECLARE fredist.
OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='fredist'.
split file by ID.
FREQUENCIES RaceNum /FORMAT=DFREQ .
split file off.
OMSEND.
dataset activate fredist window = front.
* Var1 is a string -- recode it to numeric and call it ID .
recode var1 (convert) into ID.
* Keep first record for each ID .
select if ($casenum EQ 1) or (ID NE lag(ID)).
exe.
* Var3 has the Race mode, but it is string .
* Recode it to numeric.
recode var3 (convert) into RaceMode.
exe.
* Get rid of unneeded variables .
delete variables TableNumber_ TO CumulativePercent .
* Now merge the two files .
dataset activate maindata window = front.
MATCH FILES
FILE=* /
TABLE='fredist' /
BY ID.
exe.
dataset close fredist.
Thank you very much. I am just trying it out and I think it would work
for all my demographic variables. Also, I did not know that OMS
existed, thank you for showing it to me. This is not the first time
you respond to my questions and I appreciate your help very much.
Again,
Thank you,
Anastasia Vishnyakova
agg out *
mode=addvar
/brea=ID RaceNum
/n=n.
agg out *
mode=addvar
/brea=ID
/max_n=max(n).
temp.
sel if max_n=n.
agg out 'c:\..................\Race.sav'
/brea=id
/Mode_race=first(racenum).
match fil /fil=*
/tab='c:\..................\Race.sav'
/by id.