Aggregate Variables by Mode

Anastasia

unread,

May 5, 2010, 12:22:56 PM5/5/10

to

Hi All!

I am asking for some advice on a data management procedure that would
allow me to perform a function that would aggregate variables and
return mode values. Aggregate command in SPSS does not allow to
aggregate by mode offering options to aggregate by mean, sum, etc.

Does anyone know if there is a roundabout way to compute a variable
that would return mode by break variable?

In the sample of data below a participant in my data set makes five
visits to a center and during one of the visits reports their race as
Asian, which is an erroneous value. I would like to aggregate the data
based on the ID variable and return value "white" to the aggregated
race variable.

ID TotalVisits SequenceofVisit Race RaceNum
1 5 1 White 1
1 5 2 White 1
1 5 3 Asian 2
1 5 4 White 1
1 5 5 White 1

Thank you,
Anastasia

Bruce Weaver

unread,

May 5, 2010, 1:05:50 PM5/5/10

to

Maybe this will work:

1. Use OMS to send frequency distribution tables to a data set

2. Split file by ID, then run FREQUENCIES on Race Num with output
sorted by descending frequency -- i.e., /FORMAT=DFREQ

3. Turn off split file and OMS

4. Activate the dataset containing the frequency tables, and select
the first row for each ID. The variable containing the frequency
count will be the mode (or at least a mode, if there are ties).
Delete all variables other than ID and that frequency count.

5. Merge the two data sets with MATCH FILES. The data set with the
mode will have to be treated as a look-up table (i.e., /TABLE rather
than /FILE).

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Bruce Weaver

unread,

May 5, 2010, 3:25:25 PM5/5/10

to

Oops...bit of a slip-up on point 4 above. The variable just to the
LEFT of the frequency count is the one you want. Here's an example.

* This code assumes that variable RACENUM has no value labels.
* If it does have value labels, VAR3 ends up holding the
* strings "White" and "Asian", and the wheels fall off.
* One could get around that problem, but since you already
* have the labels in variable RACE, you may not also need
* the value labels for RACENUM.

new file.
dataset close all.

data list list / ID TotalVisits SequenceofVisit (3f5.0) Race(a8)
RaceNum (f2.0).
begin data

1 5 1 White 1
1 5 2 White 1
1 5 3 Asian 2
1 5 4 White 1
1 5 5 White 1

2 5 1 Asian 2
2 5 2 Asion 2
2 5 3 Asian 2
2 5 4 White 1
2 5 5 White 1
3 5 1 White 1
3 5 2 White 1
3 5 3 Asian 2
end data.

dataset name maindata.

*val lab RaceNum
1 'White'
2 'Asian'
.

* OMS.
DATASET DECLARE fredist.
OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='fredist'.

split file by ID.
FREQUENCIES RaceNum /FORMAT=DFREQ .
split file off.

OMSEND.

dataset activate fredist window = front.

* Var1 is a string -- recode it to numeric and call it ID .

recode var1 (convert) into ID.

* Keep first record for each ID .

select if ($casenum EQ 1) or (ID NE lag(ID)).
exe.

* Var3 has the Race mode, but it is string .
* Recode it to numeric.

recode var3 (convert) into RaceMode.
exe.

* Get rid of unneeded variables .

delete variables TableNumber_ TO CumulativePercent .

* Now merge the two files .

dataset activate maindata window = front.

MATCH FILES
FILE=* /
TABLE='fredist' /
BY ID.
exe.

dataset close fredist.

Anastasia

unread,

May 5, 2010, 5:43:49 PM5/5/10

to

Bruce,

Thank you very much. I am just trying it out and I think it would work
for all my demographic variables. Also, I did not know that OMS
existed, thank you for showing it to me. This is not the first time
you respond to my questions and I appreciate your help very much.

Again,
Thank you,
Anastasia Vishnyakova

fred....@lsc.gov.uk

unread,

May 7, 2010, 5:03:12 AM5/7/10

to

If you don't wnat to use OMS you could try

agg out *
mode=addvar
/brea=ID RaceNum
/n=n.

agg out *
mode=addvar
/brea=ID
/max_n=max(n).

temp.
sel if max_n=n.
agg out 'c:\..................\Race.sav'
/brea=id
/Mode_race=first(racenum).

match fil /fil=*
/tab='c:\..................\Race.sav'
/by id.

tina.l...@gmail.com

unread,

Jan 26, 2017, 4:42:20 PM1/26/17

to

I know this is an outdated post, but I figured I would post an alternative way of doing this that I figured out. I too went searching online to find out to Aggregate a Variable by it's mode or the cases that appear most often. In my example, I was looking at preferred language spoken.

First you need to create a dummy variable that contains the number 1 in all the cases. I named my dummy variable VAR00002.

Then you basically Aggregate breaking by the ID and the Pref_Language, or in your example the Race and calculate the "SUM" of the dummy variable (VAR00002). This gives you a frequency count of the Pref_Lang(or Race) based on ID.

Then you need to Sort cases so ID is the primary sort, and the new variable you created (VAR00002_sum) as the secondary sort. I kept it sorted Ascending. This sorts the frequencies of your variable with the highest frequency appearing at the bottom of the list grouped by ID.

Then once that is completed you can run an aggregate breaking by ID and calculate the last of the Pref_Lang (or Race). This will create a variable that is based on the last variable in the ID groups, which should be the variable with the highest frequency.

Here is my Syntax.

Create dummy variable VAR00002 which contains 1 for all cases.

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID PREF_LANG_COMPLETE
/VAR00002_sum=SUM(VAR00002).

SORT CASES BY ID(A) VAR00002_sum(A).

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/PREF_LANG_COMPLETE_last=LAST(PREF_LANG_COMPLETE).