The data (I present here one question from the survey) were generated
in answer to the following question:
What is your current position?
Response options:
Elementary school teacher
Assistant Teacher
Principal
Assistant Principal
(There are more possible responses, including an open-ended "Other"
but I will simplify and stop here.)
(See a sample excerpt at the bottom of this post.)
The Excel file generated by SurveyMonkey is arranged such that every
respondent is, as should be, a row.
It is the columns that are problematic.
Rather than have a particular column represent a variable, a series of
columns represents the possible responses to a particular question.
There is a column A for ID but I leave it out in the below sample
since it didn't come through well.
Thus, for instance, columns B through E represent current position
(per labels in rows 1 and 2).
There are 4 possible responses, one column for every response.
Column B (according to the label in row 2) is elementary school
teacher, C is assistant teacher, D is principal, E is assistant
principal.
Respondent 1 is a principal so in the appropriate row columns B, C, E
are blank, and in column D it says "Principal". (Not yes! not 1! but
the same thing as was written at the top of the column).
If there is no answer, then all the columns are blank.
I note that occasionally a respondent has multiple positions (e.g.,
the row that has both principal and assistant principal, the third
survey respondent, below), which complicates things somewhat, but is
not the major issue at the moment.
What I would like to do is combine these data into a single variable
called "current position" with basic values from 1 to 4.
8 for multiple answers (I think, not sure), 9 for missing)
Is there some ready way to do this in either Excel or PASW (SPSS)?
I have some ideas about how to do this, but I'm wondering if it can be
done faster. My ideas:
1) Somehow combine the first two (header) rows. Sometimes I need the
first and sometimes I need the second.
2) Convert the words to numbers -- could be done as a Find and Replace
in Excel, but that would have to be done one column at a time -- is
there a faster way?
3) In SPSS, use COMPUTE to create variables which combine the multiple
Excel variables into a single one.
In general, do people think this would work?
Below is an excerpt from the sample Excel data, for the first 3 survey
respondents. Note that data starts in Row 3; Rows 1 and 2 are
headings. (I hope this sample comes through correctly. By the way, am
I correct that the only way to send data to this list is using tabs
and the like?)
Current Position
Elementary school teacher Assistant teacher
Principal Assistant Principal
Principal
Elementary school teacher
Principal Assistant Principal
Thank you.
Esther
Also check the archives to SPSSX-L for replies about translating from
multiple dichotomies to a single nominal level variable. I think but am
not sure that there is a PYTHON add-on? Plug-in? that automates that.
Art Kendall
Social Research Consultants
If I'm understanding correctly in this example you have if the
individual answered "Elementary school teacher" then it will say
"Elementary school teacher" in Column B in the excel spreadsheet.
In syntax you can use the compute and if statements to make a numeric
value for this data. I typically do it like this although there are
other ways.
******************.
compute Elem = 0.
if B = "Elementary school teacher" Elem = 1.
execute.
******************.
I would just use dummy variables for each response at the onset since
a respondent could choose multiple categories, although you can change
this however you want to later on (like you suggested with the 1 to 4
in one variable, although this will take some coding scheme to deal
with people who gave more than one response).
You didn't give enough information to answer in regards to combining
the header rows. Is the id# present in every row or only in the row
with the other header information? If the id# is present in every row
you may be able to just get rid of the header row (if it contains no
other unique information).
[snip, detail]
>
>There are 4 possible responses, one column for every response.
>
>Column B (according to the label in row 2) is elementary school
>teacher, C is assistant teacher, D is principal, E is assistant
>principal.
>
>Respondent 1 is a principal so in the appropriate row columns B, C, E
>are blank, and in column D it says "Principal". (Not yes! not 1! but
>the same thing as was written at the top of the column).
>
>If there is no answer, then all the columns are blank.
You can get rid of the words pretty easily.
If you know that every non-blank is the right name - which is
something that you should make sure of, anyway - all you
need to do, for variables (Columns) B to D is
RECODE B, C, D, E (blank=0) (else=1).
Then you can convert the set of 1's to a variable ranges
1-4 by saying,
COMPUTE NewVar= 1*B + 2*C + 3*D + 4*E.
You can make sure that there is no more than one value by
using a COUNT, or else by simple using
If (SUM(B, C, D, E) GT 1) NewVar= 8.
[snip, rest]
--
Rich Ulrich
I would expand the suggestion a little.
numeric numB numC numD numE(f1).
RECODE B, C, D, E (blank=0) (else=1) into numB numC numD numE.
SPSS won't recode string variables into numeric variables in the same field.
<soapbox>
Also, I wish there options in SPSS that could be set so that it would
ask if the user were sure that (s)he wanted to use the replace format of
recode rather than the into format.
In the early days recoding into the same field was probably introduced
in order to save storage, but in my experience, writing over input can
be hazardous.
Storage is no longer as expensive as it was in 1968.
<\soapbox>
Art Kendall
Good soapbox comment Art, and nice modification of Rich's neat
recode. But given that Esther's goal was a single variable with
values of 1-4, here's a method using scratch variables that
eliminates the 4 unneeded indicator variables from the final file.
The variable names I'm using are based on a guess as to how what
happens to the Excel column headings when the data are imported to
SPSS.
recode Elementaryschoolteacher Assistantteacher Principal
Assistantprincipal
("" = 0) (else = 1) into #a #b #c #d .
compute CurrentPosition = Lg10(#a + #b*10 + #c*10**2 + #d*10**3) + 1.
if missing(CurrentPosition) CurrentPosition = 99 .
format CurrentPosition (f2.0).
variable level CurrentPosition (nominal).
variable labels CurrentPosition "Current Position".
value labels CurrentPosition
1 'Elementary School Teacher'
2 'Assistant Teacher'
3 'Principal'
4 'Assistant Principal'
99 'No position selected'
.
missing values CurrentPosition (99).
frequencies CurrentPosition.
The expression #a + #b*10 + #c*10**2 + #d*10**3 will return the
following values:
0 - when no position is selected
1 - when only first position is selected
10 - only second position
100 - only third position
1000 - only 4th position
I assume only one of the 4 can be selected.
And for those values, Log10 returns:
0 - SYSMIS (so a warning will appear)
1 - 0
10 - 1
100 - 2
1000 - 3
HTH.
--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."