Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Combining multiple dichotomous variables into one categorical variable

68 views
Skip to first unread message

marcus.cr...@gmail.com

unread,
Oct 1, 2015, 4:53:23 PM10/1/15
to
I have a data set that lists reasons for removal from the home for foster children. Each variable is dichotomous though (Physical abuse, sexual abuse, neglect, alcohol abuse, etc). Each is no =0 or yes = 1.

I would like to create a new categorical variable that lists all of the options for reason for removal such as physical abuse = 1, sexual abuse = 2, neglect = 3, alcohol abuse = 4, etc.

I tried the compute command but that just added everything together. I also tried to transform into a new variable but I was not able to because each variable uses 0 and 1. Do I need to recode each so that there is a 0,1 and a 0,2 and a 0,3 etc? Then I could transform into a new variable? Or is there a simpler way that makes more sense?

Rich Ulrich

unread,
Oct 1, 2015, 7:29:29 PM10/1/15
to
On Thu, 1 Oct 2015 13:53:18 -0700 (PDT), marcus.cr...@gmail.com
wrote:

>I have a data set that lists reasons for removal from the home for foster children. Each variable is dichotomous though (Physical abuse, sexual abuse, neglect, alcohol abuse, etc). Each is no =0 or yes = 1.
>
>I would like to create a new categorical variable that lists all of the options for reason for removal such as physical abuse = 1, sexual abuse = 2, neglect = 3, alcohol abuse = 4, etc.
>
>I tried the compute command but that just added everything together. I also tried to transform into a new variable but I was not able to because each variable uses 0 and 1. Do I need to recode each so that there is a 0,1 and a 0,2 and a 0,3 etc? Then I could transform into a new variable? Or is there a simpler way that makes more sense?

Two brute-force possibilities -

* to code up all the combinations of causes .
COMPUTE Causes= 1*PhysAb + 2*SexAb + 4*Neglect + 8*Alcohol + 16* ...
* The awkward step here is adding all the right Value Labels.

* to code with the Index of the Last-listed (most important) cause .
COMPUTE Causes= 0.
DO REPEAT reasons= Alcohol, Neglect, SexAb, PhysAb
/ code= 4, 3, 2, 1.
IF reasons EQ 1) Causes= code.
END REPEAT.


--
Rich Ulrich

Marcus Crawford

unread,
Oct 2, 2015, 3:34:48 PM10/2/15
to
I tried both of these methods and did not get either to work. The first multiplied everything together it looked like. The second method did give me what looked right once I ran a frequency. I had a table with 1-15 listed. However, when I attempted to match the numbers from the table to the dichotomous variable frequencies, none of them matched.

For instance, physical abuse should have been 15 on the new chart or possibly 1. This is one reason I wanted to check to make sure I paired them correctly. On the new chart, 15 has a frequency of 32,636 and 1 has a frequency of 67.736. Physical abuse, though, had a yes frequency of 90,659. That number was not anywhere in the new chart. None of the yes frequencies from the dichotomous variables ended up in the new chart.

Here is the syntax I used for the first way:

COMPUTE Causes= 1*PHYABUSE + 2*SEXABUSE + 3*NEGLECT + 4*AAPARENT + 5*DAPARENT + 6*AACHILD + 7*DACHILD + 8*CHILDIS + 9*CHBEHPRB + 10*PRTSDIED + 11*PRTSJAIL + 12*NOCOPE + 13*ABANDMNT +14*RELINQSH + 15*HOUSING

And here is the syntax I used for the second way:

COMPUTE Causes= 0.
DO REPEAT reasons= PHYABUSE, SEXABUSE, NEGLECT, AAPARENT, DAPARENT, AACHILD, DACHILD, CHILDIS, CHBEHPRB, PRTSDIED,PRTSJAIL,NOCOPE, ABANDMNT, RELINQSH, HOUSING
/ code= 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.
IF reasons EQ 1 Causes= code.
END REPEAT.

At the end of EQ 1, I had to remove a ) from the original because it caused as error message. Was there a ( somewhere earlier that is missed?

Rich Ulrich

unread,
Oct 2, 2015, 4:46:18 PM10/2/15
to
On Fri, 2 Oct 2015 12:34:42 -0700 (PDT), Marcus Crawford
<marcus.cr...@gmail.com> wrote:

>On Thursday, October 1, 2015 at 6:29:29 PM UTC-5, Rich Ulrich wrote:
[snip, previous, a few lines of which I will repeat...]

>
>I tried both of these methods and did not get either to work. The first multiplied everything together it looked like.

Here was my code:
>> COMPUTE Causes= 1*PhysAb + 2*SexAb + 4*Neglect + 8*Alcohol + 16* ...
Notice that the multipliers increase, not 1,2,3,4, ..., - but rather,
1,2,4,8,16 ...

This will get you ALL the possible 37,628 combinations of 15 items,
which is something that I believe you do not really want. I have done
this with 4 or 5 items: those give 16 or 32 categories, which is
(IMO) about enough before trying something else.

One thing to try might be to use the 4 most important reasons
as themselves, plus one more as "any other reason".

Under the rubric of "data reduction", you could try a factor analysis
to lead to the creation of a few "factors" that might be either "any
of A, B, C, or D" or "The number marked of A, B, C, and D". But
reducing the number of categories would be a priority for me.


> The second method did give me what looked right once I ran a frequenicy. I had a table with 1-15 listed. However, when I attempted to match the numbers from the table to the dichotomous variable frequencies, none of them matched.

Right. And Proper. And logical. Perhaps you might use
"Mult-Response" to look at some frequencies and crosstabs
If there are 15 separate variables, you keep all the combinations
as I did above; your single, new variable has to shed a lot
of distinctions if it is going to have a small number of categories.
I offered to transform (what turns out to be) 15 items to a
new variable, "Most important of the 15".

You can compute that by (a) setting to the first one encountered
when listed from Most to Least, and skipping further tests; or
(b) test from Least to Most, setting to each one when it is
encountered, so that the Most important encountered is the
value left in the variable after testing the whole list.

Thus, the only one that is likely to match is the result of the final
item. "Last-listed (most important)" as I said. To repeat:

>> IF (reasons EQ 1) Causes= code.

- is my code that replaces the result of every previous test,
whenever (Reasons EQ 1). [I usually put that IF condition
inside of parentheses, for clarity, but the parens are optional
in SPSS.] I showed the list in reverse-order in order that the
most-important would end up as coded "1".

When the count shows "most important", then it should be
apparent that any person who marked off more than one
Reason is not going to have that second or third reason
shown in the total.


>
>For instance, physical abuse should have been 15 on the new chart or possibly 1. This is one reason I wanted to check to make sure I paired them correctly. On the new chart, 15 has a frequency of 32,636 and 1 has a frequency of 67.736. Physical abuse, though, had a yes frequency of 90,659. That number was not anywhere in the new chart. None of the yes frequencies from the dichotomous variables ended up in the new chart.

So: You need to decide what you want your new variable
to include. If it going to have only 15 categories, it is NOT
going to have all possible 37000+ combinations.

I now conclude... You PROBABLY want more than one variable.

You want to know about a few important combinations.
Perhaps you can use my binary coding that uses 1,2,4,8
for the most important, and after that, see what remains
that is worth mentioning.

- Try to imagine a "nice writeup" of your sort of data. What
would it report on?

>
>Here is the syntax I used for the first way:
>
>COMPUTE Causes= 1*PHYABUSE + 2*SEXABUSE + 3*NEGLECT + 4*AAPARENT + 5*DAPARENT + 6*AACHILD + 7*DACHILD + 8*CHILDIS + 9*CHBEHPRB + 10*PRTSDIED + 11*PRTSJAIL + 12*NOCOPE + 13*ABANDMNT +14*RELINQSH + 15*HOUSING
>
>And here is the syntax I used for the second way:
>
>COMPUTE Causes= 0.
>DO REPEAT reasons= PHYABUSE, SEXABUSE, NEGLECT, AAPARENT, DAPARENT, AACHILD, DACHILD, CHILDIS, CHBEHPRB, PRTSDIED,PRTSJAIL,NOCOPE, ABANDMNT, RELINQSH, HOUSING
> / code= 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.
>IF reasons EQ 1 Causes= code.
>END REPEAT.
>

--
Rich Ulrich
0 new messages