Merging overlapping variables

MsHolm

unread,

Jan 23, 2015, 1:19:07 PM1/23/15

to

Hi,

I´m currently working in a SPSS file where 2 different people have punched partly overlapping data in the same cases into 2 different variables e.g. Var1: mac143 and Var2: mam143. I want to include the information of both vars, so:

How to merge these 2 variables into 1 where they have the same value, and code the ones where they disagree into missing.

This is the case with nearly 200 variables, so efficient help is much appreciated!

M

Bruce Weaver

unread,

Jan 23, 2015, 5:01:02 PM1/23/15

to

So you want to compute a NEW variable that is set to the value of var1
(or equivalently) var2 when var1 and var2 are the same; but you want the
new variable left blank when var1 and var2 are not the same?

For just those two variables, does this do what you want?

STRING Combined12(A8).
IF var1 EQ var2 Combined12 = var1.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

MsHolm

unread,

Jan 24, 2015, 12:12:44 PM1/24/15

to

Thank you for answering, your help is much appreciated! This didn´t work, although I might get it wrong. I used the following: mammac64 as new variable, and mac64_abn_fetal_blood_velocity/ mam64_abn_fetal_blood_velocity as Var1 and 2.

COMPUTE mammac64=mac64_abn_fetal_blood_velocity.
IF mac64_abn_fetal_blood_velocity EQ mam64_abn_fetal_blood_velocity mammac64 = mac64_abn_fetal_blood_velocity.

The thing I struggle with is to fill in the blanks in my new variable where var 1 does not have a number wiht the values from var 2 .

I did this before by making dummies for when I have var 1 and var 2 values, but I hoped to do this more efficient!

Mari

Rich Ulrich

unread,

Jan 24, 2015, 3:22:37 PM1/24/15

to

On Sat, 24 Jan 2015 09:12:42 -0800 (PST), MsHolm
<mari....@gmail.com> wrote:

>Thank you for answering, your help is much appreciated! This didn´t work, although I might get it wrong. I used the following: mammac64 as new variable, and mac64_abn_fetal_blood_velocity/ mam64_abn_fetal_blood_velocity as Var1 and 2.
>
>COMPUTE mammac64=mac64_abn_fetal_blood_velocity.
>IF mac64_abn_fetal_blood_velocity EQ mam64_abn_fetal_blood_velocity mammac64 = mac64_abn_fetal_blood_velocity.
>
>The thing I struggle with is to fill in the blanks in my new variable where var 1 does not have a number wiht the values from var 2 .

"wiht the values from" ... seems to mean, with the same values as

>
>I did this before by making dummies for when I have var 1 and var 2 values, but I hoped to do this more efficient!

I don't follow what you mean by "making dummies" for this problem.

You said that you wanted the "different" instance to be coded
as missing, but your lines above initialize the new value to the
mac value.

Bruce's example let the presumed-String stay as $sysmis.

If you want to intlalize to a specific missing code, you can
let the first COMPUTE set the value to a string or -99 or
whatever.

--
Rich Ulrich

MsHolm

unread,

Jan 24, 2015, 5:22:41 PM1/24/15

to

Thanks again, and I´m sorry that I didn´t manage to make myself clear. Thanks for your patience! I´ll try again (hopefully better :-)

Two readers rate the same variable and punch it individually into SPSS, (mac64_abn_fetal_blood_velocity and mam64_abn_fetal_blood_velocity).
Reader 1 rates 150 subjects and reader 2 rates 150, the total N = 300.
100 cases have only reader 1 score, 100 have only reader 2, while 100 of the subjects have been rated twice. Of the cases rated twice, 50 have the same score from both reader 1 and 2, 50 have a divergent score.

So, I want merge the information from both readers into one variable where the 50 with divergent score are missing.

I understand that I can put in the 50 cases from either reader where they overlap and agree.

Thank you so much again!

Maria

Bruce Weaver

unread,

Jan 24, 2015, 8:14:36 PM1/24/15

to

I think it would help *immensely* if you generated a small sample
dataset (e.g., one tenth the size of your actual dataset) that shows
exactly what the data look like now, and what you want it to look like
when you're finished. This will be far more efficient than trying to
describe it.

HTH.

Rich Ulrich

unread,

Jan 25, 2015, 1:59:48 AM1/25/15

to

On Sat, 24 Jan 2015 14:22:39 -0800 (PST), MsHolm
<mari....@gmail.com> wrote:

>Thanks again, and I´m sorry that I didn´t manage to make myself clear. Thanks for your patience! I´ll try again (hopefully better :-)
>
>Two readers rate the same variable and punch it individually into SPSS, (mac64_abn_fetal_blood_velocity and mam64_abn_fetal_blood_velocity).
>Reader 1 rates 150 subjects and reader 2 rates 150, the total N = 300.
>100 cases have only reader 1 score, 100 have only reader 2, while 100 of the subjects have been rated twice. Of the cases rated twice, 50 have the same score from both reader 1 and 2, 50 have a divergent score.

Here is what is confusing: You SEEM to have two raters
whose scores are called mac_ and mam_ respectively.

That would distinguish them, if they are in the same file and
each in the same record. That is what Bruce and I have assumed,
and our solution works. If they are in separate files, you can match
the two files on ID in order to put them in the same record.

Now, knowing that the interesting cases that need testing are
the ones with BOTH values instead of just one, a slightly different
test would work.

COMPUTE comb= MIN.1( mac_, mam_).
IF (comb NE MAX.1(mac_, mam_ ) ) comb= -99.
* set to missing when there are two non-matched values.

You have increased the confusion with the counts that you
have given, because they don't add up.
100 for reader 1 only
100 for reader 2 only
200 ratings with 100 for 1 and 100 for 2

If that is true, then each rater did 200.

>
>So, I want merge the information from both readers into one variable where the 50 with divergent score are missing.
>
>I understand that I can put in the 50 cases from either reader where they overlap and agree.
>

I'm sure that the analyses that you run should include a crosstab
or scattergram of the 100 subjects, to look at the apparent,
inherent error of all the measures (assuming the dual ratings
were not done because the cases were more difficult).

--
Rich Ulrich

Bruce Weaver

unread,

Jan 25, 2015, 4:34:32 PM1/25/15

to

On 25/01/2015 1:59 AM, Rich Ulrich wrote:
> On Sat, 24 Jan 2015 14:22:39 -0800 (PST), MsHolm
> <mari....@gmail.com> wrote:
>
>> Thanks again, and I´m sorry that I didn´t manage to make myself clear. Thanks for your patience! I´ll try again (hopefully better :-)
>>
>> Two readers rate the same variable and punch it individually into SPSS, (mac64_abn_fetal_blood_velocity and mam64_abn_fetal_blood_velocity).
>> Reader 1 rates 150 subjects and reader 2 rates 150, the total N = 300.
>> 100 cases have only reader 1 score, 100 have only reader 2, while 100 of the subjects have been rated twice. Of the cases rated twice, 50 have the same score from both reader 1 and 2, 50 have a divergent score.
>
> Here is what is confusing: You SEEM to have two raters
> whose scores are called mac_ and mam_ respectively.
>
> That would distinguish them, if they are in the same file and
> each in the same record. That is what Bruce and I have assumed,
> and our solution works. If they are in separate files, you can match
> the two files on ID in order to put them in the same record.

--- snip ---

Here is one other thing that confused me way back in the original post.
MsHolm (Mari?) wrote:

"Var1: mac143 and Var2: mam143"

I understood the mac143 and mam143 to be VALUES of variables Var1 and
Var2. That is why my solution used a new string variable. But I think
Rich understood (probably correctly) that mac143 and mam143 were two
variable names, not VALUES of Var1 and Var2. Hence his suggestions that
have treated them as numeric variables.

Again, all of this would have been much clearer if a full example of the
data BEFORE and AFTER had been posted. ;-)