Incremental Contribution to Regression of Correlated Dichotomous Variables

Haris

unread,

Nov 13, 2009, 2:13:44 PM11/13/09

to

I have a dataset where Gold Standard Diagnosis (Y/N) is known for each
patient in the set. Then there also are three tests (T1-T3, also Y/N)
that are trying to match the Gold Standard. This is a a case of four
correlated non-independent dichotomous variables. When taken two at a
time, I can use McNewmar test to assess the relationship and compute
KAPPA reliability but I need to consider them all simultaneously.

1. How do I test if T2 contributes anything to the prediction of Gold
Standard when CONTROLLING for what is already explained by T1?

2. Can I compute a kappa inter-test reliability coefficient with SPSS
when there are more than two correlated variables involved?

I am stuck for now. Need help.

Rich Ulrich

unread,

Nov 13, 2009, 4:09:47 PM11/13/09

to

On Fri, 13 Nov 2009 11:13:44 -0800 (PST), Haris <karov...@gmail.com>
wrote:

>I have a dataset where Gold Standard Diagnosis (Y/N) is known for each
>patient in the set. Then there also are three tests (T1-T3, also Y/N)
>that are trying to match the Gold Standard. This is a a case of four
>correlated non-independent dichotomous variables. When taken two at a
>time, I can use McNewmar test to assess the relationship and compute
>KAPPA reliability but I need to consider them all simultaneously.
>
>1. How do I test if T2 contributes anything to the prediction of Gold
>Standard when CONTROLLING for what is already explained by T1?

Compute a regression with Gold as the DV and T1, T2 as
explanatory variables. The test on each of partial regression
coefficients is the test on how much they add to the other.

>
>2. Can I compute a kappa inter-test reliability coefficient with SPSS
>when there are more than two correlated variables involved?
>

The only kappa that is regularly understood is the 2x2
comparison. In my opinion, no one should bother to use
any of the others -- except, say, for describing the average
of several 2x2's. Jacob Cohen (devisor of kappa) once
mentioned that the more complicated versions are very well
approximated by computing the analogous intraclass correlation
coefficient (ICC).

--
Rich Ulrich

Haris

unread,

Nov 16, 2009, 3:42:15 PM11/16/09

to

> >I have a dataset where Gold Standard Diagnosis (Y/N) is known for each
> >patient in the set. Then there also are three tests (T1-T3, also Y/N)
> >that are trying to match the Gold Standard. This is a a case of four
> >correlated non-independent dichotomous variables. When taken two at a
> >time, I can use McNewmar test to assess the relationship and compute
> >KAPPA reliability but I need to consider them all simultaneously.
>
> >1. How do I test if T2 contributes anything to the prediction of Gold
> >Standard when CONTROLLING for what is already explained by T1?
>
> Compute a regression with Gold as the DV and T1, T2 as
> explanatory variables. The test on each of partial regression
> coefficients is the test on how much they add to the other.

How do you account for the fact that Gold Standard, T1, and T2 are not
INDEPENDENT? Conventional logistic regression assumes independent
sampling but the results of T1 and T2 are NOT independent of Gold
Standard. For two variables one is to use McNewmar rather than Chi-
Square test. T-tests come in two flavors: independent and paired.
What regression should I use for paired or repeated measures
dichotomous data? I don't think that regular LOGISTIC is appropriate.

Bruce Weaver

unread,

Nov 16, 2009, 5:45:42 PM11/16/09

to

McNemar's chi-square is used when you measure the same variable on two
occasions (e.g., pre and post), or in matched pairs of individuals.
But that's not what you describe. If I follow, you have 4 different
variables (T1-T3 and Gold), not the same variable measured 4 times, or
in match-groups of 4.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Rich Ulrich

unread,

Nov 16, 2009, 8:38:45 PM11/16/09

to

Here are some comments that might be read after reading Bruce's.
Additional detail.

On Mon, 16 Nov 2009 12:42:15 -0800 (PST), Haris <karov...@gmail.com>
wrote:

>> >I have a dataset where Gold Standard Diagnosis (Y/N) is known for each

>> >patient in the set. �Then there also are three tests (T1-T3, also Y/N)
>> >that are trying to match the Gold Standard. �This is a a case of four
>> >correlated non-independent dichotomous variables. �When taken two at a
>> >time, I can use McNewmar test to assess the relationship and compute
>> >KAPPA reliability but I need to consider them all simultaneously.
>> >
>> >1. How do I test if T2 contributes anything to the prediction of Gold
>> >Standard when CONTROLLING for what is already explained by T1?

[me]>>

>> Compute a regression with Gold as the DV �and T1, T2 as
>> explanatory variables. �The test on each of partial regression
>> coefficients is the test on how much they add to the other.

Haris[I am breaking up the long paragraph]>

>How do you account for the fact that Gold Standard, T1, and T2 are not
>INDEPENDENT?

What are you thinking of? Every regression uses sets of
cases. Each predictor is associated with the same outcome.
If they are not also associated (correlated) with each other,
you don't need to use a simultaneous procedure like regression.

> Conventional logistic regression assumes independent
>sampling but the results of T1 and T2 are NOT independent of Gold
>Standard.

No... or else I don't understand what you are saying.

> For two variables one is to use McNewmar rather than Chi-
>Square test.

Well, Yes. I can see a way to apply a version of the McNemar
test to the data - I would probably retreat to describing it as
a sign test, since the typical situation for "McNemar's" is something
different. I have sometimes liked this, when the data are thin,
because it provides the detailed description of what change has
been introduced by (say) the introduction of a slightly revised
diagnostic system.

The idea here would be to look at the complete 2x2x2 table for
T1, T2 and Gold. There are 4 cells where T1 and T2 are the
same, both No (diagnosis) or Yes. These counts represent
cases that neither T1 or T2 add anything to the information of
the other. What matters are the other cells, where T1 (or, read
"T2") is right (matching Gold) and T2 (or "T1") is wrong.

If you can state the null hypothesis as the simple notion that
these two numbers might be expected to split 50-50, then
you can look at the sign test. That might be applied to
Specificity; or Sensitivity; or Total errors vs. improvement.

The McNemar's version takes the square of the difference,
and divides by the sum of the two numbers.

> T-tests come in two flavors: independent and paired.
>What regression should I use for paired or repeated measures
>dichotomous data? I don't think that regular LOGISTIC is appropriate.

- don't see any relevance.

Hope this helps.

--
Rich Ulrich