Multiple Kappa in SPSS?

sethxyz

unread,

Jul 28, 2008, 6:04:05 PM7/28/08

to

I have a dataset with binomial (yes/no) face validity ratings of 54
self-report scale items by 19 "expert" raters (i.e., each rating is a
"1" when an expert thinks the item measures the scale's construct and
"0" when an expert believes the item does not measure the construct).
I am trying to figure out the best way to quantify the level of
agreement between my 19 raters. This leads me to two basic questions.
First, after reading up, it seems that a Cohen's Kappa for multiple
raters would be the most appropriate means for doing this (as opposed
to an intraclass correlation, mean interrater correlation, etc.). I'm
curious if folks agree/disagree with that. Second, the big question,
is there a way to calculate a Multiple Kappa in SPSS? It seems easy to
get a kappa for two raters in Crosstabs, but I have 19 raters. It also
appears that SAS lets you calculate a multiple kappa, but I have
neither experience with nor access to SAS. Any help would be GREATLY
appreciate. Thanks, Seth

sethxyz

unread,

Jul 28, 2008, 6:12:29 PM7/28/08

to

PS--I of course meant to say that I have a dataset of binary, not
binomial ratings. Sorry for the typo and multiple postings. Seth

Kylie

unread,

Jul 28, 2008, 8:27:03 PM7/28/08

to

Hi Seth,

There is a SPSS macro for multiple rater Kappa available from the SPSS
technical support website (http://support.spss.com), in the Macro
Library. It computes the multi-rater Kappa as discussed in S. Siegel
and J. N. Castellan's Nonparametric Statistics for the Behavioral
Sciences.

Hope this helps.

Cheers,
Kylie.

Bruce Weaver

unread,

Jul 28, 2008, 9:57:26 PM7/28/08

to

Here is something I posted a while ago in Medstats. (It is a
Google group, not a usenet group.) Note especially the paragraph
before the footnote.

--- From MedStats post (21-Apr-2008) ---

This is from Norman & Streiner's "Biostatistics: The Bare
Essentials" (2000, 2nd Ed., p. 221):

"One reason why Cicchetti was fighting a losing battle [wrt
weights for weighted kappa] is that the weighted kappa using
quadratic weights has a very general property--it is
mathematically (i.e., exactly) equal to the ICC. We must be
pulling your leg, right? Nope."

They then proceed to give an example using a one-way repeated
measures problem from earlier in the book (Chapter 11, data in
Table 11-1).

Then...

"It also follows that if we were to analyze a 2x2 table with
ANOVA, using numbers equal to 0 and 1, unweighted kappa would
equal this ICC when calculated like we did above (Cohen, 1968)."

And a bit further on...

"Who cares [about the equivalence of weighed kappa and the ICC]?
Well, this eases interpretation. Kappa can be looked on as just
another correlation, explaining some percent of the variance. And
there is another real advantage. If we have multiple observers,
we can do an intraclass correlation and report it as an average
kappa(8) instead of doing a bunch of kappas for Observer 1 vs.
Observer 2, Observer 1 vs. 3, etc."

Footnote 8 - "This also accommodates apparently religious
differences among journals. Some journals like ICCs, some like
kappas. We have on occasion calculated an ICC and called it a
kappa, and vice versa, just to keep the editor happy."

--- End of MedStats post ---

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

RichUlrich

unread,

Jul 29, 2008, 12:13:55 AM7/29/08

to

On Mon, 28 Jul 2008 15:04:05 -0700 (PDT), sethxyz <set...@gmail.com>
wrote:

The raters are rating "Whether this item measures the construct"?
And then, it seems, you must be computing a statistic across the
54 items. Is this right?

If that is the case -
This does not seem like an application for kappa, or for
ICC. You can compute an ICC-like statistic from it, but the
basic, assumed difference between items should keep you
from using either kappa or ICC to describe it.

It seems to me that the important information about each item
is how many of 19 experts consider it to measure the construct.

--
Rich Ulrich

Ryan

unread,

Jul 29, 2008, 8:35:19 AM7/29/08

to

On Jul 28, 6:04 pm, sethxyz <seth...@gmail.com> wrote:
> I have a dataset with binomial (yes/no) face validity ratings of 54
> self-report scale items by 19 "expert" raters (i.e., each rating is a
> "1" when an expert thinks the item measures the scale's construct and
> "0" when an expert believes the item does not measure the construct).

I would not consider this to be face validity. This is content
validity.

> I am trying to figure out the best way to quantify the level of
> agreement between my 19 raters.

One option is to use the content validity ratio.

This leads me to two basic questions.
> First, after reading up, it seems that a Cohen's Kappa for multiple
> raters would be the most appropriate means for doing this (as opposed
> to an intraclass correlation, mean interrater correlation, etc.). I'm
> curious if folks agree/disagree with that. Second, the big question,
> is there a way to calculate a Multiple Kappa in SPSS?

Yes. Others have pointed out how to do this.

It seems easy to
> get a kappa for two raters in Crosstabs, but I have 19 raters. It also
> appears that SAS lets you calculate a multiple kappa, but I have
> neither experience with nor access to SAS. Any help would be GREATLY
> appreciate. Thanks, Seth

Ryan