So I need to calculate Cohen's Kappa for two raters in 61 cases. There
are 6 categories that constitute the total score, and each category
received either a 0, 1, 2 or 3. First, I'm wondering if I can
calculate Cohen's Kappa overall for the total score (a sum of the 6
categories) and for each category. Secondly, assuming this can be
done, how do I structure the dataset. I know each rater should be in
columns, but where do the categories go for each score? The cases (61)
would go in rows, but would the categories go in columns or rows?
Would I have to have multiple datasets? Thanks,
Julie
You want something other than Cohen's kappa.
Kappa is pretty good for describing the agreement
between two dichotomous judgments, like Yes/No on
diagnosis. That is the only version of kappa that is
widely and wisely used. Kappa also exists for multiple
unordered categories - but it is not a good choice when
the categories are ordered. More than two categories
is also going to provide too many cells for precision when
you have only 61 cases.
It *sounds* like you have a "scale" which is made up
of 6 "items", each of which are scored from 0 to 3.
For that, you might want to assess the scale itself for
internal reliabiility using procedure Reliability to get
Cronbach's alpha.
To compare the two raters for average and for agreement,
you can look at the paired t-test procedure: the t-test
is a test on difference in average, and the Pearson r is
a good measure of agreement. For the simple situation
of two raters, that is all you need. For more complicated
designs, or for abstract consideration of "other raters", etc.,
it is occasionally desirable to report one sort or another of
an intraclass correlation, ICC
--
Rich Ulrich
Rich, you've not mentioned that in the case of ordered categories, one
can compute a weighted kappa. And weighted kappa with quadratic
weights is equivalent to the most commonly used variety of ICC. One
reference for this is "Biostatistics, The Bare Essentials", by Norman
& Streiner. I don't have the page numbers handy, but it is near the
end of the chapter on repeated measures ANOVA. I believe those
authors say the same thing in their book on Health Measurement Scales.
--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."
I would only mention "weighted kappa" to say, Ignore it. The
ICC is more familiar and more available. Unweighted kappa
with 3 or more categories is too problematic to worry about.
If you need some obscure generalizations, consider the variety of
"reliabilities" that the ICCs offer to estimate-- same two or more
raters jointly; other, untested raters, each taken separately; and
intermediate combinations.
--
Rich Ulrich