Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ICC analysis for interater and intra rater reliability

121 views
Skip to first unread message

vbar...@pathwayscenter.org

unread,
May 22, 2006, 10:08:07 AM5/22/06
to
I am trying to run some ICC for assessing inter and intra rater
reliability in a clinical setting, but I'm having doubts in setting up
the SPSS file and in choosing the right model. We had two raters
collecting data on 3 children, twice within a week. What I was thinking
was to have each child in a row, and having 4 columns, like
rater1time1, rater1time2, rater2time1, rater2time2 and run four sets of
ICC analysis. One for rater1time1 and rater1time2 and another one for
rater2time1 and rater2time2 - intra rater/ test-retest for each rater,
and one for rater1time1 and rater2time1 and one for rater1time2 and
rater2time2 (interrater reliability). Does it make sense to you? Or..
should it be reversed? I mean the raters on the rows and the subjects
on the columns (i.e., client1time1, client 1time2, etc..) since my
interest is on the raters? Do you have any other suggestion? Since all
raters evaluated all children and I have repeated measures should I
consider the 2-way random model from the ICC statistic section in the
SPSS? My interest is in looking at the reliability of our raters here;
the tests used have reported reliability, including some Rasch
analysis.

Thanks a lot in advance. I would be so grateful if you could help me.

Richard Ulrich

unread,
May 22, 2006, 3:36:23 PM5/22/06
to
On 22 May 2006 07:08:07 -0700, "vbar...@pathwayscenter.org"
<vbar...@pathwayscenter.org> wrote:

Scale correlation, including ICCs, gives an assessment of a *scale*
when evaluated in a specific *sample* that shows some amount
of variation. Correlation is another measure of the ability to
discriminate: In ICC, it can be explicitly computed from the
ANOVA table, and a high F-test between-classes implies a high
reliability in measuring (and discriminating) those classes.

Notice that when there is little variation in whatever-is-
measured, the "reliability" will measure as poor.

N=3 is very small. Are these individuals supposed to be
reliably different? Or are these individuals supposed to differ
dramatically over time, so that the paired ratings are specific?


You can approach all these questions at once, by setting up
the full ANOVA table: subjects, raters, occasions. There is
some "Total variance." There is a certain amount of that
which is partitioned into each of the Sources, and Error.

That's all the information that you have. The SPSS documentation
points to a good article on partitioning error. You can estimate or
construct ICCs by various formulas, but none of them are going
to be very convincing for your total N=12, simply on the grounds
of non-representativeness.

The simplest ICC just wants to say that Subjects are seen as different
by Raters: big Subject effect in ANOVA. You could list out your
12 scores as if they are different subjects, to get Paired-ratings.
That would be a simple start, but I think I would want to know
the whole 3x2x2 ANOVA table, too.


--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

0 new messages