Scale correlation, including ICCs, gives an assessment of a *scale*
when evaluated in a specific *sample* that shows some amount
of variation. Correlation is another measure of the ability to
discriminate: In ICC, it can be explicitly computed from the
ANOVA table, and a high F-test between-classes implies a high
reliability in measuring (and discriminating) those classes.
Notice that when there is little variation in whatever-is-
measured, the "reliability" will measure as poor.
N=3 is very small. Are these individuals supposed to be
reliably different? Or are these individuals supposed to differ
dramatically over time, so that the paired ratings are specific?
You can approach all these questions at once, by setting up
the full ANOVA table: subjects, raters, occasions. There is
some "Total variance." There is a certain amount of that
which is partitioned into each of the Sources, and Error.
That's all the information that you have. The SPSS documentation
points to a good article on partitioning error. You can estimate or
construct ICCs by various formulas, but none of them are going
to be very convincing for your total N=12, simply on the grounds
of non-representativeness.
The simplest ICC just wants to say that Subjects are seen as different
by Raters: big Subject effect in ANOVA. You could list out your
12 scores as if they are different subjects, to get Paired-ratings.
That would be a simple start, but I think I would want to know
the whole 3x2x2 ANOVA table, too.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html