Evie (et. al.):
The actual distribution of the measurements is irrelevant.
What counts is the distribution of the DIFFERENCES of the measurements.
If you compute the differences you can produce a 95% prediction interval for the difference as mean +/- 2 SD, though 5 observations is not much to base an estimate of an SD on. This is the so-called Limits of Agreement, see:
author = "JM Bland and DG Altman",
title = "Statistical methods for assessing agreement between two
methods of clinical measurement",
journal = "Lancet",
year = "1986",
volume = "i",
pages = "307--310"
or
author = {JM Bland and DG Altman},
title = {Measuring agreement in method comparison studies.},
journal = {Statistical Methods in Medical Research},
year = {1999},
volume = {8},
pages = {136--160}
With 5 observations there is of course so little information that a test of the difference = 0 is non-significant, but this hypothesis is beside the point. You are interested in whether the two observers are sufficiently close, and that is what you use the prediction interval for the differences to assess. But with 5 obs. this is very poorly determined.
You may of course also consider whether you should take the logarithm before you take the differences, in which case you will get the differences as the log of the ratio of the measurements, and you can back-transform to a prediction interval for the ratio of the measurements by the two observers.
The ICC is meaningless in the context of comparing two observers, as is any other correlation measure, see e.g.:
author = {G Atkinson and A Neville},
title = {Comment on the use of concordance correlation to
assess the agreement between two variables.},
journal = {Biometrics},
year = {1997},
volume = {52},
pages = {775--778},
Your main problem is the small number of moulds measured. You can a get bit better handle on the individual observers' precision by using the original replicate data;
the problem can be solved using a fairly simple variance components model that can be stuffed into the usual statistical packages, see:
author = {B Carstensen and J Simpson and LC Gurrin},
title = {Statistical models for assessing agreement in method
comparison studies with replicate measurements},
journal = {International Journal of Biostatistics},
year = {2008},
volume = {4},
number = {1},
pages = {Article 16}
But even with 10 replicates by each observer, you will not get a reliable prediction interval with only 5 moulds measured, regardless of the scaling.
If you make the log-transform you should do it on the original replicate measurements, and the SD.s you get from the variance components model can then be interpreted as coefficients of variation (provided you use the natural log).
Best regards,
Bendix Carstensen
_______________________________________________
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://www.biostat.ku.dk/~bxc
www.steno.dk
> Website:
http://www <
http://www/> DOT