Well, you could just answer "no" but that's probably not what they want :-)
First, you have to figure out what they mean by 'agree'. Some possibilities:
1) That people who are higher on one scale are higher on the other
2) That people who are at the extremes on one are on the extremes on the other, and otherwise, like 1)
3) That people who are at 1 on the four point are at 1 or 2 on six, at 2 are on 2 or 3, at 3 at 4 or 5 and at 4 at 5 or 6.
and other schemes that are slight variations on 3)
4) That the means of groups of people fit a linear approximation between the scales
and who knows what else!
Peter
Peter L. Flom, PhD
Statistical Consultant
www DOT peterflom DOT com
I agree that you need to talk to them about what they actually want. My
guess would be that they mean "Do they measure the same or similar
things?". I would suggest a correlation approach. This is not useful
for agreement, because it does not notice things like one scale giving
bigger numbers than the other, but is useful for validity. You will
need a rank correlation, I think, because these are only ordinal
variables, and I would suggest Kendall's tau b to deal with the many
tied ranks. However, once you have done that and got your number, what
does it mean? You can test the null hypothesis that tau = 0, but a
significant correlation does not tell you much. You need to set a
reasonable value for tau to represent agreement. This is difficult to
do, because tau, like any correlation, depends on the variability of the
quantity being measured. When we have a square table, with the same
possible scores for both variables, tau b tends to be a bigger than
linearly weighted kappa. I would suggest multiplying the usual lower
limits for categories of kappa (>0.8 = very good agreement, >0.6 = good
agreement, >0.4 = moderate agreement, >0.2 = fair agreement) by 1.1 to
give plausible categories for tau b.
Martin
--
***************************************************
J. Martin Bland
Prof. of Health Statistics
Dept. of Health Sciences
Seebohm Rowntree Building Area 2
University of York
Heslington
York YO10 5DD
Email: mb...@york.ac.uk
Phone: 01904 321334 Fax: 01904 321382
Web site: http://martinbland.co.uk/
***************************************************
I thought so .... glad to have independent confirmation :-)
>I agree that you need to talk to them about what they actually want. My
>guess would be that they mean "Do they measure the same or similar
>things?". I would suggest a correlation approach. This is not useful
>for agreement, because it does not notice things like one scale giving
>bigger numbers than the other, but is useful for validity. You will
>need a rank correlation, I think, because these are only ordinal
>variables, and I would suggest Kendall's tau b to deal with the many
>tied ranks. However, once you have done that and got your number, what
>does it mean? You can test the null hypothesis that tau = 0, but a
>significant correlation does not tell you much. You need to set a
>reasonable value for tau to represent agreement. This is difficult to
>do, because tau, like any correlation, depends on the variability of the
>quantity being measured. When we have a square table, with the same
>possible scores for both variables, tau b tends to be a bigger than
>linearly weighted kappa. I would suggest multiplying the usual lower
>limits for categories of kappa (>0.8 = very good agreement, >0.6 = good
>agreement, >0.4 = moderate agreement, >0.2 = fair agreement) by 1.1 to
>give plausible categories for tau b.
>
I did not know of this relations between kappa and tau-b, so that is good to know.
Correlation is certainly one vital part of validating one scale against another. But, depending on what use they are making of these things, it may not be enough. Although the original poster didn't say why this question was being asked (always a useful thing to know!) it strikes me that it might be that some group wants to substitute one scale for the other --- perhaps it's cheaper or something, or requires less expertise. If this is the case, then I, for one, would insist on considerably more evidence of validity.
Bendix
______________________________________________
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://www.biostat.ku.dk/~bxc
Hi All,
Use of Pearson's correlation coefficients has been criticised by Bland & Altman: See below,
They have also suggeted a very understandable and doable method to evaluate agreement between two numerical scales.
STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT J. Martin Bland, Douglas G. Altman Department of Clinical Epidemiology and Social Medicine, St. George's Hospital Medical, School, London SW17 ORE; and Division of Medical Statistics, MRC Clinical Research, Centre, Northwick Park Hospital, Harrow, Middlesex (Lancet, 1986; i: 307-310)
Dr Ghasem Yadegarfar
PhD in Epidemiology, MSc in Biostatistics, BSc in Maths Assistant Professor, Epidemiologist Biostat & Epidemiology Dept.,
School of Public Health Sciences
Isfahan University of Medical Sciences Isfahan, IRAN |
I do not think this is the kind of agreement we are looking for. I
suspect we are asking "How closely do these methods measure the same
thing, even though it is measured in different units". As the two
scales are unlikely to have interval properties, I suggested rank
correlation as an approach. The limits of agreement approach proposed
by Altman and Bland should not be used for these data.
Martin
ghasem yadegarfar wrote:
> Hi All,
>
> Use of Pearson's correlation coefficients has been criticised by Bland
> & Altman: See below,
>
> They have also suggeted a very understandable and doable method to
> evaluate agreement between two numerical scales.
>
> **
>
> *STATISTICAL METHODS FOR ASSESSING AGREEMENT *
>
> *BETWEEN TWO METHODS OF CLINICAL MEASUREMENT*
>
> J. Martin Bland, Douglas G. Altman
>
> Department of Clinical Epidemiology and Social Medicine, St. George's
> Hospital Medical, School, London SW17 ORE; and Division of Medical
> Statistics, MRC Clinical Research, Centre, Northwick Park Hospital,
> Harrow, Middlesex
>
> (/Lancet/, 1986; *i: *307-310)
>
>
> Ghasem
>
>
> /Dr Ghasem Yadegarfar /
> /PhD in Epidemiology, MSc in Biostatistics, BSc in Maths
> Assistant Professor, Epidemiologist/
> /Biostat & Epidemiology Dept., /
> /School of Public Health Sciences
> Isfahan University of Medical Sciences
> Isfahan, IRAN/
> /Tel: 0098 311 792 2771 (Office)
> Fax: 0098 311- 6682509
> Email: yadeg...@yahoo.co.uk/
--
1) I think agreement is a weird word to use when scales are completely incompatible. But people do use it. When they do, the statistician must clarify what they mean before he or she can say anything useful
2) Even when scales are compatible, agreement can be tricky. How close do two measurements have to be in agreement? Suppose we are talking about ordinary bathroom scales. Five people try each of two scales and get:
First scale 180 150 192 198 125
Second 181 149 194 200 123
do they "agree"? How could we tell? Would correlation be good?
If the two were off by only ounces, would that be enough?
3) In the case that started all this, the scales aren't *completely* incompatible, they are somewhat incompatible. 1 through 4 and 1 through 6 aren't so far off, but it's not clear what to do to make them equivalent.
4) Whether correlation is, or is not, a good measure of agreement does not, to me, depend on the compatibility of the scales, but on the meaning of agreement.
Or otherwise phrased: If we use method 1 and predict that method 2 would have given the same result we are only off by clinically irrelevant amount.
So it all boils down to clinical judgement (which I normally find very hard to tease out of clinicians).
Likewise, if there is a linear relationship between method 1 and 2 we may predict that method 2 would have given a value of A + B x method1 bar a clinically irrelevant random error. Thus the scaling does not make any difference to the essence of the problem. And linar transformation of method2 results does not alter the correlation, so neither in this case it is relevant.
But in the case of A \neq 0 and B \neq 1 there is of course an estimation problem to get A and B.
This has partly beeen addressed by regressing differences on averages in:
@Article{Bland:Altman.1999,
author = {Bland, J.M. and Altman, D.G.},
title = {Measuring agreement in method comparison studies.},
journal = {Statistical Methods in Medical Research},
year = {1999},
volume = {8},
pages = {136--160},
}
A fuller discussion of this specific topic is available in:
@TechReport{Carstensen.2008a,
author = {B Carstensen},
title = {Limits of agreement:
How to use the regression of differences on averages.},
institution = {Department of Biostatistics, University of Copenhagen},
year = {2008},
number = {08.6},
address = {\url{http://cms.ku.dk/sund-sites/ifsv-sites/english/about/departments/biostatistics/reports/2008/researchreport08-06.pdf}},
}
Best regards,
Bendix
______________________________________________
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://www.biostat.ku.dk/~bxc
> -----Original Message-----
> From: MedS...@googlegroups.com
> [mailto:MedS...@googlegroups.com] On Behalf Of Bland, M.
> Sent: 28. august 2008 10:39
> To: MedS...@googlegroups.com
> Subject: {MEDSTATS} Re: Agreement
>
>