Is anyone aware of any literature that discusses or investigates the
statistical implications of analysing BMI (body mass index) as categories
rather than leaving in its continuous form?
I am putting together a class discussing the statistical problems of
dichotomising/categorising variables in analyses, and BMI is such a commonly
used categorical measure that it would be nice to specifically discuss it.
Thanks,
Kylie.
... amidst the countless discussions and publications one sees about the
problems/undesirability of dichotomising or categorising data for analysis,
the one issue I haven't seen featuring very much is the fact that, having
done an (appropriate) analysis, one will often necessarily have to
interpret the results in relation to a (usually fairly arbitrarily defined)
dichotomy or 'multicotomy' - since the decisions to be made on the basis of
the analysis will so often be in terms such categories.
Hence, whilst I really cannot see how anyone can sensibly argue for any
sort of 'data reduction' prior to analysis (it seems so obvious to me that
'losing information' has got to be detrimental), it seems fairly unusual
to see the (e.g. dichotomous) hypotheses of interest actually being
formally tested in these situations.
Consider, for example, a study of measures taken with a view to reducing
the amount of 'drink driving' (as defined sociologically/legally). The
finding that the measures resulted in a significant and substantial
reduction in mean blood alcohol levels of drivers would certainly be
interesting, and to some extent 'promising' - but that could, of course,
simply reflect the fact that previously 'very drunk' drivers were 'not
quite so drunk' as the result of the measures. The question of interest,
hence the hypothesis that should be (but seemingly often isn't) tested, is
presumably whether the measure had a significant effect on the proportion
of drivers deemed (on the basis of whatever arbitrary criterion) to be
totally unsafe to drive - whether that criterion is a zero or finite blood
alcohol level, or whatever.
Just my few thoughts.
Kind Regards
John
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------
This is touched on briefly in David Streiner's introductory paper (ref below),
and rebutted, but you're right - it would be nice to see a fuller discussion of
this point.
Streiner DL. Breaking up is hard to do: the heartbreak of dichotomizing
continuous data. Can J Psychiatry 2002; 47: 262-266.
Thanks,
Kylie.
We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.
_____________________________________________________
Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
University of Oxford
Wolfson College Annexe
Linton Road
Oxford OX2 6UD
email: doug....@csm.ox.ac.uk
Tel: 01865 284400 (direct line
01865 284401)
Fax: 01865 284424
www:
http://www.csm-oxford.org.uk/
EQUATOR Network - resources for reporting research
www:
http://www.equator-network.org/
I completely agree with John. We also made this point briefly in relation to developing a prognostic model:
We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad
idea. Statistics in Medicine 2006; 25:127-141.
Similar comments apply to other contexts. But how this should best be done is not agreed, both in terms of the number of groups and (especially) the placement of the cutpoints. I reviewed this issue in
Altman DG. Categorizing continuous variables. In: Armitage P, Colton T (eds) Encyclopedia of
biostatistics. 2nd edn. Chichester: John Wiley, 2005: 708-711.
We do know though that choosing the cutpoints to minimise P value - or maximise differences in outcome - is highly biased, in common with other data-dependent analysis approaches. See
Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the
evaluation of prognostic factors. [Commentary] Journal of the National Cancer Institute 1994; 86:829-835.
and many other papers since.
Doug
-- --------------------------------------------------------------------- Karl Schlag Professor Tel: +34 93 542 1493 Dept. of Economics and Business Fax: +34 93 542 1746 Universitat Pompeu Fabra email: karl....@upf.edu Ramon Trias Fargas 25-27 NEW: www.econ.upf.edu/~schlag/ Barcelona 08005, Spain room: 20-221 Jaume I
There is already a body of work on this...
http://scholar.google.co.uk/scholar?hl=en&q=optimal%20cut%20points%20categorizing%20data
Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).
Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
On Thu, Jun 25, 2009 at 3:14 PM, Karl Schlag<karl....@upf.edu> wrote:Would there be interest in finding theoretical cutpoints? Say the objective would be to create 5 bins such that the power is largest. One would let the statistician choose the 4 cutpoints without "nature" being able to observe the choice.There is already a body of work on this... http://scholar.google.co.uk/scholar?hl=en&q=optimal%20cut%20points%20categorizing%20data Neil
-- --------------------------------------------------------------------- Karl Schlag Professor Tel: +34 93 542 1493