categorising BMI

Kylie Lange

unread,

Jun 23, 2009, 11:17:06 PM6/23/09

to MedS...@googlegroups.com

Hi all,

Is anyone aware of any literature that discusses or investigates the
statistical implications of analysing BMI (body mass index) as categories
rather than leaving in its continuous form?

I am putting together a class discussing the statistical problems of
dichotomising/categorising variables in analyses, and BMI is such a commonly
used categorical measure that it would be nice to specifically discuss it.

Thanks,
Kylie.

John Whittington

unread,

Jun 24, 2009, 11:13:54 AM6/24/09

to MedS...@googlegroups.com

As an aside ...

... amidst the countless discussions and publications one sees about the
problems/undesirability of dichotomising or categorising data for analysis,
the one issue I haven't seen featuring very much is the fact that, having
done an (appropriate) analysis, one will often necessarily have to
interpret the results in relation to a (usually fairly arbitrarily defined)
dichotomy or 'multicotomy' - since the decisions to be made on the basis of
the analysis will so often be in terms such categories.

Hence, whilst I really cannot see how anyone can sensibly argue for any
sort of 'data reduction' prior to analysis (it seems so obvious to me that
'losing information' has got to be detrimental), it seems fairly unusual
to see the (e.g. dichotomous) hypotheses of interest actually being
formally tested in these situations.

Consider, for example, a study of measures taken with a view to reducing
the amount of 'drink driving' (as defined sociologically/legally). The
finding that the measures resulted in a significant and substantial
reduction in mean blood alcohol levels of drivers would certainly be
interesting, and to some extent 'promising' - but that could, of course,
simply reflect the fact that previously 'very drunk' drivers were 'not
quite so drunk' as the result of the measures. The question of interest,
hence the hypothesis that should be (but seemingly often isn't) tested, is
presumably whether the measure had a significant effect on the proportion
of drivers deemed (on the basis of whatever arbitrary criterion) to be
totally unsafe to drive - whether that criterion is a zero or finite blood
alcohol level, or whatever.

Just my few thoughts.

Kind Regards
John

John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

Kylie Lange

unread,

Jun 24, 2009, 8:08:57 PM6/24/09

to MedS...@googlegroups.com, John Whittington

Hi John,

This is touched on briefly in David Streiner's introductory paper (ref below),
and rebutted, but you're right - it would be nice to see a fuller discussion of
this point.

Streiner DL. Breaking up is hard to do: the heartbreak of dichotomizing
continuous data. Can J Psychiatry 2002; 47: 262-266.

Thanks,
Kylie.

Doug Altman

unread,

Jun 25, 2009, 5:25:10 AM6/25/09

to MedS...@googlegroups.com

I completely agree with John. We also made this point briefly in relation to developing a prognostic model:

We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.

Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad
idea. Statistics in Medicine 2006; 25:127-141.

Similar comments apply to other contexts. But how this should best be done is not agreed, both in terms of the number of groups and (especially) the placement of the cutpoints. I reviewed this issue in

Altman DG. Categorizing continuous variables. In: Armitage P, Colton T (eds) Encyclopedia of
biostatistics. 2nd edn. Chichester: John Wiley, 2005: 708-711.

We do know though that choosing the cutpoints to minimise P value - or maximise differences in outcome - is highly biased, in common with other data-dependent analysis approaches. See

Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the
evaluation of prognostic factors. [Commentary] Journal of the National Cancer Institute 1994; 86:829-835.

and many other papers since.

Doug

_____________________________________________________

Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
University of Oxford
Wolfson College Annexe
Linton Road
Oxford OX2 6UD

email:  doug....@csm.ox.ac.uk
Tel:    01865 284400 (direct line 01865 284401)
Fax:    01865 284424
www:     http://www.csm-oxford.org.uk/

EQUATOR Network - resources for reporting research
www: http://www.equator-network.org/

John Whittington

unread,

Jun 25, 2009, 6:41:06 AM6/25/09

to MedS...@googlegroups.com

It's reassuring to find someone authoritative agreeing with me for once!

The exercise of developing a prognostic model which Doug mentions obviously introduces even more issues (such as he discusses), since one is then having to decide upon the 'cutpoints' (category boundaries) as well as developing the model.

The situations I was thinking of were those in which those cutpoints (which relate to decision-making) are already (at least for the time being) 'externally defined' (often pretty arbitrarily) - whether they relate to criteria for diagnosis, treatment, prosecution or whatever. It is in those situations which I feel that (having analysed all of the data, without categorisation), hypotheses relating to the (pre-defined) categorisation really should be undertaken - but I am surprised by how rare this seems to happen. A prognostic model is, I assume, likely to be tested against 'known facts' (i.e. actual observed outcome/prognosis), so other analytical techniques would presumably be employed.

Kind Regards,
John

At 10:25 25/06/2009 +0100, Doug Altman wrote:

I completely agree with John. We also made this point briefly in relation to developing a prognostic model:

We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.

Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad
idea. Statistics in Medicine 2006; 25:127-141.

Similar comments apply to other contexts. But how this should best be done is not agreed, both in terms of the number of groups and (especially) the placement of the cutpoints. I reviewed this issue in

Altman DG. Categorizing continuous variables. In: Armitage P, Colton T (eds) Encyclopedia of
biostatistics. 2nd edn. Chichester: John Wiley, 2005: 708-711.

We do know though that choosing the cutpoints to minimise P value - or maximise differences in outcome - is highly biased, in common with other data-dependent analysis approaches. See

Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the
evaluation of prognostic factors. [Commentary] Journal of the National Cancer Institute 1994; 86:829-835.

and many other papers since.

Doug

Karl Schlag

unread,

Jun 25, 2009, 10:14:52 AM6/25/09

to MedS...@googlegroups.com

Would there be interest in finding theoretical cutpoints? Say the objective would be to create 5 bins such that the power is largest.
One would let the statistician choose the 4 cutpoints without "nature" being able to observe the choice.

A theorist thinking out loud, Karl

-- 
---------------------------------------------------------------------
Karl Schlag  
Professor 				Tel:  +34 93 542 1493
Dept. of Economics and Business		Fax:  +34 93 542 1746
Universitat Pompeu Fabra		email: karl....@upf.edu
Ramon Trias Fargas 25-27		NEW: www.econ.upf.edu/~schlag/
Barcelona 08005, Spain			room: 20-221 Jaume I

Neil Shephard

unread,

Jun 25, 2009, 10:18:15 AM6/25/09

to MedS...@googlegroups.com

On Thu, Jun 25, 2009 at 3:14 PM, Karl Schlag<karl....@upf.edu> wrote:
> Would there be interest in finding theoretical cutpoints? Say the objective
> would be to create 5 bins such that the power is largest.
> One would let the statistician choose the 4 cutpoints without "nature" being
> able to observe the choice.

There is already a body of work on this...

http://scholar.google.co.uk/scholar?hl=en&q=optimal%20cut%20points%20categorizing%20data

Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).

Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

Karl Schlag

unread,

Jun 25, 2009, 10:30:49 AM6/25/09

to MedS...@googlegroups.com

Yet none of them is exact nonparametric, unless they only create two categories (and even then it is not obvious how to analytically derive the optimal cut) ... perhaps the existence of this list means that there is interest.
Karl

Neil Shephard wrote:

On Thu, Jun 25, 2009 at 3:14 PM, Karl Schlag<karl....@upf.edu> wrote:

Would there be interest in finding theoretical cutpoints? Say the objective
would be to create 5 bins such that the power is largest.
One would let the statistician choose the 4 cutpoints without "nature" being
able to observe the choice.

There is already a body of work on this...

http://scholar.google.co.uk/scholar?hl=en&q=optimal%20cut%20points%20categorizing%20data

Neil

-- 
---------------------------------------------------------------------
Karl Schlag  
Professor 				Tel:  +34 93 542 1493

Reply all

Reply to author

Forward