Re: {MEDSTATS} categorising BMI

Neil Shephard

unread,

Jun 24, 2009, 5:03:48 AM6/24/09

to MedS...@googlegroups.com

On Wed, Jun 24, 2009 at 4:17 AM, Kylie Lange<kylie...@adelaide.edu.au> wrote:
>
> Hi all,
>
> Is anyone aware of any literature that discusses or investigates the
> statistical implications of analysing BMI (body mass index) as categories
> rather than leaving in its continuous form?
>
> I am putting together a class discussing the statistical problems of
> dichotomising/categorising variables in analyses, and BMI is such a commonly
> used categorical measure that it would be nice to specifically discuss it.

Nothing specific to BMI, but Frank Harrell has a useful page on the
problems associated with binning continuous variables...

http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous

There was some discussion earlier this year on MedStats itself (which
pertains to age)...

http://groups.google.com/group/MedStats/browse_thread/thread/f14ced3d35a4ce4f/ed262012c02d8ecc?pli=1

And here are a few references....

Altman DG (1991) Categorising Continuous covariates. British Journal
of Cancer 64:975

Altman DG, Royston P (2006) The cost of dichotomising continuous
variables. 332(7549):1080
http://171.66.124.147/cgi/content/extract/332/7549/1080

Faraggi D, Simon R (1996) A simulation study of cross-validation for
selecting an optimal cutpoint in univariate survival analysis.
Statistics in Medicine 15(20):2203-2213
http://www.ncbi.nlm.nih.gov/pubmed/8910964

Owen SV, Froman RD (2005) Why carve up your continuous data? Research
in NUrsing & Health 28(6):496-503
http://www3.interscience.wiley.com/journal/112141778/abstract

Hadzi-Pavlovic D (2007) Correlations II : categorizing continuous
data. Acta Neuropsychiatrica 19(2):129-130

The problem is really independent of the variable that you are
attempting to categorize, and I'm sure you could phrase the above
points in the context of BMI.

Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).

Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

William Stanbury

unread,

Jun 24, 2009, 5:48:11 AM6/24/09

to MedS...@googlegroups.com

For the source, I'd recommend reading all publications by the inventor of the BMI in C19, the Belgian scientist Henri Quetelet. BMI used to be called the Quetelet Index (QI); initially the QI faced rival indices such as the Broca Index which have become vestigeal or extinct in recent years, at least in modern medical science. I trust that your French is up to scratch!

Best wishes,

William Stanbury.

2009/6/24 Neil Shephard <nshe...@gmail.com>

--
William Stanbury

Founder & General Leader
William Stanbury SPRL
Kerselarenlaan 62
1030 Schaarbeek
Belgium
Mobile: 32 478 62 40 29
Landline: 32 23 47 72 18.

kornbrot

unread,

Jun 24, 2009, 6:40:38 AM6/24/09

to MedS...@googlegroups.com

BUT
Current evidence suggests that waist to hip ratio is consierably more predictive of wieght trlated problems
So WHY ohWHY do GPS and hear untis routinelay take bmi and ignore w/h?
Best

Diana

On 24/06/2009 10:48, "William Stanbury" <william...@gmail.com> wrote:

For the source, I'd recommend reading all publications by the inventor of the BMI in C19, the Belgian scientist Henri Quetelet. BMI used to be called the Quetelet Index (QI); initially the QI faced rival indices such as the Broca Index which have become vestigeal or extinct in recent years, at least in modern medical science. I trust that your French is up to scratch!

Best wishes,

William Stanbury.

2009/6/24 Neil Shephard <nshe...@gmail.com>

On Wed, Jun 24, 2009 at 4:17 AM, Kylie Lange<kylie...@adelaide.edu.au> wrote:
>
> Hi all,
>
> Is anyone aware of any literature that discusses or investigates the
> statistical implications of analysing BMI (body mass index) as categories
> rather than leaving in its continuous form?
>
> I am putting together a class discussing the statistical problems of
> dichotomising/categorising variables in analyses, and BMI is such a commonly
> used categorical measure that it would be nice to specifically discuss it.

Nothing specific to BMI, but Frank Harrell has a useful page on the
problems associated with binning continuous variables...

http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous

There was some discussion earlier this year on MedStats itself (which
pertains to age)...

http://groups.google.com/group/MedStats/browse_thread/thread/f14ced3d35a4ce4f/ed262012c02d8ecc?pli=1

And here are a few references....

Altman DG (1991) Categorising Continuous covariates. British Journal
of Cancer 64:975

Altman DG, Royston P (2006) The cost of dichotomising continuous
variables. 332(7549):1080

MailScanner has detected a possible fraud attempt from "171.66.124.147" claiming to be MailScanner has detected a possible fraud attempt from "171.66.124.147" claiming to be http://171.66.124.147/cgi/content/extract/332/7549/1080 <http://171.66.124.147/cgi/content/extract/332/7549/1080>

Faraggi D, Simon R (1996) A simulation study of cross-validation for
selecting an optimal cutpoint in univariate survival analysis.
Statistics in Medicine 15(20):2203-2213
http://www.ncbi.nlm.nih.gov/pubmed/8910964

Owen SV, Froman RD (2005) Why carve up your continuous data? Research
in NUrsing & Health 28(6):496-503
http://www3.interscience.wiley.com/journal/112141778/abstract

Hadzi-Pavlovic D (2007) Correlations II : categorizing continuous
data. Acta Neuropsychiatrica 19(2):129-130

The problem is really independent of the variable that you are
attempting to categorize, and I'm sure you could phrase the above
points in the context of BMI.

Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).

Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

Professor Diana Kornbrot
email: d.e.ko...@herts.ac.uk
web:    http://web.mac.com/kornbrot/iweb/KornbrotHome.html
Work
School of Psychology
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
   fax:     +44 (0) 170 728 5073
Home
19 Elmhurst Avenue
London N2 0LT, UK
    voice:   +44 (0) 208 883 3657
    mobile: +44 (0) 796 890 2102
   fax:      +44 (0) 870 706 4997

Neil Shephard

unread,

Jun 24, 2009, 6:48:12 AM6/24/09

to MedS...@googlegroups.com

On Wed, Jun 24, 2009 at 11:40 AM, kornbrot<d.e.ko...@herts.ac.uk> wrote:
> BUT
> Current evidence suggests that waist to hip ratio is consierably more
> predictive of wieght trlated problems
> So WHY ohWHY do GPS and hear untis routinelay take bmi and ignore w/h?

I'd suggest that there are a few reasons why...

1. They are unaware of the evidence that waist/hip ratio is more predictive.

2. Studies don't collect the waist or hip measurements because of
this, but do collect height and weight (statisticians should be
consulted at the design phase of a study, not at the end to perform
surgery on the data that has been collected).

3. Studies using older data don't have a chance of using waist-hip
ratio as only height and weight were collected at the time as BMI was
thought to be the appropriate metric.

Its a case of 'educating' people as to the most appropriate measure to
take, and all statisticians have a role in this (assuming they work in
biostatistics where this sort of data measurement is used of course!).

Greg Snow

unread,

Jun 24, 2009, 11:57:50 AM6/24/09

to MedStats

Another arcticle to take into account is:

Richard Kronmal (1993) "Spurious Correlation and the Fallacy of the
Ratio Standard Revisited". Journal of the Royal Statistical Society.
Vol. 156, No 3, 379-392.

Which does not deal with the categorization issue, but rather the fact
that using the ratio rather than the 2 numbers (whether height-weight
or waist-hip) can lead to misleading results and it is better to use
the original values rather than their ratio.

On Jun 24, 4:40 am, kornbrot <d.e.kornb...@herts.ac.uk> wrote:
> BUT
> Current evidence suggests that waist to hip ratio is consierably more
> predictive of wieght trlated problems
> So WHY ohWHY do GPS and hear untis routinelay take bmi and ignore w/h?
> Best
>
> Diana
>

> On 24/06/2009 10:48, "William Stanbury" <williamstanb...@gmail.com> wrote:
>
>
>
>
>
> > For the source, I'd recommend reading all publications by the inventor of the
> > BMI in C19, the Belgian scientist Henri Quetelet. BMI used to be called the
> > Quetelet Index (QI); initially the QI faced rival indices such as the Broca
> > Index which have become vestigeal or extinct in recent years, at least in
> > modern medical science. I trust that your French is up to scratch!
> >
> > Best wishes,
> >
> > William Stanbury.
>
> >

> > 2009/6/24 Neil Shephard <nsheph...@gmail.com>
>
> >> On Wed, Jun 24, 2009 at 4:17 AM, Kylie Lange<kylie.la...@adelaide.edu.au>

> >> wrote:
>
> >>> > Hi all,
>
> >>> > Is anyone aware of any literature that discusses or investigates the
> >>> > statistical implications of analysing BMI (body mass index) as categories
> >>> > rather than leaving in its continuous form?
>
> >>> > I am putting together a class discussing the statistical problems of
> >>> > dichotomising/categorising variables in analyses, and BMI is such a
> >>> commonly
> >>> > used categorical measure that it would be nice to specifically discuss it.
>
> >> Nothing specific to BMI, but Frank Harrell has a useful page on the
> >> problems associated with binning continuous variables...
>
> >>http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous
>
> >> There was some discussion earlier this year on MedStats itself (which
> >> pertains to age)...
>

> >>http://groups.google.com/group/MedStats/browse_thread/thread/f14ced3d...
> >> /ed262012c02d8ecc?pli=1

> > Email - nsheph...@gmail.com
> > Website -http://slack.ser.man.ac.uk/
> > Photos -http://www.flickr.com/photos/slackline/
>
> Professor Diana Kornbrot
> email: d.e.kornb...@herts.ac.uk

> web: http://web.mac.com/kornbrot/iweb/KornbrotHome.html
> Work
> School of Psychology
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> voice: +44 (0) 170 728 4626
> fax: +44 (0) 170 728 5073
> Home
> 19 Elmhurst Avenue
> London N2 0LT, UK
> voice: +44 (0) 208 883 3657
> mobile: +44 (0) 796 890 2102

> fax: +44 (0) 870 706 4997- Hide quoted text -
>
> - Show quoted text -

Kylie Lange

unread,

Jun 24, 2009, 8:11:18 PM6/24/09

to MedS...@googlegroups.com, Neil Shephard

Thanks Neil. There are a couple new references in that list that I haven't read
yet and so will be interested to check out.

Thanks,
Kylie.

Peter Flom

unread,

Jun 25, 2009, 7:15:17 AM6/25/09

to MedS...@googlegroups.com

Fascinating stuff.

Do the readers of this list think there is room for another article?

Maybe I will try to write one.

Peter

-----Original Message-----
From: John Whittington
Sent: Jun 25, 2009 6:41 AM
To: MedS...@googlegroups.com
Subject: {MEDSTATS} Re: categorising BMI

It's reassuring to find someone authoritative agreeing with me for once!

The exercise of developing a prognostic model which Doug mentions obviously introduces even more issues (such as he discusses), since one is then having to decide upon the 'cutpoints' (category boundaries) as well as developing the model.

The situations I was thinking of were those in which those cutpoints (which relate to decision-making) are already (at least for the time being) 'externally defined' (often pretty arbitrarily) - whether they relate to criteria for diagnosis, treatment, prosecution or whatever. It is in those situations which I feel that (having analysed all of the data, without categorisation), hypotheses relating to the (pre-defined) categorisation really should be undertaken - but I am surprised by how rare this seems to happen. A prognostic model is, I assume, likely to be tested against 'known facts' (i.e. actual observed outcome/prognosis), so other analytical techniques would presumably be employed.

Kind Regards,
John

At 10:25 25/06/2009 +0100, Doug Altman wrote:

I completely agree with John. We also made this point briefly in relation to developing a prognostic model:

We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.

Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad
idea. Statistics in Medicine 2006; 25:127-141.

Similar comments apply to other contexts. But how this should best be done is not agreed, both in terms of the number of groups and (especially) the placement of the cutpoints. I reviewed this issue in

Altman DG. Categorizing continuous variables. In: Armitage P, Colton T (eds) Encyclopedia of
biostatistics. 2nd edn. Chichester: John Wiley, 2005: 708-711.

We do know though that choosing the cutpoints to minimise P value - or maximise differences in outcome - is highly biased, in common with other data-dependent analysis approaches. See

Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the
evaluation of prognostic factors. [Commentary] Journal of the National Cancer Institute 1994; 86:829-835.

and many other papers since.

Doug

John

----------------------------------------------------------------

Dr John Whittington,       Voice:    +44 (0) 1296 730225

Mediscience Services       Fax:      +44 (0) 1296 738893

Twyford Manor, Twyford,    E-mail:   Joh...@mediscience.co.uk

Buckingham MK18 4EL, UK
----------------------------------------------------------------


Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

kornbrot

unread,

Jun 26, 2009, 8:55:29 AM6/26/09

to MedS...@googlegroups.com

There are nvertheless situations where the ratio, in addition to the raw numbers are useful.
For example, ratio of total cholesterol to LDL may be a better guide for statin treatment decisions than either LDL or total cholesterol
diana

Professor Diana Kornbrot
email: d.e.ko...@herts.ac.uk

John Whittington

unread,

Jun 26, 2009, 9:26:35 AM6/26/09

to MedS...@googlegroups.com

At 13:55 26/06/2009 +0100, kornbrot wrote:
>There are nvertheless situations where the ratio, in addition to the raw
>numbers are useful.
>For example, ratio of total cholesterol to LDL may be a better guide for
>statin treatment decisions than either LDL or total cholesterol

Is not the point that, in reality, it is very unlikely that a ratio (or any
other fixed mathematical combination) of two variables is going to remain
an ideal predictor/guide across all values of both variables - hence the
suggestion that it is better to look at both of the values? In effect,
using your example, it would mean that for each value of total cholesterol
there would be a specific 'treatment decision threshold' in terms of LDL
level (or vice versa), with 'the ratio' (at the threshold point) not
necessarily always being the same.

Of course, that's far more complicated (both to estimate the thresholds and
to apply them) - so 'less-than-ideal' fixed combinations (e.g. a ratio) are
likely to continue to be used in practice, for their simplicity.

Kind Regards,

Dr Neville Calleja

unread,

Jun 26, 2009, 10:17:54 AM6/26/09

to MedS...@googlegroups.com

There have also been deviations between different schools of thought on the best cutoff for BMI between normal weight and overweight with some setting the cutoff at 27 and others at 25.

Using the ratio in its original value would, of course, maximise power. As to using the individual weight and height rather than BMI may require some transformation of height to allow for a better fit.

Neville

--
==================
Dr Neville Calleja
12 Mon Nid
Ganni Faure Str
Tarxien TXN2421
MALTA

--------- Original Message --------
From: MedS...@googlegroups.com
To: "MedS...@googlegroups.com" <MedS...@googlegroups.com>
Subject: Re: {!!! SPAM ???} {MEDSTATS} Re: categorising BMI
Date: 26/06/09 15:55

There are nvertheless situations where the ratio, in addition to the raw numbers are useful.
For example, ratio of total cholesterol to LDL may be a better guide for statin treatment decisions than either LDL or total cholesterol

> > 2009/6/24 Neil Shephard <nsheph....@gmail.com>

__________________________________________
Message sent using Waldonet Secure Webmail
https://securemail.waldonet.net.mt/

Frank

unread,

Jun 28, 2009, 9:44:14 AM6/28/09

to MedStats

See also @Article{fil07cat,
author = {Filardo, Giovanni and Hamilton, Cody and Hamman, Baron
and Ng, Hon K. T. and Grayburn, Paul},
title = {Categorizing {BMI} may lead to biased results in studies
investigating in-hospital mortality after isolated {CABG}},
journal = J Clin Epi,
year = 2007,
volume = 60,
pages = {1132-1139},
annote = {BMI;CABG;surgical adverse events;hospital
mortality;epidemiology;smoothing methods;categorization;categorizing
continuous variables;investigators should waive categorization
entirely and use smoothed functions for continuous variables;examples
of non-monotonic relationships}
}

Reply all

Reply to author

Forward