Nothing specific to BMI, but Frank Harrell has a useful page on the
problems associated with binning continuous variables...
http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous
There was some discussion earlier this year on MedStats itself (which
pertains to age)...
http://groups.google.com/group/MedStats/browse_thread/thread/f14ced3d35a4ce4f/ed262012c02d8ecc?pli=1
And here are a few references....
Altman DG (1991) Categorising Continuous covariates. British Journal
of Cancer 64:975
Altman DG, Royston P (2006) The cost of dichotomising continuous
variables. 332(7549):1080
http://171.66.124.147/cgi/content/extract/332/7549/1080
Faraggi D, Simon R (1996) A simulation study of cross-validation for
selecting an optimal cutpoint in univariate survival analysis.
Statistics in Medicine 15(20):2203-2213
http://www.ncbi.nlm.nih.gov/pubmed/8910964
Owen SV, Froman RD (2005) Why carve up your continuous data? Research
in NUrsing & Health 28(6):496-503
http://www3.interscience.wiley.com/journal/112141778/abstract
Hadzi-Pavlovic D (2007) Correlations II : categorizing continuous
data. Acta Neuropsychiatrica 19(2):129-130
The problem is really independent of the variable that you are
attempting to categorize, and I'm sure you could phrase the above
points in the context of BMI.
Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).
Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
For the source, I'd recommend reading all publications by the inventor of the BMI in C19, the Belgian scientist Henri Quetelet. BMI used to be called the Quetelet Index (QI); initially the QI faced rival indices such as the Broca Index which have become vestigeal or extinct in recent years, at least in modern medical science. I trust that your French is up to scratch!
Best wishes,
William Stanbury.
2009/6/24 Neil Shephard <nshe...@gmail.com>
On Wed, Jun 24, 2009 at 4:17 AM, Kylie Lange<kylie...@adelaide.edu.au> wrote:
>
> Hi all,
>
> Is anyone aware of any literature that discusses or investigates the
> statistical implications of analysing BMI (body mass index) as categories
> rather than leaving in its continuous form?
>
> I am putting together a class discussing the statistical problems of
> dichotomising/categorising variables in analyses, and BMI is such a commonly
> used categorical measure that it would be nice to specifically discuss it.
Nothing specific to BMI, but Frank Harrell has a useful page on the
problems associated with binning continuous variables...
http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous
There was some discussion earlier this year on MedStats itself (which
pertains to age)...
http://groups.google.com/group/MedStats/browse_thread/thread/f14ced3d35a4ce4f/ed262012c02d8ecc?pli=1
And here are a few references....
Altman DG (1991) Categorising Continuous covariates. British Journal
of Cancer 64:975
Altman DG, Royston P (2006) The cost of dichotomising continuous
variables. 332(7549):1080
MailScanner has detected a possible fraud attempt from "171.66.124.147" claiming to be MailScanner has detected a possible fraud attempt from "171.66.124.147" claiming to be http://171.66.124.147/cgi/content/extract/332/7549/1080 <http://171.66.124.147/cgi/content/extract/332/7549/1080>
Faraggi D, Simon R (1996) A simulation study of cross-validation for
selecting an optimal cutpoint in univariate survival analysis.
Statistics in Medicine 15(20):2203-2213
http://www.ncbi.nlm.nih.gov/pubmed/8910964
Owen SV, Froman RD (2005) Why carve up your continuous data? Research
in NUrsing & Health 28(6):496-503
http://www3.interscience.wiley.com/journal/112141778/abstract
Hadzi-Pavlovic D (2007) Correlations II : categorizing continuous
data. Acta Neuropsychiatrica 19(2):129-130
The problem is really independent of the variable that you are
attempting to categorize, and I'm sure you could phrase the above
points in the context of BMI.
Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).
Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
I'd suggest that there are a few reasons why...
1. They are unaware of the evidence that waist/hip ratio is more predictive.
2. Studies don't collect the waist or hip measurements because of
this, but do collect height and weight (statisticians should be
consulted at the design phase of a study, not at the end to perform
surgery on the data that has been collected).
3. Studies using older data don't have a chance of using waist-hip
ratio as only height and weight were collected at the time as BMI was
thought to be the appropriate metric.
Its a case of 'educating' people as to the most appropriate measure to
take, and all statisticians have a role in this (assuming they work in
biostatistics where this sort of data measurement is used of course!).
Thanks,
Kylie.
-----Original Message-----
From: John Whittington
Sent: Jun 25, 2009 6:41 AM
To: MedS...@googlegroups.com
Subject: {MEDSTATS} Re: categorising BMI
It's reassuring to find someone authoritative agreeing with me for once!
The exercise of developing a prognostic model which Doug mentions obviously introduces even more issues (such as he discusses), since one is then having to decide upon the 'cutpoints' (category boundaries) as well as developing the model.
The situations I was thinking of were those in which those cutpoints (which relate to decision-making) are already (at least for the time being) 'externally defined' (often pretty arbitrarily) - whether they relate to criteria for diagnosis, treatment, prosecution or whatever. It is in those situations which I feel that (having analysed all of the data, without categorisation), hypotheses relating to the (pre-defined) categorisation really should be undertaken - but I am surprised by how rare this seems to happen. A prognostic model is, I assume, likely to be tested against 'known facts' (i.e. actual observed outcome/prognosis), so other analytical techniques would presumably be employed.
Kind Regards,
John
At 10:25 25/06/2009 +0100, Doug Altman wrote:
I completely agree with John. We also made this point briefly in relation to developing a prognostic model:
We agree that medical decision making often requires categorization of data, e.g. to define a high-risk group of patients for
a clinical trial ... However, categorization should be applied to the prognostic index, not to the original prognostic variables.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad
idea. Statistics in Medicine 2006; 25:127-141.
Similar comments apply to other contexts. But how this should best be done is not agreed, both in terms of the number of groups and (especially) the placement of the cutpoints. I reviewed this issue in
Altman DG. Categorizing continuous variables. In: Armitage P, Colton T (eds) Encyclopedia of
biostatistics. 2nd edn. Chichester: John Wiley, 2005: 708-711.
We do know though that choosing the cutpoints to minimise P value - or maximise differences in outcome - is highly biased, in common with other data-dependent analysis approaches. See
Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the
evaluation of prognostic factors. [Commentary] Journal of the National Cancer Institute 1994; 86:829-835.
and many other papers since.
Doug
John
----------------------------------------------------------------Dr John Whittington, Voice: +44 (0) 1296 730225Mediscience Services Fax: +44 (0) 1296 738893Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.ukBuckingham MK18 4EL, UK----------------------------------------------------------------
Peter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com
Is not the point that, in reality, it is very unlikely that a ratio (or any
other fixed mathematical combination) of two variables is going to remain
an ideal predictor/guide across all values of both variables - hence the
suggestion that it is better to look at both of the values? In effect,
using your example, it would mean that for each value of total cholesterol
there would be a specific 'treatment decision threshold' in terms of LDL
level (or vice versa), with 'the ratio' (at the threshold point) not
necessarily always being the same.
Of course, that's far more complicated (both to estimate the thresholds and
to apply them) - so 'less-than-ideal' fixed combinations (e.g. a ratio) are
likely to continue to be used in practice, for their simplicity.
Kind Regards,
--------- Original Message --------
From: MedS...@googlegroups.com
To: "MedS...@googlegroups.com" <MedS...@googlegroups.com>
Subject: Re: {!!! SPAM ???} {MEDSTATS} Re: categorising BMI
Date: 26/06/09 15:55
There are nvertheless situations where the ratio, in addition to the raw numbers are useful.
For example, ratio of total cholesterol to LDL may be a better guide for statin treatment decisions than either LDL or total cholesterol
> > 2009/6/24 Neil Shephard <nsheph....@gmail.com>