Splits and Trees (was Re: {MEDSTATS} Re: categorising BMI)

Peter Flom

unread,

Jun 25, 2009, 12:44:16 PM6/25/09

to MedS...@googlegroups.com

Thinking some more about this -

How do the pitfalls of dichotimization relate to the use of classification and regression trees?

There are some clear differences - both in terms of how splits are found and how they are utilized - but
there seems to be a fundamental similarity.

Differences:
1. In tree analysis, the nodes from the first split are treated separately
2. Tree analysis almost always involves some form of validation - regression often does not
3. Tree analysis often involves looking at multiple model (bagging, boosting, forests, etc) regression rarely does

Similarities
1. Both use splits of the data - often dichotomous splits
2. Cutpoints in regression models MAY be chosen based on the data and the bivariate relationships; in tree analysis, this is always done

Any thoughts or references or what-have-you appreciated.

Peter


Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Ted Harding

unread,

Jun 25, 2009, 1:10:52 PM6/25/09

to MedS...@googlegroups.com

Could people please make an effort to revert to responding in
plain text (Google will add the HTML version anyway ...).
Peter Flom's response below: quoted as received.
(I spare you the earlier one, almost unreadable, from Karl Schlag).
Thanks,
Ted.

On 25-Jun-09 16:44:16, Peter Flom wrote:
>
> <head><style>body{font-family:
> Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color:
>#ffffff;color: black;}</style></head><body id="compText">Thinking some
>#more about this - How do the pitfalls of dichotimization relate
>#to the use of classification and regression trees? There are

> some clear differences - both in terms of how splits are found and how

> they are utilized - but there seems to be a fundamental
> similarity.  Differences:   1.  In tree analy
> sis, the nodes from the first split are treated separately  
> 2.  Tree analysis almost always involves some form of validation -
> regression often does not   3.  Tree analysis often

> involves looking at multiple model (bagging, boosting, forests, etc)

> regression rarely does Similarities   1. Both use splits
> of the data - often dichotomous splits   2. Cutpoints in

> regression models MAY be chosen based on the data and the bivariate

> relationships; in tree analysis, this is always done Any

> thoughts or references or what-have-you

> appreciated. Peter </body><pre>

>
> Peter L. Flom, PhD
> Statistical Consultant

> www DOT peterflomconsulting DOT com</pre>

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jun-09 Time: 18:10:49
------------------------------ XFMail ------------------------------

Peter Flom

unread,

Jun 25, 2009, 1:20:30 PM6/25/09

to MedS...@googlegroups.com

Sorry about that ...

I will try to remember to check. I think the default on my system is to reply
in the same mode as the message I am replying to.

Peter

-----Original Message-----
>From: Ted.H...@manchester.ac.uk
>Sent: Jun 25, 2009 1:10 PM
>To: MedS...@googlegroups.com
>Subject: HTML (was: RE: Splits and Trees (was Re: {MEDSTATS} Re: categorising BMI))
>
>
>Could people please make an effort to revert to responding in
>plain text (Google will add the HTML version anyway ...).
>Peter Flom's response below: quoted as received.
>(I spare you the earlier one, almost unreadable, from Karl Schlag).
>Thanks,
>Ted.
>
>On 25-Jun-09 16:44:16, Peter Flom wrote:
>>
>> <head><style>body{font-family:
>> Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color:
>>#ffffff;color: black;}</style></head><body id="compText">Thinking some
>>#more about this - How do the pitfalls of dichotimization relate
>>#to the use of classification and regression trees? There are
>> some clear differences - both in terms of how splits are found and how
>> they are utilized - but there seems to be a fundamental

>> similarity. Differences: 1. In tree analy

>> sis, the nodes from the first split are treated separately

>> 2. Tree analysis almost always involves some form of validation -
>> regression often does not 3. Tree analysis often

>> involves looking at multiple model (bagging, boosting, forests, etc)

>> regression rarely does Similarities 1. Both use splits
>> of the data - often dichotomous splits 2. Cutpoints in

Ted Harding

unread,

Jun 25, 2009, 1:42:18 PM6/25/09

to MedS...@googlegroups.com

On 25-Jun-09 17:20:30, Peter Flom wrote:
> Sorry about that ...
> I will try to remember to check. I think the default on my system
> is to reply in the same mode as the message I am replying to.
> Peter

Thanks, Peter! (Not that I was trying to "make an example" of you
particularly -- you just happened to be "on top of the stack").

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861

Date: 25-Jun-09 Time: 18:42:14
------------------------------ XFMail ------------------------------

Frank

unread,

Jun 28, 2009, 9:35:40 AM6/28/09

to MedStats

The evils of dichotomization are one of the reasons that recursive
partitioning fails unless you have an incredibly large sample size to
make up for the loss of information [the other problem is allowing for
all possible interactions, i.e., not using any additivity
assumptions.]. Much has been written about this. See my
bibliographic database at http://biostat.mc.vanderbilt.edu/rms near
the bottom of the page, and look for recursive partitioning or CART.

Recursive partitioning seems to work on sample sizes less than 20,000
but this is usually a mirage. Bootstrapping reveals that the tree
architecture is really blowing in the wind.

Frank

On Jun 25, 11:44 am, Peter Flom <peterflomconsult...@mindspring.com>
wrote:

> PeterPeter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com

Peter Flom

unread,

Jun 28, 2009, 9:43:11 AM6/28/09

to MedS...@googlegroups.com

Frank

That page is going in my bookmark list, right away!

Looks to be chock full of interesting stuff.

Peter

Reply all

Reply to author

Forward