Splits and Trees (was Re: {MEDSTATS} Re: categorising BMI)

2 views
Skip to first unread message

Peter Flom

unread,
Jun 25, 2009, 12:44:16 PM6/25/09
to MedS...@googlegroups.com
Thinking some more about this -

How do the pitfalls of dichotimization relate to the use of classification and regression trees?

There are some clear differences - both in terms of how splits are found and how they are utilized - but
there seems to be a fundamental similarity. 

Differences:
  1.  In tree analysis, the nodes from the first split are treated separately
  2.  Tree analysis almost always involves some form of validation - regression often does not
  3.  Tree analysis often involves looking at multiple model (bagging, boosting, forests, etc) regression rarely does

Similarities
  1. Both use splits of the data - often dichotomous splits
  2. Cutpoints in regression models MAY be chosen based on the data and the bivariate relationships; in tree analysis, this is always done


Any thoughts or references or what-have-you appreciated.


Peter

Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Ted Harding

unread,
Jun 25, 2009, 1:10:52 PM6/25/09
to MedS...@googlegroups.com
Could people please make an effort to revert to responding in
plain text (Google will add the HTML version anyway ...).
Peter Flom's response below: quoted as received.
(I spare you the earlier one, almost unreadable, from Karl Schlag).
Thanks,
Ted.

On 25-Jun-09 16:44:16, Peter Flom wrote:
>
> <head><style>body{font-family:
> Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color:
>#ffffff;color: black;}</style></head><body id="compText">Thinking some
>#more about this - <br><br>How do the pitfalls of dichotimization relate
>#to the use of classification and regression trees?<br><br>There are


> some clear differences - both in terms of how splits are found and how

> they are utilized - but<br>there seems to be a fundamental
> similarity.&nbsp; <br><br>Differences: <br>&nbsp; 1.&nbsp; In tree analy
> sis, the nodes from the first split are treated separately<br>&nbsp;
> 2.&nbsp; Tree analysis almost always involves some form of validation -
> regression often does not<br>&nbsp; 3.&nbsp; Tree analysis often


> involves looking at multiple model (bagging, boosting, forests, etc)

> regression rarely does<br><br>Similarities<br>&nbsp; 1. Both use splits
> of the data - often dichotomous splits<br>&nbsp; 2. Cutpoints in


> regression models MAY be chosen based on the data and the bivariate

> relationships; in tree analysis, this is always done<br><br><br>Any


> thoughts or references or what-have-you

> appreciated.<br><br><br>Peter<br></body><pre>


>
> Peter L. Flom, PhD
> Statistical Consultant

> www DOT peterflomconsulting DOT com</pre>

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jun-09 Time: 18:10:49
------------------------------ XFMail ------------------------------

Peter Flom

unread,
Jun 25, 2009, 1:20:30 PM6/25/09
to MedS...@googlegroups.com
Sorry about that ...

I will try to remember to check. I think the default on my system is to reply
in the same mode as the message I am replying to.

Peter

-----Original Message-----
>From: Ted.H...@manchester.ac.uk
>Sent: Jun 25, 2009 1:10 PM
>To: MedS...@googlegroups.com
>Subject: HTML (was: RE: Splits and Trees (was Re: {MEDSTATS} Re: categorising BMI))
>
>
>Could people please make an effort to revert to responding in
>plain text (Google will add the HTML version anyway ...).
>Peter Flom's response below: quoted as received.
>(I spare you the earlier one, almost unreadable, from Karl Schlag).
>Thanks,
>Ted.
>
>On 25-Jun-09 16:44:16, Peter Flom wrote:
>>
>> <head><style>body{font-family:
>> Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color:
>>#ffffff;color: black;}</style></head><body id="compText">Thinking some
>>#more about this - <br><br>How do the pitfalls of dichotimization relate
>>#to the use of classification and regression trees?<br><br>There are
>> some clear differences - both in terms of how splits are found and how
>> they are utilized - but<br>there seems to be a fundamental

>> similarity.  <br><br>Differences: <br>  1.  In tree analy


>> sis, the nodes from the first split are treated separately<br> 

>> 2.  Tree analysis almost always involves some form of validation -
>> regression often does not<br>  3.  Tree analysis often


>> involves looking at multiple model (bagging, boosting, forests, etc)

>> regression rarely does<br><br>Similarities<br>  1. Both use splits
>> of the data - often dichotomous splits<br>  2. Cutpoints in

Ted Harding

unread,
Jun 25, 2009, 1:42:18 PM6/25/09
to MedS...@googlegroups.com
On 25-Jun-09 17:20:30, Peter Flom wrote:
> Sorry about that ...
> I will try to remember to check. I think the default on my system
> is to reply in the same mode as the message I am replying to.
> Peter

Thanks, Peter! (Not that I was trying to "make an example" of you
particularly -- you just happened to be "on top of the stack").

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861

Date: 25-Jun-09 Time: 18:42:14
------------------------------ XFMail ------------------------------

Frank

unread,
Jun 28, 2009, 9:35:40 AM6/28/09
to MedStats
The evils of dichotomization are one of the reasons that recursive
partitioning fails unless you have an incredibly large sample size to
make up for the loss of information [the other problem is allowing for
all possible interactions, i.e., not using any additivity
assumptions.]. Much has been written about this. See my
bibliographic database at http://biostat.mc.vanderbilt.edu/rms near
the bottom of the page, and look for recursive partitioning or CART.

Recursive partitioning seems to work on sample sizes less than 20,000
but this is usually a mirage. Bootstrapping reveals that the tree
architecture is really blowing in the wind.

Frank

On Jun 25, 11:44 am, Peter Flom <peterflomconsult...@mindspring.com>
wrote:
> PeterPeter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com

Peter Flom

unread,
Jun 28, 2009, 9:43:11 AM6/28/09
to MedS...@googlegroups.com
Frank

That page is going in my bookmark list, right away!

Looks to be chock full of interesting stuff.

Peter

Reply all
Reply to author
Forward
0 new messages