Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Best Subset Multiple Regression - SPSS

671 views
Skip to first unread message

Gaj Vidmar

unread,
Mar 22, 2005, 10:53:05 AM3/22/05
to
"Bruce Weaver" <bwe...@lakeheadu.ca> wrote:
> B.S....@ed.ac.uk wrote:
>
> > I am looking at data which relates to clinical variables (heart rate,
> > blood pressure etc. ) and their relationship to the stiffness of
> > arteries (continous variable). I am interested in carrying out multiple
> > regression on the data set. I realise that SPSS does not run Best
> > Subset Multiple Regression - could someone advise what might be a good
> > alternative in SPSS and whether I should consider using Minitab instead
> > for this.
>
> Best subset regression is not a good idea. For details, see Mike
> Babyak's recent article (ref below) and Rich Ulrich's Stats FAQ on
> stepwise regression (link below).
>
> Babyak MA. What You See May Not Be What You Get: A Brief, Nontechnical
> Introduction to Overfitting in Regression-Type Models. Psychosomatic
> Medicine 66:411–421 (2004).
>
> http://www.pitt.edu/~wpilib/statfaq/regrfaq.html

Personally, I kinda like the "all subsets" procedure in contrast to the
rightful
detest for the standard stepwise procedures. - Of course, "all subsets" as a
tool
for helping in selection of candidate (feasible, reasonable) models, rather
than for
any kind of inference!

An even better tool for the same purpose and to be used in the same spirit
could
be the "leaps and bounds" algorithm. A web reference for all this is

http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/bestcp.htm

Without trying to argue via authority, let me just note that the Dataplot
software
comes from NIST ...

As for performing "all subsets regression" in SPSS, you will find syntax by
that
name at the Raynald Levesque's great SPSS site (spsstools.net), but it will
still
not be quite "it", since in addition to R-squared other measures are usually
used,
mainly RMSE and Mallow's Cp. You can, of course, calculate both for each
model
by hand, or if you can program in SPSS' scripting language you might be able
to
automate it, or perhaps something can be done with the new OMS, but I am too
stupid and lazy for either, so the few times I've used "all subsets" I did
it with NCSS.

In addition to the tables with statistics, NCSS also produces the plots of
R2,
RMSE and Cp vs. the no. of predictors (each model is a point in the
scatterplot),
which I found a nice tool for the "human analyst who knows more then the
computer" and might be worth producing after you do the necessary
calculations
with whichever tool (before, read about the "ideal" value of Cp, though).

Finally, as a point for justifying the "all subsets" procedure, I believe
you may use it
after you had already selected (and/or interpreted in more detail) a
particular model!
-- Maybe even don't mention in the final report or article or whatever you
produce that
you had used -- just run it to "hear what the authomaton says", because if
it tends to
agree with your choice, if nothing else it may give you the nice feeling
that even
though of inferior capabilities, there is at least someone who agrees with
you :))

Regards,

Gaj Vidmar
Univ. of Ljubljana, Fac. of Medicine, Inst. of Biomedical Informatics


Bruce Weaver

unread,
Mar 22, 2005, 8:14:44 AM3/22/05
to
B.S....@ed.ac.uk wrote:

> Hi,


>
> I am looking at data which relates to clinical variables (heart rate,
> blood pressure etc. ) and their relationship to the stiffness of
> arteries (continous variable). I am interested in carrying out multiple
> regression on the data set. I realise that SPSS does not run Best
> Subset Multiple Regression - could someone advise what might be a good
> alternative in SPSS and whether I should consider using Minitab instead
> for this.
>

> Thanks,
> Ilyas
>

Best subset regression is not a good idea. For details, see Mike
Babyak's recent article (ref below) and Rich Ulrich's Stats FAQ on
stepwise regression (link below).

Babyak MA. What You See May Not Be What You Get: A Brief, Nontechnical
Introduction to Overfitting in Regression-Type Models. Psychosomatic
Medicine 66:411–421 (2004).

http://www.pitt.edu/~wpilib/statfaq/regrfaq.html


--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

B.S....@ed.ac.uk

unread,
Mar 22, 2005, 8:01:43 AM3/22/05
to
0 new messages