On Sun, 10 Jul 2016 06:36:24 -0700 (PDT),
hansra...@gmail.com
wrote:
>
>The answer did not help.
>
>My question is that when one is using Mallows' Cp what is the acceptable norm (provided that sample size is decent and the explanatory variables used in multiple regression are reasonably expected to affect the dependent var). Of course, Cp statistic has its shortfalls and there are other arguable statistics like AIC and BIC that can be used for model selection.
>
>Would help if someone can answer with respect to the Cp statistic only.
The Wikip article asserts, early on,
Mallows's Cp has been shown to be equivalent to Akaike information
criterion in the special case of Gaussian linear regression.[3]
And that [3] is a 2013 reference. So Cp and AIC must be equally
questionable.
You mean, "How should you proceed"? - Take the advice and
ignore models with Cp < p because that /does/ indicate over-fitting.
Logically. I can see it. Your problem seems to be that you don't find
that advice explicitly in Mallows's original paper. And you wonder if
this is a myth that has survived the last 40 years without being
questioned. Well, what got questioned (as I see it) is the worth of
that sort of judgement, so the details became moot. Perhaps you
could write to one of the authors who seemed to mis-assign credit
to him, if you wish to persevere.
This SPSS group, and the stats groups also, tended towards the
social sciences when they were active; this SPSS group is especially
for SPSS advice, though we do what we can on the other questions.
I pointed out that the whole area of "stepwise" is now regarded with
some skepticism in the social sciences. If your particular special
area still likes stepwise, then you should probably search for advice
or examples in the literature of your own area.
There are textbooks on data-mining, where "selection" is a renewed
problem. I can't say whether I ran across Cp in my textbook scanning
of data-mining, years ago. But I think that cross-validation was the
general advice.
--
Rich Ulrich