Adrian
2008/5/27 <martin...@yahoo.co.uk>:
> When doing logistic regression, I've always understood that there must
> be at least 10 of the rarer observations per parameter going into the
> model - as an absolute minimum. Then I read in "Categorical Data
> Analysis using the SAS System" by Maura E. Stokes et al (a SAS
> Institute publication): "Your choice depends partially on the sample
> size. There should be at least 5 observations for the rarer outcome
> per parameter being considered in the expanded model. Some analysts
> would prefer at least 10".
>
> Does anyone know of any support for 5?
As far as I know, the only empirical support for any of these rules
comes from a publication by Frank Harrell. He simulated some models and
showed that stepwise procedures did not replicate well when the 10-15
events per independent variables ratio was not maintained.
Obviously this would not apply to models that use cross-validation like
CART. Also, if you are willing to cite your work as exploratory and
needing replication prior to use in the real world, then a 10-15 to 1
ratio is not as critical.
I believe the proper citation is
Regression modeling strategies for improved prognostic prediction. Frank
E. Harrell. Statistics in Medicine 1984: 3143-152.
but I don't have the article in front of me right now to verify this.
I mention these issues briefly at
http://www.childrensmercy.org/stats/weblog2004/survival.asp
and
http://www.childrensmercy.org/stats/weblog2004/ratioobsivs.asp
but do not address this in the detail that it deserves.
Steve Simon, ssi...@cmh.edu, Standard Disclaimer
Evidence Based Medicine gives my book 4/4.5 stars out of five!
Full text is at http://ebm.bmj.com/cgi/content/full/12/2/59
However i am not sure Over dispersion is one concept which i really
struggle with. I have been reading the grail of McCullagh and Nedler,
and i cant find a specific reference to the number of parameters to
estimate and its consequence with regards to an over dispersed model,
except to say that if you have to many parameters you wont be able to
estimate the over dispersion parameter.
Not sure though?
Adrian
2008/5/28 <martin...@yahoo.co.uk>: