Internal validation of logistic regression model with bootstrap doubt

Marta Garcia-Granero

unread,

Dec 13, 2013, 6:06:48 AM12/13/13

to meds...@googlegroups.com

Hi everybody:

I am working with an ophthalmologist in a binary multivariable
regression model for glaucoma (3 relevant predictors, selected manually
and carefully from all the data generated by an OCT apparatus, and
adjusted for age). I have generated 1000 bootsamples from the original
dataset (sample size=426; 181 glaucoma cases), and I have obtained the
1000 b-coefficients for those predictors, checked thier distribution and
average values, computed the bias...

Now my doubt: what is better, to use the original model on the 1000
bootsamples to obtain the 1000 c-statistics (fix the b coefficients, use
them to generate the predicted values and compute AUC), or the 1000
slightly different models to generate the predicted probabilities and
then the 1000 AUC? As I see it, the first approach would be like
directly bootstrapping the predicted values, while the second would
allow for the same case getting a different predicted probability.

Thanks in advance

Marta Garc�a-Granero
Dpt. of Biochemistry and Genetics, Biostatistics Unit
University of Navarra

Giulio Flore

unread,

Dec 13, 2013, 6:17:59 AM12/13/13

to meds...@googlegroups.com

Hi,

I assume that what you want to achieve here is a validation of the C-statistic. If you intend to keep all predictors of the first phase, you should do the same for the C/AUC values, as they will be both expression of the bootstrapping process. If you hold fixed the coefficients (which are estimates of their bootstrap distribution) you get a new C-stats/AUC value distribution that 1) is only partially representative of the original bootstrap exercise and 2) will have a lower and probably unrealistic variance.

Brief, for consistency sake, you ought to get your bootstrap for all relevant model outputs at the same time.

Regards

Giulio

Marta García-Granero

Dpt. of Biochemistry and Genetics, Biostatistics Unit
University of Navarra

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--- You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marta Garcia-Granero

unread,

Dec 13, 2013, 9:10:03 AM12/13/13

to meds...@googlegroups.com

El 13/12/2013 12:17, Giulio Flore escribió:

Hi,

I assume that what you want to achieve here is a validation of the C-statistic. If you intend to keep all predictors of the first phase, you should do the same for the C/AUC values, as they will be both expression of the bootstrapping process. If you hold fixed the coefficients (which are estimates of their bootstrap distribution) you get a new C-stats/AUC value distribution that 1) is only partially representative of the original bootstrap exercise and 2) will have a lower and probably unrealistic variance.

Brief, for consistency sake, you ought to get your bootstrap for all relevant model outputs at the same time.

Thanks a lot, Giulio, for your quick, concise, and clarifying answer. That was what I thought.

Regards,
MGG

Regards

Giulio

On 13 December 2013 12:06, Marta Garcia-Granero <mgarcia...@gmail.com> wrote:

Hi everybody:

I am working with an ophthalmologist in a binary multivariable regression model for glaucoma (3 relevant predictors, selected manually and carefully from all the data generated by an OCT apparatus, and adjusted for age). I have generated 1000 bootsamples from the original dataset (sample size=426; 181 glaucoma cases), and I have obtained the 1000 b-coefficients for those predictors, checked thier distribution and average values, computed the bias...

Now my doubt: what is better, to use the original model on the 1000 bootsamples to obtain the 1000 c-statistics (fix the b coefficients, use them to generate the predicted values and compute AUC), or the 1000 slightly different models to generate the predicted probabilities and then the 1000 AUC? As I see it, the first approach would be like directly bootstrapping the predicted values, while the second would allow for the same case getting a different predicted probability.

Thanks in advance

Marta García-Granero
Dpt. of Biochemistry and Genetics, Biostatistics Unit
University of Navarra

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--- You received this message because you are subscribed to the Google Groups "MedStats" group.

To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.

To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

Frank Harrell

unread,

Dec 13, 2013, 3:01:20 PM12/13/13

to meds...@googlegroups.com

I don't think what you've described is the appropriate Efron-Gong optimism bootstrap for internal validation of predictive accuracy. Also, examining the 1000 sets of coefficients is not very productive.

Frank

Marta Garcia-Granero

unread,

Dec 16, 2013, 10:37:30 AM12/16/13

to meds...@googlegroups.com

Hi Frank:

Thank you for your comments. I have looked for the Efron-Gong paper, and also one of yours, but I'm still a bit at a loss. Could you please indicate if I'm on the correct path?

1) Run the logistic model on my 1000 bootstrapped samples.
2) Apply the model each time to my original dataset, compute the AUC of those models when applied to the original dataset
3) Get the average AUC and compare it to the original one.

Am I correct?

Thanks in advance

Marta GG

SR Millis

unread,

Dec 16, 2013, 11:04:54 AM12/16/13

to meds...@googlegroups.com

Here's what I do--- in R:

library(rms, T)

attach(cv_old)

fit <- lrm(group ~ LDFR + DISC + dP , x=T, y=T)

fit

residuals.lrm(fit, type='gof')

fit <- update(fit, x=T, y=T)

v <- validate(fit, B=500)

v

cal <- calibrate(fit, B=1000)

plot(cal)

~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat®
Board Certified in Clinical Neuropsychology, Clinical Psychology, & Rehabilitation Psychology
Professor
Wayne State University School of Medicine
Email: aa3...@wayne.edu
Email: srmi...@yahoo.com
Tel: 313-993-8085

Marta Garcia-Granero

unread,

Dec 16, 2013, 11:47:31 AM12/16/13

to meds...@googlegroups.com

Hi:

Unfortunately, I am working with SPSS (my datasets, both original plus bootstraped samples), although I could export my data to Stata. That's why I am asking for the steps involved, in order to program them with SPSS. I already wrote the bootstrap-samples generator, and I use OMS to extract the relevant output for further analyses.

Is there a paper that explains in detail what the R package does? I might be able to program the same in SPSS.

Thanks,
Marta

SR Millis

unread,

Dec 16, 2013, 12:49:37 PM12/16/13

to meds...@googlegroups.com

Frank Harrell in his book, Regression Modeling Strategies, provides an excellent discussion of boostrapping to estimate the amount over over-optimism in a model.

SR Millis

~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat®
Board Certified in Clinical Neuropsychology, Clinical Psychology, & Rehabilitation Psychology
Professor
Wayne State University School of Medicine
Email: aa3...@wayne.edu
Email: srmi...@yahoo.com
Tel: 313-993-8085

Reply all

Reply to author

Forward