good / bad validation stats in rms

27 views
Skip to first unread message

Daniel Smith

unread,
Oct 16, 2015, 11:40:06 PM10/16/15
to regmod
Dear Frank,

The statistics that are displayed after a bootstrap validation in rms are all new to me. I have been doing my best to get a rough idea of what they are. But I cant seem to find at all, any guidelines on what is a good / ok / bad validation statistic - for example what is a good value of Dxy? I have seen statements like ‘the validation indicates some overfitting’ etc, but I don’t know what value of the validation stats would allow someone to say that. Perhaps there are no conventions? But then how do we know whether our model has been validated?! Which of the validation stats do we use (all of them or just a few?). Is it more of the case where we select a few models and then decide which is the best using validation stats?

Many thanks!

Frank Harrell

unread,
Oct 17, 2015, 9:06:22 AM10/17/15
to regmod
It's all relative and any cutoffs would be arbitrary.  Also, we sometimes worry at the amount of optimism (overfitting) and sometimes we just care about how decent the final overfitting-corrected indexes are.

If you use validations to select from among more than 2 competing models you will have to enclose the entire process in an outer bootstrap loop to properly penalize for this additional layer of data dredging.

You can think of 1 minus the calibration slope as a "proportion of overfitting", i.e., the fraction of what was learned from the data that was based on noise.
Reply all
Reply to author
Forward
0 new messages