When to use bootstrapping

438 views
Skip to first unread message

Emma Jane

unread,
Dec 4, 2009, 10:03:01 AM12/4/09
to MedStats
Dear all

I'm new to bootstrapping and I'd appreciate your opinions on the
following:

Let's say I have a moderately sized random sample and I wish to
calculate a mean and 95% CI.

(1) Are there any theoretical advantages to using sampling theory over
bootstrap resampling to generate the CI?

(2) What if the sample was non-random?

(3) What about more interesting statistics e.g. 95% CI for a median,
difference between 2 means?

(4) What if I had a small sample? Which of course begs the question,
how big is big enough!

Many thanks,

Emma Jane

Thomas Keller

unread,
Dec 5, 2009, 4:19:25 AM12/5/09
to MedStats

roland andersson

unread,
Mar 5, 2011, 10:01:38 AM3/5/11
to meds...@googlegroups.com
Thomas

We are planning to construct a clinical score for diagnosing a disease. We have 500 patients and will construct the score from about 8-10 variables. In a previous project we divided another set of patients in two samples - one construction sample and one validation sample. For this new project, with new patients and new variables, it has been proposed that we should use resampling methods. Can you give us some advice about this issue?

Regards

Roland E Andersson, MD PhD 

2009/12/5 Thomas Keller <tho...@gmx.de>
--~--~---------~--~----~------------~-------~--~----~
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
-~----------~----~----~----~------~----~------~--~---


Marc Schwartz

unread,
Mar 5, 2011, 10:24:56 AM3/5/11
to meds...@googlegroups.com
Roland,

Why not use k-fold cross-validation:


I don't know of too many folks these days that would proactively suggest split-sample approaches, although I do see it used.

HTH,

Marc Schwartz

Gary Collins

unread,
Mar 5, 2011, 11:29:02 AM3/5/11
to meds...@googlegroups.com

If you want to throw away some of your data, have inappropriately optimistic performance data and produce a “rule” which is relatively useless, then the split-sample approach is great.

 

With n of this magnitude, though it is the number of events in your sample which drives the model building process and not n (as mistakenly believed by some) you run the problem, if using split-sample of having insufficient events to derive a useful (and stable) model but also insufficient events to appropriately validate the model.  Splitting a sample into two datasets only creates two very similar (smaller) datasets (apart from differences by chance), and as such does not provide anything meaningful and your model will likely fail when subject to external validation.

 

So if you want to make the most of your data then you should be thinking along using bootstrapping.  You can then develop, validate (internal validation) and estimate optimism in model performance using all your data and not throw portions of it away.

 

There are many papers showing split-sample to be inappropriate with bootstrapping an alternative preferred approach, including (amongst others)

 

Hirsch, R. P. (1991). "Validation Samples." Biometrics 47(3): 1193-1194.

Steyerberg, E. W., F. E. Harrell Jr, et al. (2001). "Internal validation of predictive models: efficiency of some procedures for logistic regression analysis." J Clin Epidemiol 54: 774-781.

Steyerberg, E. W. (2009). Clinical prediction models: a practical approach to development, validation, and updating. New York, Springer.

 

Hope this helps.

 

Gary

 

---------------------------------------------------------------

Dr Gary S Collins                       Tel: +44 (0)1865 284418

Senior Medical Statistician             Fax: +44 (0)1865 284424

Centre for Statistics in Medicine         www.csm-oxford.org.uk

University of Oxford,

Wolfson College Annexe, Linton Road, Oxford, OX2 6UD

---------------------------------------------------------------

--

roland andersson

unread,
Mar 5, 2011, 5:18:38 PM3/5/11
to meds...@googlegroups.com
Gary and Marc

Thanks a lot. I will check out k-fold cross validation. I would appreciate any hint on how to use the methods that is available in Stata for this purpose. 

Roland   

2011/3/5 Gary Collins <gary.c...@csm.ox.ac.uk>

Marc Schwartz

unread,
Mar 5, 2011, 7:20:03 PM3/5/11
to meds...@googlegroups.com
Hi Roland,

Unfortunately, I do not use Stata, but R, so am unfamiliar with the functionality that might be available in that application. If there are standard functions available specific to this methodology, I would envision that they would be noted in the application's documentation along with examples.

Alternatively, you may need to post to a Stata specific forum or contact Stata's support folks, who should be able to guide you further.

Regards,

Marc

roland andersson

unread,
Mar 7, 2011, 1:26:53 AM3/7/11
to meds...@googlegroups.com
Gary

You are breaking up the open door. I am a surgeon and not a statistician.

Please give som more positive advice about how to construct the model
and how to validate. I used ordered logistic regression because I have
two levels of the stage of the diasese. Which sample should I use for
the construction? How should the validation be done, according to you?

Greetings
Roland

2011/3/5 Gary Collins <gary.c...@csm.ox.ac.uk>:

Frank Harrell

unread,
Mar 7, 2011, 2:42:09 PM3/7/11
to MedStats
I think you missed Gary's message. His advise was spot-on. Split
sample validation requires > 20,000 subjects before it is reliable.

There are entire books on the subject (such as Steyerberg's Clinical
Prediction Models and my Regression Modeling Strategies) just as there
are on surgical subspecialties.

Frank

On Mar 7, 12:26 am, roland andersson <rolanders...@gmail.com> wrote:
> Gary
>
> You are breaking up the open door. I am a surgeon and not a statistician.
>
> Please give som more positive advice about how to construct the model
> and how to validate. I used ordered logistic regression because I have
> two levels of the stage of the diasese. Which sample should I use for
> the construction? How should the validation be done, according to you?
>
> Greetings
> Roland
>
> 2011/3/5 Gary Collins <gary.coll...@csm.ox.ac.uk>:

roland andersson

unread,
Mar 7, 2011, 3:13:34 PM3/7/11
to meds...@googlegroups.com
Frank

I did not miss it. In my first posting I referred to the methodology I
used in a previous study. I assumed that it was not optimal. Therefore
I asked for advice about the methodology that should be used in a
future study. The answer I got was that I was using bad and
inefficient methods, which I already knew. I had opened the door but
Gary smashed it.

However, I can leave that and am still asking for advice on the
optimal method to construct a clinical score for predicting a disease,
and for validating this score. I have 500 new patients with suspicion
of the disease, 200 of which finally was diagnosed with the disease.
120 had the mild form and 80 the more serious form. I have 8-10
predictors which are all significant discriminators according to
ordered logistic regression.

I use Stata and if someone is familiar with that software you can give
me more precise guiding, however as this forum is not bound to any
specific software I will gladly receive any advice given from a
principal perspective.

A general reference to a textbook is of some help, of course, but I
would appreciate some more more practical advice.


Regards

Roland Andersson


2011/3/7 Frank Harrell <f.ha...@vanderbilt.edu>:

SR Millis

unread,
Mar 7, 2011, 3:28:37 PM3/7/11
to meds...@googlegroups.com
Roland,

In terms of your request for "more practical advice," I'd strongly recommend that you seek consultation from an appropriately trained statistician to find a solution to your problem.

However, as long as we're on the topic, you can tell me how to perform an appendectomy?

Best wishes,
SR Millis

~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat®
Professor
Wayne State University School of Medicine
Email: aa3...@wayne.edu
Email: srmi...@yahoo.com
Tel: 313-993-8085


--- On Mon, 3/7/11, roland andersson <roland...@gmail.com> wrote:

Richard Goldstein

unread,
Mar 7, 2011, 3:29:35 PM3/7/11
to meds...@googlegroups.com

I have not been following this thread but I do note that for the
Stata-specific aspect of the issue it would be better to send a question
to either tech-support or to the Stata listserv (you can join at
www.stata.com). I also note that Stata has quite extensive built-in
routines for bootstrapping so that all the set-up and output is done for you

Rich

>>>> performance data and produce a �rule� which is relatively useless, then the

Gary Collins

unread,
Mar 7, 2011, 4:04:09 PM3/7/11
to meds...@googlegroups.com
Hi Roland,

I have given advice based on the brief initial posting you wrote and provided guidance against using a split-sample approach which from your posting was not clear (to me) that you were aware of the major problems of such an conducting such an analysis and offered suggestions on how I would personally tackle your problem.

However, an additional piece of advice from your subsequent email is also to be very wary of using univariate regression models to identify predictors to include in a multivariable model. Selecting risk factors based solely on p-values from such univariate screening has been shown to be a poor approach and should be strongly cautioned against.

Sun, G. W., T. L. Shook, et al. (1996). "Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis." J Clin Epidemiol 49(8): 907-916.

Either fit a full model with all predictors and perform some kind of shrinkage (or penalized approach) or use clinical judgement (and not statistical criteria) to remove redundant variables. One crude measure of the potential for over-fitting is the so-called events per variable which should be at least 10 but preferably higher - and this is not the number of variables in your final model, but the number of (candidate) variables you considered at the outset that were subject to some kind of statistical testing and this include interactions and other types of terms.

Re practical advice. I don't use STATA (but R) so I can't comment on the relevant syntax you need. However, as Frank has pointed out (in addition to my earlier reply), that Ewout Steyerberg's book titled Clinical Prediction Models and indeed Frank's book on Regression Modeling Strategies cover the topic of bootstrapping in the context of developing and validating a prediction model in detail, which I suggest you consult for further information at a level which is easily accessible. But the basic principles and practical steps are (with possible variants)

(1) Build a model using all the data and evaluate this model using some performance metric of interest - discrimination or calibration)
(2) Sample with replacement from the original data
(3) Build a model from this sample using the same steps as done in step (1)
(4) Apply this model (built from the sampled data) to the original data and evaluate performance (using whatever metric you are interested in)
(5) calculate the difference in performance metrics
(6) repeat steps 1-4 at least 200 times to obtain a reasonably stable estimate of the optimism.

You can then subtract this value of the optimism from the performance metric obtained in (1) to obtain a optimism-corrected performance estimate. Following these steps you can derive and evaluate your model and obtain some estimate of optimism.

You should also be wary of any missing data in your dataset as well, this is an additional concern which you should be appropriately dealt with (using multiple imputation). Most risk prediction models exclude patients with missing data and conduct a complete-cases analysis which is potentially a serious source of bias and you need to account for this in your strategy to derive a prediction model.

Hope this helps

Gary

---------------------------------------------------------------
Dr Gary S Collins Tel: +44 (0)1865 284418
Senior Medical Statistician Fax: +44 (0)1865 284424
Centre for Statistics in Medicine www.csm-oxford.org.uk
University of Oxford,
Wolfson College Annexe, Linton Road, Oxford, OX2 6UD
---------------------------------------------------------------

roland andersson

unread,
Mar 7, 2011, 4:19:37 PM3/7/11
to meds...@googlegroups.com
Dr Millis

Thank you for this advice. However, the problem is that appropriately
trained statisticians that have thorough knowledge in bootstrapping
techniques (and less so in multiple imputation - another technique we
inted to use in the same project) do not abound were I am working. I
have been lucky to find profesional help a few times, but often when I
have tried to get advice from the statisticians that I have access to
I have sometimes understood that the statistician understood less of
the problem and the appropriate statistical methods than what I had
found out by myself.

Since many years I have done most of the statistic works in my
publications, which have involved also advanced techniques. Some of my
works have been published also in high ranking journals, like NEJM,
Annals of Surgery etc. I guess that I am not alone in this situation.
Do you think that this is wrong? Should statisticians always be
involved in clinical research? How much of the statistics should be
done by the clinical researcher?

We are of course not fully trained statisticians, but with some
guidance we can do quite a lot of also advanced statistics. Of course
we risk that the referee will turn us down and ask for a statistical
review, but it has never happened to me.

So my question is if this forum is open for people like me? Can I ask
questions and get advice without being patronised?

Your question about instructions on how to perform an appendectomy is
irrelevant as I think there are regulations about who is allowed to
practice surgery. I think we are not there yet when it comes to
practice statistical methods.

Regards

Roland Andersson

2011/3/7 SR Millis <srmi...@yahoo.com>:

roland andersson

unread,
Mar 7, 2011, 4:31:14 PM3/7/11
to meds...@googlegroups.com
Richard

I understand that this forum is not related to any specific software.
You can just tell me the principles or direct me to a place where I
can find this information. from that I can figure out myself how this
can be applied in my situation. As you have noted Stata have a large
number of appropriate routines. I can of course direct my questions to
Statalist, but this forum is especially intended for medical
statistics and I thought that there may be more participants that
share the same problem.

Regards

Roland Andersson

2011/3/7 Richard Goldstein <rich...@ix.netcom.com>:

>>>>> performance data and produce a “rule” which is relatively useless, then the

Richard Goldstein

unread,
Mar 7, 2011, 4:37:31 PM3/7/11
to meds...@googlegroups.com
Roland,

Stata has a -bootstrap- command and some post-estimation commands also;
look in the reference manual or type -help bootstrap-; these should do
all the necessary housekeeping; note the warning at the top of p. 199 of
the manual re: comparison of different ways of using the bootstrap
within Stata

Rich

>>>>>> performance data and produce a �rule� which is relatively useless, then the

roland andersson

unread,
Mar 7, 2011, 5:10:18 PM3/7/11
to meds...@googlegroups.com
Gary

Thank you for the advice. Indeed, in our previous study we selected
only complete cases for the model. We knew about multiple imputation
but this method is not a well-known for reviewers of surgical journals
so we decided not to use it. We also had not access to the appropriate
software at that time.

We have followed Steyerbergs advice to include variables with p<0.10
from the multivariable model.

I imagine I understand what you mean by "optimism-corrected
performance estimate", but it would be helpful if you could give an
example of a report where such methods have been used as a model for
how this can be presented.

It seems that these methods are common in basic research, but I think
it is not so common in clinical research. A problem is that clinical
journals want reports that describe results that can be easily
understood and applied. I am not sure that such methods have been used
very much in reports describing the construction and (internal)
validation of clinical prediction rules or scoring systems (which are
also not common). I have not come across any such report.

Regards

Roland Andersson


2011/3/7 Gary Collins <gary.c...@csm.ox.ac.uk>:

Gary Collins

unread,
Mar 7, 2011, 5:36:17 PM3/7/11
to meds...@googlegroups.com
Hi Roland,

I appreciate your comments re multiple imputation, but in my opinion, regardless of whether reviewers are aware or not of the statistical details does not justify inappropriate treatment of data. Standards for developing and validating prediction models have notoriously been very poor with reviewers as much to blame as authors.

Use of multiple imputation for example has been appropriately used in numerous studies developing and validating prediction models, for example

http://www.ncbi.nlm.nih.gov/pubmed/18573856
http://www.ncbi.nlm.nih.gov/pubmed/20466793

Patients with incomplete data should rarely (if ever) be discarded, conducting a complete-case analysis makes much stronger assumptions (which are often not met) than those of multiple imputation. Data are time-consuming and expensive to collect so as researchers and authors we need to ensure we are utilising them appropriately, in addition to minimising any potential biases. It is up to us to ensure that readers and reviewers are fully aware of all the relevant issues. The main potential drawback of not treating the data with appropriate care is that you run the likelihood of producing a prediction rule, which work perfectly well on your development dataset, but when applied to another group of patients will fail to work - which is often the case and ultimately the prediction rule never gets used.

Re optimism. A selection of examples of studies that have estimated optimism (amongst many) can be found in

http://www.ncbi.nlm.nih.gov/pubmed/20155439
http://www.ncbi.nlm.nih.gov/pubmed/17371884
http://www.ncbi.nlm.nih.gov/pubmed/19900801
http://www.ncbi.nlm.nih.gov/pubmed/19187369

These approaches are possibly not as rare as you think (though certainly not widespread), they are increasingly being used as the methodological standards in building and evaluating prediction models (slowly) increase.

Hope this helps.

bw,

Abhaya Indrayan

unread,
Mar 7, 2011, 6:14:06 PM3/7/11
to meds...@googlegroups.com
Great! I was looking for a number where split-sample validation could work and now I know it is 20,000. I do not have access to the books referred by Frank but it would be interesting to know how this number was arrived. 

~Abhaya


Frank Harrell

unread,
Mar 7, 2011, 7:26:18 PM3/7/11
to MedStats
Here's the "scientific" derivation. I had a dataset of 17,000
patients with a 30% mortality, binary endpoint. I split into two
random halves. I developed a model on the first half and validated on
the 2nd half. Just to be sure I started the whole process over. This
second time the validation ROC area was substantially different from
what I obtained from the first split.

Frank

On Mar 7, 5:14 pm, Abhaya Indrayan <a.indra...@gmail.com> wrote:
> Great! I was looking for a number where split-sample validation could work
> and now I know it is 20,000. I do not have access to the books referred by
> Frank but it would be interesting to know how this number was arrived.
>
> ~Abhaya
>

Max Jasper

unread,
Mar 7, 2011, 8:05:21 PM3/7/11
to meds...@googlegroups.com
Instead of "one construction sample and one validation sample" , a clean and more accurate way to pick up your significant variables is to use a case-control analysis....you can use any number of your data cases.

Max.

Richard Goldstein

unread,
Mar 7, 2011, 9:24:14 PM3/7/11
to meds...@googlegroups.com
Roland,

re: MI and Stata, I note that (1) there is an official command now
(version 11), but (2) there has been an excellent user-written command
(-ice) for a number of years and you should look at each of these (to
see Royston's articles on -ice-, go to the Stata Journal part of the web
site -- articles over 3 years of age are available at no cost and there
are several articles by Royston and colleagues that are more than 3
years old.

Rich

roland andersson

unread,
Mar 8, 2011, 1:27:38 AM3/8/11
to meds...@googlegroups.com
Richard

We have tried MI but for various reasons we will use ice and mim in
the next study. As I said - the previous article was published in 2008
and the work was started around 2005. At that time we knew about
imputation of missing values and we were tempted to use it, but had no
software and had not seen it used in published reports.

My question was mainly about bootstrapping for construction and
validation of a clinical score.

Roland


2011/3/8 Richard Goldstein <rich...@ix.netcom.com>:

Adrian Sayers

unread,
Mar 8, 2011, 4:23:11 AM3/8/11
to meds...@googlegroups.com
I would suggest looking at this book,

The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford Statistical Science Series) [Paperback]

Margaret Sullivan Pepe

from recollection i
t describes the creation of a risk model, and the use of ROC curves in conjunction with GLM to adjust for many variables.

I think it has an accompanying website, and datafiles.




bw
Adrian

Abhaya Indrayan

unread,
Mar 8, 2011, 5:22:56 AM3/8/11
to meds...@googlegroups.com
If split half of 17,000 with a mortality of 30% could not give a replicable ROC, I doubt if 20,000 would be enough, particularly if mortality is 5 or 10%. 

Perhaps n by itself should not be the criterion. Correct knowledge about the factors affecting mortality and their appropriate measurement may be more important. Dominant problem could be the epistemic uncertainty. This is my concern in all such efforts. We believe we know enough about biological processes but possibly we do not in some (many?) cases. Working with existing knowledge is the stark reality, and this is not sufficiently realized in some scientific endeavours.

~Abhaya

roland andersson

unread,
Mar 8, 2011, 6:02:38 AM3/8/11
to meds...@googlegroups.com
Frank

How big was the difference in the coefficients? I understand that even
small (clinically nonsignificant) differences can become statistically
different in such big samples.

In the construction of the clinical score I am talking about we
rounded all coefficients to the nearest larger integer, because we
wanted a score that was simple for the clinician to use. We found that
this brutal manipulation produced a score that discriminated as well
as a score based on the regression coefficients. And in external
validation we have now obtained usable results. using alI the
available patients in the construction would probably have lead to a
slightly different model, but I doubt that it would have changed the
scoring value by one integer.

I agree with everything that you and Gary has said and I want to use
the currently best approach, but what I am saying is that we sometimes
can get usable results also with substandard methods that were
regarded as the best only some years ago. It all depends on how the
results is intended to be used. This is my opinion as an amateur in
statistics. I am open for correction.

Roland Andersson


2011/3/8 Frank Harrell <f.ha...@vanderbilt.edu>:

Frank Harrell

unread,
Mar 8, 2011, 8:29:29 AM3/8/11
to MedStats


On Mar 7, 4:10 pm, roland andersson <rolanders...@gmail.com> wrote:
> Gary
>
> Thank you for the advice. Indeed, in our previous study we selected
> only complete cases for the model. We knew about multiple imputation
> but this method is not a well-known for reviewers of surgical journals
> so we decided not to use it. We also had not access to the appropriate
> software at that time.
>
> We have followed Steyerbergs advice to include variables with p<0.10
> from the multivariable model.

Please re-read his advice. I would be surprised if he recommended
that.
Frank

>
> I imagine I understand what you mean by "optimism-corrected
> performance estimate", but it would be helpful if you could give an
> example of a report where such methods have been used as a model for
> how this can be presented.
>
> It seems that these methods are common in basic research, but I think
> it is not so common in clinical research. A problem is that clinical
> journals want reports that describe results that can be easily
> understood and applied. I am not sure that such methods have been used
> very much in reports describing the construction and (internal)
> validation of clinical prediction rules or scoring systems (which are
> also not common). I have not come across any such report.
>
> Regards
>
> Roland Andersson
>
> 2011/3/7 Gary Collins <gary.coll...@csm.ox.ac.uk>:
> > 2011/3/7 Frank Harrell <f.harr...@vanderbilt.edu>:
> >>> > have 500 patients and will construct the score...
>
> read more »

Frank Harrell

unread,
Mar 8, 2011, 8:30:24 AM3/8/11
to MedStats
That would destroy the intercept. For predictive modeling, cohort
studies are usually preferred.
Frank

On Mar 7, 7:05 pm, Max Jasper <maxjas...@shaw.ca> wrote:
> Instead of "*one construction sample and one validation sample*" , a clean
> and more accurate way to pick up your significant variables is to use a *case-control
> analysis*....you can use any number of your data cases.
>
> Max.

SR Millis

unread,
Mar 8, 2011, 10:15:49 AM3/8/11
to meds...@googlegroups.com
Roland,

I'm a board certified clinical neuropsychologist who spent many years evaluating and treating patients prior to my transition to full-time research. Later in life, I returned to university to receive formal training as a statistician. My training and experience as a clinician has certainly made me a better statistician. However, at a certain point, I came to the realization that, at least for me, my statistical practice would be quite limited without going back to school.

So, yes, speaking as a former clinician and current statistician, I welcome your participation on this listserv. I think that we need to acknowledge that there limitations to what any listserv can provide its participants. Without access to your database and face-to-face interaction with you, I think that it's difficult to give you the sort of detailed advice you're seeking. As a Stata user, too, I'm not aware of any Stata program that will do what you're seeking to do.

What to do next? I suspect that you can do the sort of bootstrapping you've descibed in R. The texts by Harrell and Steyerberg are excellent resources in this regard. If you don't have access to a statistician to assist you, I think that I would be an excellent investment of your time if you were to work your way through Harrell and Steyerberg. That's how I learned to use the bootstrap for model validation. Both texts provide examples, R commands, output, and interpretation.

Although not regulated like surgery, in the UK, there is the Chartered Statistician (CStat) and in the USA, the accredited Professional Statistician (PStat).

Regards,

Martin Holt

unread,
Mar 8, 2011, 11:08:48 AM3/8/11
to meds...@googlegroups.com
Hi Roland,
 
MedStats was created to help anyone with a statistical query, from a research nurse to a highly trained consultant. And the first rule I put in place was "No flaming" .....which covers your statement about feeling patronised.
 
Do you feel more informed now, Roland? I think that your query is just the sort of query that motivated me to found MedStats: to have an open discussion about what is a very interesting area of Medical Statistics. I have first hand experience of what you are trying to achieve and doubt you would easily obtain the quality of responses that have been made during this discussion as easily as putting the query to MedStats. But, yes, patronising is something we do not want.
 
I think your question as to whether or not you should perform the statistics on work for which you are the lead clinician ~ that's an interesting question.I think, ultimately, that you're covered if you state that this is what you have done in the published paper. Who was responsible for what? Otherwise I think you get into the field of adequate blinding.
 
Best Regards,

Martin Holt
Medical Statistician

Steve Simon, P.Mean Consulting

unread,
Mar 8, 2011, 12:25:31 PM3/8/11
to meds...@googlegroups.com, roland andersson
Roland Andersson raised an interesting point about the type of advice
being offered on lists like this one.

On 3/7/2011 3:19 PM, roland andersson wrote:

> So my question is if this forum is open for people like me? Can I ask
> questions and get advice without being patronised?

You get what you pay for, and since the advice here is free, that should
tell you something. You can't get complex answers to difficult questions
by email. The best you can hope for on a list like this is being pointed
in the right direction. If we get a bit partonizing at times, please
understand that we are doing this on our own time and at our own expense.

There are lots of people who can provide more detailed answers for a
fee. My fee is $175 per hour, but you can probably find someone who is a
bit cheaper than me. The are hundreds of us out there. My purpose in
mentioning this is not to solicit business, but rather to point out that
you have an excellent alternative to advice from MEDSTATS if you have
the budget.

If you don't want to spend that kind of money and you are finding the
advice here to be not that helpful, you can raise a plea like you have
done above, or you can try to phrase your questions differently, or you
can ignore the advice that you find partonizing.

But the previous emailer raised a very serious and important point. You
are in an area that is very complex and it may be a bad idea to continue
to seek advice for free. It may seem like patronizing to point out that
you may be out of your depth, but I'm sure it was intended as a helpful
nudge.

When I offer advice on a list like this, I always feel a but guilty
because I don't take the time to ask some background questions and I
don't try to totally understand the context of the problem. I also don't
provide enough details in many of my answers. That's just the nature of
advice on a list like this. No one is going to invest hours of time
without compensation. We do get some non-monetary compensation, of
course, such as the thrill of an intellectual challenge, but it doesn't
pay the light bills.

The quality of advice I offer to paying clients is substantially better
than the advice I offer here, but the advice I provide here is sometimes
helpful and it is certainly a bargain at $0 per hour.

I do wish you the best of luck with your project. I have not offered any
answers to your questions earlier because you are already getting better
advice than I could provide. Seriously! You have some of the rock stars
of Statistics offering you advice and I've learned a lot from reading
their comments.

Steve Simon, n...@pmean.com, Standard Disclaimer.
Sign up for the Monthly Mean, the newsletter that
dares to call itself average at www.pmean.com/news

SR Millis

unread,
Mar 8, 2011, 1:02:41 PM3/8/11
to meds...@googlegroups.com
Thanks, Steve. Well-stated.

Scott


~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat®
Professor
Wayne State University School of Medicine
Email: aa3...@wayne.edu
Email: srmi...@yahoo.com
Tel: 313-993-8085


--- On Tue, 3/8/11, Steve Simon, P.Mean Consulting <n...@pmean.com> wrote:

roland andersson

unread,
Mar 8, 2011, 1:52:01 PM3/8/11
to meds...@googlegroups.com
Martin

Thank you for this comment. I certainly am more informed after the discussion. I am happy that this forum exist. 

I have received very helpful and objective answers and comments to my previous postings to this forum and assume that the patronising tone I received this time was just an accident or a misunderstanding. 

I think I will come back to your last comment about the clinician doing the statistics in another post later on. 

Greetings
Roland Andersson
  

2011/3/8 Martin Holt <m861...@btinternet.com>

Neil Shephard

unread,
Mar 9, 2011, 5:17:59 AM3/9/11
to meds...@googlegroups.com
On Tue, Mar 8, 2011 at 6:52 PM, roland andersson <roland...@gmail.com> wrote:
>
> I have received very helpful and objective answers and comments to my previous postings to this forum and assume that the patronising tone I received this time was just an accident or a misunderstanding.

Its worth bearing in mind that in non-verbal communication such as
discussion forums a lot of information that provides context and
meaning (e.g. intonation, body language, and so forth) is lost so its
very easy to mis-interpret what is being conveyed.

I always try and bear this in mind and over-look anything that I think
is negative and focus on the meat of the responses.

Neil


--
“Truth in science can be defined as the working hypothesis best suited
to open the way to the next better one.” - Konrad Lorenz

Email - nshe...@gmail.com
Website - http://kimura.no-ip.org/
Photos - http://www.flickr.com/photos/slackline/

roland andersson

unread,
Mar 9, 2011, 2:40:19 PM3/9/11
to meds...@googlegroups.com
Gary

Thank you for the reference to Steyerberg's book Clinical Prediction
Models. I have looked it up and it seems to suit my needs very well.

Roland

2011/3/7 Gary Collins <gary.c...@csm.ox.ac.uk>:

Pedro Emmanuel Alvarenga Americano do Brasil

unread,
Mar 9, 2011, 4:11:11 PM3/9/11
to meds...@googlegroups.com
Hello stats matster and Mr Rolland,

Sorry got late to this topic. Just another suggestions. Similar to Mr Rolland, Im also involved in projects to develop clinical prediction models and no statistician around to give me a hand. Different from him, I sort of left Stata behind long ago and currently deal only with R. This make much easier to read "Clinical predictioin models". As an infectious disease specialist, I see a lot of myself in Mr Rolland speach. 

A few thought that may help...

1. The evidence based books state that validation (to check if a model performance is kept in other clinical scenario) of a prediction model (or decision rule) can be "narrow" or "broad". The "narrow" way may fit the definition of temporal validation in the "Clinical prediction model" book. Thus, splitting the sample into two samples of different times, may be acceptable to validate a model (eg develop the model with patients data from 2008 to 2009, and then validate the model with data from patients from 2010), although it may compromise precision of the estimates. But if you have data from patients of different hospitals or research centers, thus you may be able to run a "broad" validation ("Evidence based clinical diagnosis" or "User's guide to evidence based medicine"). If performance is not similar, thus updating the model may be a good decision. Thus before, developing a new models with new informations, you may want to try the old model in the new data. If the performance is bad, this could be a argument in favor to further add information to the model. 

2. Bootstrap, as every method, has its advantages and its limitations. The good thing about it is that it may be a way to select variables, to estimate optimism (and prevent overfitting) and to calibrate the model at the same time (in R http://finzi.psych.upenn.edu/R/library/tmp/rms/html/validate.html). But it does not mean that other methods to select variables, penalize the model, and calibrate could not be used. Perhaps Stata has other satistical tools to do these analysis steps. 

Im sorry I cant recommend the Stata function names... but I would say that Im now glad that I migrate from Stata to R. It took me 2 to 3 years to feel comfortable using R. That is a lot of time, but no regret so far.

R is opensource.org, thus no concern about buying license(or worse, cracking it). R is available to every OS, the experience that I have in R users mail list is better than Stata support... and there are several other things that I believe are advantages of R. Also, there are several R graphical interfaces that can be installed as extensions such as R commander, and Rstudio, which may help to turn R more friendly to beginners. It is like learning a new language. But I always say... the best software is the one you know how to use and suit your purposes.

I strongly suggest reading "Clinical prediction models". It is much more friendly for those with less statistics experience then "Regression modeling strategies", and it may be quite an incentive to understand and use R. There is some exercises here that you may take a look... http://www.clinicalpredictionmodels.org/   

Abraço forte e que a força esteja com você,

Dr. Pedro Emmanuel A. A. do Brasil
Instituto de Pesquisa Clínica Evandro Chagas
Fundação Oswaldo Cruz
Rio de Janeiro - Brasil
Av. Brasil 4365
Tel 55 21 3865-9648
email: pedro....@ipec.fiocruz.br
email: emmanue...@gmail.com

---Apoio aos softwares livres
www.zotero.org - gerenciamento de referências bibliográficas.
www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações.
www.epidata.dk - entrada de dados.
www.r-project.org - análise de dados.
www.ubuntu.com - sistema operacional


Reply all
Reply to author
Forward
0 new messages