confidence interval calculation

Ellen Ann

unread,

Nov 3, 2009, 10:45:28 AM11/3/09

to meds...@googlegroups.com

Dear all,

I am looking for help to calculate the confidence interval around Observed - Expected. My expected values come from a logistic regression model. Any suggestions would be most appreciated.

Ted Harding

unread,

Nov 3, 2009, 11:57:22 AM11/3/09

to meds...@googlegroups.com

There is a problem, of some kind, with your question!

Since it is a logistic regression model, the observed outcome will
be 0 or 1 (though if you have several cases at the same values of
the covariates these can be grouped into a binomial outcome of
'r' 1s out of 'n' cases, but you do not describe your data enough
to make it clear which is the case).

The expected outcome, at a given set of covariate values, will be the
fitted probability P of Outcome=1, for some value of P with 0 < P < 1.

So (Observed - Expected) = (1 - P), or (0 - P) = -P.

Since this can have only two values, the concept of a "confidence
interval" for it is a bit elusive. There is, of course, uncertainty
in the value of P, which can be expressed as a standard error for P
(which any decent statistical software should be able to supply).
So there is certainly a confidence interval for P; but the binary
nature of the Observed still interferes with getting one for the
(Observed - Expected).

It is not like a standard linear regression, where the predicted
(Expected) value has uncertainty and varies continuously, and the
Outcome is distributed about the true mean with (usually) a Normal
distribution, so that (Oberved - Expected) is a continuous variable
for which one can certainly calculate a "confidence interval"
(combining the uncertainty about the Expected value with the random
distribution of the Observed value).

There is also the question: What is the interest in calculating
a "confidence interval" for (Observed - Expected) anyway?

If you told us more about what is going on, it may be possible
to give a properly targeted answer.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Nov-09 Time: 16:57:20
------------------------------ XFMail ------------------------------

Ellen Ann

unread,

Nov 3, 2009, 12:10:29 PM11/3/09

to meds...@googlegroups.com

Sorry for not giving enough details, I was trying to be brief but I was probably too brief.

What I'm ultimately trying to do is compare the performance of several "testers" who are diagnosing a certain disease within animals. I have created a logistic model to predict whether an animal should be diseased or not based on well known (from previous studies) predictors. I now want to compare the observed number of diseased animals with the predicted number to be able to rank the testers in order of performance for diagnosing disease in the animals (the test is fairly subjective). I have summed the number of observed and predicted for each tester and I now want to calculate the confidence interval around the O-E, but I wasnt too sure how to do this. I hope this makes a little more sense?

Thanks for your help.

Peter Flom

unread,

Nov 3, 2009, 12:34:19 PM11/3/09

to meds...@googlegroups.com

Ellen Ann wrote

<<
Sorry for not giving enough details, I was trying to be brief but I was probably too brief.

What I'm ultimately trying to do is compare the performance of several "testers" who are diagnosing a certain disease within animals. I have created a logistic model to predict whether an animal should be diseased or not based on well known (from previous studies) predictors. I now want to compare the observed number of diseased animals with the predicted number to be able to rank the testers in order of performance for diagnosing disease in the animals (the test is fairly subjective). I have summed the number of observed and predicted for each tester and I now want to calculate the confidence interval around the O-E, but I wasnt too sure how to do this. I hope this makes a little more sense?
>>>

I think this is a continuation of an earlier post of yours.

If I recall correctly, you had several testers, each testing different animals - I think it was a hundred or som animals per tester.

You then say that you have a logistic regression program that predicts the number of diseased animals.

so, you now have something like this

Tester Predicted by model Diagnosed by tester
1 45 52
2 51 57
3 42 38

etc.

Is that right?

And you are saying that the numbers -7, -6 and +4 are indicative of the skill of the testers.

This may be, I think, a mistake.

First, it assumes that the prediction of the model is always correct. If you have such a model, then
why do you need testers who use a subjective test?

Second, it assumes that calling a sick animal healthy is equivalent to calling a healthy animal sick ... I am not sure if this is sensible.

But, assuming that these values ARE what you want, I think the way to go is NOT to try to find the SE of O-E, but to turn these numbers into proportions, and then compare the proportions.

Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom

Steve Simon, P.Mean Consulting

unread,

Nov 3, 2009, 1:23:58 PM11/3/09

to meds...@googlegroups.com

Ellen Ann wrote:

> What I'm ultimately trying to do is compare the performance of several
> "testers" who are diagnosing a certain disease within animals. I have
> created a logistic model to predict whether an animal should be diseased
> or not based on well known (from previous studies) predictors. I now
> want to compare the observed number of diseased animals with the
> predicted number to be able to rank the testers in order of performance
> for diagnosing disease in the animals (the test is fairly subjective).
> I have summed the number of observed and predicted for each tester and I
> now want to calculate the confidence interval around the O-E, but I
> wasnt too sure how to do this. I hope this makes a little more sense?

Why not use sensitivity and specificity here?

Let the prediction of the logistic model be your "gold standard" of
disease. Let the results of an individual tester be a "diagnostic test".

Suppose the "gold standard" calls 50 animals as diseased. Among these 50
animals, the tester gets 35 right and 15 wrong. Then your sensitivity is
70%.

Among the 200 healthy animals (by the gold standard) a tester gets 180
right and 20 wrong. Then your specificity is 90%.

Plot each raters sensitivity/specificity pair on a scatterplot. You can
also show confidence limits for these pairs using standard formulas for
a confidence interval for a single proportion.

The problem here, of course, is that you won't be able to rank the
performance easily. Is someone who has sens=70% and spec=90% better or
worse than someone who has sens=90% and spec=70%? If the cost of a false
positive diagnosis and the cost of a false negative diagnosis are both
the same, then it would not be too outrageous to compute an overall
accuracy. For the first rater, it would be (35+180)/(50+200) = 86%. Note
that this is not the same as the average of 70% and 90% but rather is
weighted towards the specificity because of the greater number of
healthy animals.

I hope this helps.
--
Steve Simon, Standard Disclaimer
Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
"The first three steps in data entry, with examples in PASW/SPSS"
Details at www.pmean.com/webinars

Ellen Ann

unread,

Nov 4, 2009, 5:19:53 AM11/4/09

to meds...@googlegroups.com

Dear Peter and Steven

Thank you both very much for your helpful suggestions.

I agree with Peter that I am making the assumption that the model is a gold standard. I know it isn’t but I think it’s the best I have to predict which animals are probably diseased in order to rank the testers.

I think the analysis is similar to other studies looking at performance indicators for hospitals or surgeons. They use logistic regression to calculate risk-adjusted predicted mortalities then compare the O/E ratio. I will look at the O/E ratio but I am also interested in the absolute number they are missing or overestimating so I will also look at O-E. I tried using the proportions Obeserved and Expected then calculated a confidence interval based on the difference, however, the resulting limits were quite wide. I wasnt sure whether I should have tried to incorporate the standard error from the model for the CI of the expected proportion?

Peter - I am interested in both extremes of either missing healthy animals or diagnosing healthy animals as diseased. You are also right in thinking I have posted this problem before - I mailed the SAS list the previous time and I thought I had a solution but I didnt.

Steven - thank you for your thoughts on looking into sensitivity/specificity and the number correct. The problem with this method is that I have simply summed the predicted probabilities to calculate the expected number of diseased animals. Therefore I dont know exactly which ones are expected to be diseased. Maybe I should have used a cut-off value to assign animals to diseased/not?

Thanks once again for your time and effort

Peter Flom

unread,

Nov 4, 2009, 6:40:14 AM11/4/09

to meds...@googlegroups.com

From: Ellen Ann

<<<
I think the analysis is similar to other studies looking at performance indicators for hospitals or surgeons. They use logistic regression to calculate risk-adjusted predicted mortalities then compare the O/E ratio. I will look at the O/E ratio but I am also interested in the absolute number they are missing or overestimating so I will also look at O-E. I tried using the proportions Obeserved and Expected then calculated a confidence interval based on the difference, however, the resulting limits were quite wide. I wasnt sure whether I should have tried to incorporate the standard error from the model for the CI of the expected proportion?
>>>>

The CIs for differences in proportions are very tricky. There's a big literature on this.

<<<<

Peter - I am interested in both extremes of either missing healthy animals or diagnosing healthy animals as diseased. You are also right in thinking I have posted this problem before - I mailed the SAS list the previous time and I thought I had a solution but I didnt.

>>>

Another problem with looking at the totals rather than the individual cases is that a person could be wrong on EVERY animal and still get the right total.

Let's say half the animals are diseased. Well, if someone called every diseased animal healthy and every healthy animal sick, he or she would get EXACTLY the right number of diseased animals.

In your model, what you are testing is NOT accuracy of diagnosis exactly, although it's partly that.

Peter

Martin Holt

unread,

Nov 4, 2009, 6:44:46 AM11/4/09

to meds...@googlegroups.com

Hi Ellen,

As I understand your postings, your ultimate aim is to rank the testers, using the O/E ratio. I agree that this is similar to the development of ranks for hospitals or surgeons, where they often use logistic regression. Two thoughts:

Sample size~Peter remembered 100 animals or so, per tester. For logistic regression, the most common rule of thumb is at least 10 of the least common events (deaths) per risk factor. You don't mention use of risk factors, but if the motality rate is low, this could lead to the need for a bigger sample size than you've got. A common indicator is the Hosmer-Lemeshow statistic, to test goodness of fit, and another is the c-index: together these can be used to assess the need for individual risk factors.

League tables~there is a societal need for league tables, but generally speaking they are not welcomed as much by statisticians as by the public who want them. One very good reason lies behind your question: the confidence intervals around any one school (for example) are so wide relative to the rankings that the position of that school could be anywhere over a wide range in the rankings. And a small change in the method of calculating the ranking (for example, in which risk factors are used), can take a school from the bottom area to the top area of the table.

I'm aware I haven't answered how to calculate the CI....I'm leaving that to the previous postings.

Best Wishes,

Martin Holt

BXC (Bendix Carstensen)

unread,

Nov 4, 2009, 2:35:26 PM11/4/09

to meds...@googlegroups.com

As far as I can see, "logistic model to predict whether an animal should be
diseased or not" produces a probability for each animal. And each tester produces a decision. Så for each tester you could draw up the ROC curve, the tester's outcome being the test, and the predicted probaility being the score.

Best regards,
Bendix

> -----Original Message-----
> From: meds...@googlegroups.com
> [mailto:meds...@googlegroups.com] On Behalf Of Steve Simon,
> P.Mean Consulting
> Sent: 3. november 2009 19:24
> To: meds...@googlegroups.com
> Subject: {MEDSTATS} Re: confidence interval calculation
>
>

Ellen Ann

unread,

Nov 5, 2009, 5:33:14 AM11/5/09

to meds...@googlegroups.com

Thank you to everyone for your help with this. Just to answer a few questions:

Bendix: I have 880 testers (who tested at least 10 animals - I'm not sure if this is an appropriate cut-off?) so running individual ROC curves for each tester would probably be impossible? However, I could run an overall ROC curve (I think?) and come up with an appropriate cut-point to allocate animals as diseased - but is this introducing an extra source of error? Or is it much better than just summing totals because of the reasons Peter gave earlier?

Martin: You queried the sample size. I have 1,833 cases and over 92,000 records. In the model I have 8 independent variables. I did not include tester in the model because a lot of them have 0 diseased animals and this caused problems. The method I have used (not including tester in the model) is referred to in other papers as indirect standardisation.

I'm still having a few problems with producing appropriate confidence interval. I found a paper that recommends using propagation of errors to calculate CIs for O/E and they gave a sas programme on their site to run it but the programme runs out of memory on my dataset!! Can anyone recommend any good papers/books/links on how to calculate the appropriate variance for O/E and maybe even O-E???

Thanks

Martin Holt

unread,

Nov 5, 2009, 1:57:09 PM11/5/09

to meds...@googlegroups.com

Hi Ellen,

I'd like to refer you to this link:

http://www.indicators.scot.nhs.uk/Trends_Jan_2009/Standard.htm

and to this link

http://www.indicators.scot.nhs.uk/Work/CISTWorkingPaper3July13th20011.htm

Here it says that logistic regression and indirect standardisation are equivalent only with a fully saturated logistic model, that is with all main effects and all interactions....a bit much with 8 independent variables. The second link discusses the pros/cons of removal of (interaction) terms from the model. And whether using logistic regression is better than the 'manual' method.

Both methods are fully explained and each leads to an SMR....Observed/Expected scaled up by 100.

Googling leads to many ways of calculating a confidence interval for the SMR. This link, eg

http://www.openepi.com/SMR/SMR.htm

I hope that this helps.

Best Wishes,

Martin Holt

----- Original Message -----

From: Ellen Ann

To: meds...@googlegroups.com

Reply all

Reply to author

Forward