I am looking for help to calculate the confidence interval around Observed - Expected. My expected values come from a logistic regression model. Any suggestions would be most appreciated.
> Dear all, > I am looking for help to calculate the confidence interval around > Observed - Expected. My expected values come from a logistic > regression model. > Any suggestions would be most appreciated.
There is a problem, of some kind, with your question!
Since it is a logistic regression model, the observed outcome will be 0 or 1 (though if you have several cases at the same values of the covariates these can be grouped into a binomial outcome of 'r' 1s out of 'n' cases, but you do not describe your data enough to make it clear which is the case).
The expected outcome, at a given set of covariate values, will be the fitted probability P of Outcome=1, for some value of P with 0 < P < 1.
So (Observed - Expected) = (1 - P), or (0 - P) = -P.
Since this can have only two values, the concept of a "confidence interval" for it is a bit elusive. There is, of course, uncertainty in the value of P, which can be expressed as a standard error for P (which any decent statistical software should be able to supply). So there is certainly a confidence interval for P; but the binary nature of the Observed still interferes with getting one for the (Observed - Expected).
It is not like a standard linear regression, where the predicted (Expected) value has uncertainty and varies continuously, and the Outcome is distributed about the true mean with (usually) a Normal distribution, so that (Oberved - Expected) is a continuous variable for which one can certainly calculate a "confidence interval" (combining the uncertainty about the Expected value with the random distribution of the Observed value).
There is also the question: What is the interest in calculating a "confidence interval" for (Observed - Expected) anyway?
If you told us more about what is going on, it may be possible to give a properly targeted answer.
Sorry for not giving enough details, I was trying to be brief but I was
probably too brief.
What I'm ultimately trying to do is compare the performance of several
"testers" who are diagnosing a certain disease within animals. I have
created a logistic model to predict whether an animal should be diseased or
not based on well known (from previous studies) predictors. I now want to
compare the observed number of diseased animals with the predicted number to
be able to rank the testers in order of performance for diagnosing disease
in the animals (the test is fairly subjective). I have summed the number of
observed and predicted for each tester and I now want to calculate the
confidence interval around the O-E, but I wasnt too sure how to do this. I
hope this makes a little more sense?
Thanks for your help.
On Tue, Nov 3, 2009 at 4:57 PM, Ted Harding <Ted.Hard...@manchester.ac.uk>wrote:
> On 03-Nov-09 15:45:28, Ellen Ann wrote:
> > Dear all,
> > I am looking for help to calculate the confidence interval around
> > Observed - Expected. My expected values come from a logistic
> > regression model.
> > Any suggestions would be most appreciated.
> There is a problem, of some kind, with your question!
> Since it is a logistic regression model, the observed outcome will
> be 0 or 1 (though if you have several cases at the same values of
> the covariates these can be grouped into a binomial outcome of
> 'r' 1s out of 'n' cases, but you do not describe your data enough
> to make it clear which is the case).
> The expected outcome, at a given set of covariate values, will be the
> fitted probability P of Outcome=1, for some value of P with 0 < P < 1.
> So (Observed - Expected) = (1 - P), or (0 - P) = -P.
> Since this can have only two values, the concept of a "confidence
> interval" for it is a bit elusive. There is, of course, uncertainty
> in the value of P, which can be expressed as a standard error for P
> (which any decent statistical software should be able to supply).
> So there is certainly a confidence interval for P; but the binary
> nature of the Observed still interferes with getting one for the
> (Observed - Expected).
> It is not like a standard linear regression, where the predicted
> (Expected) value has uncertainty and varies continuously, and the
> Outcome is distributed about the true mean with (usually) a Normal
> distribution, so that (Oberved - Expected) is a continuous variable
> for which one can certainly calculate a "confidence interval"
> (combining the uncertainty about the Expected value with the random
> distribution of the Observed value).
> There is also the question: What is the interest in calculating
> a "confidence interval" for (Observed - Expected) anyway?
> If you told us more about what is going on, it may be possible
> to give a properly targeted answer.
<< Sorry for not giving enough details, I was trying to be brief but I was probably too brief.
What I'm ultimately trying to do is compare the performance of several "testers" who are diagnosing a certain disease within animals. I have created a logistic model to predict whether an animal should be diseased or not based on well known (from previous studies) predictors. I now want to compare the observed number of diseased animals with the predicted number to be able to rank the testers in order of performance for diagnosing disease in the animals (the test is fairly subjective). I have summed the number of observed and predicted for each tester and I now want to calculate the confidence interval around the O-E, but I wasnt too sure how to do this. I hope this makes a little more sense?
I think this is a continuation of an earlier post of yours.
If I recall correctly, you had several testers, each testing different animals - I think it was a hundred or som animals per tester.
You then say that you have a logistic regression program that predicts the number of diseased animals.
so, you now have something like this
Tester Predicted by model Diagnosed by tester 1 45 52 2 51 57 3 42 38
etc.
Is that right?
And you are saying that the numbers -7, -6 and +4 are indicative of the skill of the testers.
This may be, I think, a mistake.
First, it assumes that the prediction of the model is always correct. If you have such a model, then why do you need testers who use a subjective test?
Second, it assumes that calling a sick animal healthy is equivalent to calling a healthy animal sick ... I am not sure if this is sensible.
But, assuming that these values ARE what you want, I think the way to go is NOT to try to find the SE of O-E, but to turn these numbers into proportions, and then compare the proportions.
Ellen Ann wrote: > What I'm ultimately trying to do is compare the performance of several > "testers" who are diagnosing a certain disease within animals. I have > created a logistic model to predict whether an animal should be diseased > or not based on well known (from previous studies) predictors. I now > want to compare the observed number of diseased animals with the > predicted number to be able to rank the testers in order of performance > for diagnosing disease in the animals (the test is fairly subjective). > I have summed the number of observed and predicted for each tester and I > now want to calculate the confidence interval around the O-E, but I > wasnt too sure how to do this. I hope this makes a little more sense?
Why not use sensitivity and specificity here?
Let the prediction of the logistic model be your "gold standard" of disease. Let the results of an individual tester be a "diagnostic test".
Suppose the "gold standard" calls 50 animals as diseased. Among these 50 animals, the tester gets 35 right and 15 wrong. Then your sensitivity is 70%.
Among the 200 healthy animals (by the gold standard) a tester gets 180 right and 20 wrong. Then your specificity is 90%.
Plot each raters sensitivity/specificity pair on a scatterplot. You can also show confidence limits for these pairs using standard formulas for a confidence interval for a single proportion.
The problem here, of course, is that you won't be able to rank the performance easily. Is someone who has sens=70% and spec=90% better or worse than someone who has sens=90% and spec=70%? If the cost of a false positive diagnosis and the cost of a false negative diagnosis are both the same, then it would not be too outrageous to compute an overall accuracy. For the first rater, it would be (35+180)/(50+200) = 86%. Note that this is not the same as the average of 70% and 90% but rather is weighted towards the specificity because of the greater number of healthy animals.
I hope this helps. -- Steve Simon, Standard Disclaimer Second free statistics webinar, Wed, Nov 4, 11am-noon CST. "The first three steps in data entry, with examples in PASW/SPSS" Details at www.pmean.com/webinars
Thank you both very much for your helpful suggestions.
I agree with Peter that I am making the assumption that the model is a gold
standard. I know it isn’t but I think it’s the best I have to predict which
animals are probably diseased in order to rank the testers.
I think the analysis is similar to other studies looking at performance
indicators for hospitals or surgeons. They use logistic regression to
calculate risk-adjusted predicted mortalities then compare the O/E ratio. I
will look at the O/E ratio but I am also interested in the absolute number
they are missing or overestimating so I will also look at O-E. I tried
using the proportions Obeserved and Expected then calculated a confidence
interval based on the difference, however, the resulting limits were quite
wide. I wasnt sure whether I should have tried to incorporate the standard
error from the model for the CI of the expected proportion?
Peter - I am interested in both extremes of either missing healthy animals
or diagnosing healthy animals as diseased. You are also right in thinking I
have posted this problem before - I mailed the SAS list the previous time
and I thought I had a solution but I didnt.
Steven - thank you for your thoughts on looking into sensitivity/specificity
and the number correct. The problem with this method is that I have simply
summed the predicted probabilities to calculate the expected number of
diseased animals. Therefore I dont know exactly which ones are expected to
be diseased. Maybe I should have used a cut-off value to assign animals to
diseased/not?
Thanks once again for your time and effort
On Tue, Nov 3, 2009 at 6:23 PM, Steve Simon, P.Mean Consulting <
> > What I'm ultimately trying to do is compare the performance of several
> > "testers" who are diagnosing a certain disease within animals. I have
> > created a logistic model to predict whether an animal should be diseased
> > or not based on well known (from previous studies) predictors. I now
> > want to compare the observed number of diseased animals with the
> > predicted number to be able to rank the testers in order of performance
> > for diagnosing disease in the animals (the test is fairly subjective).
> > I have summed the number of observed and predicted for each tester and I
> > now want to calculate the confidence interval around the O-E, but I
> > wasnt too sure how to do this. I hope this makes a little more sense?
> Why not use sensitivity and specificity here?
> Let the prediction of the logistic model be your "gold standard" of
> disease. Let the results of an individual tester be a "diagnostic test".
> Suppose the "gold standard" calls 50 animals as diseased. Among these 50
> animals, the tester gets 35 right and 15 wrong. Then your sensitivity is
> 70%.
> Among the 200 healthy animals (by the gold standard) a tester gets 180
> right and 20 wrong. Then your specificity is 90%.
> Plot each raters sensitivity/specificity pair on a scatterplot. You can
> also show confidence limits for these pairs using standard formulas for
> a confidence interval for a single proportion.
> The problem here, of course, is that you won't be able to rank the
> performance easily. Is someone who has sens=70% and spec=90% better or
> worse than someone who has sens=90% and spec=70%? If the cost of a false
> positive diagnosis and the cost of a false negative diagnosis are both
> the same, then it would not be too outrageous to compute an overall
> accuracy. For the first rater, it would be (35+180)/(50+200) = 86%. Note
> that this is not the same as the average of 70% and 90% but rather is
> weighted towards the specificity because of the greater number of
> healthy animals.
> I hope this helps.
> --
> Steve Simon, Standard Disclaimer
> Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
> "The first three steps in data entry, with examples in PASW/SPSS"
> Details at www.pmean.com/webinars
From: Ellen Ann
<<< I think the analysis is similar to other studies looking at performance indicators for hospitals or surgeons. They use logistic regression to calculate risk-adjusted predicted mortalities then compare the O/E ratio. I
will look at the O/E ratio but I am also interested in the absolute
number they are missing or overestimating so I will also look at O-E. I
tried using the proportions Obeserved and Expected then calculated a
confidence interval based on the difference, however, the resulting
limits were quite wide. I wasnt sure whether I should have tried to
incorporate the standard error from the model for the CI of the
expected proportion?
>>>>
The CIs for differences in proportions are very tricky. There's a big literature on this.
<<<<
Peter
- I am interested in both extremes of either missing healthy animals or
diagnosing healthy animals as diseased. You are also right in thinking
I have posted this problem before - I mailed the SAS list the previous
time and I thought I had a solution but I didnt.
>>>
Another problem with looking at the totals rather than the individual cases is that a person could be wrong on EVERY animal and still get the right total.
Let's say half the animals are diseased. Well, if someone called every diseased animal healthy and every healthy animal sick, he or she would get EXACTLY the right number of diseased animals.
In your model, what you are testing is NOT accuracy of diagnosis exactly, although it's partly that.
Peter
Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom
As I understand your postings, your ultimate aim is to rank the testers, using the O/E ratio. I agree that this is similar to the development of ranks for hospitals or surgeons, where they often use logistic regression. Two thoughts:
Sample size~Peter remembered 100 animals or so, per tester. For logistic regression, the most common rule of thumb is at least 10 of the least common events (deaths) per risk factor. You don't mention use of risk factors, but if the motality rate is low, this could lead to the need for a bigger sample size than you've got. A common indicator is the Hosmer-Lemeshow statistic, to test goodness of fit, and another is the c-index: together these can be used to assess the need for individual risk factors.
League tables~there is a societal need for league tables, but generally speaking they are not welcomed as much by statisticians as by the public who want them. One very good reason lies behind your question: the confidence intervals around any one school (for example) are so wide relative to the rankings that the position of that school could be anywhere over a wide range in the rankings. And a small change in the method of calculating the ranking (for example, in which risk factors are used), can take a school from the bottom area to the top area of the table.
I'm aware I haven't answered how to calculate the CI....I'm leaving that to the previous postings.
----- Original Message ----- From: Ellen Ann To: medstats@googlegroups.com Sent: Wednesday, November 04, 2009 10:19 AM
Subject: {MEDSTATS} Re: confidence interval calculation
Dear Peter and Steven
Thank you both very much for your helpful suggestions.
I agree with Peter that I am making the assumption that the model is a gold standard. I know it isn’t but I think it’s the best I have to predict which animals are probably diseased in order to rank the testers.
I think the analysis is similar to other studies looking at performance indicators for hospitals or surgeons. They use logistic regression to calculate risk-adjusted predicted mortalities then compare the O/E ratio. I will look at the O/E ratio but I am also interested in the absolute number they are missing or overestimating so I will also look at O-E. I tried using the proportions Obeserved and Expected then calculated a confidence interval based on the difference, however, the resulting limits were quite wide. I wasnt sure whether I should have tried to incorporate the standard error from the model for the CI of the expected proportion?
Peter - I am interested in both extremes of either missing healthy animals or diagnosing healthy animals as diseased. You are also right in thinking I have posted this problem before - I mailed the SAS list the previous time and I thought I had a solution but I didnt.
Steven - thank you for your thoughts on looking into sensitivity/specificity and the number correct. The problem with this method is that I have simply summed the predicted probabilities to calculate the expected number of diseased animals. Therefore I dont know exactly which ones are expected to be diseased. Maybe I should have used a cut-off value to assign animals to diseased/not?
Thanks once again for your time and effort
On Tue, Nov 3, 2009 at 6:23 PM, Steve Simon, P.Mean Consulting <n...@pmean.com> wrote:
Ellen Ann wrote:
> What I'm ultimately trying to do is compare the performance of several
> "testers" who are diagnosing a certain disease within animals. I have
> created a logistic model to predict whether an animal should be diseased
> or not based on well known (from previous studies) predictors. I now
> want to compare the observed number of diseased animals with the
> predicted number to be able to rank the testers in order of performance
> for diagnosing disease in the animals (the test is fairly subjective).
> I have summed the number of observed and predicted for each tester and I
> now want to calculate the confidence interval around the O-E, but I
> wasnt too sure how to do this. I hope this makes a little more sense?
Why not use sensitivity and specificity here?
Let the prediction of the logistic model be your "gold standard" of
disease. Let the results of an individual tester be a "diagnostic test".
Suppose the "gold standard" calls 50 animals as diseased. Among these 50
animals, the tester gets 35 right and 15 wrong. Then your sensitivity is
70%.
Among the 200 healthy animals (by the gold standard) a tester gets 180
right and 20 wrong. Then your specificity is 90%.
Plot each raters sensitivity/specificity pair on a scatterplot. You can
also show confidence limits for these pairs using standard formulas for
a confidence interval for a single proportion.
The problem here, of course, is that you won't be able to rank the
performance easily. Is someone who has sens=70% and spec=90% better or
worse than someone who has sens=90% and spec=70%? If the cost of a false
positive diagnosis and the cost of a false negative diagnosis are both
the same, then it would not be too outrageous to compute an overall
accuracy. For the first rater, it would be (35+180)/(50+200) = 86%. Note
that this is not the same as the average of 70% and 90% but rather is
weighted towards the specificity because of the greater number of
healthy animals.
I hope this helps.
--
Steve Simon, Standard Disclaimer
Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
"The first three steps in data entry, with examples in PASW/SPSS"
Details at www.pmean.com/webinars
As far as I can see, "logistic model to predict whether an animal should be diseased or not" produces a probability for each animal. And each tester produces a decision. Så for each tester you could draw up the ROC curve, the tester's outcome being the test, and the predicted probaility being the score.
> -----Original Message-----
> From: medstats@googlegroups.com > [mailto:medstats@googlegroups.com] On Behalf Of Steve Simon, > P.Mean Consulting > Sent: 3. november 2009 19:24
> To: medstats@googlegroups.com
> Subject: {MEDSTATS} Re: confidence interval calculation
> Ellen Ann wrote:
> > What I'm ultimately trying to do is compare the performance > of several > > "testers" who are diagnosing a certain disease within > animals. I have > > created a logistic model to predict whether an animal should be > > diseased or not based on well known (from previous studies) > > predictors. I now want to compare the observed number of diseased > > animals with the predicted number to be able to rank the testers in > > order of performance for diagnosing disease in the animals > (the test is fairly subjective).
> > I have summed the number of observed and predicted for each > tester and > > I now want to calculate the confidence interval around the > O-E, but I > > wasnt too sure how to do this. I hope this makes a little > more sense?
> Why not use sensitivity and specificity here?
> Let the prediction of the logistic model be your "gold > standard" of disease. Let the results of an individual tester > be a "diagnostic test".
> Suppose the "gold standard" calls 50 animals as diseased. > Among these 50 animals, the tester gets 35 right and 15 > wrong. Then your sensitivity is 70%.
> Among the 200 healthy animals (by the gold standard) a tester > gets 180 right and 20 wrong. Then your specificity is 90%.
> Plot each raters sensitivity/specificity pair on a > scatterplot. You can also show confidence limits for these > pairs using standard formulas for a confidence interval for a > single proportion.
> The problem here, of course, is that you won't be able to > rank the performance easily. Is someone who has sens=70% and > spec=90% better or worse than someone who has sens=90% and > spec=70%? If the cost of a false positive diagnosis and the > cost of a false negative diagnosis are both the same, then it > would not be too outrageous to compute an overall accuracy. > For the first rater, it would be (35+180)/(50+200) = 86%. > Note that this is not the same as the average of 70% and 90% > but rather is weighted towards the specificity because of the > greater number of healthy animals.
> I hope this helps.
> --
> Steve Simon, Standard Disclaimer
> Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
> "The first three steps in data entry, with examples in PASW/SPSS"
> Details at www.pmean.com/webinars
Thank you to everyone for your help with this. Just to answer a few
questions:
Bendix: I have 880 testers (who tested at least 10 animals - I'm not sure if
this is an appropriate cut-off?) so running individual ROC curves for each
tester would probably be impossible? However, I could run an overall ROC
curve (I think?) and come up with an appropriate cut-point to allocate
animals as diseased - but is this introducing an extra source of error? Or
is it much better than just summing totals because of the reasons Peter gave
earlier?
Martin: You queried the sample size. I have 1,833 cases and over 92,000
records. In the model I have 8 independent variables. I did not include
tester in the model because a lot of them have 0 diseased animals and this
caused problems. The method I have used (not including tester in the model)
is referred to in other papers as indirect standardisation.
I'm still having a few problems with producing appropriate confidence
interval. I found a paper that recommends using propagation of errors to
calculate CIs for O/E and they gave a sas programme on their site to run it
but the programme runs out of memory on my dataset!! Can anyone recommend
any good papers/books/links on how to calculate the appropriate variance for
O/E and maybe even O-E???
Thanks
On Wed, Nov 4, 2009 at 7:35 PM, BXC (Bendix Carstensen) <b...@steno.dk>wrote:
> As far as I can see, "logistic model to predict whether an animal should be
> diseased or not" produces a probability for each animal. And each tester
> produces a decision. Så for each tester you could draw up the ROC curve, the
> tester's outcome being the test, and the predicted probaility being the
> score.
> Best regards,
> Bendix
> > -----Original Message-----
> > From: medstats@googlegroups.com
> > [mailto:medstats@googlegroups.com] On Behalf Of Steve Simon,
> > P.Mean Consulting
> > Sent: 3. november 2009 19:24
> > To: medstats@googlegroups.com
> > Subject: {MEDSTATS} Re: confidence interval calculation
> > Ellen Ann wrote:
> > > What I'm ultimately trying to do is compare the performance
> > of several
> > > "testers" who are diagnosing a certain disease within
> > animals. I have
> > > created a logistic model to predict whether an animal should be
> > > diseased or not based on well known (from previous studies)
> > > predictors. I now want to compare the observed number of diseased
> > > animals with the predicted number to be able to rank the testers in
> > > order of performance for diagnosing disease in the animals
> > (the test is fairly subjective).
> > > I have summed the number of observed and predicted for each
> > tester and
> > > I now want to calculate the confidence interval around the
> > O-E, but I
> > > wasnt too sure how to do this. I hope this makes a little
> > more sense?
> > Why not use sensitivity and specificity here?
> > Let the prediction of the logistic model be your "gold
> > standard" of disease. Let the results of an individual tester
> > be a "diagnostic test".
> > Suppose the "gold standard" calls 50 animals as diseased.
> > Among these 50 animals, the tester gets 35 right and 15
> > wrong. Then your sensitivity is 70%.
> > Among the 200 healthy animals (by the gold standard) a tester
> > gets 180 right and 20 wrong. Then your specificity is 90%.
> > Plot each raters sensitivity/specificity pair on a
> > scatterplot. You can also show confidence limits for these
> > pairs using standard formulas for a confidence interval for a
> > single proportion.
> > The problem here, of course, is that you won't be able to
> > rank the performance easily. Is someone who has sens=70% and
> > spec=90% better or worse than someone who has sens=90% and
> > spec=70%? If the cost of a false positive diagnosis and the
> > cost of a false negative diagnosis are both the same, then it
> > would not be too outrageous to compute an overall accuracy.
> > For the first rater, it would be (35+180)/(50+200) = 86%.
> > Note that this is not the same as the average of 70% and 90%
> > but rather is weighted towards the specificity because of the
> > greater number of healthy animals.
> > I hope this helps.
> > --
> > Steve Simon, Standard Disclaimer
> > Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
> > "The first three steps in data entry, with examples in PASW/SPSS"
> > Details at www.pmean.com/webinars
Here it says that logistic regression and indirect standardisation are equivalent only with a fully saturated logistic model, that is with all main effects and all interactions....a bit much with 8 independent variables. The second link discusses the pros/cons of removal of (interaction) terms from the model. And whether using logistic regression is better than the 'manual' method.
Both methods are fully explained and each leads to an SMR....Observed/Expected scaled up by 100.
Googling leads to many ways of calculating a confidence interval for the SMR. This link, eg
----- Original Message ----- From: Ellen Ann To: medstats@googlegroups.com Sent: Thursday, November 05, 2009 10:33 AM
Subject: {MEDSTATS} Re: confidence interval calculation
Thank you to everyone for your help with this. Just to answer a few questions:
Bendix: I have 880 testers (who tested at least 10 animals - I'm not sure if this is an appropriate cut-off?) so running individual ROC curves for each tester would probably be impossible? However, I could run an overall ROC curve (I think?) and come up with an appropriate cut-point to allocate animals as diseased - but is this introducing an extra source of error? Or is it much better than just summing totals because of the reasons Peter gave earlier?
Martin: You queried the sample size. I have 1,833 cases and over 92,000 records. In the model I have 8 independent variables. I did not include tester in the model because a lot of them have 0 diseased animals and this caused problems. The method I have used (not including tester in the model) is referred to in other papers as indirect standardisation.
I'm still having a few problems with producing appropriate confidence interval. I found a paper that recommends using propagation of errors to calculate CIs for O/E and they gave a sas programme on their site to run it but the programme runs out of memory on my dataset!! Can anyone recommend any good papers/books/links on how to calculate the appropriate variance for O/E and maybe even O-E???
Thanks
On Wed, Nov 4, 2009 at 7:35 PM, BXC (Bendix Carstensen) <b...@steno.dk> wrote:
As far as I can see, "logistic model to predict whether an animal should be
diseased or not" produces a probability for each animal. And each tester produces a decision. Så for each tester you could draw up the ROC curve, the tester's outcome being the test, and the predicted probaility being the score.
Best regards,
Bendix
> -----Original Message-----
> From: medstats@googlegroups.com
> [mailto:medstats@googlegroups.com] On Behalf Of Steve Simon,
> P.Mean Consulting
> Sent: 3. november 2009 19:24
> To: medstats@googlegroups.com
> Subject: {MEDSTATS} Re: confidence interval calculation
> Ellen Ann wrote:
> > What I'm ultimately trying to do is compare the performance
> of several
> > "testers" who are diagnosing a certain disease within
> animals. I have
> > created a logistic model to predict whether an animal should be
> > diseased or not based on well known (from previous studies)
> > predictors. I now want to compare the observed number of diseased
> > animals with the predicted number to be able to rank the testers in
> > order of performance for diagnosing disease in the animals
> (the test is fairly subjective).
> > I have summed the number of observed and predicted for each
> tester and
> > I now want to calculate the confidence interval around the
> O-E, but I
> > wasnt too sure how to do this. I hope this makes a little
> more sense?
> Why not use sensitivity and specificity here?
> Let the prediction of the logistic model be your "gold
> standard" of disease. Let the results of an individual tester
> be a "diagnostic test".
> Suppose the "gold standard" calls 50 animals as diseased.
> Among these 50 animals, the tester gets 35 right and 15
> wrong. Then your sensitivity is 70%.
> Among the 200 healthy animals (by the gold standard) a tester
> gets 180 right and 20 wrong. Then your specificity is 90%.
> Plot each raters sensitivity/specificity pair on a
> scatterplot. You can also show confidence limits for these
> pairs using standard formulas for a confidence interval for a
> single proportion.
> The problem here, of course, is that you won't be able to
> rank the performance easily. Is someone who has sens=70% and
> spec=90% better or worse than someone who has sens=90% and
> spec=70%? If the cost of a false positive diagnosis and the
> cost of a false negative diagnosis are both the same, then it
> would not be too outrageous to compute an overall accuracy.
> For the first rater, it would be (35+180)/(50+200) = 86%.
> Note that this is not the same as the average of 70% and 90%
> but rather is weighted towards the specificity because of the
> greater number of healthy animals.
> I hope this helps.
> --
> Steve Simon, Standard Disclaimer
> Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
> "The first three steps in data entry, with examples in PASW/SPSS"
> Details at www.pmean.com/webinars