Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
confidence interval calculation
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ellen Ann  
View profile  
 More options Nov 3, 10:45 am
From: Ellen Ann <ellenst...@gmail.com>
Date: Tue, 3 Nov 2009 15:45:28 +0000
Local: Tues, Nov 3 2009 10:45 am
Subject: confidence interval calculation

Dear all,

I am looking for help to calculate the confidence interval around Observed -
Expected.  My expected values come from a logistic regression model.  Any
suggestions would be most appreciated.


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "{MEDSTATS} confidence interval calculation" by Ted Harding
Ted Harding  
View profile  
 More options Nov 3, 11:57 am
From: (Ted Harding) <Ted.Hard...@manchester.ac.uk>
Date: Tue, 03 Nov 2009 16:57:22 -0000 (GMT)
Local: Tues, Nov 3 2009 11:57 am
Subject: RE: {MEDSTATS} confidence interval calculation
On 03-Nov-09 15:45:28, Ellen Ann wrote:

> Dear all,
> I am looking for help to calculate the confidence interval around
> Observed - Expected.  My expected values come from a logistic
> regression model.
> Any suggestions would be most appreciated.

There is a problem, of some kind, with your question!

Since it is a logistic regression model, the observed outcome will
be 0 or 1 (though if you have several cases at the same values of
the covariates these can be grouped into a binomial outcome of
'r' 1s out of 'n' cases, but you do not describe your data enough
to make it clear which is the case).

The expected outcome, at a given set of covariate values, will be the
fitted probability P of Outcome=1, for some value of P with 0 < P < 1.

So (Observed - Expected) = (1 - P), or (0 - P) = -P.

Since this can have only two values, the concept of a "confidence
interval" for it is a bit elusive. There is, of course, uncertainty
in the value of P, which can be expressed as a standard error for P
(which any decent statistical software should be able to supply).
So there is certainly a confidence interval for P; but the binary
nature of the Observed still interferes with getting one for the
(Observed - Expected).

It is not like a standard linear regression, where the predicted
(Expected) value has uncertainty and varies continuously, and the
Outcome is distributed about the true mean with (usually) a Normal
distribution, so that (Oberved - Expected) is a continuous variable
for which one can certainly calculate a "confidence interval"
(combining the uncertainty about the Expected value with the random
distribution of the Observed value).

There is also the question: What is the interest in calculating
a "confidence interval" for (Observed - Expected) anyway?

If you told us more about what is going on, it may be possible
to give a properly targeted answer.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Nov-09                                       Time: 16:57:20
------------------------------ XFMail ------------------------------


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "{MEDSTATS} Re: confidence interval calculation" by Ellen Ann
Ellen Ann  
View profile  
 More options Nov 3, 12:10 pm
From: Ellen Ann <ellenst...@gmail.com>
Date: Tue, 3 Nov 2009 17:10:29 +0000
Local: Tues, Nov 3 2009 12:10 pm
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Sorry for not giving enough details, I was trying to be brief but I was
probably too brief.

What I'm ultimately trying to do is compare the performance of several
"testers" who are diagnosing a certain disease within animals.  I have
created a logistic model to predict whether an animal should be diseased or
not based on well known (from previous studies) predictors.  I now want to
compare the observed number of diseased animals with the predicted number to
be able to rank the testers in order of performance for diagnosing disease
in the animals (the test is fairly subjective).  I have summed the number of
observed and predicted for each tester and I now want to calculate the
confidence interval around the O-E, but I wasnt too sure how to do this.  I
hope this makes a little more sense?

Thanks for your help.

On Tue, Nov 3, 2009 at 4:57 PM, Ted Harding <Ted.Hard...@manchester.ac.uk>wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Flom  
View profile  
 More options Nov 3, 12:34 pm
From: Peter Flom <peterflomconsult...@mindspring.com>
Date: Tue, 3 Nov 2009 12:34:19 -0500 (GMT-05:00)
Local: Tues, Nov 3 2009 12:34 pm
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Ellen Ann  wrote

<<
Sorry for not giving enough details, I was trying to be brief but I was probably too brief.

What I'm ultimately trying to do is compare the performance of several "testers" who are diagnosing a certain disease within animals.  I have created a logistic model to predict whether an animal should be diseased or not based on well known (from previous studies) predictors.  I now want to compare the observed number of diseased animals with the predicted number to be able to rank the testers in order of performance for diagnosing disease in the animals (the test is fairly subjective).  I have summed the number of observed and predicted for each tester and I now want to calculate the confidence interval around the O-E, but I wasnt too sure how to do this.  I hope this makes a little more sense?


I think this is a continuation of an earlier post of yours.

If I recall correctly, you had several testers, each testing different animals - I think it was a hundred or som animals per tester.  

You then say that you have a logistic regression program that predicts the number of diseased animals.

so, you now have something like this

Tester    Predicted by model    Diagnosed by tester
 1              45                      52
 2              51                      57
 3              42                      38

etc.

Is that right?

And you are saying that the numbers -7, -6 and +4 are indicative of the skill of the testers.

This may be, I think, a mistake.

First, it assumes that the prediction of the model is always correct.  If you have such a model, then
why do you need testers who use a subjective test?

Second, it assumes that calling a sick animal healthy is equivalent to calling a healthy animal sick ... I am not sure if this is sensible.

But, assuming that these values ARE what you want, I think the way to go is NOT to try to find the SE of O-E, but to turn these numbers into proportions, and then compare the proportions.

Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "{MEDSTATS} confidence interval calculation" by Steve Simon, P.Mean Consulting
Steve Simon, P.Mean Consulting  
View profile  
 More options Nov 3, 1:23 pm
From: "Steve Simon, P.Mean Consulting " <n...@pmean.com>
Date: Tue, 03 Nov 2009 12:23:58 -0600
Local: Tues, Nov 3 2009 1:23 pm
Subject: Re: {MEDSTATS} confidence interval calculation

Ellen Ann wrote:
> What I'm ultimately trying to do is compare the performance of several
> "testers" who are diagnosing a certain disease within animals.  I have
> created a logistic model to predict whether an animal should be diseased
> or not based on well known (from previous studies) predictors.  I now
> want to compare the observed number of diseased animals with the
> predicted number to be able to rank the testers in order of performance
> for diagnosing disease in the animals (the test is fairly subjective).  
> I have summed the number of observed and predicted for each tester and I
> now want to calculate the confidence interval around the O-E, but I
> wasnt too sure how to do this.  I hope this makes a little more sense?

Why not use sensitivity and specificity here?

Let the prediction of the logistic model be your "gold standard" of
disease. Let the results of an individual tester be a "diagnostic test".

Suppose the "gold standard" calls 50 animals as diseased. Among these 50
animals, the tester gets 35 right and 15 wrong. Then your sensitivity is
70%.

Among the 200 healthy animals (by the gold standard) a tester gets 180
right and 20 wrong. Then your specificity is 90%.

Plot each raters sensitivity/specificity pair on a scatterplot. You can
also show confidence limits for these pairs using standard formulas for
a confidence interval for a single proportion.

The problem here, of course, is that you won't be able to rank the
performance easily. Is someone who has sens=70% and spec=90% better or
worse than someone who has sens=90% and spec=70%? If the cost of a false
positive diagnosis and the cost of a false negative diagnosis are both
the same, then it would not be too outrageous to compute an overall
accuracy. For the first rater, it would be (35+180)/(50+200) = 86%. Note
that this is not the same as the average of 70% and 90% but rather is
weighted towards the specificity because of the greater number of
healthy animals.

I hope this helps.
--
Steve Simon, Standard Disclaimer
Second free statistics webinar, Wed, Nov 4, 11am-noon CST.
"The first three steps in data entry, with examples in PASW/SPSS"
Details at www.pmean.com/webinars


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "{MEDSTATS} Re: confidence interval calculation" by Ellen Ann
Ellen Ann  
View profile  
 More options Nov 4, 5:19 am
From: Ellen Ann <ellenst...@gmail.com>
Date: Wed, 4 Nov 2009 10:19:53 +0000
Local: Wed, Nov 4 2009 5:19 am
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Dear Peter and Steven

Thank you both very much for your helpful suggestions.

I agree with Peter that I am making the assumption that the model is a gold
standard.  I know it isn’t but I think it’s the best I have to predict which
animals are probably diseased in order to rank the testers.

I think the analysis is similar to other studies looking at performance
indicators for hospitals or surgeons.  They use logistic regression to
calculate risk-adjusted predicted mortalities then compare the O/E ratio.  I
will look at the O/E ratio but I am also interested in the absolute number
they are missing or overestimating so I will also look at O-E.  I tried
using the proportions Obeserved and Expected then calculated a confidence
interval based on the difference, however, the resulting limits were quite
wide. I wasnt sure whether I should have tried to incorporate the standard
error from the model for the CI of the expected proportion?

Peter - I am interested in both extremes of either missing healthy animals
or diagnosing healthy animals as diseased.  You are also right in thinking I
have posted this problem before - I mailed the SAS list the previous time
and I thought I had a solution but I didnt.

Steven - thank you for your thoughts on looking into sensitivity/specificity
and the number correct.  The problem with this method is that I have simply
summed the predicted probabilities to calculate the expected number of
diseased animals.  Therefore I dont know exactly which ones are expected to
be diseased.  Maybe I should have used a cut-off value to assign animals to
diseased/not?

Thanks once again for your time and effort

On Tue, Nov 3, 2009 at 6:23 PM, Steve Simon, P.Mean Consulting <


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Flom  
View profile  
 More options Nov 4, 6:40 am
From: Peter Flom <peterflomconsult...@mindspring.com>
Date: Wed, 4 Nov 2009 06:40:14 -0500 (GMT-05:00)
Local: Wed, Nov 4 2009 6:40 am
Subject: Re: {MEDSTATS} Re: confidence interval calculation
From: Ellen Ann
<<<
I think the analysis is similar to other studies looking at performance indicators for hospitals or surgeons.  They use logistic regression to calculate risk-adjusted predicted mortalities then compare the O/E ratio.  I will look at the O/E ratio but I am also interested in the absolute number they are missing or overestimating so I will also look at O-E.  I tried using the proportions Obeserved and Expected then calculated a confidence interval based on the difference, however, the resulting limits were quite wide. I wasnt sure whether I should have tried to incorporate the standard error from the model for the CI of the expected proportion?
>>>>

The CIs for differences in proportions are very tricky.  There's a big literature on this. 


<<<<
Peter - I am interested in both extremes of either missing healthy animals or diagnosing healthy animals as diseased.  You are also right in thinking I have posted this problem before - I mailed the SAS list the previous time and I thought I had a solution but I didnt.
>>>

Another problem with looking at the totals rather than the individual cases is that a person could be wrong on EVERY animal and still get the right total.

Let's say half the animals are diseased.  Well, if someone called every diseased animal healthy and every healthy animal sick, he or she would get EXACTLY the right number of diseased animals.

In your model, what you are testing is NOT accuracy of diagnosis exactly, although it's partly that. 



Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Holt  
View profile  
 More options Nov 4, 6:44 am
From: "Martin Holt" <m861h...@btinternet.com>
Date: Wed, 4 Nov 2009 11:44:46 -0000
Local: Wed, Nov 4 2009 6:44 am
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Hi Ellen,

As I understand your postings, your ultimate aim is to rank the testers, using the O/E ratio. I agree that this is similar to the development of ranks for hospitals or surgeons, where they often use logistic regression. Two thoughts:

Sample size~Peter remembered 100 animals or so, per tester. For logistic regression, the most common rule of thumb is at least 10 of the least common events (deaths) per risk factor. You don't mention use of risk factors, but if the motality rate is low, this could lead to the need for a bigger sample size than you've got. A common indicator is the Hosmer-Lemeshow statistic, to test goodness of fit, and another is the c-index: together these can be used to assess the need for individual risk factors.

League tables~there is a societal need for league tables, but generally speaking they are not welcomed as much by statisticians as by the public who want them. One very good reason lies behind your question: the confidence intervals around any one school (for example) are so wide relative to the rankings that the position of that school could be anywhere over a wide range in the rankings. And a small change in the method of calculating the ranking (for example, in which risk factors are used), can take a school from the bottom area to the top area of the table.

I'm aware I haven't answered how to calculate the CI....I'm leaving that to the previous postings.

Best Wishes,
Martin Holt


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
BXC (Bendix Carstensen)  
View profile  
 More options Nov 4, 2:35 pm
From: "BXC (Bendix Carstensen)" <b...@steno.dk>
Date: Wed, 4 Nov 2009 20:35:26 +0100
Local: Wed, Nov 4 2009 2:35 pm
Subject: RE: {MEDSTATS} Re: confidence interval calculation
As far as I can see, "logistic model to predict whether an animal should be
diseased or not" produces a probability for each animal. And each tester produces a decision. Så for each tester you could draw up the ROC curve, the tester's outcome being the test, and the predicted probaility being the score.

Best regards,
Bendix


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ellen Ann  
View profile  
 More options Nov 5, 5:33 am
From: Ellen Ann <ellenst...@gmail.com>
Date: Thu, 5 Nov 2009 10:33:14 +0000
Local: Thurs, Nov 5 2009 5:33 am
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Thank you to everyone for your help with this.  Just to answer a few
questions:

Bendix: I have 880 testers (who tested at least 10 animals - I'm not sure if
this is an appropriate cut-off?) so running individual ROC curves for each
tester would probably be impossible?  However, I could run an overall ROC
curve (I think?) and come up with an appropriate cut-point to allocate
animals as diseased - but is this introducing an extra source of error?  Or
is it much better than just summing totals because of the reasons Peter gave
earlier?

Martin: You queried the sample size.  I have 1,833 cases and over 92,000
records.  In the model I have 8 independent variables.  I did not include
tester in the model because a lot of them have 0 diseased animals and this
caused problems.  The method I have used (not including tester in the model)
is referred to in other papers as indirect standardisation.

I'm still having a few problems with producing appropriate confidence
interval.  I found a paper that recommends using propagation of errors to
calculate CIs for O/E and they gave a sas programme on their site to run it
but the programme runs out of memory on my dataset!!  Can anyone recommend
any good papers/books/links on how to calculate the appropriate variance for
O/E and maybe even O-E???

Thanks

On Wed, Nov 4, 2009 at 7:35 PM, BXC (Bendix Carstensen) <b...@steno.dk>wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Holt  
View profile  
 More options Nov 5, 1:57 pm
From: "Martin Holt" <m861h...@btinternet.com>
Date: Thu, 5 Nov 2009 18:57:09 -0000
Local: Thurs, Nov 5 2009 1:57 pm
Subject: Re: {MEDSTATS} Re: confidence interval calculation

Hi Ellen,

I'd like to refer you to this link:

http://www.indicators.scot.nhs.uk/Trends_Jan_2009/Standard.htm

and to this link

http://www.indicators.scot.nhs.uk/Work/CISTWorkingPaper3July13th20011...

Here it says that logistic regression and indirect standardisation are equivalent only with a fully saturated logistic model, that is with all main effects and all interactions....a bit much with 8 independent variables. The second link discusses the pros/cons of removal of (interaction) terms from the model. And whether using logistic regression is better than the 'manual' method.

Both methods are fully explained and each leads to an SMR....Observed/Expected scaled up by 100.

Googling leads to many ways of calculating a confidence interval for the SMR. This link, eg

http://www.openepi.com/SMR/SMR.htm

I hope that this helps.

Best Wishes,

Martin Holt


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google