confidence interval for proportions with not-normally distributed data

88 views
Skip to first unread message

ak47schumann

unread,
Mar 5, 2013, 4:26:09 AM3/5/13
to statforli...@googlegroups.com
Hi,

I would like to ask about how to model a confidence interval for a proportion on not-normally distributed data. Basically, what I want to do is to estimate the maximum proportion of a certain value in my text corpus. I think this can can be done by setting the upper border of a confidence interval.
I have a proportion of 0.014 on N=500 samples. According to a formula given in Bourier: Wahrscheinlichkeitsrechnung und schließende Statistik (German textbook on statistics), this is not a normal distribution since n*P*(1-P) = 500*0.014*0.986 < 9. The book goes on saying that in such cases the confidence interval can be determined by using the f distribution, giving formulas for calculating the respective dfs and so on.

In the English version of Statistics for Linguistics I found a brief description of the method for normally distributed data. But is there a way to calculate the confidence interval for not-normal data using R?

Thanks and cheers,
anne

Stefan Th. Gries

unread,
Mar 5, 2013, 11:31:42 AM3/5/13
to statforli...@googlegroups.com
Check out <http://cran.r-project.org/web/packages/binom/index.html>

Cheers,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries
-----------------------------------------------

Anne Schumann

unread,
Mar 6, 2013, 5:10:54 AM3/6/13
to statforli...@googlegroups.com
Stefan, thank you for this suggestion. However, it is not clear to me how the binomial distribution relates to my case?
A totally different question I have is about Yates continuity correction. What does it do? Are there restrictions for its use? Can I use it on data sets where I would normally use Fisher's test (expected freqs < 5)?

Best,
anne


2013/3/5 Stefan Th. Gries <stg...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Stefan Th. Gries

unread,
Mar 6, 2013, 12:37:27 PM3/6/13
to statforli...@googlegroups.com
> Stefan, thank you for this suggestion. However, it is not clear to me how the binomial distribution relates to my case?
Uh ... You have a proportion, you say, so usually such proportions are
seen as the proportion of successes out of all successes and failures,
and then the probability of any one ration of successes and its
confidence interval are computed based on the binomial distribution.
Now in your case, your heuristic says that this may be problematic
because too high/low success rates may cause problems with the normal
approximation so I sent you a reference to a packages that uses
corrections for this.

> A totally different question I have is about Yates continuity correction. What does it do?
First sentence of
<http://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity> ...

> Are there restrictions for its use? Can I use it on data sets where I would normally use Fisher's test (expected freqs < 5)?
With the Fisher-Yates exact test, you DON'T want to use it; FYE is an
exact test.
Reply all
Reply to author
Forward
0 new messages