Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Sample Size

0 views
Skip to first unread message

Proginoskes

unread,
Nov 10, 2007, 1:50:52 AM11/10/07
to
On Nov 9, 1:27 pm, Gregory <> wrote:
> I have a bundle of files that I can be assessed as being either YES or
> NO.
>
> Sometimes there are 300.
>
> Sometimes there are 2000
>
> Sometimes there are 10000
>
> How do you work out how many you need to sample to give an assessment
> of the lot + or - x% accuracy? Is there a standard calculation? Does
> anyone know of any websites that explain all that?

You're looking for some kind of statistical method. I suspect this is
a basic question. (Cross-posted to sci.stat.math for this reason;
someone there can probably give you an answer without having to look
it up.)

--- Christopher Heckman

Nasser Abbasi

unread,
Nov 10, 2007, 7:46:34 AM11/10/07
to

"Proginoskes" <CCHe...@gmail.com> wrote in message
news:1194677452....@v29g2000prd.googlegroups.com...

I think you can use the confidence interval for population propertion for
this problem.

The formula is this,

let n=sample size (>=20)
let p=propertion of YES to NO in the sample picked.

Then the propertion of YES to NO in the population is (with 95%
confidence) will be within

p +- 1.96 * Sqrt( p*(1-p)/n )

So to make p closer to the true population propertion, make n larger. The
large n is, the smaller the confidence interval is.

Nasser


Luis A. Afonso

unread,
Nov 10, 2007, 9:54:02 AM11/10/07
to
Date: Nov 10, 2007 7:46 AM
Author: Nasser Abbasi
Subject: Re: Sample Size

**************************************

OK!
One thing hat wasn’t taken into account: the normal approximation behind the expression is pretty good as long as
________________n*p >5____n*(1-p)>5
which, for n=20, limits the proportion:
___________0.25 < p < 0.75
Large n these limits approach simultaneously to 0 and 1.


________

Luis A. Afonso

John Smith

unread,
Nov 10, 2007, 9:43:37 PM11/10/07
to
Luis,

I have missed you. Why did you not post one of your stupid basic programs so that I could make fun of you.

I do not think you have ever answered a question with writing a stupid basic program.

Why don't you simulate your answer? Then I can have some fun.

Welcome back!

Let's have some fun.

John

Nasser Abbasi

unread,
Nov 11, 2007, 1:21:17 AM11/11/07
to

"Proginoskes" <CCHe...@gmail.com> wrote in message
news:1194677452....@v29g2000prd.googlegroups.com...
> On Nov 9, 1:27 pm, Gregory <> wrote:
>> I have a bundle of files that I can be assessed as being either YES or
>> NO.
>>
>> Sometimes there are 300.
>>
>> Sometimes there are 2000
>>
>> Sometimes there are 10000
>>
>> How do you work out how many you need to sample to give an assessment
>> of the lot + or - x% accuracy? Is there a standard calculation? Does
>> anyone know of any websites that explain all that?

I've looked more into this, this shows how to solve for the sample size
itself.

You need to provide the margin of error you are willing to accept. Suppose
you want the estimate of the population propertion to be within say 0.1 of
the true proportion. i.e. you want to be within +- 10% of the actual
proportion in the population. Let this value be called B. Hence B=0.1

Next you need to provide the confidence level you want in this estimate
being within the above margin of error. Say you want to in the 95%
confidence level.

Next, assume that p=0.5 (This gives the largest sample size n, this value
will make it certain the sample size you obtain will be sufficient)

Hence the procedure to find n, is to simply solve for n from the following
equation

B =1.96 * SQRT( p^2 / n )

For example, for B=0.1, p=0.5, sample size n = 97

If you want the confidence level to be 90% you are within this margin of
error, then replace 1.96 about with 1.645, and you'll get sample size is 68.
So you see, sample size will become smaller if you are willing to reduce
your confidence level.

If you want to get 95% confidence level, but want smaller margin of error,
say B=0.07, then sample size will be 196

etc..

Nasser


Mike

unread,
Nov 11, 2007, 9:50:15 PM11/11/07
to
In article <TKhZi.3260$pr6....@newsfe06.phx>, n...@12000.org says...
...and note that the assesment is independant of the population size as long as:
1) the population size is sufficient to supply n samples,
2) the samples can be chosen 'randomly' from the population.

Mike

Allen McIntosh

unread,
Nov 11, 2007, 11:38:10 PM11/11/07
to
[I'm directing followups away from alt.math.recreational]

Mike wrote:
> In article <TKhZi.3260$pr6....@newsfe06.phx>, n...@12000.org says...

>> let n=sample size (>=20)

>> let p=proportion of YES to NO in the sample picked.


>> Then the propertion of YES to NO in the population is (with 95%
>> confidence) will be within
>> p +- 1.96 * Sqrt( p*(1-p)/n )
>> So to make p closer to the true population propertion, make n larger. The
>> large n is, the smaller the confidence interval is.

> ...and note that the assesment is independent of the population size as long as:


> 1) the population size is sufficient to supply n samples,
> 2) the samples can be chosen 'randomly' from the population.

To elaborate a little on what Mike says, and to mention a few assumptions:

1) The formula 1.96*sqrt(...) uses a Normal approximation that doesn't
work well if n*p is too small

2) All this assumes that the OP is sampling without replacement

3) As Mark points out the OP is sampling from a finite population, so
that 1.96*sqrt(...) should really be multiplied by sqrt(1-n/N) where n
is the sample size and N is the population size. This term matters when
the sample is a significant fraction of the population, and matches the
intuition that when the sample is the entire population, there is no
uncertainty in the result

Richard Ulrich

unread,
Nov 12, 2007, 8:28:06 PM11/12/07
to
On Sun, 11 Nov 2007 23:38:10 -0500, Allen McIntosh
<nos...@mouse-potato.com> wrote:

[snip]


>
> To elaborate a little on what Mike says, and to mention a few assumptions:
>
> 1) The formula 1.96*sqrt(...) uses a Normal approximation that doesn't
> work well if n*p is too small
>
> 2) All this assumes that the OP is sampling without replacement
>
> 3) As Mark points out the OP is sampling from a finite population, so
> that 1.96*sqrt(...) should really be multiplied by sqrt(1-n/N) where n
> is the sample size and N is the population size. This term matters when
> the sample is a significant fraction of the population, and matches the
> intuition that when the sample is the entire population, there is no
> uncertainty in the result

There's "no uncertainty in the result" if it is a vote.

There's the same uncertainty as always if the count is
taken as representative of similar groups.

Every mention of the Finite Sampling Fraction should
include the advice that it is almost never appropriate, if you
are not predicting the vote that has yet to be tabulated on
Election Evening.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

John Smith

unread,
Nov 20, 2007, 3:19:27 PM11/20/07
to
Luisa,

Still can't answer a simple question, can you?

John

PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?

Luis A. Afonso

unread,
Nov 20, 2007, 3:29:06 PM11/20/07
to
N(0,1)

N(0,2)


N(0,3)


**** Date: Nov 20, 2007 3:19 PM
Author: John Smith
Subject: Re: Sample Size

Luisa,

Still can't answer a simple question, can you?

John

PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?****


Jean, Joan

In what concern STUPIDITY I found no difference between you and Jackie, Jacqueline.
****

Luis Amaral Afonso

Luis A. Afonso

unread,
Nov 20, 2007, 3:30:16 PM11/20/07
to

John Smith

unread,
Nov 20, 2007, 4:17:46 PM11/20/07
to
Luisa,

I wrote:
PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?****


you wrote: In what concern STUPIDITY I found no difference between you and Jackie, Jacqueline.
****

It's obvious you can't answer a simple statistics question, but can't you follow instructions? I said "some who knows English"; that obviously excludes you.

John

Luis A. Afonso

unread,
Nov 21, 2007, 3:40:31 AM11/21/07
to
The PROOF that John Smith knows not what is saying

see a new thread I posed

Luis Amaral Afonso

0 new messages