Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sampling From Finite Population with Replacement

68 views
Skip to first unread message

Cagdas Ozgenc

unread,
Sep 24, 2010, 6:20:50 AM9/24/10
to
In statistics text books it is proposed that sampling from a finite
population with replacement is equivalent to sampling from an infinite
population. I find this somewhat misleading.

Suppose that we have a population of size N generated by random
variable Normal(MeanM, StdDevM). Then take samples of size n < N from
this population and calculate average (let's call it MeanS).

MeanS = (1/n)*sum of samples

There is no way you can estimate MeanM in an unbiased fashion. You can
only estimate population mean (let's call it MeanP) which is not
equal to MeanM, the mean of random variable that generated the
population.

Is my thinking flawed? Or do we always infer about an hypothetical
infinite population?

Greg Heath

unread,
Sep 25, 2010, 5:42:45 PM9/25/10
to
On Sep 24, 6:20 am, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote:
> In statistics text books it is proposed that sampling from a finite
> population with replacement is equivalent to sampling from an infinite
> population. I find this somewhat misleading.

I doubt that any book proposes that. My impression
is that if you have a random sample containing N
measurements, mean, variance and other estimates
obtained using B random samples of size N with
replacement are aymptotically superior to the
estimate obtained from the original sample.

See
http://en.wikipedia.org/wiki/Bootstrapping_(statistics)

Hope this helps.

Greg

Luis A. Afonso

unread,
Sep 25, 2010, 6:58:01 PM9/25/10
to
Yes. It is unacceptable that:(quote)
" In statistics text books it is proposed that sampling from a finite population with replacement is equivalent to sampling from an infinite population. "
In fact, as Greg pointed out from ONE SINGLE SAMPLE one can obtain a set of Bootstrap samples able to provide parameters estimates of the Population.
HOWEVER this set, because it is constructed from the same items. is NEVER EQUIVALENT to a set of samples DIRECTLY drawn from the Population.

Rich Ulrich

unread,
Sep 25, 2010, 7:25:59 PM9/25/10
to
On Fri, 24 Sep 2010 03:20:50 -0700 (PDT), Cagdas Ozgenc
<cagdas...@gmail.com> wrote:

>In statistics text books it is proposed that sampling from a finite
>population with replacement is equivalent to sampling from an infinite
>population. I find this somewhat misleading.
>
>Suppose that we have a population of size N generated by random
>variable Normal(MeanM, StdDevM). Then take samples of size n < N from
>this population and calculate average (let's call it MeanS).
>
>MeanS = (1/n)*sum of samples
>
>There is no way you can estimate MeanM in an unbiased fashion.

Where do you see "bias"? I think you need to check on that word.

> You can
>only estimate population mean (let's call it MeanP) which is not
>equal to MeanM, the mean of random variable that generated the
>population.

This population mean is the best "unbiased estimate" of the
generating mean that you can have here.

Where do you get the notion that an unbiased estimatore
has zero error? It is supposed to be zero "on the average".

It is convenient for us that in many cases, the easiest unbiased
estimate of something in particular is smaller than any of
the biased estimates, as well as being generally convenient.

On the other hand, you can divide either by N, (N-1) or
(N+1) to get three different estimates of the variance
the normal, each of which has its uses. (N-1) gives
unbiased. I think it is (N+1) that gives minimum variance
for the estimate.

>
>Is my thinking flawed? Or do we always infer about an hypothetical
>infinite population?

If we are doing experimental science that is intended as
inferential, there is a future that we point to. For those
cases, there is an infinite population. That's the only case
that most of us ever need to worry about.

When we are predicting the final election returns from
the 10 p.m. returns that include 50% of the precincts,
and using previously known patterns, the N is not infinite.

--
Rich Ulrich

Ray Koopman

unread,
Sep 25, 2010, 8:16:33 PM9/25/10
to

Dividing by N+1 minimizes the expected squared error
in the estimated variance.

Ray Koopman

unread,
Sep 25, 2010, 8:39:12 PM9/25/10
to
On Sep 25, 5:16 pm, Ray Koopman <koop...@sfu.ca> wrote:
>
> Dividing by N+1 minimizes the expected squared error
> in the estimated variance.

Sorry, I should have been more specific. N+1 is for samples from
a mesokurtic distribution such as the normal, with mu4/mu2^2 = 3.
In general, the divisor that minimizes the expected squared error
is N + k-2 - (k-3)/N, where k = mu4/mu2^2.

Bruce Weaver

unread,
Sep 26, 2010, 12:04:44 PM9/26/10
to
On 24/09/2010 6:20 AM, Cagdas Ozgenc wrote:
> In statistics text books it is proposed that sampling from a finite
> population with replacement is equivalent to sampling from an infinite
> population. I find this somewhat misleading.
>


I suspect what the book or books are getting at is that the distinction
between samples drawn with and without replacement becomes less
important as the population size increases. Let N be the population
size, and p the probability of being sampled. When one draws a simple
random sample *with replacement*, p = 1/N for every member of the
population on every draw. When one samples *without replacement*,
though, the denominator decreases by 1 after each draw. So p = 1/N on
the first draw. For those not yet drawn, p = 1/(N-1) on the second draw,
1/(N-2) on the third draw, etc. Once a member has been drawn, p = 0
thereafter.

When the population size is large enough, reducing the denominator by 1
each time has a tiny impact on the probability of being drawn; and
likewise the probability of being drawn at any point is very close to 0.
So, when the sample size is large enough, the distinction between
sampling with and without replacement becomes a pretty fine one. Here
are some typical notes on all of this:

http://www.ma.utexas.edu/users/parker/sampling/repl.htm

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Luis A. Afonso

unread,
Sep 27, 2010, 6:04:16 PM9/27/10
to
On 24/09/2010 6:20 AM, Cagdas Ozgenc wrote:
> In statistics text books it is proposed that sampling from a finite
> population with replacement is equivalent to sampling from an infinite
> population. I find this somewhat misleading.
>
____

It can be understood only (exclusively) that from a finite
Population one can draw WITH REPLACEMENT as samples as we wish exactly we do from an infinite Population.

Cagdas Ozgenc

unread,
Sep 28, 2010, 7:50:05 AM9/28/10
to wpi...@pitt.edu
On Sep 26, 3:25 am, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Fri, 24 Sep 2010 03:20:50 -0700 (PDT), Cagdas Ozgenc
>

Here is what I am trying to say. Take any statistics book you will
find a statment that starts something like the following:

"You have a population of size N with elements normally distributed
with Mu and Sigma. If we sample from this population with
replacement..." Then they continue calculating population mean and
variance, and then claim that Expected Value of Sample mean is equal
to Mu.

The point I read that it gives me the creeps. First of all normal
distribution is a model. Yes sample mean will give an unbiased
estimate of the population mean (which is a population parameter not a
model parameter). But on average it will not be Mu. Sampling with
replacement from a finite population will not give an unbiased
estimation of the model paramaters. Either I am not reading my books
carefully or this issue is somehow swept under the rug.

In the infinite population case my understanding is that population
parameter and model paratemer will converge. But when we talk about
inference do we ever care about the model parameter?

Bruce Weaver

unread,
Sep 28, 2010, 10:31:00 AM9/28/10
to

I don't understand your objection. Have you ever tried it with a
population small enough so that you can enumerate all possible samples
of a given size? E.g., try the following:

1. Let the population consist of 5 scores: 2, 3, 4, 5, 6

2. Compute the population mean and SD (with N, not n-1 in the
denominator).

3. Draw all possible samples of n=2 (with replacement) from the
population--there are 25 of them. For each one, compute the sample
mean.

4. Compute the mean and SD of the 25 sample means. For the SD, use
N=25 in the denominator, because you have the entire population of
sample means.

Notice that the mean of the sample means = the population mean; and
the SD of the sample means = the population SD over the square root
of the sample size.

Cagdas Ozgenc

unread,
Sep 28, 2010, 2:48:47 PM9/28/10
to
> bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home
> "When all else fails, RTFM."- Alıntıyı gizle -
>
> - Alıntıyı göster -

Let me try to explain one more time. My questions are usually too deep
down there for me to explain properly.

First of all I am talking about 3 diferent things here: model
parameters, population parameters, sample statistics

I have no objection to the fact that population mean can be calculated
by sample mean in an unbiased way. However you will find commonly in
text books and real life research that what's trying to be inferred is
not the population mean but the model mean (or generating process).

For example take a look at the lecture notes of a stats class in
UCDavis that I just found on the internet (page 5):

http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf

It is trying to show the difference between sampling with replacement
vs sampling without replacement. But that's not the issue here.
There's something else wrong about it.

Starts with the following:

"Suppose the heights of female students entering UC Davis in year 2005
follows a normal distribution, with mean mu and standard deviation
sigma"

First of all number of female students entering UC Davis in year 2005
is finite. If this is really our population then there is no way it
can be normally distributed. Normal distribution is a model for
infinite populations. This means that actually we are not looking at a
population but we are looking at a sampling from normal distribution.

Now the rest of the problem:

"A random sample of 100 students are taken" from the above so called
population.

Now we are looking at a sample of a sample. This means that no matter
what you do, you will never find the mean of the normal distribution
(Mu) by repeated sampling. It doesn't matter whether you do it with
replacement or without replacement.

You will end up calculating the mean of the population, which will be
slightly or significantly different from Mu depending on how many
students entered UC Davis in year 2005. This means that our samples of
100 students will be an unbiased estimate of the population mean but a
biased estimate of Mu.

This is the difference between sampling with replacement from a finite
population and sampling from an infinite population. It seems to me
that this is a chronical problem in stat texts.

I understand that when you have 1000 elements in your population the
difference in the result will be miniscule. Or some could say that it
is just a model and a model is not 100% reflection of real life
(that's why it is called a model). However, members of a finite
population can well be generated by a normal random variable, there is
nothing wrong with that. The problem arises when we start calling this
a population and start calculating mean, variance, and confidence
intervals. Here we were trying to capture the essence of the
generating stochastic process (mu, sigma), but we actually ended up
with something else.

The put the final nail in the coffin, let's look at the last sentence
on that page of lecture notes:

"Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is
just NOT TRUE!

Cagdas Ozgenc

unread,
Sep 28, 2010, 2:56:13 PM9/28/10
to

No you cannot. You don't know the data generating process as I
explained to the other poster.

Bruce Weaver

unread,
Sep 28, 2010, 5:49:20 PM9/28/10
to
On Sep 28, 2:48 pm, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote:

>
> Let me try to explain one more time. My questions are usually too deep
> down there for me to explain properly.
>
> First of all I am talking about 3 diferent things here: model
> parameters, population parameters, sample statistics
>
> I have no objection to the fact that population mean can be calculated
> by sample mean in an unbiased way. However you will find commonly in
> text books and real life research that what's trying to be inferred is
> not the population mean but the model mean (or generating process).
>
> For example take a look at the lecture notes of a stats class in
> UCDavis that I just found on the internet (page 5):
>
> http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf
>
> It is trying to show the difference between sampling with replacement
> vs sampling without replacement. But that's not the issue here.
> There's something else wrong about it.
>
> Starts with the following:
>
> "Suppose the heights of female students entering UC Davis in year 2005
> follows a normal distribution, with mean mu and standard deviation
> sigma"
>
> First of all number of female students entering UC Davis in year 2005
> is finite. If this is really our population then there is no way it
> can be normally distributed.

I agree. But nothing else is truly normal either, at least if you're
working with real (rather than simulated) data. George Box summed it
up pretty nicely as follows:

“…the statistician knows…that in nature there never was a normal
distribution, there never was a straight line, yet with normal and
linear assumptions, known to be false, he can often derive results
which match, to a useful approximation, those found in the real
world.” (JASA, 1976, Vol. 71, 791-799)


> Normal distribution is a model for
> infinite populations.

This strikes me as too restrictive. I think the normal distribution
can also serve as a fairly decent model for finite populations,
provided they are large enough. According to the website given below,
UC Davis had a little over 5,000 freshman students in 2005. If even
half were female, the normal distribution described might not be too
bad as a *model* for the height distribution of incoming female
students.

http://facts.ucdavis.edu/student_population_headcount_fall.lasso


> This means that actually we are not looking at a
> population but we are looking at a sampling from normal distribution.
>
> Now the rest of the problem:
>
> "A random sample of 100 students are taken" from the above so called
> population.
>
> Now we are looking at a sample of a sample. This means that no matter
> what you do, you will never find the mean of the normal distribution
> (Mu) by repeated sampling. It doesn't matter whether you do it with
> replacement or without replacement.
>
> You will end up calculating the mean of the population, which will be
> slightly or significantly different from Mu depending on how many
> students entered UC Davis in year 2005. This means that our samples of
> 100 students will be an unbiased estimate of the population mean but a
> biased estimate of Mu.
>
> This is the difference between sampling with replacement from a finite
> population and sampling from an infinite population. It seems to me
> that this is a chronical problem in stat texts.
>
> I understand that when you have 1000 elements in your population the
> difference in the result will be miniscule. Or some could say that it
> is just a model and a model is not 100% reflection of real life
> (that's why it is called a model).

Or as Box said,

"All models are wrong. Some are useful."

> However, members of a finite
> population can well be generated by a normal random variable, there is
> nothing wrong with that. The problem arises when we start calling this
> a population and start calculating mean, variance, and confidence
> intervals. Here we were trying to capture the essence of the
> generating stochastic process (mu, sigma), but we actually ended up
> with something else.
>
> The put the final nail in the coffin, let's look at the last sentence
> on that page of lecture notes:
>
> "Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is
> just NOT TRUE!

No, it's not. But I think the question is whether the approximation
is close enough to be useful under the circumstances.

HTH.
--
Bruce Weaver
bwe...@lakeheadu.ca

Ray Koopman

unread,
Sep 28, 2010, 6:23:18 PM9/28/10
to
On Sep 28, 11:48 am, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote:
> [...]

> Now we are looking at a sample of a sample. This means that no matter
> what you do, you will never find the mean of the normal distribution
> (Mu) by repeated sampling. It doesn't matter whether you do it with
> replacement or without replacement.
>
> You will end up calculating the mean of the population, which will
> be slightly or significantly different from Mu depending on how many
> students entered UC Davis in year 2005. This means that our samples
> of 100 students will be an unbiased estimate of the population mean
> but a biased estimate of Mu.

The population mean is an unbiased estimate of the generator mean.
The sample mean is an unbiased estimate of the population mean,
and therefore of the generator mean.

Rich Ulrich

unread,
Sep 28, 2010, 6:38:32 PM9/28/10
to
On Tue, 28 Sep 2010 11:48:47 -0700 (PDT), Cagdas Ozgenc
<cagdas...@gmail.com> wrote:

My understanding of your problem differs from what
Bruce is arguing.

[snip, previous posts]


>
>Let me try to explain one more time. My questions are usually too deep
>down there for me to explain properly.
>
>First of all I am talking about 3 diferent things here: model
>parameters, population parameters, sample statistics
>
>I have no objection to the fact that population mean can be calculated
>by sample mean in an unbiased way. However you will find commonly in
>text books and real life research that what's trying to be inferred is
>not the population mean but the model mean (or generating process).
>
>For example take a look at the lecture notes of a stats class in
>UCDavis that I just found on the internet (page 5):
>
>http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf
>
>It is trying to show the difference between sampling with replacement
>vs sampling without replacement. But that's not the issue here.
>There's something else wrong about it.
>
>Starts with the following:
>
>"Suppose the heights of female students entering UC Davis in year 2005
>follows a normal distribution, with mean mu and standard deviation
>sigma"

I agree that the above is ambiguous if you really want to press
the point. It uses mu and sigma which describe populations
It does not state whether the population is the class of 2005,
or something wider that would be more useful for generalization.

>
>First of all number of female students entering UC Davis in year 2005
>is finite. If this is really our population then there is no way it

First, that is not necessarily the population. The class of 2005
is not described here, necessarily, as other than a sample from an
infinite population with those (assumed) parameters.

On the other hand, I see no need at all that a sample drawn
with a particular distribution needs to be infinite. A single
observation can be "drawn from a distribution."

And if the class of 2005 is the "population", then "normal" is
a description which is assumed, or presumed to be close enough
for the purposes of some problem.


>can be normally distributed. Normal distribution is a model for
>infinite populations. This means that actually we are not looking at a
>population but we are looking at a sampling from normal distribution.
>
>Now the rest of the problem:
>
>"A random sample of 100 students are taken" from the above so called
>population.
>
>Now we are looking at a sample of a sample.

The terminology of "population" versus "sample" is not
particularly illuminating unless one is going to talk about the
finite sampling correction, or otherwise make use of the
limitation of the total N. This could be a sample from a sample,
or a sample from a population.

> This means that no matter
>what you do, you will never find the mean of the normal distribution
>(Mu) by repeated sampling. It doesn't matter whether you do it with
>replacement or without replacement.

So the mean is wrong. It does not pretend to be an estimate with zero
error, only with zero bias when the whole procedure is applied many
times. As I tried to emphasize in my previous reply, "zero bias" does
not say that the estimate has zero variance.
For normal, it does not even the least possible variance.

>
>You will end up calculating the mean of the population, which will be
>slightly or significantly different from Mu depending on how many
>students entered UC Davis in year 2005.

Huh? If 2005 *is* the "population", how could "the mean of the
population" be "significantly different from Mu"?


> This means that our samples of
>100 students will be an unbiased estimate of the population mean but a
>biased estimate of Mu.

"'population mean" is Mu, by conventional definition. So, I can't
see how you can accept that the samples of 100 are unbiased in
estimating one but not the other.

>
>This is the difference between sampling with replacement from a finite
>population and sampling from an infinite population. It seems to me
>that this is a chronical problem in stat texts.

You've lost me for good.

>
>I understand that when you have 1000 elements in your population the
>difference in the result will be miniscule. Or some could say that it
>is just a model and a model is not 100% reflection of real life

By "model", do you refer to the "sample"?

>(that's why it is called a model). However, members of a finite
>population can well be generated by a normal random variable, there is
>nothing wrong with that. The problem arises when we start calling this
>a population and start calculating mean, variance, and confidence
>intervals. Here we were trying to capture the essence of the
>generating stochastic process (mu, sigma), but we actually ended up
>with something else.
>
>The put the final nail in the coffin, let's look at the last sentence
>on that page of lecture notes:
>
>"Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is
>just NOT TRUE!

If X1, ..., X100 are individuals sampled, I thought that was the
starting point. I don't see it as a conclusion.

--
Rich Ulrich

Cagdas Ozgenc

unread,
Sep 28, 2010, 11:49:38 PM9/28/10
to
> "...the statistician knows...that in nature there never was a normal
> bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home
> "When all else fails, RTFM."- Alıntıyı gizle -
>
> - Alıntıyı göster -- Alıntıyı gizle -
>
> - Alıntıyı göster -

Of course normal is a good model for finite populations as well. I
agree with you on that.

The point I was trying to make is that once you get into using a
model, this can be any probability distribution with infinitely many
values (normal, uniform, or even a discrete distribution with infinite
variety in values), or density estimation for example there is indeed
a difference between sampling from an infinite population and sampling
from a finite population with replacement.

Cagdas Ozgenc

unread,
Sep 28, 2010, 11:54:07 PM9/28/10
to

I think you have a point here. But as you can see that there is a
problem with consistency.

Let's say that generator mean is Mu, and population mean is Mu + Eps.
And I take as you suggest Eps is a random error not a systematic error
(not a bias).

Now as you take more and more sample means, you will see that they
will start to gather around Mu+Eps not Mu. Now do we have a random
error or a systematic error?

Cagdas Ozgenc

unread,
Sep 29, 2010, 12:21:42 AM9/29/10
to

> I agree that the above is ambiguous if you really want to press
> the point.  It uses mu and sigma which describe populations
> It  does not state whether the population is the class of 2005,
> or something wider that would be more useful for generalization.
>

That's not the issue. Take any finite population with a data
generating process behind it. Population mean is an unbiased estimate
of data generating process distribution as Ray pointed out. But once
you start getting samples from that population your random error turns
into a systematic error (a bias).

Luis A. Afonso

unread,
Sep 29, 2010, 7:56:34 AM9/29/10
to
When a problem is ill posed it is difficult to get what it is really asked for.
Suppose I generate data from a NORMAL MODEL N(mu, var). Considering any finite set N of values we know that, generally speaking, the sample mean, m, does not coincide with mu.
_______ mu = m + bias
This bias tends to ZERO when N grows ro infinity.
The same way in what concerns variance and its unbiased estimator, the sample variance, svar, which’s the sum of the squared deviations divided by N-1.
________ svar = var + bias´

Cagdas Ozgenc

unread,
Sep 29, 2010, 8:12:01 AM9/29/10
to

I don't know what question you answered here but I will disagree with
you here again. What you are describing here is not a bias, it is a
random error. If N is a random sampling from the normal model you
specified, calculated sample mean contains a random error not a bias
because error is just related to samples you selected, not to a
systematic mistake you made in your experiment.

Luis A. Afonso

unread,
Sep 29, 2010, 1:59:13 PM9/29/10
to
You are right: they are
Estimation errors = sampling errors

http://en.wikipedia.org/wiki/Sampling_error - 27k

Ray Koopman

unread,
Sep 29, 2010, 2:54:35 PM9/29/10
to

It all depends on whether we're talking about the conditional
distribution of the sample mean, given the population mean; or the
unconditional (or marginal) distribution of the sample mean. As an
estimate of the generator mean, the ssmple mean is conditionally
biased but marginally unbiased.

Cagdas Ozgenc

unread,
Sep 29, 2010, 3:03:24 PM9/29/10
to

>
> >> The population mean is an unbiased estimate of the generator mean.
> >> The sample mean is an unbiased estimate of the population mean,
> >> and therefore of the generator mean.
>
> > I think you have a point here. But as you can see that there is a
> > problem with consistency.
>
> > Let's say that generator mean is Mu, and population mean is Mu + Eps.
> > And I take as you suggest Eps is a random error not a systematic error
> > (not a bias).
>
> > Now as you take more and more sample means, you will see that they
> > will start to gather around Mu+Eps not Mu. Now do we have a random
> > error or a systematic error?
>
> It all depends on whether we're talking about the conditional
> distribution of the sample mean, given the population mean; or the
> unconditional (or marginal) distribution of the sample mean. As an
> estimate of the generator mean, the ssmple mean is conditionally
> biased but marginally unbiased.
>

I don't think I am following you. How is all that related to
conditioning?

Rich Ulrich

unread,
Sep 29, 2010, 3:09:48 PM9/29/10
to

That's clever, but basically wrong. That is not the definition
of bias that we have in play previously.

You can get closer and closer to obtaining the value of
the population mean; but you never have more precision
than what the population mean provides, in regards to
estimating the underlying process.

So? That is the error of a single sampling (the "population").

Yes, colloquially speaking, we say that any single drawing of
a sample is going to be biased, or it gives a biased estimate.
But the relevant meaning when we speak of "an unbiased
statistic" is limited to the venue of the procedure that is being
repeated.


Subsamples give an unbiased estimate of the sample.
The sample gives an unbiased estimate of the generating
process -- and the mean of the whole sample has smaller error
than any of its subsamples will have. Technically, we want to
say that subsamples *do* give an unbiased estimate of the
generating process, (inevitably) with larger error than the
whole sample.

The prospect of mis-statement arises from imagining that
using the subsamples can escape the original error of the
sample. Even though we may casually call it "biased" when
we describe its effect, that is applying the adjective on a
different level of intercourse.

--
Rich Ulrich

Cagdas Ozgenc

unread,
Sep 29, 2010, 3:39:14 PM9/29/10
to
On 29 Eylül, 23:09, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Tue, 28 Sep 2010 21:21:42 -0700 (PDT), Cagdas Ozgenc
>

I am glad that we are now at least on the same ground.

If I look at the definition of Sampling Bias in Wikipedia it is
actually exactly what you describe above.

http://en.wikipedia.org/wiki/Sampling_bias

Even though I understand you, I don't understand why you think I am
wrong. Is it the definition of the word "bias" that we differ in
opinion? If so according to which source (book, article, etc.)?

Ray Koopman

unread,
Sep 29, 2010, 3:57:14 PM9/29/10
to

In the marginal distribution all the error is random, and the sample
mean is is an unbiased estimate of the generator mean. In the
conditional distribution there is both random and systematic error;
the sample mean is a biased estimate of the generator mean, with the
bias being the unknown but fixed difference between the population
mean and the generator mean.

Cagdas Ozgenc

unread,
Sep 29, 2010, 4:12:02 PM9/29/10
to

Sorry I didn't make myself clear. Basically I am trying to relate your
conclusion to my initial question. What does this in general tell us
about sampling from an infinite population vs sampling from a finite
population with replacement? Can I conclude that they cannot be
treated equally? Why is this issue never mentioned in stat texts?

Ray Koopman

unread,
Sep 29, 2010, 5:40:04 PM9/29/10
to

You're asking about more than just the difference between sampling
from an infinite population and sampling with replacement from a
finite population. You specified that the population was generated by
a random process and that you wanted to estimate the generator mean.
That means you're doing two-stage sampling, which is something that
most texts (other than those on survey sampling) do not get into,
and that you must also specify whether you are interested in the
conditional or the marginal distribution of the sample mean.

Cagdas Ozgenc

unread,
Sep 30, 2010, 6:30:49 AM9/30/10
to
> conditional or the marginal distribution of the sample mean.- Hide quoted text -
>
> - Show quoted text -

Thank you. That's pretty much summarizes and concludes what I wanted
to discuss with the group.

I still think it's strange that it is quite common for stat texts to
have a generating distribution in their sample problems ("suppose that
we have a pop with Normal(mu, sigma)" sort of stuff and then use Mu
and Sigma in various calculations), and yet still contain the argument
that sampling from an infinite pop is equal to sampling from a finite
pop with replacement. This issue is too insidious.

Thanks once again.

0 new messages