An estimation question

anal...@hotmail.com

unread,

Apr 28, 2012, 11:18:48 AM4/28/12

to

This problem occurs a lot in real life..

You sample n people, and a proportion p of them are found to be
carrying a red flag (like some political party, prefer a brand of soap
etc.). Textbooks say that the estimate of the proportion carrying the
red flag in the total population is p with a variance of n.p.(1-p).
This would indicate that p close to 0 or 100 pct can be estimated with
smaller samples than p around 50 pct with the same confidence.

But suppose we have carried out these samplings repeatedly and past
results show that the proportion carrying the red flag always comes in
between 0 and say 15 pct. We can even estimate a histogram
distribution of p from past samples. If we now make a new sampling of
n items - and we wish to rely on the past sampling results, how would
the mean and variance estimates change?

Thanks for any replies.

anal...@hotmail.com

unread,

Apr 28, 2012, 2:42:50 PM4/28/12

to

On Apr 28, 11:18 am, "analys...@hotmail.com" <analys...@hotmail.com>
wrote:

> This problem occurs a lot in real life..
>
> You sample n people, and a proportion p of them are found to be
> carrying a red flag (like some political party, prefer a brand of soap
> etc.). Textbooks say that the estimate of the proportion carrying the
> red flag in the total population is p with a variance of n.p.(1-p).

correction: variance = p(1-p)/n.

David Jones

unread,

Apr 30, 2012, 3:29:20 PM4/30/12

to

wrote in message
news:b425cd93-987c-4e5b...@l18g2000vbx.googlegroups.com...

One approach is to choose another distribution that is more suitable. Older
standard book will outline methods for choosing a distribution: one based on
the index of dispersion and one based on getting the proportions of sample
of counts q_k of samples with k "successes", and then plotting the ratio of
adjacent values as a function of k ... standard distributions have
recognisable forms.

If you just want to stick to what you have outlined, then you can use your
"sample of past values" of p to estimate the standard deviation you want,
which is a combination of between and within sample variations.

Rich Ulrich

unread,

Apr 30, 2012, 9:37:45 PM4/30/12

to

I thought you would elicit some sort of Bayesian answer,
but that hasn't happened.

Bayesian computation uses a "prior distribution" and
comes up with a combined, Bayesian estimate -- but that
is not the same, exactly, as reporting Mean and SD.
And I'm not a bayesian advocate, nor am I up-to-speed
on what they are doing, but my impression is that the
results, in terms of narrowing or modifying the estimators
is ordinarily of the magnitude that you get by adding a
total of 1 case, or very few cases, to the observed sample
size.

If you want to make a statement based on a long time-series
of observations, there are classical techniques that *might*
be applicable -- What is appropriate would depend on
whether you are tapping some dimension that you is
constant ("is thought to be constant") or that might
have a slow change, relative to the number of census points.

For the simplest instance -- If there is no change expected
or suggested by the data, you might decide to pool all the
avialable data, and present the overall mean and SD, based
on the total N. -- If that comes to a really large N, it will
produce a SD that is too small, because it will not take into
account the standard error of the bias of the estiumations.

If there is slow change, you might argue for a time-series
projection. That would mainly use the most recent points,
but it might afford a more precise estimate of the present
mean than you get by using the latest data alone.

--
Rich Ulrich

anal...@hotmail.com

unread,

May 2, 2012, 7:51:05 PM5/2/12

to

On Apr 30, 9:37 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Sat, 28 Apr 2012 08:18:48 -0700 (PDT), "analys...@hotmail.com"

> Rich Ulrich- Hide quoted text -
>
> - Show quoted text -

Thanks, Rich and David. I thought some more about the problem and it
seems to me that we have to specify what is being measured.

(1) The classical problem can be stated in terms of an urn with a
fixed number of black and white balls. You sample n balls (the number
of balls in the urn is >> n so that replacement or non-replacement
doesn't matter) and m of them turn out to be black. m/n is the best
estimate of the proportion of black balls in the urn.

(2) In the "Bayesian" version there are k urns with the proportion of
black balls in urn j being p(j), which are all known. You control n
the number of balls sampled, but they all come from a single urn whose
ientity is not known to you. The problem here is that if m of them
turned out to be black what is the probablity that they all came from
urn j.

In real life - case (1) applies when you are emasuring an objective
reality outside your sampling - such as the proportion of women or
left-handed people in a population. In thsi case the observed
variation arises purely from the finiteness of the sampling.
Successive samples should simply be cumulated to get the best estimate
of the population proportion.

Case (2) applies when each sampling is actually a "campaign" of sorts
- you send out n mailings that solicit some action and the response
rate is not something thats objectively out there independent of your
measurement. But if all "campaigns" are not too dissimilar from each
other, then past response rates can be used as a guide as to what to
expect. In this case there are two sources of variation - which past
campaign your current campaign is most similar to and secondarily, the
normal sampling variation from finite sampling.

Rich Ulrich

unread,

May 2, 2012, 10:20:32 PM5/2/12

to

If you are speaking of - precisely- multiple samples of
one fixed population, you should have described it differently
than you did when you compared it to repeated samples over time.
- Sum the samples if they are random.

>
>(2) In the "Bayesian" version there are k urns with the proportion of
>black balls in urn j being p(j), which are all known. You control n
>the number of balls sampled, but they all come from a single urn whose
>ientity is not known to you. The problem here is that if m of them
>turned out to be black what is the probablity that they all came from
>urn j.

Maybe it is because I know too little about bayesian estimation,
but that doesn't sound to me like any general description of
baysian estimation.

>
>In real life - case (1) applies when you are emasuring an objective
>reality outside your sampling - such as the proportion of women or
>left-handed people in a population. In thsi case the observed
>variation arises purely from the finiteness of the sampling.
>Successive samples should simply be cumulated to get the best estimate
>of the population proportion.
>
>Case (2) applies when each sampling is actually a "campaign" of sorts
>- you send out n mailings that solicit some action and the response
>rate is not something thats objectively out there independent of your
>measurement. But if all "campaigns" are not too dissimilar from each
>other, then past response rates can be used as a guide as to what to
>expect. In this case there are two sources of variation - which past
>campaign your current campaign is most similar to and secondarily, the
>normal sampling variation from finite sampling.

Sure, if you can identify all the Fixed Factors that are relevant,
(prevalence of the factors, and effects of each of them)
you can get a smaller range for an estimate of the overall proportion.

That's just ANOVA.

--
Rich Ulrich