Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Probability of Outcome

0 views
Skip to first unread message

Eugene Kononov

unread,
Oct 19, 2003, 7:16:04 PM10/19/03
to
I was watching the recent California gubernatorial elections in
progress, and very soon after I saw the intermidiate results
indicating 55%/45% in favor of the recall with only 19% of the
precincts reporting, Gray Davis was announced to be out of job by the
networks.

Well, I thought I understood the law of big numbers, but apparently
not. With only 19% of the votes counted, and the 10% difference
between the votes, I thought there was a significant chance that the
final result may be different. Something like 0.05 probability, just
from the top of my head. Apparently, that probability is much lower,
and that's what motivated the proclamations of Davis' defeat.

Can anyone demonstrate how such a probability can be calculated,
statistically? More specifically, given the normal distributions of
votes, and 19% of the votes counted, and 55%/45% current distribution,
what is the probability that after 100% of the votes are in, the final
result would be, say, 49.99%/50.01%?

Kenmlin

unread,
Oct 19, 2003, 9:41:25 PM10/19/03
to
First of all, they assumed that the data from the 1st 19% of the precints are
unbiased representation of all precints.

Based on the available information, you could say that

P(a person will vote for Arnold) = 0.55

Then you'd have to determine how many votes constitues 49.99%. If there are
totol of 1 million votes, then your question will be

P(499,900 people voted for Arnold out of 1,000,000)

You can derive the probability by binomial distribution although it's probably
not computationally feasible since you'd have to calculate the combination of
large numbers.

You could also run bunch of simulation and see how often Arnold gets 49.99% of
the votes.

Hope that helped.

Ken

Stan Brown

unread,
Oct 19, 2003, 10:18:39 PM10/19/03
to
In article <d4a4fd16.03101...@posting.google.com> in
sci.stat.math, Eugene Kononov <nonli...@yahoo.com> wrote:
>With only 19% of the votes counted, and the 10% difference
>between the votes, I thought there was a significant chance that the
>final result may be different. Something like 0.05 probability, just
>from the top of my head.

This is a common misconception. In fact, with binomial (yes/no)
data, the margin of error and the p-value depend on the absolute
number in your sample, not on the sample size as a fraction of
population size.

With just over a thousand people, assuming it's a good random
sample, you know the outcome of the whole population to within 3%
either way. That's true whether the population is Montana or
California or the entire world.

One big question, obviously, is whether the 19% of votes counted are
a random sample. Without more information we can't say. If they were
mostly from urban districts, or mostly from rural districts, or
mostly from one region of the state, they might be a biased sample.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com
Address munging may or may not reduce the spam you get; it surely
reduces the number of useful answers you get.
http://www.cs.tut.fi/~jkorpela/usenet/laws.html

Rich Ulrich

unread,
Oct 20, 2003, 10:21:15 AM10/20/03
to
On 19 Oct 2003 16:16:04 -0700, nonli...@yahoo.com (Eugene Kononov)
wrote:

> I was watching the recent California gubernatorial elections in
> progress, and very soon after I saw the intermidiate results
> indicating 55%/45% in favor of the recall with only 19% of the
> precincts reporting, Gray Davis was announced to be out of job by the
> networks.
>
> Well, I thought I understood the law of big numbers, but apparently
> not. With only 19% of the votes counted, and the 10% difference
> between the votes, I thought there was a significant chance that the
> final result may be different. Something like 0.05 probability, just
> from the top of my head. Apparently, that probability is much lower,
> and that's what motivated the proclamations of Davis' defeat.

[ snip: Question, how to figure it?]

I don't know what p-level they use, but 5% might be enough.
Consider how often they have been wrong.

On the other hand: By the simplest model, those data
provide evidence much stronger than 5%.
Construct the 2x2 table with
- Row 1, a few million votes, 19%, divided 55%-45%;
- Row 2, the other 81%, divided 50-50.
Compute the 2x2 contingency chisquared.
The 5% test is 3.84.

The Bush miscall in Florida was based on a lead of perhaps
150 thousand votes, with only "a half million remaining".
Now I show some problems with 'finite population correction.'
The scare quotes are there, because the half-million turned
out to be a full million -- a rather poor estimate -- coming in
from places where the vote counting was slowest.

The vote counting was slow in certain urban precincts that
were mainly black, owing to a record-turnout. It was
50% above the previous (and expected) level, in some;
with up to 1/3 being first-time voters. (And first-time voters
created much of the 'chad' problem). Oh, yes, some black
precincts went for Gore by 90%-10%, so the missing votes
were not random.

That is why vote-forecasting for a total ought to account
for biases -- of who remains. Here in Pennsylvania, the
state-wide votes are divided by Pittsburgh versus
Philadelphia, for local favorites; and urban (those two)
versus rural, for Democratic versus Republican. On election
nights, the city reports come in first.
A "Pittsburgh Democrat" who does not "win big in
Pittsburgh" is seldom going to win.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization."

Bruce Weaver

unread,
Oct 20, 2003, 11:53:14 AM10/20/03
to
Rich Ulrich wrote:
> On 19 Oct 2003 16:16:04 -0700, nonli...@yahoo.com (Eugene Kononov)
> wrote:
>
>
>>I was watching the recent California gubernatorial elections in
>>progress, and very soon after I saw the intermidiate results
>>indicating 55%/45% in favor of the recall with only 19% of the
>>precincts reporting, Gray Davis was announced to be out of job by the
>>networks.
>>
>>Well, I thought I understood the law of big numbers, but apparently
>>not. With only 19% of the votes counted, and the 10% difference
>>between the votes, I thought there was a significant chance that the
>>final result may be different. Something like 0.05 probability, just
>>from the top of my head. Apparently, that probability is much lower,
>>and that's what motivated the proclamations of Davis' defeat.
>
> [ snip: Question, how to figure it?]
>
> I don't know what p-level they use, but 5% might be enough.
> Consider how often they have been wrong.
>
> On the other hand: By the simplest model, those data
> provide evidence much stronger than 5%.
> Construct the 2x2 table with
> - Row 1, a few million votes, 19%, divided 55%-45%;
> - Row 2, the other 81%, divided 50-50.
> Compute the 2x2 contingency chisquared.
> The 5% test is 3.84.

----- snip the rest -----

Rich, do you want 50-50 in the remaining 81%, or row 2 cell
counts that will result in 50% plus 1 in total for the
column that had 45% in the early going? I would have
thought the latter. I worked out a little example (using
20% and 80% to make life easier) with 55% and 45% in row 1,
and row 2 cell counts that result in 50% + 1 as the total
count for the column that had 45% in the early going.

Here's the 2x2 table for N = 1800 (chi-square = 4.601, p =
0.032).

|---------------------|--------------------|------|
| |B |Total |
| |------|-------------| |
| |Recall|Do not recall| |
|----|-----|----------|------|-------------|------|
|A |Early|Count |198 |162 |360 |
| | |----------|------|-------------|------|
| | |% within A|55.0% |45.0% |100.0%|
| |-----|----------|------|-------------|------|
| |Late |Count |701 |739 |1440 |
| | |----------|------|-------------|------|
| | |% within A|48.7% |51.3% |100.0%|
|----|-----|----------|------|-------------|------|
|Total |Count |899 |901 |1800 |
| |----------|------|-------------|------|
| |% within A|49.9% |50.1% |100.0%|
|----------|----------|------|-------------|------|


And here it is with N = 18,000,000 (chi-square = 45000, p =
virtually 0).

|---------------------|---------------------|--------|
| |B |Total |
| |-------|-------------| |
| |Recall |Do not recall| |
|----|-----|----------|-------|-------------|--------|
|A |Early|Count |1980000|1620000 |3600000 |
| | |----------|-------|-------------|--------|
| | |% within A|55.0% |45.0% |100.0% |
| |-----|----------|-------|-------------|--------|
| |Late |Count |7019999|7380001 |14400000|
| | |----------|-------|-------------|--------|
| | |% within A|48.7% |51.3% |100.0% |
|----|-----|----------|-------|-------------|--------|
|Total |Count |8999999|9000001 |18000000|
| |----------|-------|-------------|--------|
| |% within A|50.0% |50.0% |100.0% |
|----------|----------|-------|-------------|--------|
¢


Cheers,
Bruce
--
Bruce Weaver
wea...@mcmaster.ca
www.angelfire.com/wv/bwhomedir/

Eugene Kononov

unread,
Oct 20, 2003, 12:41:40 PM10/20/03
to
Stan Brown <the_sta...@fastmail.fm> wrote in message >

> With just over a thousand people, assuming it's a good random
> sample, you know the outcome of the whole population to within 3%
> either way. That's true whether the population is Montana or
> California or the entire world.

What's the formula to calculate the 3% error? Also, what does it tell
us in terms of the probablility that the outcome will deviate more
than 3% in either direction?

Rich Ulrich

unread,
Oct 20, 2003, 3:17:44 PM10/20/03
to
On Mon, 20 Oct 2003 11:53:14 -0400, Bruce Weaver <wea...@mcmaster.ca>
wrote:

> Rich Ulrich wrote:
> > On 19 Oct 2003 16:16:04 -0700, nonli...@yahoo.com (Eugene Kononov)

[ how to project for Finite Population, 2x2 table. ]


> ----- snip the rest -----
>
> Rich, do you want 50-50 in the remaining 81%, or row 2 cell
> counts that will result in 50% plus 1 in total for the
> column that had 45% in the early going? I would have
> thought the latter. I worked out a little example (using

Right, of course. Not, "Row 2 equals 50%,"
but "Total adds up to 50%."

So easy. It is too bad that we hardly ever can apply it,
since we hardly ever *fairly* assume that the sampling
is suitably random.

Stan Brown

unread,
Oct 20, 2003, 3:31:57 PM10/20/03
to
In article <d4a4fd16.0310...@posting.google.com> in
sci.stat.math, Eugene Kononov <nonli...@yahoo.com> wrote:

What follows assumes the true proportion is not too close to 1 or 0,
and that's quite a good assumption for elections.

The standard error of the proportion is sqrt(pq/n), where p is the
(unknown) true proportion, q is 1-p, and n is sample size. pq
achieves a maximum at p=q=.5, so the standard error can never be
greater than .5/sqrt(n).

You then decide your desired confidence level, 1-alpha. While 95% is
customary in many opinion polls, it's not "magically delicious"; but
let's stick with custom. From the confidence level you calculate
alpha/2 (.025 in this case). You then find the critical z such that
the area in the right-hand tail is aloha/2. This is the inverse of
the normal distribution and can be done from tables or is programmed
into some calculators. If alpha/2 = .025, z(alpha/2) = 1.96.

The margin of error is +/- the critical z times the standard error.
If your sample is n = 1000, the standard error is about 1.58% and
1.96 times that is +/- 3.1%. If you picked a higher (lower)
confidence level than 95%, you would similarly compute a lower
(higher) margin of error.

Dave Swanson

unread,
Oct 20, 2003, 8:30:16 AM10/20/03
to
I always assumed that the statisticians improved their accuracy by
stratifying the population into, say, Republican and Democratic
districts, and then using stratified estimators. But I really don't
know.

A few statisticians where I work go to New York on election night and
help CBS predict the outcomes. One of these days I'll have to ask
them how they do it.

Herman Rubin

unread,
Oct 21, 2003, 10:16:45 AM10/21/03
to
In article <d4a4fd16.03101...@posting.google.com>,

The law of large numbers, as usually considered, was NOT
used by the networks; they do not assume that the reports
come from a random sample of precincts. Inference using
which precincts reported, and their past behavior, was used
to produce estimates and confidence regions by methods of
inference which the networks have developed from past data.


--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

0 new messages