Election polling, 2020.

Rich Ulrich

unread,

Jun 28, 2020, 12:16:50 AM6/28/20

to

The NY Times this week offered a sidebar describing the
methods of the polling with Siena that they released a week
ago. This poll reported a 14 point lead for Biden over Trump,
to go with others' recent results of 12, 8, and 14 point leads.

Theirs was a phone poll. And it did include calling cell phones,
which I wasn't aware was done before, or was legal.

Their starting point was the voter registration lists from each
state. They make use of whatever is available, sex and
party affiliation and probably age. They turn to other sources
to obtain phone numbers and whatever else.

They report that the interview-completion rate for those
contacted was 1% to 2%. That seems amazingly low to
me, but they claim to have done very well with the same
techniques in the last elections.

--
Rich Ulrich

Rich Ulrich

unread,

Nov 6, 2020, 9:10:06 PM11/6/20

to

UPDATE.
Nov 06, 2020. How the polling may have failed in Pennsylvania.

This year, for the first time ever, a large number of Pennsylvanians
voted by mail, and Trump's campaign against voting by mail clearly
made an impact. Election returns show that, across Pennsylvania, Biden
took 75% of the vote-by-mail.

Most pollsters report their predictions according to those “likely to
vote.” From what I have read long ago, that made use of previous
voting – using either state records or self-report – and maybe
factoring in enthusiasm.

If you are a pollster, scrabbling to reach 1000 subjects when 9 out of
10 people won't talk to you on the phone, I think you quietly and
happily put the person who has voted by mail into the likely-to-vote
stack; this year, this state, that would have been a mistake that
created an bias of several points, towards Biden.

To see that, consider this example. By election week, “already voted”
could be half of your 1000, most of them who would have made it by any
standard. Now suppose that 100 of your “likely-to-vote” sample won
that qualification solely by telling you they have voted. If the rest
of the sample was 450-450, you have raised the totals to 525-475, an
unearned 50 person lead for Biden.

In retrospect, this is an obvious mistake. But I can see how it could
have crept in. The decision to pool Voted with Likely never made a
difference before, because: (a) "Voted" was a tiny number; (b) it was
always balanced evenly, more or less, between parties. Suddenly, in
2020, it is a big number and there is a big partisan split in mailing,
in states new to mail-in voting.

One reason that I suspect that this blunder has occurred is that I
have seen NO comments about what pollsters did with “already voted”
when they /could/ have informed us of what margins to expect in the
states in final contention. Steve Karnacky (MSNBC) is reporting on
what he has seen, when he says 75-25 statewide

As the final votes come in (Nevada, Pennsylvania, Arizona, Georgia),
the Finite Population Correction is difficult to apply because of the
mixing-in (it seems) of many so-called “provisional ballots” whose
propensities are unknown.

--
Rich Ulrich

David Jones

unread,

Nov 8, 2020, 3:10:08 PM11/8/20

to

> when they could have informed us of what margins to expect in the

> states in final contention. Steve Karnacky (MSNBC) is reporting on
> what he has seen, when he says 75-25 statewide
>
> As the final votes come in (Nevada, Pennsylvania, Arizona, Georgia),
> the Finite Population Correction is difficult to apply because of the
> mixing-in (it seems) of many so-called “provisional ballots” whose
> propensities are unknown.

This recent article (27 October) may be of interest if you have not
seen it ...
https://www.significancemagazine.com/politics/689-forecast-error-potus-2020

Rich Ulrich

unread,

Nov 9, 2020, 1:32:37 AM11/9/20

to

Interesting review, thanks!

I remember from a few elections ago that there were predictions
based on "economic factors." Those seemed to disappear from
the sources I see, probably because of repeated failures. - It is
hard to get a reliable predictor-equation when your training data
is, say, 8 elections; and you are trying to fit with some selection
of 30 or so a-priori variables.

--
Rich Ulrich

Rich Ulrich

unread,

Nov 11, 2020, 8:00:52 PM11/11/20

to

On Fri, 06 Nov 2020 21:09:59 -0500, Rich Ulrich
<rich....@comcast.net> wrote:

>
>UPDATE.
>Nov 06, 2020. How the polling may have failed in Pennsylvania.
>
>This year, for the first time ever, a large number of Pennsylvanians
>voted by mail, and Trump's campaign against voting by mail clearly
>made an impact. Election returns show that, across Pennsylvania, Biden
>took 75% of the vote-by-mail.
>
>Most pollsters report their predictions according to those “likely to
>vote.” From what I have read long ago, that made use of previous
>voting – using either state records or self-report – and maybe
>factoring in enthusiasm.

...

Here is a long article about What Went Wrong -
https://www.nytimes.com/2020/11/10/upshot/polls-what-went-wrong.html

It features a table that shows that the errors of 2016 were
(apparently) still present. It does that by applying "corrections"
directly from the error of 2016 to the poll results of 2020, to make
an adjusted "prediction" for 16 states. As I count it, 12 of the 16
predictions for 2020 show fewer points of error after this adjustment,
with error for only two states being worse (and two the same).

Here is a comment in the article that implicitly supports the idea
that I floated (one which they do NOT mention), that pollsters
blundered by pooling "alreadly voted" with all other "likely voters" -

"Heading into the election, many surveys showed something unusual:
Demorats faring better among likely voters than among registered
voters. Usually, Republicans hold the turnout edge.

"Take Pennsylvania. The final CNN/SSRS poll of the state showed Mr.
Biden up by 10 points among likely voters, but just by five among
registered voters."

In Pa., where mail voting was new and Republicans tended to heed
Trump's plea to vote in person, the mail vote broke 3-1 for Biden.
My example proposed a five point gain from labeling voters as
"likely" because they already voted, comprisiing 75D+25R for 100
subjects in a 1000 subject sample.

--
Rich Ulrich