Jonathan,
Thanks for this interesting post (and may I add, thanks for all
your important work since 2004!). However, I have doubts about
your dismissal of Richard Charnin's view--which I share--that the
"mathematical impossibility" of the 'adjusted' exit poll figures
points to complicity in fraud. I think the question of whether the
number of respondents listed for an exit poll refers to actual or
to "virtual" respondents doesn't invalidate the use of
"mathematical impossibility" as an indicator of fraud.
There are, as you suggest, a frustrating number of uncertainties
and imponderables about the practices of exit pollsters. From
their own statements and explanations, we do know quite a bit
about how, on the basis of past voting patterns, they choose
sampling precincts in such a manner as to obtain as representative
as possible an overall (clustered) sample; and about how they
deliberately over-sample minority groups in the hope of getting
statistically significant results (and then, as you indicate,
re-weight their figures, according to their best estimate of the
actual percentage each group amounts to in the total number of
people who voted). The demographic calculations involved are, as
you say, systematically right-shifted in the US: that would be
because the vote tallies from prior elections are marked by
systemic patterns of vote suppression and fraud directed at
Democratic-trending minorities, and because pre-election polling
(where that is incorporated into the preparatory planning for exit
polls) typically has a similar rightward deflection (remarked on
by Steve Freeman and others, with reference to Gallup's use of the
"likely voter" category).
I have no more than amateur status in these matters--but is it
correct to assert that the late Warren Mitofsky and his ilk have
feared any awareness on the public's part of these general
processes of clustered sampling and re-weighting? My impression,
on the contrary, has been that boasting about the large knowledge
base, skill, and artful statistical procedures (or, if you prefer,
manipulations) required for exit polling has always been part of
their self-promotion--though they would of course indignantly
reject the notion of a systematic right-shift built into the
process, as well as any suggestion that vote tallies have been
marked by systemic fraud since at least the 1980s.
The early contamination of exit polling data by vote-tally
information is another matter. That's something the pollsters
would firmly deny--though Mitofsky acknowledged making unspecified
use of what he called "quick counts." (I think this term refers to
information collected--in parallel with the full exit poll
questionnaires--by samplers who asked voters to indicate simply,
without further elaboration, which presidential candidate they've
just voted for. From what I read, years ago, about "quick counts,"
I understood them to be used by Mitofsky as a source of
supplementary early information about vote trending that he
provided to his media-consortium subscribers.) Data from the 2004
election might suggest that Mitofsky and Joseph Lenski (the
principals of the 2004 Edison-Mitofsky exit polling consortium)
would have had good reason to avoid stirring the early officially
announced vote tallies into their calculations. (By the time polls
were closing in the eastern states on November 2, 2004, the
vote-tally figures provided by CNN at 8:50 pm EST showed
Bush ahead of Kerry by 6,590,476 to 5,239,414 votes, and at 9:06
EST by 9,257,135 to 7,652,510--a lead shrinking from 11 percent to
9 percent--while pre-election polling would have led
Edison-Mitofsky to expect a close race, with a probable Kerry
victory; and the national exit poll figures available on CNN
at 9:06 pm, with a stated 13,047 respondents, showed Kerry leading
by nearly 3 percent.) But we don't know, in this and other
elections, what procedures were followed in the exit-poll
calculations. Are we faced here with an imponderable? Some kind of
early contamination of exit polling data is widely suspected, but
I'm not aware of hard information on the subject.
The key issue, however, is the subsequent adjusting or forcing of
the exit poll data, undertaken once the official vote tallies are
largely complete.
Here we have clear evidence, at least in the 2004 election, of
deliberate deception on the part of the exit pollsters. The
national exit poll figures posted by CNN on the evening of
November 2, 2004 were replaced at 1:36 am EST on November 3 by new
figures, based on a stated 13,531 respondents, showing Bush ahead
of Kerry by nearly 1.5 percent. A rise of 3.6 percent in the
stated number of respondents was accompanied by a swing of 4.5
percent from Kerry to Bush in voters' reports of their choices. In
the state exit polls that I was tracking during that election,
similar effects were observable: in Ohio a 2.8 increase in the
stated number of respondents, posted at 1:41 am on November 3, was
accompanied by a 6.5 percent swing from Kerry to Bush; and in
Florida, a 0.55 percent increase in stated respondents was
accompanied by a 4 percent swing to Bush.
Lenski told Richard Morin of the Washington Post that an
Edison-Mitofsky server had malfunctioned shortly before 11
pm--"barely minutes before the consortium was to update its exit
polling with the results of later interviewing that found Bush
with a one-point lead"--and this "glitch prevented access to any
exit poll results until technicians got a backup system
operational at 1:33 am" on November 3. But this explanation
appears to be false: the adjusted or forced Florida figures were
posted at 1:01 am.
Lenski offered Jim Rutenberg of the New York Times an
equally deceptive explanation of the divergences between the
November 2 and the subsequent exit poll figures, saying "that it
was possible that more Democrats and more women were voting
earlier, perhaps skewing the data in the afternoon. But, he said,
by the end of the night the system's polling data basically
tracked with the actual results. 'Sophisticated users of this data
know the limitations of partial survey results,' he said." In
fact, the data sets released to CNN and the other media
participants in the National Election Pool at 3:59 pm, 7:33 pm,
and around 9 pm (based, respectively, on 8,349, 11,027, and 13,047
stated respondents) consistently showed Kerry leading Bush by 51
to 48 percent. If by "polling data" we mean numbers actually
derived from from the (demographically re-weighted) responses of
voters to the exit poll questionnaires, as opposed to altered
figures substituted for the data to make it appear to conform to
the actual vote tally, there was no point on November 2 or
afterwards at which the "polling data ... tracked with the actual
results."
Warren Mitofsky participated in this deception. When Keith
Olbermann referred on the November 23, 2004 MSNBC Countdown
program to "the variance among the early and late exit polls, and
the voting," Countdown received what Olbermann described
on the November 24 program as a "strident" email from Mitofsky
protesting against the program's "misinformation," and insisting
that "no early exit polls" had been released by his company or by
Lenski's Edison Media Research: "the early release came from
unauthorized leaks to bloggers who posted misinformation."
There were indeed unauthorized leaks, presumably from within the
National Election Pool. (But this was not raw data: the figures
had been demographically re-weighted.) Mitofsky may have thought
he could wish away the figures that he and his colleagues had
supplied to the NEP on the afternoon and evening of November 2:
after all, those percentages had been erased after midnight when CNN
and other subscribers replaced them with corrupted ones. He was
perhaps hoping that people would forget that the Washington
Post had published the final November 2 exit poll data in
the morning edition of November 3, and would be unaware that you
and Steve Freeman had preserved and circulated screen shots of the
November 2 data. But neither Olbermann's remark nor the leaked
early data posted by bloggers were "misinformation."
It's important to recognize that the process of adjusting or
forcing exit polls to bring them into conformity with the official
vote tallies involves the conflation of two categorically
different sets of data: one which is rich in demographic
information, and one which contains no such information
whatsoever. You quite rightly say that a forced re-weighting
"ripples through" the whole exit poll data set, producing
"distorting effects" that can be "glaring." But one should
distinguish between these effects and those produced by the prior
demographic re-weighting undertaken in order to remove distortions
such as those introduced by the deliberate oversampling of
minorities. The demographic re-weighting involves changes that (if
we concede to exit polling a place among the social sciences) are
scientifically justifiable; in contrast, the forced re-weighting
is necessarily arbitrary: it involves fudging the data in whatever
ways seem convenient in order to have it add up to a total that is
more or less closely aligned with the official vote tally.
In 2004, the distorting effects that resulted from this fudging
were indeed glaring. One of these is something you allude to: a
distortion in the party IDs of voters. There is evidence in the
2004 exit poll figures of a sampling bias that favoured the
Republican Party. Although Al Gore won the popular vote in 2000 by
540,000 votes, or 0.5 percent, the successive waves of November 2,
2004 exit polls show 3 percent more Bush than Gore voters among
respondents who said they had voted in 2000. This difference was
inflated to fully 6 percent in the "forced" November 3 figures,
according to which 43 percent of 2004 voters had supported Bush in
2000, and only 37 percent had voted for Gore. These percentages
generate the absurd conclusion that the active 2004 electorate
included 52.6 million people who had voted for Bush in 2000--an
election in which he received 50.5 million votes.
The forcing produced similar effects with respect to minority
voters. For example, what appears to have been a pro-Bush sampling
bias in the November 2 exit poll's reporting of Hispanic votes was
exacerbated in the November 3 figures. Although an exit poll by
the William C. Velasquez Institute showed Bush receiving less than
the 35 percent of Hispanic votes he received in 2000, the November
2 exit poll credited him with 41 percent of such votes, a figure
raised on November 3 to 44 percent. A month later, NBC News
took the unprecedented step of revising its exit poll estimates,
reducing Bush's Hispanic support to 40 percent, and, in Texas,
changing an 18-point win by Bush into a 2 percent win for Kerry
among Hispanics.
But the largest distortion was produced by the claim, embedded in
the forced data, that Bush's 2004 victory was based on a massive
66 percent increase in voter turnout in the major urban centres,
led by an increase of more than four million in the number of
white voters. However, as Michael Collins demonstrated, there is
strong evidence that the supposed surge in big-city white
Republican voters never occurred: the actual increase in turnout
in big cities was more on the order of 13 percent, and the
likelihood that most of the people in this group supported Bush is
vanishingly small.
As these examples suggest, the forcing of exit poll data to fit
divergent vote tallies amplifies existing errors, and makes the
corrupted exit polls useless for any honest purpose. (A number of
academic political scientists made use of it in the years
following the election and were--one hopes unwittingly--led into
various follies as a result.)
Turning now to the issue you raise of 'virtual' as opposed to
actual exit poll respondents, I confess myself puzzled. First,
then, a request for help: can you point to texts in which
practitioners use "# respondents" in the "term of art" manner that
you describe?
Here's one source of my puzzlement. It seems to me that a very
simple thought-experiment can demonstrate the superfluity, when it
comes to legitimate re-weighting of exit polls, of a distinction
between 'actual' and 'virtual' respondents. Let's imagine an exit
poll carried out in a jurisdiction in which one-tenth of the
registered voting population is African-American and nine-tenths
is 'white'. The sample size is, for convenience, 1,000; and for
the sake of statistical validity in relation to minority voters,
20 percent of that sample is African-American. Re-weighting the
sample is simple: divide the numerical responses of the 200 black
respondents by 2, and multiply the numerical responses of the 800
white respondents by 1.125. (This produces the effect of a sample
divided 90 percent to 10 percent between white and black
respondents.) One notes in passing that the resulting number of
'virtual' respondents here would be the same as the number of
'actual' respondents: 1,000. The distinction has no apparent
function.
I'd prefer to remain skeptical as to the use of this notion of
'virtual' respondents until I've seen evidence that it is actually
and not just hypothetically in play. And in defence of a
commonsense notion that when pollsters indicate a number of
respondents they are making reference to a determinate sequence of
actual encounters, mediated by exit poll questionnaires, between
samplers and real people who have just cast votes, I'm tempted to
quote the philosopher David Hume. In controversy with Calvinist
theologians who supposed that, the deity being mysterious and
incomprehensible, divine attributes such as justice and mercy
could not have any determinate meaning, Hume declared that if our
ideas on the subject, "so far as they go," are not just, adequate,
and in line with actuality, then "I know not what there is in this
subject worth insisting on."
One might think of expanding my little thought-experiment by
adding gender breakdowns within the samples of black and white
respondents, by assuming (we're still in 2004) a vote split
between Kerry and Bush of 55 percent for the former and 45 percent
for the latter, and by imagining an Evil Manipulator whose job it
would be to reproduce schematically the effects observable in 2004
in the national exit poll and the state exit polls in Ohio and
Florida by bringing about a 10 percent swing in the vote with a 5
percent increase in the number of exit poll respondents. (The Evil
Manipulator could be assured that no outsider would have access to
his manipulations of real or imaginary data, and his only
governing criterion would be one of plausibility.) But the fatuity
of such an exercise is immediately obvious, because the array of
possible linked data manipulations is a garden of forking paths.
The precise fraudulent re-weightings that are carried out are of
no interest in comparison to the basic fact that fraud is manifest
in the fact that a small change in sample size produces a larger
percentage shift in voters' choices.
I want to quibble with your claim that the ripple effect of
re-weightings carries through the entire data field including
"the #respondents." That would only be the case if the person
doing the re-weighting decided to include the number of
respondents as a category within the data matrix--and there is no
reason to do so. Unlike the other figures, which make up the
content of the exit poll and are properly interdependent (e.g.,
re-weighting the sample of black voters will alter the figures
reported for the incomes of Democratic Party supporters), the
number of respondents is of interest only as an indicator of the
poll's margin of error.
I guess my puzzlement or confusion comes down to this (with
apologies if I'm being repetitive). Where we have a pattern of
post-election alterations of exit polls, accompanied (as in the
2004 examples I've given, and in this year's Massachussetts and
New York Democratic primaries) by statements of increases in the
number of respondents that are not nearly large enough for the
percentage changes in voters' choices to be legitimate, then we
have, as I first wrote on November 5, 2004, "footprints of
electoral fraud." To dismiss this is as not forensically
interesting doesn't make sense to me. Whatever manipulations were
carried out by the exit pollsters during the process of corrupting
their data by conflating it with vote tally percentages
remains--in the absence of whistleblowers--occult to us. In the
language of classical epistemology, those manipulations inhabit
the realm of the Kantian noumenal, the Ding an sich to
which our perceptions can't by definition give unmediated access.
But so what? We still have unmistakable evidence of illegitimate
alterations of data within the instrument that--bar tampering of
this kind--we know to be a reliable indicator of corruption in the
vote count.
Michael Keefer