On 10/3/13, Jameson Quinn <
jameso...@gmail.com> wrote:
> Coincidentally, I'm right this instant doing my statistics homework on
> means and standard deviations using Lebesgue integrals. So, taking a
> busman's holiday from the homework:
>
> One easy-to-find parameter of interest is: for candidates A and B, what is
> the chance that a random sample of N voters, with replacement, from the
> actual electorate, will elect A? That would be called a bootstrap win %,
> and a candidate with a higher standard deviation will be more random, while
> one with lower STDEV will tend to get numbers more like 1 or 0. If you did
> that pairwise with N=1 a bunch of times, the most frequent winner would be
> a Condorcet winner; if you did it groupwise with N=1, you'd get plurality.
--agree re Condorcet & plurality.
> If you do it pairwise with N=large, you'd get a median system (GMJ, I
> think);
--disagree. Counterexample election:
A's scores 0,0,0,0,0,5,6,6,6,6,6,6
B's scores: 4,4,4,4,4,4,9,9,9,9,9,9
A has greater median with 5. B has greater average. I wrote this example
for six below-median scores and six above, but imagine that "six" is
"very large."
Now upon taking a random sample of N scores, N large, the one with the higher
average will almost surely win (i.e. will have higher sample average).
>and if you do it groupwise with N=large you get Range.
> So this
> trick would let you get hybrids in between those 4 corners, by varying the
> average number of candidates you threw into the ring, and the average
> number of ballots you picked.
--careful, above error indicates this is not quite what you think it is.
> This is computationally tractable; you could
> pretty easily prove which candidate had the best win probability
> analytically, if the empirical bootstrap percentages came out close to a
> tie.
--More precisely: Given V range-style ballots, we wish to determine
"who would have
the greatest win-probability, if we selected N ballots at random &
ignored the rest?"
(0<N<=V.)
This is an interesting algorithm-design problem. (Actually there
could be two problem-variants -- sampling with & without replacement.)
However, at present, I am not seeing any efficient algorithmic answer
to either problem variant. The obvious
algorithm is to exhaustively enumerate every possible sample. Without
replacement there are binomial(V,N) samples. With replacement there
are V^N.
These numbers are exponentially large. (Consider V=2000000, N=1000000.)
So the obvious algorithm is infeasible.
You could use Monte Carlo experiments to get a probabilistic answer
(that comes with
a statistical confidence) but in close elections you'd be screwed and
I'm not seeing
an efficient "analytical" way to resolve near-ties.
Is this merely because I am too stupid to see a good algorithm (which
nevertheless exists) or is it because this problem really is hard?
[By the way, I actually used this technique in the past to examine a
poll by me, Greene & Quintal for 2004 USA presidential election. The
point was by subsampling our poll & doing Monte Carlo bootstrapping
you get confidence intervals for various claims about
the election. Balinski & Laraki also did a similar study another
time, thought of the same kind of stat-analysis technique, which by
the way is not due to us, this is called the "bootstrap" idea and has
been around for a while.]
Anyhow. I think this is a good algorithm problem. I would initially guess
it is #P-hard or NP-hard, probably "weakly" so (like the NUMBER
PARTITION problem,
read book by Garey & Johnson) in which case this whole idea is,
although nice in some
mathematical sense, useless practically.
That "weakness" insight pays off -- there is a "pseudopolynomial time algorithm"
in the event the allowed scores are selected from a fixed finite set
such as (to be concrete) the 10-element set {0,1,2,3,4,5,6,7,8,9}.
It works as follows. You compute the entire probability distribution
for the C score-totals of the N-voter sample (C=#candidates) with N=1.
Then, you add one more voter to the sample and recompute the entire
probability distrib.
You keep doing that, adding 1 more, then recompute entire probdist.
The point is
that with that fixed finite score-set, the entire probdist can be represented as
a table with at most (10*N)^C entries, each entry is a probability value.
Since C and 10 are constants, this is a polynomial function of N.
If however, "10" is instead exponentially large then this is no longer
polynomial
(that is the meaning of the terminology "pseudo"polynomial time).
Also if C is allowed
to be unbounded, ditto.
> Wow. That's awesome. 4 of the 5 "good" voting systems (except not
> approval), all seen as special cases of the same basic procedure. And
> approval, the fifth, is of course the special case of all those special
> cases.
>
> If this isn't clear to someone... I don't have time to explain this
> instant, but I'll definitely come back and post more with a bigger
> exploration of this idea some time later.
>
> Jameson
--yeah, well, it looks like your idea was somewhat half-baked.
However, there is certainly an interesting problem in there.
As I said I would guess that this core problem can be proven to be
#P-hard or NP-hard.
But that would be the interesting research -- to prove (or disprove) that.
Can you? I do not currently see a solution, but feel there is a
decent chance this can be accomplished.
--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)