Kemeny Condorcet method, NP-complete to determine winner

62 views
Skip to first unread message

Warren Smith

unread,
Sep 11, 2011, 11:11:12 PM9/11/11
to electionscience
The Kemeny Condorcet voting method is this.

Each ballot is a rank-ordering of the N candidates.
The output of the voting method is also such an ordering.
Among all possible orderings, the one which maximizes the total number
of agreements with individual ballots about pairwise preference relationships,
is chosen.

Description length of that paragraph: 38 words. (Kemeny is one of the
simplest Condorcet methods to describe.) As opposed to Range voting,
here in 15 words:
A voter scores all the candidates as her ballot; candidate with
highest average score wins.

Anyhow, it is known to be NP-complete to determine the Kemeny order
from the ballots.
For some discussion of this see
http://en.wikipedia.org/wiki/Feedback_arc_set
[It appears it may even be NP-hard to find an APPROXIMATELY best order
within an approximation factor of 1.3? No, C.Mathieu & W.Schudy in
press have a sequence of algorithms
to approximate the Kemeny order to within 1+epsilon factor in time
that grows only exponentially with 1/epsilon.]

I now want to make the simple remark that MERELY DETERMINING THE
WINNER (i.e. the top
element of the Kemeny output ordering) ALONE is also NP-complete.

Why? Because if it were Polynomial Time, then you could find the
winner, then eliminate him
from all ballots and the election, then find the winner of that
reduced election, etc, thus
deducing the whole Kemeny order in polynomial time. This would
contradict the theorem
that is NP-hard. QED.

The naive brute force algorithm to find the Kemeny order runs in
N!*constant steps.
Better algorithms exist which run faster in typical instances, but I
am unaware of any proof that any algorithm will ALWAYS manage to run
for a 50-candidate instance in less than the age of the universe. Nor
do I think anybody will produce any such proof in the next 100 years.

If you believe a general purpose election method should have the
property that it is always feasible
to determine the winner from the votes, and you believe general purpose election
methods ought to be able to handle elections with over 50 (or indeed
200) candidates, then you conclude Kemeny is not acceptable as a
general purpose election method, and will not become acceptable for at
least 100 years (if ever). California 2003 governor election had 135
candidates. More generally, I basically think only election methods
than always run in low-degree polynomial time, can be acceptable for
general purpose use.

This is not at all a huge restriction, but it does blow away Kemeny.

--
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)
and
math.temple.edu/~wds/homepage/works.html

Warren Smith

unread,
Sep 12, 2011, 12:10:15 AM9/12/11
to electionscience
The Mathieu-Schudy breakthrough algorithm for finding
a Kemeny ordering approximately optimal to
1+epsilon factor, for N candidates, assuming the pairwise
table is pre-computed, runs in this time bound
according to their theorem 1.2 with b=1:

time=
O(N^3 * logN * [1/epsilon] + N * 2^(epsilon^(-6.001)) ).

so for example, if you wanted to get approximation to within 10%
(epsilon=0.1) then this runtime bound would be

O( 10 * N^3 * logN + N * 2^1002305 )

note the million appearing in the EXPONENT.
The quantity inside this O is far, far, far greater than the number of
atoms in the observable universe times the number of femtoseconds in


the age of the universe.

---

Tom Coleman & Anthony Wirth:
Ranking tournaments, local search a a new algorithm,
Journal of Experimental Algorithmics 14 (2009) pp. 2.6–2.22
http://www.siam.org/proceedings/alenex/2008/alx08_013colemant.pdf

reviews the empirical performance of practical algorithms for this
task (as opposed to theoretic guarantees about worst case runtime of
algorithms). This is based on extensive testing by them of 13
algorithms (as well as some hybrid algorithms) on 5 kinds of
datasets. For us (if I understand aright) their most relevant dataset
seems to be the Kemeny problem for the task of ranking 100 movies in
quality order (they got vote data from an online user movie-rating
site).

They found that none of the 13 algorithms they tested was able to find
the optimum Kemeny order on all their 100-movie datasets. The best of
the 13 algorithms they tried was CHANAS, which produced the winning
order (i.e. as good or better than the 12 rival algorithms got) for
76% of the 100-movie datasets they tried.
That is, no algorithm of the 13 they tried could solve 100-movie
instance to optimality more than 76% of the
time (and 76% is probably a severe overestimate since
"beating the 12 rivals" is probably usually not as good as "being the
global optimum").

Coleman & Worth also cite a proof by
C.Dwork, R.Kumar, M.Naor, D.Sivakumar: Rank aggregation revisited, Proceedings
Tenth International World Wide Web Conference
WWW 10 (2001) 613-622
that KEMENY Ordering still is NP-hard even if there are only 4 voters.

This is an even stronger NP-hardness result.

Another paper about this is
V.Conitzer, A.Davenprt, J.Kalagnanam:
Improved bounds for computing Kemeny rankings,
AAAI'06 Proceedings of the 21st national conference on Artificial
intelligence - Volume 1 (2006) 620-626
http://www.cs.cmu.edu/~conitzer/kemeny.pdf

They considered algorithms based on "branch and bound" and "integer
programming" using linear programming to give bounds which could be
used to prune the search, making search take way less time than naive
brute force search.

They tested their algorithms on artificial 5-voter elections with 20
to 40 candidates each. They were able using integer programming to
solve all 25-candidate problems they tried (25 problems in all) to
deduce exact optimum Kemeny order. [But the backtracking approach
sometimes failed to solve 25-candidate 5-voter problems.] The maximum
solve time was under 5 minutes on a 3GHz machine.

With 5 voters and now 40 candidates, they were unable to solve all
their 25 problems "within any reasonable time"
using any approach.

Bottom line: the papers I found were unable to reliably find Kemeny
optimum orders for 100-candidate situations
via any algorithm. Also, among algorithms that
not only aimed to find optimum order, but also aimed to prove they'd
found it (as opposed to maybe finding it but not being sure they'd
found it like in the Coleman-Wirth paper) apparently nobody so far has
been able to reliably solve 40-candidate 5-voter problems.

Jan Kok

unread,
Sep 13, 2011, 12:48:06 AM9/13/11
to electio...@googlegroups.com
A practical solution to difficult elections where it is impractical to
find the optimal Kemeny ordering would be for the election office to
publish the pairwise matrix, and let anyone who wants to crank away on
the problem. The best solution received by the election office after
24 hours is accepted as the optimal solution.

That would give an advantage to candidates with tech-savvy staff and
supporters. :-)

Leon Smith

unread,
Sep 13, 2011, 6:44:25 PM9/13/11
to electio...@googlegroups.com
On Sun, Sep 11, 2011 at 11:11 PM, Warren Smith <warre...@gmail.com> wrote:
> The naive brute force algorithm to find the Kemeny order runs in
> N!*constant   steps.
> Better algorithms exist which run faster in typical instances, but I
> am unaware of any proof that any algorithm will ALWAYS manage to run
> for a 50-candidate instance in less than the age of the universe.  Nor
> do I think anybody will produce any such proof in the next 100 years.

Ehh, even if an organization were to adopt Kemeny-Condorcet, I
seriously doubt that a "real" 100-candidate election would prove to be
intractable. I mean, we've solved Travelling Salesman Problem
instances with nearly 100,000 nodes. Solving TSPs with a few
thousand nodes isn't even enough to make modern algorithms running on
modern computers really break a sweat.

Best,
Leon

Warren Smith

unread,
Sep 13, 2011, 7:42:45 PM9/13/11
to electio...@googlegroups.com
TSP is different than Kemeny.

Euclidean Plane TSP is an example of an extremely atypical
comparatively easy NP-hard problem.
(1) Euclidean plane TSP has (1+epsilon)-approximation algorithm
running in polynomial
time for any fixed epsilon.
(2) EPTSP is hard in the sense you can solve an N^2-size EPTSP problem
to solve an N-size SAT problem. Note the squaring. For Kemeny there
is no such squaring. I.e. way harder.
(3) Euclidean TSP has extremely strong easily computed upper and lower bounds
useful for pruning backtrack search.

For Kemeny and feedback arc set problems, points 2 and 3 are untrue, and 1 is
either untrue or true in a much weaker sense.

> Ehh,  even if an organization were to adopt Kemeny-Condorcet,  I
> seriously doubt that a "real" 100-candidate election would prove to be
> intractable.  I mean, we've solved Travelling Salesman Problem
> instances with nearly 100,000 nodes.    Solving TSPs with a few
> thousand nodes isn't even enough to make modern algorithms running on
> modern computers really break a sweat.

--Also,
(a) good luck trying to actually obtain or write such a TSP algorithm
and prove it correct.
They're very complicated.
(b) I don't believe in acting based on speculations about how good
future algorithms will be;
I believe in actual facts.
(c) I have posted a 27-candidate Kemeny challenge problem. If you think it is
so easy to do 100 candidates, why don't you take a whack at my Kemeny challenge
problem?

By the way, I happen to be the inventor and/or co-inventor of the best
(current world
record) theoretical algorithm bounds for both exact and approximate TSP in low
dimensional spaces, so I know something about that.

I'd be interested to know if it is within reach, and to get a better
idea of where the
boundary is between solvable and unsolvable. If my 27 example turns out to be
solvable, I can come back with somewhat larger sizes. I think 48 will be out
of reach.

Warren Smith

unread,
Sep 13, 2011, 7:58:34 PM9/13/11
to electio...@googlegroups.com
Also one more point.
Although it is true some planar TSP problems have been solved with
1000s of cities (Leon says "nearly 100,000" is the current record?
Quite likely he is right)
that does not mean that those programs can solve
every TSP problem with even 100 cities.

There might be a 100-city planar TSP problem, which is too hard for
even those super-codes, to solve. I do not
know that. As far as I am aware, nobody has intentionally tried to
create really hard 100-city planar TSP problems. Again, it might be
interesting if somebody did and tried out the programs on them, trying
to understand better just where the borderline is between doable
versus not. Well, for example,
just take a goodsize chunk of the equilateral triangle lattice
(vertices=cities) then perturb each city a small amount randomly.
That might be a pretty hard
traveling salesman problem. Maybe certain carefully designed "gadgets"
at each lattice point might
make it even harder.

All that is not an issue for Kemeny because there are no super-codes
for Kemeny -- even the best programs currently out there often fail
with only 40 candidates and only 5 voters without anybody even trying
to make the elections be hard.

Reply all
Reply to author
Forward
0 new messages