More soft quorum stuff

Clay Shentrup

unread,

Sep 26, 2011, 7:12:47 PM9/26/11

to electio...@googlegroups.com

Related to the expected value discussion at http://ScoreVoting.net/BetterQuorum, but based on a confidence threshold.

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

Excerpt:

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

(For a lower bound use minus where it says plus/minus.) Here p is the observed fraction of positive ratings, z_α/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings.

Comment: Intuitively, it seems to me that you want expected value more than you want this confidence-based formula.

Warren Smith

unread,

Sep 26, 2011, 9:32:08 PM9/26/11

to electio...@googlegroups.com

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

Evan Miller's (E.B.Wilson's) idea is interesting but
I think is not the right idea for range voting quorum purposes.

Also, it quite likely is not the right idea for MIller's web-page purposes
either, in many situations anyhow.

Reason I say that is, let's say Candidate Joe
recruits 100 rabid supporters and nobody else has heard of him.

Joe gets 100 scores of 9 on an 0-9 scale, and that is all.
That average is 9, but if every candidate had 100 fake zero scores
before voting starts, Joe's average would be only 4.5.

Meanwhile well known Candidate Bob gets millions of scores averaging 7.7
and this is affected negligibly by the 100 fake zero scores.

I think in that case Bob ought to win.

OK... in such a situation, Evan Wilson's formula
[which involves a term z/4n meaning what? (z/4)*n or z/(4*n)?
I'm assuming the latter]
is, I think, going to get for Joe about 8.9 score.

[Other issues with Evan Wilson's formula: why does he
say alpha/2 when he could say alpha? Guess it does not matter,
but it made stating his formula more complicated for no reason.
Also, the way he wrote his formula it suffers from a lot of numerical
cancellation error, which he could avoid with the usual conjugacy trick
which is standard for such situations.]

The problem is Wilson+Miller's underlying statistical model assumes that Joe's voters are sampled from the same distribution as Bob's. But that is
exactly wrong, which in fact is the whole point and the whole reason this manipulation attempt by Joe+supporters, can work.

The Dirichlet approach I suggest (the 100 fake votes) is not about
confidence under a false assumption of "same distribution sampled;"
it instead is about smooth interpolation between "prior" and "posterior"
estimates. The 100 fake votes represent prior.

You can see that the Wilson/Miller formula is blowing it on that whole issue, by considering what it does in the case of ZERO DATA.
In that case, the right thing to to is to use the prior estimate
(i.e. the fake votes).
The wrong thing to do, which Miller does, is to divide zero by zero, causing a machine crash.

Clay Shentrup

unread,

Sep 27, 2011, 12:18:16 PM9/27/11

to electio...@googlegroups.com

Here's another thing (maybe already covered, but I didn't notice).

Say there's a 51% chance that the average reader would rate X 1 point higher than Y.

And a 49% probability that the average reader would rate Y 2 points higher than X.

So the electorate probably prefers X. But would you still rank Y higher in the list, due to expected value? I would say so. Some others might not.

Reply all

Reply to author

Forward