I thank Warren for this page.
At 09:48 PM 5/23/2013, Warren D Smith wrote:
>See
>
http://rangevoting.org/MeasTheory.html
>The British Association for Advancement of
>Science in 1932 tasked a committee to report on
>"quantitative measurement of sensory events." It
>produced its final report in 1940. This was
>stimulated by the "sone scale of loudness"
>purported to measure "objective scale of [subjective] auditory sensation."
>Encyclopedia Brittanica, "Sone": Loudness is a
>subjective characteristic of a sound (as opposed
>to the sound-pressure level in decibels, which
>is objective and directly measurable).
>Consequently, the sone scale of loudness is
>based on data obtained from subjects who were
>asked to judge the loudness of pure tones and
>noise. One sone is arbitrarily set equal to the
>loudness of a 1,000-hertz tone at a sound level
>of 40 decibels above the standard reference
>level (i.e., the minimum audible threshold). A
>sound with a loudness of four sones is one that
>listeners perceive to be four times as loud as the reference sound.
Eek! "Four times as loud"? What does that mean?
From Wikipedia, article on "Sone":
>The study of apparent loudness is included in
>the topic of
><
http://en.wikipedia.org/wiki/Psychoacoustics>psychoacoustics
>and employs methods of
><
http://en.wikipedia.org/wiki/Psychophysics>psychophysics.
Or, really psycho physics. Okay, cheap shot.
The Wikipedia article gives rough equivalence of
sones to sound pressure levels. It looks to me
like sones are an artifact of translation of
perceived sound levels into a numerical scale,
because a perception of sound sound as a multiple
of another is largely subjective. It may relate
to a measurable quantity, such as the numbers of
neurons firing, or rates of firing.
What can be studied with objectivity, even though
it's "subjective," is relative loudness, i.e., is
sound A louder than sound B? It is easy if the
sounds are otherwise identical, i.e., same
frequency distribution, timing, context, etc.
>Such scales are highly important, in fact
>crucial, for purposes such as telephony,
>computer speech synthesis, audio compression,
>etc. But one member of the BAAS committee
>claimed any such quantitative scale "is not
>merely false but in fact meaningless unless and
>until a meaning can be given to the concept of
>addition as applied to sensation." (Final report
>p.245.) But other members had extremely opposite views!
If it is useful, it is not meaningless, even if
no clear, objective meaning has been discovered.
The "addition" comment is intrinsic to the
concept of sones, i.e, a sound at four sones, is
it equivalent to four sounds of one sone,
simultaneously played, at the same time?
Obviously there would be phase relationships,
etc., to consider. (Because the sum of two tones,
as described, could be *silent*. So we must
assume no phase difference for this to be meaningful.)
Then, for tones with different characteristics,
I'd expect that the correpondence of sound
pressure to the perception of loudness would
vary. Hence any correspondence of the sone scale
to a sound pressure scale would be variable, and
it may, indeed, vary with the individual. Again,
this would be tested by playing the two differing
sounds and adjusting them in volume so they are perceived as equally loud.
I'm not at all sure that the sone concept is
needed for those applications. That it is used in
some way, however, does show possible utility.
(A well-tested set of correspondences between
sones and sound pressure could be used, but ...
it seems likely that the real application is
always comparative, i.e., is a particular sound
louder than another. Or enough louder to
distinguish signal from noise, another application?)
Warren commented:
>You can already tell that that quote was
>hogwash. A counterexample is "temperature," an
>apparently obscure and little known concept
>unfamiliar to eminent members of the British
>Association for Advancement of Science.
>Temperature is meaningful and measurable,
>despite the fact that the "sum of two
>temperatures" seems meaningless and you
><
http://rangevoting.org/FeynTexts.html#temp>never
> do that ("what is the total temperature of
>this apple and this cup of coffee?")
If temperature is a proxy for thermal energy, as
it is, then the sum of two temperatures could be
meaningful, if they are temperatures of the same
body. "Same" might refer to thermal mass. I don't
recall the specific relationships and am far too
lazy to look them up or derive them. However, if
we "add" an apple to the cup of coffee, they will
reach thermal equilibrium, so the "sum" of A and
B is actually a kind of weighted average of A and
B. "Net combined" temperature, not "total" temperature.
Skipping over more details of the debate over
sones and the like, we come to the real question
here, the reason for interest in the topic:
>What does this have to do with Range voting?
>
>The present essay was stimulated by an insanely
>wrong-headed attack on range voting – actually,
>incredibly, an attack on every voting method
>that uses numbers!! – by M.Balinski & R.Laraki.
>They prefer an alternative and more complicated,
>but related voting system – which they invented
>and called majority judgment (MJ) – based not on
>"greatest average score wins" but rather on
>"greatest median score wins, with an additional
>tie-breaking scheme." MJ also uses, not a
>numerical score-set, but rather a set of 6 verbal scores
>Excellent / Tres bien / Bien / Assez Bien /
>Passable / Insuffisant / a Rejeter.
>
>We quote the attack from Balinski & Laraki's paper
>Election by Majority Judgment: Experimental
>Evidence, pages 13-54 in Bernard Dolez, Bernard
>Grofman, Annie Laurent: Studies In Public
>Choice: In Situ and Laboratory Experiments On
>Electoral Law Reform: French Presidential Elections, Springer 2011.
>
>We have numbered their paragraphs for later reference:
>
>1. Is it reasonable to use numerical scales in
>voting? The answer is a resounding no, for several reasons:
Great example of the abuse of the word "reason."
It certainly is possible to use numerical scales
in voting, it's been widely done, and it is
*clearly* reasonable by a relatively objective
standard of reason. That is "reasonable people"
-- not insane -- use such scales. Whether or not
it's *optimal* would require some standard of optimality. Do they define one?
>2. The numbers mean nothing unless they are
>defined: proposals to use weights give them no definition.
Suppose voters vote in an election by tossing
uniform weights into baskets. They are permitted
to toss up to N of these weights into each
labelled basket. What is the definition of a vote
of one weight? It's obvious: it is an *action*
that is defined by its *effect*. That is, tossing
N weights maximally acts to elect that candidate,
and tossing none maximally acts against the
election of that candidate, and tossing an
intermediate number has a proportional effect intermediate between these.
Generally, proposals to use weights have exactly
that "meaning." The error is in assuming that to
be meaningful, it must have meaning *outside of
the election process.* I.e., that there must be
some correspondence between the number of weights
chosen by the voter and some condition
independent of the election. In fact, Balinski
and Laraki have attempted to create that, by
using words that *may* correspond to an
independent judgment. And that leads to election pathologies with Range.
The same error can exist, naturally, if a voter
thinks that the voter should, say, vote for all
candidates according to an independent
assessment, and that "honesty" requires, then
equal rating all "excellent" candidates when, in
fact, the system is asking for a *choice* between
them. "Excellent" is a *category* as Arrow has
discussed in his Center for Election Science
interview. It does not negate the existence of
differences between candidates within the
category (as placed by the voter). It means,
simply, that the voter has made a choice to
suppress the difference as being insignficant *under the election conditions.*
Gad, you'd think that political scientists would
actually engage in discussion with those who have
studied these matters in detail, before
committing themselves to print. But, we have seen
over and over, they don't, and as a result,
"political science" is often decades behind what
is commonly known. It's hubris.
> Their only real "meaning" is found in their
> strategic use. This induces comparisons, which
> immediately leads to
> <
http://rangevoting.org/ArrowThm.html>Arrow's
> paradox... E.g. with these actual ballot instructions
>Give a grade to each of the twelve candidates:
>either 0, or 1, or 2 (2 the best grade, 0 the
>worst). To do so, place a cross in the
>corresponding box etc. The candidate elected
>with [this] method is the one who receives the highest number of points.
This does not lead to Arrow's paradox, that's
preposterous and totally misreads Arrow's
Theorem. The voters have not necessarily ranked
all the candidates, they have ranked
*categories.* Analyses of range that propose
violations of Arrovian criteria generally assume
an underlying ranking, and then study the
election from the point of view of those
rankings, which totally neglects the concept of
an insignificant ranking, totally neglects the
action of the voter to *deliberately* rank two
candidates in the same category, as an *exercise of power.*
The analysis completely denies the concept of
"preference strength," when preference strength
is obviously active in real-world social choice.
It is thinking like this that delayed the
development of election science for decades. It was a denial of the *obvious*.
>3. nothing is said concerning the meaning of 0, 1, or 2.
That's directly in contradiction to the example
given. The meaning of the vote was precisely
defined by the result. The meaning *is* the
result, the effect of the voter's voting pattern
on the result. This is *so* embarrassing. With
vote-for-one, what is the "meaning" of the vote?
>The numbers induce relative, so strategic,
>behavior. Other numbers could have been given.
>For example, with {-1,0 +1} mathematically there
>is no difference, but were these numbers used
>the behavior of the voters would almost surely
>have been different. [In fact, this experiment
>later was
><
http://rangevoting.org/France2012.html>tried
>and voter behavior was significantly different.]
The behavior of voters under differing conditions
will vary. So? The real issue would be which
system will generate maximized social utility,
(or, symmetrically, minimize social discontent
with the result) and that is a problem that
depends on the definition of social utility.
There are techniques for studying this, and the
denial of any absolute meaning to "social
utility" is useless. It's true, generally, i.e.,
we have no means of measuring "true absolute
social utility," but we can measure proxies for
it, in real elections, and we can measure the
effect of variations in system on results in simulations.
Simulations cannot address the difference between
the name-system and the numerical-system, but
that can be studied in properly designed
statistical trials. I.e, a population is divided
randomly into two groups, with each group being
given an "election," say as an exit poll. One
group has a name-system and the other group has a numerical-system.
However, such a trial could be biased by the
absence of strategic incentive. Balinksi and
Laraki seem to assume that "strategic incentives"
somehow distort results. That's true in a sense.
If voters voted absolute utilities *not
normalized*, then we could easily maximize
utility. But there is no way to define these
utilities that is practical. In reality, in
everyday choice, our application of words like
"Excellent" varies with *expectations.* I.e, it is a *strategic choice.*
Strategic choices test preference strength. What
has been called "strategic voting" with Range is
simply a manifestation of how people, real-world,
make choices. Contradictory states have been
asserted to assert this strategic voting as
somehow "dishonest." I.e., a voter supposedly has
a weak preference for A over B, but down-rates B
to min rating, say, because they want A to win.
Uh, that means they have a *strong preference.*
And if the only choice is between A and B, this
is totally rational and to be expected, and is,
in the ordinary meaning of the word, "honest."
Present these voters with a different election
scenario, they will choose differently, and this
is how multiple-round systems *powerfully* test
preference strength. By testing preference
strength, social utility maximization is possible.
>4. When numbers are used, they may well not be
>used in the same way at all: when a 0-100 scale
>is used, some voters may view 80 to be an
>excellent grade, others may see it as merely middling.
Balinski and Laraki completely miss the real
behavior of voters, and the "meaning" of ratings,
and this comment makes it obvious.
I have one full vote to cast for each candidate.
The meaning of the vote is the effect on the
outcome, nothing more, nothing less.
(If a voter thinks something else, they have been
misled, probably by people demanding "honest
votes," i.e., absolute approval, etc.) If I rate
a candidate at 80%, this is a *strong vote* for
the candidate, generally. By the way, I don't
like grading systems, like ABCDF. The purpose of
voting systems is not to "grade" the candidates,
it is to choose one or more. My favorite will get
an A, even if I think he's pretty bad (assuming I
choose to vote at all, I might not.) The worst
will get an F, even if I perceive the worst as
*almost* as good as the favorite. That's in a two
candidate election. What happens with more
candidates is more complex, for sure. But voting
under range is still a matter of deciding where
to place voting power, and that power is
expressed in the pairwise elections, it can vary
from zero (equal rating) to one full vote (max/min rating.)
Voting is an exercise of *power,* not a
sentiment. We *always* choose where to put our
power based on expectations, unless we are
asleep. Do we prefer voters who are asleep or those who are awake?
If I am presented with a ballot with candidates
on it, and I am familiar with the candidates, but
have no specific knowledge of how others are
likely to vote, I am quite likely to cast what is
called a "fully sincere" Range ballot. That is,
it will be unaffected by "strategic
considerations." But in a real-world election,
that circumstance is only present in minor
elections. Strategic considerations will shift
the vote, for most voters, because voters dislike
wasting their vote, sometimes. And the actual
behavior depends on preference strength, so it is
arguable that it *improves* results.
(Similations, so far, have not clearly addressed
this. *Especially to be studied* are two-round
systems, where voter turnout also tests preference strength.)
>5. Even if the numbers did provide a common
>language, they will almost certainly not be a
>proper interval measure [in the sense of Stevens
>– it is here that Balinski & Laraki invoke
>"measurement theory"] – that depends on who the
>candidates are and how the voters give their
>grades. For example, the 0-20 scale used in
>France is a common language, but an 18, 19, or
>20 is unheard of in philosophy or literature, so
>the scale is not an interval measure. Once the
>distribution of the grades is known – after many
>elections (or many examinations) – it is
>possible to determine whether the scale is an
>interval measure and, if not, to correct it (as
>did the Danes). But then it is too late, since
>the weights must be announced ahead of time.
If a name-scale is used, it should be defined in
terms of the fractional vote assigned (or the
numerator of the fractional vote, same thing). I
still don't like it, though I have proposed that
the ranks in Bucklin be named. (They were
numbered in the original Bucklin, 1, 2, 3).
I have also proposed using a Range ballot for
Bucklin, and, again, names could be attached. But
these are *comparative* names, not absolute
categories or names easily interpreted as such.
So, I'll give the rank, the equivalent rating,
and the name, for a Range-Bucklin implementation.
And I'll assume a two-round system, with some
extra explanation that might not be on the ballot.
INSTRUCTIONS
Categorize each candidate into one of the
following ranks. You may categorize more than one
candidate into the rank. The first rank will be
counted for all voters, and, if the majority of
voters have, with this counting, approved the
election of the candidate, the election will
complete. If more than one candidate has a
majority, then the candidate with the most votes
will be elected. [Ties not considered here].
If there is no majority approval, then the next
rank votes will be added to those already
counted, and, if necessary, this counting of
lower ranks will be repeated down to the Approved
rank, until a majority is found or all approved ranks have been counted.
If there is still no majority, a runoff election
will be held. On the runoff ballot will be the
two most-approved candidates from this election,
plus any candidate who would, by comparison of
all ranks voted, including the unapproved ranks,
defeat, pairwise, both of those candidates.
1, 4, Favorite
2, 3, Preferred
3, 2, Approved
4, 1, Disliked
5, 0, Rejected
If you do not mark a category for a candidate,
that candidate will be classified as Rejected.
(The range ratings I give above as equivalent are
not used in the method as described. They would
be used in a variant. The 3rd rank has been
defined as an approved rank, which was the case
with original Bucklin. A vote at that rank *can*
elect a candidate. The most that votes below that
rank can do is to select a Condorcet winner and
place that candidate into the runoff. It would
also be possible to define the runoff as being
top two, *or*, if there is a candidate who
defeats both of the two, between the top approved
candidate and the pairwise winner. There is
another contingency I have not addressed. It is
*highly* unlikely, but would need to be
considered in the formal method for logical completeness.)
The system above could be Range instead of
Bucklin; all that is necessary is to have a
formal approval cutoff. I.e., say, mid-range or
higher is "approved." The descending approval
cutoff counting could be skipped and the range
votes added to find highest range sum, or, if
that canddiate does not have majority approval,
then a runoff with top two (which is known to
improve range votes with real voters, i.e.,
"strategic voters,") and the pairwise test still considered as above.
>6. Even if it turned out that the scale did
>approximate an interval measure, the procedure
>depends on irrelevant alternatives, [hence] is
>subject to Arrow's paradox: for if one or
>several candidates drop out, the distribution of
>the remaining grades will almost certainly be
>different, so the scale is no longer an interval
>measure. [For example, in the French 2007
>presidential election, the counts of the number
>of times each of their 6 verbals scores was
>used, changed considerably when all scores for
>the 8 "unimportant" among the 12 candidates were removed.]
It's insane. An election is a *choice*, and
choice depends on context. The context is the
specific ballot, set of options, used by the
voters, not some other hypothetical or
previously-possible ballot. IIA as applied to the
expressed votes is not violated. Arrow knows
this. Of course, Balinski and Laraki did not have
the benefit of the CES interview, but we already
knew that Arrow's theorem did not apply to
systems that categorize candidates into ranked
categories, allowing equal ranking.
Arrow's theorem deliberately did not consider
cardinal voting systems, nor did it consider
"ballots." It assumed individual preference
profiles that ranked all candidates, strictly, no
equal ranking, and no skipped ranks. In the real
world we are often faced with choices where we
have difficulty deciding. That is an expression
of low or no preference strength. Voting systems
that allow this as an expression, therefore,
collect more accurate data from voters.
Bucklin is a Range system, not a ranked system.
The difference has often been overlooked, but
voters could skip ranks with Bucklin, and did,
thus espressing strong preference. In original
Bucklin, they could equal-rank in third rank, and
it's an obvious extenstion to allow equal ranking
in all ranks. Why not? Forcing the voter to rank
suppresses valuable information. Equal ranking is
*information,* just as ranking is information.
Warren went on to say much the same as I wrote
above, independently. (I hadn't read him through
before writing this commentary, I often do that.)