Contest Scoring Question

2 views
Skip to first unread message

Avrom Faderman

unread,
Aug 21, 1999, 3:00:00 AM8/21/99
to
Hi all,

This is a question that's been nagging me for a while, and since the contest
will begin fairly soon, I figured this might be a good time to ask it.

Is there, or if not should there be, any guidelines for scoring (other than
10=highest, 1=lowest, integers only)? I don't mean anything as specific as
how important puzzles vs. plot vs. writing are--such opinions differ, and I
gather the point of the judging is to see how the community as a whole likes
various games.

I just mean questions like:

What does a "10" mean? Does it mean "a really good game, and one of the 2-3
best this comp?" "The very best game this comp?" "The sort of game that
only comes out every few years?" "The best game ever?" "The best game
imaginable?"

What does a "5" mean? Solidly average? Kind of mediocre? Is it more like
what a "C" officially means ("average") or what it has generally come to
mean ("minimally decent")?

I think scoring styles currently differ pretty dramatically. I've seen
people's score lists with nothing below a 6; I've seen people who
normalized their score lists (giving the top 10% of games a 10, the next 10%
a 9, and so on); I myself tend to bunch pretty heavily around the 4-6
range, have never given a 10 (I think subconsciously I had associated it
with "the best game imaginable"), and give 8-9 only to really really
impressive games.

Maybe this isn't a problem. Or maybe it is: For example, I tend to only
play TADS and Inform games, and since I think my scores are much lower than
the average, that constitutes a slight penalty to any game written in TADS
or Inform. Maybe this is the sort of noise that's expected to all come out
in the wash; I don't know enough about the sample size of judges to know
whether that's realistic.

What do people think? Would requiring normalization of scores (or doing so
automatically) be a good idea? Or would instituting some other official
guideline (even something as simple as "A game with a score of 10 should be
at or nearly on a par with your favorite game; one with a score of 5 should
be about average; one with a score of 1 should be one worthy only of
parody") be better?

Best,
Avrom


Adam Cadre

unread,
Aug 21, 1999, 3:00:00 AM8/21/99
to
Avrom Faderman wrote:
> What do people think? Would requiring normalization of scores (or doing so
> automatically) be a good idea? Or would instituting some other official
> guideline (even something as simple as "A game with a score of 10 should be
> at or nearly on a par with your favorite game; one with a score of 5 should
> be about average; one with a score of 1 should be one worthy only of
> parody") be better?

To me, 10 means "I want this game to win the comp" and 1 means "I want this
game to come in last." Since I tend to have just a few favorites in any
given comp batch, most of my scores cluster around 2 and 3, making my 10
even more powerful.

An example. Let's say there are three games in the comp, each with 99
votes, and all three have an average score of 5. Now, you like the first
game quite a bit, and want it to win the comp; the others you don't care
for. Grading against all games that could conceivably be written, you
decide to hand out a 7 and two 4s. This leads to a net score of 5.02 for
the game you like, and a 4.99 for the games you don't.

Now, let's say you're like me, and give the games a 10, a 2 and a 1. Now
the scores end up 5.05, 4.97 and 4.96. A somewhat bigger spread.

Now let's put these approaches in the same pool. There are 98 votes for
each of these games, and they average out to 5. You vote 7, 4 and 3; I
give those games, respectively, a 1, a 2 and a 10. Your favorites work
out to a 4.98, a 4.96 and a 5.03. My vote has had a greater impact on
the final scores.

And that's why I vote the way I do. I'll give up to 4 points to a game
that I acknowledge to be well-done, but don't care for. To get even as
high as a 5, your game has to be one I really enjoy.

-----
Adam Cadre, Issaquah, WA
http://adamcadre.ac

Andrew Plotkin

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to
Avrom Faderman <Avrom_F...@email.msn.com> wrote:
> What do people think? Would requiring normalization of scores (or doing so
> automatically) be a good idea? Or would instituting some other official
> guideline (even something as simple as "A game with a score of 10 should be
> at or nearly on a par with your favorite game; one with a score of 5 should
> be about average; one with a score of 1 should be one worthy only of
> parody") be better?

The question you seem to be asking is, over what *set* do you normalize
the 1 - 10 range. (The set of competition games that year? The set of all
competition games you've ever played? The set of all notional games you
ever might play, from the worst imaginable to the best imaginable?)

I more or less use the second of those definitions. I can see arguments
for all three, but none really convincing, so I guess I'd say we should
continue leaving it up to each player's discretion.

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the
borogoves..."

Steve Young

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to

Adam Cadre wrote in message ...

>Avrom Faderman wrote:
>> What do people think? Would requiring normalization of scores (or
doing so
>> automatically) be a good idea? Or would instituting some other
official
>> guideline (even something as simple as "A game with a score of 10
should be
>> at or nearly on a par with your favorite game; one with a score of 5
should
>> be about average; one with a score of 1 should be one worthy only of
>> parody") be better?
>
>To me, 10 means "I want this game to win the comp" and 1 means "I want
this
>game to come in last." Since I tend to have just a few favorites in
any
>given comp batch, most of my scores cluster around 2 and 3, making my
10
>even more powerful.


I always thought this competition was for the IF community to promote
and encourage IF to be written, and to participate by fairly marking all
the games to finally decide which ones are the best of that particular
year. From your comments I can see how wrong I was.

There is always going to be disagreement about how games are marked,
with some people marking high, others average, and others low, but at
least they are marking all the games by the same criteria, unlike
yourself who is trying to manipulate the results by putting in rogue
scores. In a large pool of participants this would probably not have
much effect, but in the smaller sample of people voting here, one or two
people like yourself could fix the result, making it more like an
election for a banana republic than a serious comparison of the IF
games.

>An example. Let's say there are three games in the comp, each with 99
>votes, and all three have an average score of 5. Now, you like the
first
>game quite a bit, and want it to win the comp; the others you don't
care
>for. Grading against all games that could conceivably be written, you
>decide to hand out a 7 and two 4s. This leads to a net score of 5.02
for
>the game you like, and a 4.99 for the games you don't.
>
>Now, let's say you're like me, and give the games a 10, a 2 and a 1.
Now
>the scores end up 5.05, 4.97 and 4.96. A somewhat bigger spread.
>
>Now let's put these approaches in the same pool. There are 98 votes
for
>each of these games, and they average out to 5. You vote 7, 4 and 3; I
>give those games, respectively, a 1, a 2 and a 10. Your favorites work
>out to a 4.98, a 4.96 and a 5.03. My vote has had a greater impact on
>the final scores.
>
>And that's why I vote the way I do. I'll give up to 4 points to a game
>that I acknowledge to be well-done, but don't care for. To get even as
>high as a 5, your game has to be one I really enjoy.


I can understand you rating games you like and enjoy above others you
might not so much enjoy, but surely they shouldn't be marked right down
just on this criteria alone. I recently played Hitchhikers Guide to the
Galaxy and didn't like it at all, finding it very annoying in parts and
repetative, but even taking this into account I also recognised the many
positive attributes of the game and would give it around 5/10.

The trouble is if your scoring system took hold, the only games that
would come anywhere would be middle of the road games, that although
they didn't add anything new were fairly inoffensive, fairly liked, and
as such wouldn't be marked excessively either way. Whereas the best
games would be marked way up or down according to the whims of the
participent and how it might effect their own favourites. Hardly a
criteria of the best short games of the day.

Steve


Vincent Lynch

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to
Andrew Plotkin wrote in message
<7pnj8p$n...@dfw-ixnews12.ix.netcom.com>...

>
>The question you seem to be asking is, over what *set* do you normalize
>the 1 - 10 range. (The set of competition games that year? The set of
all
>competition games you've ever played? The set of all notional games you
>ever might play, from the worst imaginable to the best imaginable?)
>
>I more or less use the second of those definitions. I can see arguments
>for all three, but none really convincing, so I guess I'd say we should
>continue leaving it up to each player's discretion.

Since having voting in the competition only serves to compare games in
that year of the competition against each other (at least, that's the
case if it's just the ordered list at the end that's important), only
the first suggestion makes sense to me. I don't think enforced
normalising works, though, unless judges are obliged to play all the
games.

-Vincent


Andrew Plotkin

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to
Vincent Lynch <vincen...@lineone.net> wrote:
> Andrew Plotkin wrote in message
> <7pnj8p$n...@dfw-ixnews12.ix.netcom.com>...
>>
>>The question you seem to be asking is, over what *set* do you normalize
>>the 1 - 10 range. (The set of competition games that year? The set of
> all
>>competition games you've ever played? The set of all notional games you
>>ever might play, from the worst imaginable to the best imaginable?)
>>
>>I more or less use the second of those definitions. I can see arguments
>>for all three, but none really convincing, so I guess I'd say we should
>>continue leaving it up to each player's discretion.
>
> Since having voting in the competition only serves to compare games in
> that year of the competition against each other (at least, that's the
> case if it's just the ordered list at the end that's important), only
> the first suggestion makes sense to me.

The problem is, some people (and me) will say "What if all the games suck?
Should I give a mediocre game a 10 just because there aren't any better
entries that year?"

> I don't think enforced
> normalising works, though, unless judges are obliged to play all the
> games.

That's another good point.

Andrew Plotkin

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to
Steve Young <steve...@eclipse.co.uk> wrote:
>
> Adam Cadre wrote in message ...
>> To me, 10 means "I want this game to win the comp" and 1 means "I want
>> this game to come in last." Since I tend to have just a few favorites
>> in any given comp batch, most of my scores cluster around 2 and 3,
>> making my 10 even more powerful.
>
> I always thought this competition was for the IF community to promote
> and encourage IF to be written, and to participate by fairly marking all
> the games to finally decide which ones are the best of that particular
> year. From your comments I can see how wrong I was.

<eliza>You seem bitter. Do you want to talk about it?</eliza>

> [...]


> I can understand you rating games you like and enjoy above others you
> might not so much enjoy, but surely they shouldn't be marked right down
> just on this criteria alone.

You're getting into an area entirely separate from the original question.
I've spent a lot of time in the past posting about this, and I *really*
don't want to start again.

However, please consider that, for me, the phrase "games I like and enjoy"
includes *everything* that shapes my opinion about a game. Including all
the things you're accusing Adam (and me) of leaving out.

Avrom Faderman

unread,
Aug 22, 1999, 3:00:00 AM8/22/99
to

Steve Young wrote in message <93533465...@ns2.saturn.ispc.net>...

>
>Adam Cadre wrote in message ...
>>To me, 10 means "I want this game to win the comp" and 1 means "I want
>this
>>game to come in last." Since I tend to have just a few favorites in
>any
>>given comp batch, most of my scores cluster around 2 and 3, making my
>10
>>even more powerful.
>
>The trouble is if your scoring system took hold, the only games that
>would come anywhere would be middle of the road games, that although
>they didn't add anything new were fairly inoffensive, fairly liked, and
>as such wouldn't be marked excessively either way. Whereas the best
>games would be marked way up or down according to the whims of the
>participent and how it might effect their own favourites. Hardly a
>criteria of the best short games of the day.


Actually, it seems to me that if there's any trouble with Adam's scoring
system, it's exactly the opposite--broad-appeal,
everyone-enjoys-but-nobody-loves games have almost no chance if everyone
scores like that.

Massive simplification to demonstrate basic point ahead.

Let's suppose that after eliminating the games that nobody likes, the comp
were left with 2 other sorts: "Controversial" games (ones that are in half
the players' top 15% and in half the players' bottom half) and
"uncontroversial" games (ones that every player ranks as pleasant but
basically middle-of-the-road...in the second 25%, maybe).

If everyone used a scoring system like mine, with a huge bunch near 4-6 and
only outliers near the top and bottom, "controversial" and "uncontroversial"
games might do about evenly well--controversial games would get a pretty
even mix of 8-10 and 1-5 and uncontroversial games would get 5-7 from almost
everyone, giving games of both types an expected average of around 6.

If everyone used Adam's system, with the big bunch near the very bottom but
with a very sharp curve towards the top as one approaches the best games in
the competition, controversial games would still come off OK--they'd get
8-10 from about half the people and 1's from the rest, giving them an
average of about 5, but uncontroversial games would be getting 3's and 4's
from almost everyone, with an expected average of about 3.5.

Now I said "trouble," but I'm not sure it is trouble. Maybe we *want* to
encourage games at least some people love over games everybody likes--that
sounds like the better bet if we want lots of experimental stuff, at least.
Which would be an argument for Adam's system.

(The extreme version of fostering this would be to ask everyone to vote for
the best game rather than rate all of them. Then even being everyone's
second-favorite wouldn't be as good as being the top pick of a few.)

(For pure useless intellectual curiosity's sake, it would be interesting to
see an analysis of past competition results along the lines of...which games
were the most people's top pick? How closely does that correlate with the
actual contest results? But that question's probably more trouble to answer
than it's worth.)

Best,
Avrom


Quentin.D.Thompson

unread,
Aug 23, 1999, 3:00:00 AM8/23/99
to
In article <7ppi7n$c...@dfw-ixnews5.ix.netcom.com>,

Andrew Plotkin <erky...@netcom.com> wrote:
> Vincent Lynch <vincen...@lineone.net> wrote:
> > Andrew Plotkin wrote in message
> > <7pnj8p$n...@dfw-ixnews12.ix.netcom.com>...
> >>
> >>The question you seem to be asking is, over what *set* do you normalize
> >>the 1 - 10 range. (The set of competition games that year? The set of
> > all
> >>competition games you've ever played? The set of all notional games you
> >>ever might play, from the worst imaginable to the best imaginable?)

In my opinion, a combination of (1) and (3). First decide where all the games
stand based on previous/current experience (but not necessarily based on
previous Comp rankings) and then award them scores based on the general
quality that year. For example, suppose you had three games, A, B and C.
Having played a bit of IF in your time, (and having your own conception of
what the perfect 10 is, be it "Jigsaw" or "Trinity" or "So Far"), you decide
that A rates around 6, B rates about 4 and C rates about 7. The next step
(i.e. "C was the best of this comp; let's give it a 9 or 10" .vs. "This game
is obviously inferior to what _I_ feel is the perfect 10; let the 7 stand")
should be up to the individual judges. Trying to draw up "official"
guidelines doesn't help much, and can even generate some unnecessary
friction. There are arguments for all scoring systems, including Adam's.

> > Since having voting in the competition only serves to compare games in
> > that year of the competition against each other (at least, that's the
> > case if it's just the ordered list at the end that's important), only
> > the first suggestion makes sense to me.
>
> The problem is, some people (and me) will say "What if all the games suck?
> Should I give a mediocre game a 10 just because there aren't any better
> entries that year?"

Spokesman: "That would, indeeed, be a remote contingency." :-D Seriously,
though, even if it does happen (and judging by form, it won't..) my own
opinion would be _not_ to place the 'best' (i.e.'least sucky') game at 9 or
10, the next at 8, and so on. If it does boil down to a choice of "which of
these games is the least lousy?" there's no obligation to give game X (which
is inferior to the 16th place winner of the Comp held 2 years ago) a 10 or
even a 9. Then again, some people score all games fairly high. Though I've
never posted ratings as an 'official' judge (and won't this year, for obvious
reasons :-D), I've had a bash at scoring most of the Comp games from last
year, and my ratings tend to be at the extremes: there were two 9s, several
8s, but several 3s, 2s and 1s - though that had more to do with parsers.

-- Quentin.D.Thompson.


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

SteveG

unread,
Aug 23, 1999, 3:00:00 AM8/23/99
to
On Sat, 21 Aug 1999 14:27:41 -0700, "Avrom Faderman"
<Avrom_F...@email.msn.com> wrote:

>Hi all,
>
>This is a question that's been nagging me for a while, and since the contest
>will begin fairly soon, I figured this might be a good time to ask it.
>
>Is there, or if not should there be, any guidelines for scoring (other than
>10=highest, 1=lowest, integers only)?

I don't think its necessary to have rules for scoring.

Guidelines might be nice except for the fact that I don't think
there's a widely accepted view of what those guidelines should be. :-)

And, in the end, I don't think it matters too much as my observation
has been that despite widely differing views on and methods of voting
the results have each year pretty much lined up with my own views. (So
the results must be "correct"! :-) My favourite game has never won and
I could quibble over some of the relative rankings but all in all I've
always been comfortable with the outcomes. I think the current simple
rules produce results that rgif readers have been happy with.)

> I don't mean anything as specific as
>how important puzzles vs. plot vs. writing are--such opinions differ, and I
>gather the point of the judging is to see how the community as a whole likes
>various games.
>
>I just mean questions like:
>
>What does a "10" mean? Does it mean "a really good game, and one of the 2-3
>best this comp?" "The very best game this comp?" "The sort of game that
>only comes out every few years?" "The best game ever?" "The best game
>imaginable?"
>
>What does a "5" mean? Solidly average? Kind of mediocre? Is it more like
>what a "C" officially means ("average") or what it has generally come to
>mean ("minimally decent")?
>

> [snip]

In my votes last year "10" meant most enjoyable in the contest, "1"
meant least enjoyable, "5" meant average.

I initially scored each game on an absolute scale (ie: "10" = best
game imaginable). So at the end of the judging period I had games
scoring from about 2 to 9 with many games clustered around 4-7.

In previous years I would've submitted those votes but someone pointed
out that votes which used the full range of scores had more influence
on the outcome. So I re-scaled my scores to stretch from 1 to 10 and
took the opportunity to spread out the games I'd scored between 4 and
7 to more accurately reflect my opinions. So in the scores I submitted
the meaning of "10" had become "best in this comp".

I thought this was an easy and fair way for me to translate my
judgements into scores.


--
SteveG
(Please remove erroneous word from address if emailing a reply)

Steve Young

unread,
Aug 23, 1999, 3:00:00 AM8/23/99
to

Andrew Plotkin wrote in message
<7ppj4m$c...@dfw-ixnews5.ix.netcom.com>...

>Steve Young <steve...@eclipse.co.uk> wrote:
>>
>> Adam Cadre wrote in message ...
>>> To me, 10 means "I want this game to win the comp" and 1 means "I
want
>>> this game to come in last." Since I tend to have just a few
favorites
>>> in any given comp batch, most of my scores cluster around 2 and 3,
>>> making my 10 even more powerful.
>>
>> I always thought this competition was for the IF community to promote
>> and encourage IF to be written, and to participate by fairly marking
all
>> the games to finally decide which ones are the best of that
particular
>> year. From your comments I can see how wrong I was.
>
><eliza>You seem bitter. Do you want to talk about it?</eliza>


I can't decide if you are joking or not. If not, I would say that I'm
not bitter about anything, but bringing up an important point about how
competition games are marked and how it might effect the final result of
any contest.

>> I can understand you rating games you like and enjoy above others you
>> might not so much enjoy, but surely they shouldn't be marked right
down
>> just on this criteria alone.


Steve


Marnie Parker

unread,
Aug 23, 1999, 3:00:00 AM8/23/99
to
>Subject: Re: Contest Scoring Question
>From: stev...@erroneous.moc.govt.nz (SteveG)
>Date: Sun, 22 August 1999 09:05 PM EDT

>In my votes last year "10" meant most enjoyable in the contest, "1"
>meant least enjoyable, "5" meant average.
>
>I initially scored each game on an absolute scale (ie: "10" = best
>game imaginable). So at the end of the judging period I had games
>scoring from about 2 to 9 with many games clustered around 4-7.
>
>In previous years I would've submitted those votes but someone pointed
>out that votes which used the full range of scores had more influence
>on the outcome. So I re-scaled my scores to stretch from 1 to 10 and
>took the opportunity to spread out the games I'd scored between 4 and
>7 to more accurately reflect my opinions. So in the scores I submitted
>the meaning of "10" had become "best in this comp".
>
>I thought this was an easy and fair way for me to translate my
>judgements into scores.
>

I, personally, think this is the best way to do it. So I agree.

10 = excellent, 5 = average, 1 = very poor

Then spread them from there.

This question actually reaches into the very heart of the contest. Is it about
competition? Or is it about encouraging new and more IF writers?

If the second, then the authors of the games entered have a right to good
feedback, i.e., how would their game stack up outside the contest?

Doe :-) IMHO.

-----------------------------
doea...@aol.com
The Doepage - http://members.aol.com/doepage/index.htm
IF Art Gallery - http://members.aol.com/iffyart/gallery.htm
"I can live for two months on a good compliment." Mark Twain

BrenBarn

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
>>Hi all,
>>
>>This is a question that's been nagging me for a while, and since the contest
>>will begin fairly soon, I figured this might be a good time to ask it.
>>
>>Is there, or if not should there be, any guidelines for scoring (other than
>>10=highest, 1=lowest, integers only)?
I wonder this everytime anyone is scoring anything. I personally think
guidelines would lead to a contest in which the eventual outcome more
accurately reflected the aggregate opinions of the judges, but the problem is.
. .

>Guidelines might be nice except for the fact that I don't think
>there's a widely accepted view of what those guidelines should be. :-)
And that's a big problem. For example, if (hypothetically) I liked
scoring games one way, and EVERYONE else liked scoring them a different way,
and they all instituted guidelines or rules or what have I on how I should rank
the games, I would be irritated, and tempted to judge my own way and thumb my
nose. (This is hypopthetical, remember :-)
That's the reason we shouldn't have guidelines. The reason we should,
though, is that unless everyone (or almost everyone) can agree on a way to
judge, the outcome of the contest will not reflect the opinions of the judges.
For example, suppose there are three games, and three judges. Two of the
judges (#1 and #2) like A the best and C the worst, and think B is average.
Judge #3 thinks the reverse (A is bad, C is great, B is average). Judge #1
ranks based on "All the games that could theoretically be made", and he's got
high hopes, so he gives A a 3, B a 2, and C a 1. Judge #2 ranks based on "All
the games I've ever played," and he's played better games than A, so he gives A
a 6, B a 3, and C a 1. Judge #3 (the rebel) ranks based on "All the games in
this year's contest," so he gives A a 1, B a 5, and C a 10.
Adding those up, we get 10 for A, 10 for B, and 12 for C. So C wins, even
though two out of three judges thought it was THE WORST game in the
competition! That's heinous! Admittedly, this is an unlikely situation, but
it is illustrative of how guidelines would be helpful.
Perhaps one of the problems is that even the lowest possible score still
RAISES that games overall score. In this case, it might be better to grade
games on a scale of 5 to -5, with 0 being a completely average game. Thus, if
I REALLY hated the game, I could actually DECREASE its score, rather than just
not increasing it very much.
And of course, you have to take into account the fact that I have never
judged, never entered, and never even participated in any way in the IF
Competition, and therefore the tedious ramble you just read is probably all
nonsense. Sorry! :-)


From,
Brendan B. B. (Bren...@aol.com)
(Name in header has spam-blocker, use the address above instead.)

"Do not follow where the path may lead;
go, instead, where there is no path, and leave a trail."
--Author Unknown

ct

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
In article <19990824104709...@ng-fc1.aol.com>,
BrenBarn <bren...@aol.comRemove> wrote:

> That's the reason we shouldn't have guidelines. The reason we
> should, though, is that unless everyone (or almost everyone) can
> agree on a way to judge, the outcome of the contest will not reflect
> the opinions of the judges.

Yes it will - just a weighted reflection where some judges' opinions
matter more than others.

> [#1] gives A a 3, B a 2, and C a 1. [#2] gives A a 6, B a 3, and C a 1.
> [#3] gives A a 1, B a 5, and C a 10.


> Adding those up, we get 10 for A, 10 for B, and 12 for C. So C
> wins, even though two out of three judges thought it was THE WORST
> game in the competition! That's heinous!

No, that's just the sort of thing that happens with additive score
voting. Now if everyone would just use a sensible /rank/ voting system
(since ranking is all we're supposed to get out at the end anyway)
this kind of thing wouldn't happen...

regards, ct


James M. Power

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
This is actually a problem that pops up in many areas. One solution that
seems to work quite well is to give people a number of votes to be
spread out as they wish. Often the number of votes is equal to the
number of candidates. For example, lets say you have 10 games, a thru j,
and the judges have ten votes. It is implicit that you could give all
ten votes to one candidate. Or you could give four to your top candidate
and two to three others you thought worthy of merit. This results in a
ranking of the games, but not an average rating. (I think the previous
discussion shows that the average rating is at best only crudely
accurate). Some people feel that this ranking more accurately reflects
both the number and intensity of approving voters. It is more likely to
result in favorable rankings for a game that a small number of people
really love, but the majority think is a stinker. In fact, if the human
mind were capable of digesting more than one number in a rating system,
we could come up with a very good one. (Quick, on a scale of 1 to 10,
who is a better athlete, Michele Kwan or George Foreman?)

Of course, it is a little different here since most of the voters don't
play all the games. Would it work with the number of votes limited to
the number of games you played? You wouldn't be able to vote one by one
then, but only all at once.

FWIW
-Jim Power

BrenBarn

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
>Yes it will - just a weighted reflection where some judges' opinions
>matter more than others.
In my opinion, that is not really "the opinions of the judges." I don't
consider an unevenly weighted result to be representative.

>Now if everyone would just use a sensible /rank/ voting system
>(since ranking is all we're supposed to get out at the end anyway)
>this kind of thing wouldn't happen...

You mean like, pick your top 10 games? Or rank all the games from best to
worst? That's a good idea, but there are problems there, too. For example,
judges might not get to play all the games (at least if I understand the
contest correctly). That would throw off the scores if, for example, some
judges happened to play only the worst or only the best games. Also, it's a
real pain to have to make the hair-splitting decision of which of two games to
rank higher, if both are very close in quality.
I suppose we could use a "lump ranking" system, where you get ranks 1
(best) through 10 (worst), and you can give many games the same rank. But that
sounds eerily similar to the kind of grading already in use. . . creepy. . .

BrenBarn

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
>One solution that
>seems to work quite well is to give people a number of votes to be
>spread out as they wish. Often the number of votes is equal to the
>number of candidates.
That's a good idea.

>It is more likely to
>result in favorable rankings for a game that a small number of people
>really love, but the majority think is a stinker.

Yeah. So now we have a new question: "Is that a good thing?" Which
carries more weight? A lot of people who "like" it, or a few people who "love"
it? How many "likers" does it take to equal one "lover"?

>Of course, it is a little different here since most of the voters don't
>play all the games. Would it work with the number of votes limited to
>the number of games you played?

It seems like it would, but then we have the problem I mentioned in another
post: If some judges happened to play only the best or only the worst games,
the outcome would be jarred.
It's looking to me like any "relative" rating system (in which the score of
one game affects and is affected by the scores of others) will have this
problem. This includes "ranking" systems and the "distributed votes" system
you described.
Assuming I'm right, then to solve this, we either have to use an absolute
rating system (a la "scale of 1 to 10"), or we have to require every judge to
play every game. The latter would be either impossible or impossibly
time-consuming, and probably annoying and imposing to the judges as well.
So any thoughts on a rating scale of 5 to -5? I haven't thought the
scheme through, but it's got that. . . that. . . je ne sais quoi. . . :-)

ct

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
In article <19990824160240...@ng-fm1.aol.com>,

BrenBarn <bren...@aol.comRemove> wrote:
> So any thoughts on a rating scale of 5 to -5? I haven't thought the
>scheme through, but it's got that. . . that. . . je ne sais quoi. . . :-)

I know what it's got - nothing of additional merit. All you do is
reduce all scores (including the final average) by 5.

regards, ct


Vincent Lynch

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
James M. Power wrote in message
<37C2B983...@SpecialtyMarking.com>...
>This is actually a problem that pops up in many areas. One solution

that
>seems to work quite well is to give people a number of votes to be
>spread out as they wish. Often the number of votes is equal to the
>number of candidates. For example, lets say you have 10 games, a thru
j,
>and the judges have ten votes. It is implicit that you could give all
>ten votes to one candidate. Or you could give four to your top
candidate
>and two to three others you thought worthy of merit. This results in a
>ranking of the games, but not an average rating. (I think the previous
>discussion shows that the average rating is at best only crudely
>accurate). Some people feel that this ranking more accurately reflects
>both the number and intensity of approving voters.

I don't think this is an improvement. The main problem I see with the
current system is that judges can, to an extent, decide how much effect
their votes have on the outcome. In the system you're suggesting,
that's even more true; if I really want game A to win, I give it 10
votes.

>Of course, it is a little different here since most of the voters don't
>play all the games. Would it work with the number of votes limited to

>the number of games you played? You wouldn't be able to vote one by
one
>then, but only all at once.

That's the other problem, generally. If the frontrunners are game A and
game B, and I've only played game A, I shouldn't have any say as to
which of the two games is better.

I'd suggest the following; that each judge simply submits an ordered
list of the games they've played. I'm not sure exactly how it would
work (I'm thinking of a modified version of STV, if that means anything
to anyone), but assuming it does, it should completely eliminate any
tactical voting.

I can't imagine that anyone really wants to change anything this close
to the competition, but I'd be happy to supply more details if people
think it's a good idea.

-Vincent


ct

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
In article <19990824155353...@ng-fm1.aol.com>,

BrenBarn <bren...@aol.comRemove> wrote:
>>Yes it will - just a weighted reflection where some judges' opinions
>>matter more than others.
> In my opinion, that is not really "the opinions of the judges." I don't
> consider an unevenly weighted result to be representative.

The weighting, though, is entirely decided by the judges
themselves. If they choose to reduce their vote in overall effect...

> You mean like, pick your top 10 games? Or rank all the games from best to
> worst?

I think rank all games played from best to worst. (I haven't entirely
thought this through, but...) It should be possible to cope easily with
missing ranks.

> Also, it's a real pain to have to make the hair-splitting decision
> of which of two games to rank higher, if both are very close in
> quality.

Oh, I'm happy for 'equivalent' ranks to be posted.

> I suppose we could use a "lump ranking" system, where you get ranks
> 1 (best) through 10 (worst), and you can give many games the same
> rank.

That would be a crudified version of above, yes. I'm not sure it'd
gain you much save easier parsing for the vote counter, and it'd likely
confuse people into interpreting it as a scoring system.

> But that sounds eerily similar to the kind of grading
> already in use. . . creepy. . .

Just 'cos the voting looks similar doesn't mean the counting has to be...

regards, ct

Vincent Lynch

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
BrenBarn wrote in message
<19990824160240...@ng-fm1.aol.com>...

>>It is more likely to
>>result in favorable rankings for a game that a small number of people
>>really love, but the majority think is a stinker.
> Yeah. So now we have a new question: "Is that a good thing?"
Which
>carries more weight? A lot of people who "like" it, or a few people
who "love"
>it? How many "likers" does it take to equal one "lover"?

I want my favourite game to win, even if I don't think it's _much_
better than the others. So I say I "love" it. I choose how much effect
my vote has.

> It's looking to me like any "relative" rating system (in which the
score of
>one game affects and is affected by the scores of others) will have
this
>problem. This includes "ranking" systems and the "distributed votes"
system
>you described.

I disagree. See my other post.

> Assuming I'm right, then to solve this, we either have to use an
absolute
>rating system (a la "scale of 1 to 10"), or we have to require every
judge to
>play every game. The latter would be either impossible or impossibly
>time-consuming, and probably annoying and imposing to the judges as
well.

I agree that judges shouldn't have to play every game. I don't agree
that the only alternative is absolute scoring.

> So any thoughts on a rating scale of 5 to -5? I haven't thought
the
>scheme through, but it's got that. . . that. . . je ne sais quoi. . .
:-)

I don't think it's really any different. It only makes any sense if
voting is additive; this is a Bad Thing, since a game can win just by
having more judges judge it. If we average it, it becomes equivalent to
scoring from 0 to 10.

-Vincent

Vincent Lynch

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to

BrenBarn wrote in message
<19990824155353...@ng-fm1.aol.com>...

>>Now if everyone would just use a sensible /rank/ voting system
>>(since ranking is all we're supposed to get out at the end anyway)
>>this kind of thing wouldn't happen...
> You mean like, pick your top 10 games? Or rank all the games from
>best to worst? That's a good idea, but there are problems there, too.

>For example, judges might not get to play all the games (at least if I
>understand the contest correctly). That would throw off the scores if,
>for example, some judges happened to play only the worst or only the
>best games. Also, it's a real pain to have to make the hair-splitting

>decision of which of two games to rank higher, if both are very close
>in quality.

If judges rank the games from best to worst, it shouldn't matter if they
don't play all of them. They compare the games they play against each
other; no comparism is made between games they've played, and games they
haven't. Ideally, anyway; it depends on how you count the votes.

I don't think the other consideration, of how to separate two equally
good games is important; the contest has to decide a winner somehow.
Suppose two games are entered which earn a 10 from every judge under the
current system; there's no way of choosing a winner. If judges are
forced to give a preference, there may be a clear favourite.

-Vincent


BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
>I know what it's got - nothing of additional merit. All you do is
>reduce all scores (including the final average) by 5.
That may or may not be true, depending on how the judges make use of the
negative score. The difference between giving a game a 1 out of 10 and giving
it a -5 out of 5 is that the former INCREASES the game's score, and the latter
DECREASES it. If I were judging, I would certainly use a negative score
differently than the corresponding positive score.

BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
>The weighting, though, is entirely decided by the judges
>themselves. If they choose to reduce their vote in overall effect...
Yes and no. True, they decide how they are going to rate the games. But
without knowing how the other judges will do it, no one can know how their
choice will affect the results.
All in all I advocate a 5 to -5 scale, with guidelines. Nothing Nazi.
Just stuff like:
5: One of the best conceivable works of IF
3: A very good work
1: A somewhat above-average work
0: A completely average work
-1: A somewhat below average work
-3: A game whose flaws considerably overpower its good parts
-5: One of the worst conceivable works of IF

BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
>If judges rank the games from best to worst, it shouldn't matter if they
>don't play all of them. They compare the games they play against each
>other; no comparism is made between games they've played, and games they
>haven't.
That's the problem. Each additional game a judge plays will affect the
ranking of all the others (unless it is the best or worst so far). So if A and
B were judging and A happened to play the ten worst games in the competition,
and B played the ten best, A's #1, which is an awful game, would score as much
as B's #1, which is a superb game.

BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
>> So any thoughts on a rating scale of 5 to -5? I haven't thought
>the
>>scheme through, but it's got that. . . that. . . je ne sais quoi. . .
>:-)
>
>I don't think it's really any different. It only makes any sense if
>voting is additive; this is a Bad Thing, since a game can win just by
>having more judges judge it. If we average it, it becomes equivalent to
>scoring from 0 to 10.
Hmmm. You're right. I have a hunch, however, that judges would react
differently to being able to give a negative score rather than a positive one.
I'm not familiar with the the voting is averaged, but I'll hazard a guess:
Every judge who plays game X gives it a score, and those scores are averaged
for the final score. In other (albeit more obtuse) words, the average is
conducted "laterally", between judges. Perhaps what we need is a way to
average the votes "vertically," within the realm of a single judge. So that,
for example, the number of games played and/or the scores awarded by any one
judge will affect how his opinion affects the end result of the competition.
Sorry for the unintelligible nature of the preceding post. . .

BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
>The main problem I see with the
>current system is that judges can, to an extent, decide how much effect
>their votes have on the outcome. In the system you're suggesting,
>that's even more true; if I really want game A to win, I give it 10
>votes.
If you really want to dilute the judge's power to manipulate the scoring
like this, you need some kind of guideline system. It's a lot harder to rate
all your games either 10 or 1 (and be considered sane) when the guidelines call
a 10 "The best game you've ever played" and a 1 "An abomination of the genre."

Gene Wirchenko

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
"Vincent Lynch" <vincen...@lineone.net> wrote:

[snipped previous]

>If judges rank the games from best to worst, it shouldn't matter if they
>don't play all of them. They compare the games they play against each
>other; no comparism is made between games they've played, and games they

>haven't. Ideally, anyway; it depends on how you count the votes.
>
>I don't think the other consideration, of how to separate two equally
>good games is important; the contest has to decide a winner somehow.
>Suppose two games are entered which earn a 10 from every judge under the
>current system; there's no way of choosing a winner. If judges are
>forced to give a preference, there may be a clear favourite.

Or there might not be. You still haven't eliminated the
possibility of ties and more evil...

If I have truly no preference between two games i.e. consider
them equally worthy, you've just denied me the ability to cast my
honest, accurate vote.

As to ties, just say that these games are considered equally
good.

I'm starting to think that the Interactive Fiction is the voting.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Gene Wirchenko

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
bren...@aol.comRemove (BrenBarn) wrote:

>>I know what it's got - nothing of additional merit. All you do is
>>reduce all scores (including the final average) by 5.
> That may or may not be true, depending on how the judges make use of the
>negative score. The difference between giving a game a 1 out of 10 and giving
>it a -5 out of 5 is that the former INCREASES the game's score, and the latter
>DECREASES it. If I were judging, I would certainly use a negative score
>differently than the corresponding positive score.

It does nothing of the sort. The difference between scores in -5
to 5 vs. 0 to 10 is simply 5. The latter are five higher.

Does anyone seriously believe that voting 2 in 0 to 10 increases
a game's score? It's the same as -3 in -5 to 5.

Trig

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
If you want to eliminate the tactical voting without hurting games that do not
get played, there is a simple system. Just rank all of the games that are
played from best to worst with no concern as to actual scores. Then, on the
official score collector applies the 10% template (i.e. top 10% get 10, next
10% get 9, etc...) This would provide a normalized system that the judges
cannot "tamper" with.

The problem here is that I think it hamstrings the feedback that the games get.
Sure, they'll be reviews. There always are plenty, but if an excellant game
is entered with an unprecedented crop of spectacular efforts, it will get an
average rating which may discourage a talented newcomer and vise versa for
bunch of schlock.

Of course, in reality, these extreme circumstances aren't all that likely.
Most likely there will be some good, some bad, and mostly average games which
happen to fit nicely into a pretty bell curve. Also, I don't believe that
there is really any justifiable cause to worry about some judges trying to rig
the final outcomes with a voting master plan since there are enough judges to
render this strategy fairly ineffectual.

All in all, I think that the scoring method employed has little effect on the
overall contest results. The good games are going to rise to the top, the bad
ones will suck and received their deserved scores.

Trig
--
"This may look like a slab of liver, but really, it's an external brain pack!"

ct

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
In article <19990824213615...@ng-ff1.aol.com>,

BrenBarn <bren...@aol.comRemove> wrote:
>>The weighting, though, is entirely decided by the judges
>>themselves. If they choose to reduce their vote in overall effect...
> Yes and no. True, they decide how they are going to rate the games. But
>without knowing how the other judges will do it, no one can know how their
>choice will affect the results.

No, the 'weighting' given to each judge's votes is exactly the range
of results they use. (Or 'distribution' perhaps, but you can't fiddle
this without cheating.)

regards, ct

James M. Power

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
Vincent,

FWIW, your point is absolutely correct. The supposed advantage of this
system is that you can't affect the scores of the other games at that
point.

I realized a huge hole in this though. How do we know if a game didn't
receive votes because no one played it, or because a rater didn't like
it.

I guess the rating would have to reflect number of votes received versus
number of potential voters.

Makes it a little more complicated.

-Jim Power

Vincent Lynch wrote:
>

> I don't think this is an improvement. The main problem I see with the

Andrew Plotkin

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
BrenBarn <bren...@aol.comremove> wrote:
>>The weighting, though, is entirely decided by the judges
>>themselves. If they choose to reduce their vote in overall effect...
> Yes and no. True, they decide how they are going to rate the games. But
> without knowing how the other judges will do it, no one can know how their
> choice will affect the results.
> All in all I advocate a 5 to -5 scale, with guidelines. Nothing Nazi.
> Just stuff like:
> 5: One of the best conceivable works of IF
> 3: A very good work
> 1: A somewhat above-average work
> 0: A completely average work
> -1: A somewhat below average work
> -3: A game whose flaws considerably overpower its good parts
> -5: One of the worst conceivable works of IF

The guidelines don't tell the judge anything, because everyone has a
different conception of what the best and worst IF might be.

Therefore, this devolves to the current system. (Okay, except there are
eleven possible scores instead of ten.)

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the
borogoves..."

BrenBarn

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
Okay, it looks like my romantic fascination with the 5 to -5 scale was
misadvised; it is, indeed, not functionally different from a 1 to 10. Sorry
about that. :-S
But I have another idea. What if we had every judge score as many games
as they wanted. Then, we take each game and add up all the scores it was
given. Then, to compensate for the number of scores it was given, multiply
it's total score by the number of total judges, and divide by the number of
judges who scored it.
So if a game was given a 10 by one judge, an 8 by another, a 5 by another,
a 2 by yet another, and wasn't played by two others. . . The total score is 25,
times six (the total number of judges) is 150, divided by four (the number of
judges who judged it) is 37.5. (Or you could round it to 38 if you want.)
I have done some VERY limited testing (well, okay, one test) on arbitrary
test values, and this method does give slightly different results than a pure
average. I intend to continue with the testing with different numbers to see
how it plays out. Any thoughts would be appreciated.

Jesse Welton

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
In article <37c3701a...@news.shuswap.net>,
Gene Wirchenko <ge...@shuswap.net> wrote:
>bren...@aol.comRemove (BrenBarn) wrote:
>
>>[nibble] If I were judging, I would certainly use a negative score

>>differently than the corresponding positive score.
>
[nibble]

>
> Does anyone seriously believe that voting 2 in 0 to 10 increases
>a game's score? It's the same as -3 in -5 to 5.

As you state, it's mathematically equivalent. I think BrenBarn's
point is that it's psychologically different, which no doubt will be
true for some people and not for others. Offhand, I'd guess it will
be less true for the sort of people who intentionally skew their
scores to increase the effect of their vote on the outcome.

-Jesse

Jesse Welton

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
In article <19990825100115...@ng-fc1.aol.com>,

BrenBarn <bren...@aol.comRemove> wrote:
> What if we had every judge score as many games
>as they wanted. Then, we take each game and add up all the scores it was
>given. Then, to compensate for the number of scores it was given, multiply
>it's total score by the number of total judges, and divide by the number of
>judges who scored it.

Add up the scores, divide by the number of judges who voted: this
gives the average score. Then multiply this by the total number of
judges, which is a constant. This is no different than averaging,
except you've scaled the results by a constant, which won't change the
final order.

> I have done some VERY limited testing (well, okay, one test) on arbitrary
>test values, and this method does give slightly different results than a pure
>average.

You must have made an arithmetic mistake.

-Jesse

Iain Merrick

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
BrenBarn wrote:

> Okay, it looks like my romantic fascination with the 5 to -5 scale was
> misadvised; it is, indeed, not functionally different from a 1 to 10. Sorry
> about that. :-S

> But I have another idea. What if we had every judge score as many games


> as they wanted. Then, we take each game and add up all the scores it was
> given. Then, to compensate for the number of scores it was given, multiply
> it's total score by the number of total judges, and divide by the number of
> judges who scored it.

[...]

What particular problem are you trying to address here? The present
system isn't perfect, but it does have the advantage of being simple.

There has been a _lot_ of discussion about possible voting systems over
the years. It might be worthwhile trawling through dejanews to read some
of it.

Dan Schmidt

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
bren...@aol.comRemove (BrenBarn) writes:

| Okay, it looks like my romantic fascination with the 5 to -5 scale was
| misadvised; it is, indeed, not functionally different from a 1 to 10. Sorry
| about that. :-S
| But I have another idea. What if we had every judge score as many games
| as they wanted. Then, we take each game and add up all the scores it was
| given. Then, to compensate for the number of scores it was given, multiply
| it's total score by the number of total judges, and divide by the number of
| judges who scored it.

That is also equivalent to just taking the average. All you've done
is taken the average and multiplied it by the total number of judges.

--
Dan Schmidt -> df...@harmonixmusic.com, df...@alum.mit.edu
Honest Bob & the http://www2.thecia.net/users/dfan/
Factory-to-Dealer Incentives -> http://www2.thecia.net/users/dfan/hbob/
Gamelan Galak Tika -> http://web.mit.edu/galak-tika/www/

David Glasser

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
ct <turn...@xserver.sjc.ox.ac.uk> wrote:

> In article <19990824160240...@ng-fm1.aol.com>,
> BrenBarn <bren...@aol.comRemove> wrote:

> > So any thoughts on a rating scale of 5 to -5? I haven't thought the
> >scheme through, but it's got that. . . that. . . je ne sais quoi. . . :-)
>

> I know what it's got - nothing of additional merit. All you do is
> reduce all scores (including the final average) by 5.

Well, to nitpick, the 5 to -5 allows 11 choices, where 1-10 only allows
10. But you're still right.

--
David Glasser: gla...@iname.com | http://www.uscom.com/~glasser/
DGlasser@ifMUD:orange.res.cmu.edu 4001 | raif FAQ http://come.to/raiffaq
"So, is that superior artistry, or the easy way out?"
--TenthStone on white canvases as art, on rec.arts.int-fiction

Vincent Lynch

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to

Gene Wirchenko wrote in message <37c36efe...@news.shuswap.net>...

>"Vincent Lynch" <vincen...@lineone.net> wrote:
>
>[snipped previous]
>
>>I don't think the other consideration, of how to separate two equally
>>good games is important; the contest has to decide a winner somehow.
>>Suppose two games are entered which earn a 10 from every judge under the
>>current system; there's no way of choosing a winner. If judges are
>>forced to give a preference, there may be a clear favourite.
>
> Or there might not be. You still haven't eliminated the
>possibility of ties and more evil...

If exactly half of the voters prefer game A to game B, and the other half
prefer game B to game A, there's no fair way of deciding a winner. Any way
of deciding a winner (by using another system that gives a winner) is
basically arbitrary.

Similarly, if no-one can decide which game they prefer, so there isn't a
clear favourite, you're not going to find any sensible way of resolving the
tie. But it won't happen.

> If I have truly no preference between two games i.e. consider
>them equally worthy, you've just denied me the ability to cast my
>honest, accurate vote.

That's a fair point. But I think it's easier to decide which of two games I
prefer than whether a game is worth a 7 or an 8.

> As to ties, just say that these games are considered equally
>good.

If you like. Or toss a coin.

> I'm starting to think that the Interactive Fiction is the voting.

I'm not sure I understand that...

-Vincent


Vincent Lynch

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
ct wrote in message <7pvb3r$lhr$1...@xserver.sjc.ox.ac.uk>...
>In article <19990824155353...@ng-fm1.aol.com>,

>> Also, it's a real pain to have to make the hair-splitting decision
>> of which of two games to rank higher, if both are very close in
>> quality.
>
>Oh, I'm happy for 'equivalent' ranks to be posted.

As long as you can work that into whatever system you have for counting the
votes.

>> I suppose we could use a "lump ranking" system, where you get ranks
>> 1 (best) through 10 (worst), and you can give many games the same
>> rank.
>
>That would be a crudified version of above, yes. I'm not sure it'd
>gain you much save easier parsing for the vote counter, and it'd likely
>confuse people into interpreting it as a scoring system.

It could really confuse people if you ask them to start giving a 1 to the
best games, and a 10 to the worst. ;-)

>> But that sounds eerily similar to the kind of grading
>> already in use. . . creepy. . .
>
>Just 'cos the voting looks similar doesn't mean the counting has to be...

When I posted yesterday, I thought I had a voting system that worked, where
judges just submitted an ordered list of games, didn't have to play all of
them, and that there could be no tactical voting. I realised the flaw in my
system at about 6am this morning. ;-) In elections, the STV system does
this, but assumes that you evaluate all the candidates, even if you can lump
any number of them together (perhaps all but one) as "joint last". (This
wasn't the flaw.) I think my idea would just about work, but it would
probably be worse than what we have now. I don't think what I intended is
actually possible, but I could well be wrong about that as well.

My point (if I have one) is that the problems of how to count votes in a
system, and how to allow equal rankings, or judges to just play some of the
games, or whatever, are non-trivial. And what we have seems to work fairly
well.

-Vincent


Message has been deleted

Adam Cadre

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to
> Thanks. That IS what I meant, but I'm coming to realize that that's not
> really a good enough reason. As far as people who INTENTIONALLY throw the
> scores out of whack, I think that's grossly immoral and disrespectful to the
> honorable intentions of the competition.

I'm not sure what "out of whack" means in this context. In another post
you refer to the practice of "grab[bing] the biggest possible piece of the
pie by leaning heavily toward the extremes of the scale" as "manipulating
the contest," so I assume that's what you're talking about here, but I'm
not at all certain how this practice is somehow not "in whack." After
all, I'm just grading the games based on how much I enjoyed them --
what's dishonest about that? What better criteria would you propose? My
scores cluster around the low end of the scale because, truth be told,
for about 3/4 of the games in any given comp, at the end of the time I
allot to evaluating them, I find myself wishing I'd done something else
with that time. Giving a 3 or 4 to such a game is *generous*, to my
mind. And thus, my scores accurately reflect the extent to which I
prefer the handful of games that I truly do enjoy. I've never downgraded
a game I liked in order to benefit a game I liked more -- the fact that
I'd be willing to downgrade such a game means that I didn't like it all
that much and it therefore deserves the lower grade. No?

-----
Adam Cadre, Issaquah, WA
http://adamcadre.ac

ct

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
In article <4OYw3.8893$9i....@newreader.ukcore.bt.net>,

Vincent Lynch <vincen...@lineone.net> wrote:
> My point (if I have one) is that the problems of how to count votes
> in a system, and how to allow equal rankings, or judges to just play
> some of the >games, or whatever, are non-trivial.

Yes.

> And what we have seems to work fairly well.

Yes. This was also the point I was trying to make.

(Although the thought that everyone understood 'spread' would cheer me.)

regards, ct

ct

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
In article <%NYw3.8890$9i....@newreader.ukcore.bt.net>,

Vincent Lynch <vincen...@lineone.net> wrote:
> Gene Wirchenko wrote in message <37c36efe...@news.shuswap.net>...
> > I'm starting to think that the Interactive Fiction is the voting.
>
> I'm not sure I understand that...

And there I thought it was the bestest, funniest line I'd read in rai-f
for months. Each to his own!

regards, ct

ct

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
In article <cUYv3.4447$36.4...@typhoon-sf.snfc21.pbi.net>,
Avrom Faderman <fade...@pacbell.net> wrote:
>(The extreme version of fostering this would be to ask everyone to vote for
>the best game rather than rate all of them. Then even being everyone's
>second-favorite wouldn't be as good as being the top pick of a few.)
>
>(For pure useless intellectual curiosity's sake, it would be interesting to
>see an analysis of past competition results along the lines of...which games
>were the most people's top pick? How closely does that correlate with the
>actual contest results? But that question's probably more trouble to answer
>than it's worth.)

See now, I've been planning to follow up this since it was posted, do
the re-working of the votes and see how it changed everything. But it
obviously ain't gonna happen - I'm actually working too much :-(

Remind me sometime when I'm written up.

regards, ct

Gene Wirchenko

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
"Vincent Lynch" <vincen...@lineone.net> wrote:

>Gene Wirchenko wrote in message <37c36efe...@news.shuswap.net>...

>>"Vincent Lynch" <vincen...@lineone.net> wrote:
>>
>>[snipped previous]
>>
>>>I don't think the other consideration, of how to separate two equally
>>>good games is important; the contest has to decide a winner somehow.
>>>Suppose two games are entered which earn a 10 from every judge under the
>>>current system; there's no way of choosing a winner. If judges are
>>>forced to give a preference, there may be a clear favourite.
>>
>> Or there might not be. You still haven't eliminated the
>>possibility of ties and more evil...
>
>If exactly half of the voters prefer game A to game B, and the other half
>prefer game B to game A, there's no fair way of deciding a winner. Any way
>of deciding a winner (by using another system that gives a winner) is
>basically arbitrary.

That's right.

>Similarly, if no-one can decide which game they prefer, so there isn't a
>clear favourite, you're not going to find any sensible way of resolving the
>tie. But it won't happen.

So what? Hand each author a gun and have them shoot it out for
the prizes. OTOH, maybe they can agree on something.

>> If I have truly no preference between two games i.e. consider
>>them equally worthy, you've just denied me the ability to cast my
>>honest, accurate vote.
>
>That's a fair point. But I think it's easier to decide which of two games I
>prefer than whether a game is worth a 7 or an 8.

With me, it depends. Sometimes, it's easier to score the games
and sometimes, the straight preference is easier.

>> As to ties, just say that these games are considered equally
>>good.
>
>If you like. Or toss a coin.

Since they're apparently just as good, play the one whose genre
you prefer or use some other arbitrariness.

>> I'm starting to think that the Interactive Fiction is the voting.
>
>I'm not sure I understand that...

There's alot of interaction in this thread. Some of the methods
would give results that are fiction.

Gene Wirchenko

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
turn...@xserver.sjc.ox.ac.uk (ct) wrote:

>In article <%NYw3.8890$9i....@newreader.ukcore.bt.net>,


>Vincent Lynch <vincen...@lineone.net> wrote:
>> Gene Wirchenko wrote in message <37c36efe...@news.shuswap.net>...

>> > I'm starting to think that the Interactive Fiction is the voting.
>>
>> I'm not sure I understand that...
>

>And there I thought it was the bestest, funniest line I'd read in rai-f
>for months. Each to his own!

Thank you.

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>As you state, it's mathematically equivalent. I think BrenBarn's
>point is that it's psychologically different, which no doubt will be
>true for some people and not for others.
Thanks. That IS what I meant, but I'm coming to realize that that's not
really a good enough reason. As far as people who INTENTIONALLY throw the
scores out of whack, I think that's grossly immoral and disrespectful to the
honorable intentions of the competition.

From,

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>No, the 'weighting' given to each judge's votes is exactly the range
>of results they use. (Or 'distribution' perhaps, but you can't fiddle
>this without cheating.)
I see what you're saying, but what I'm saying is that if I decide to, say,
calibrate my scoring so that most games are in the 4-6 range, I don't know how
much difference my score will make. If everyone else is giving 1-3s, it will
make a big difference; if they're doing the same as me, it'll make some
difference; if they're giving 7-10s, it won't make so much of a difference.
The obvious solution is to grab the biggest possible piece of the pie by
leaning heavily toward the extremes of the scale, but that comes dangerously
close to the realm of "manipulating the contest," which, as I stated in another
post just a few minutes ago, is despicable.
I think this discussion originally got started with the suggestion of
guidelines, and I also think that's a good way to go. It's a lot harder to
believably give every game an 9 when the "official" guidelines say that an 9 is
"One of the best games I've ever played."
Of course, because the scoring is totally mathematical, it doesn't matter
if anyone believes you or not. Is banning certain people from judging allowed?
Suppose I habitually and blatantly manipulated the scores like this? Could I
be barred?
These have been some whimsical thoughts, all delivered in a rambling
package. . .

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>The guidelines don't tell the judge anything, because everyone has a
>different conception of what the best and worst IF might be.
Right, but at least know we have a fix on what, for example, an 8 (or a 3)
means. I just posted something about two minutes ago that explains this more
fully, but the long and the short of it is that no will take seriously a judge
who repeatedly gives "Best game ever" and/or "Worst game ever" scores.

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>You must have made an arithmetic mistake.
I did. (Whoops.) Sorry for cluttering up this discussion with this kind
of junk. . . :-S

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>That is also equivalent to just taking the average. All you've done
>is taken the average and multiplied it by the total number of judges.
This is indeed so. My apology can be found in a nearby post. :-S

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>I'm not sure what "out of whack" means in this context. In another post
>you refer to the practice of "grab[bing] the biggest possible piece of the
>pie by leaning heavily toward the extremes of the scale" as "manipulating
>the contest," so I assume that's what you're talking about here, but I'm
>not at all certain how this practice is somehow not "in whack." After
>all, I'm just grading the games based on how much I enjoyed them --
>what's dishonest about that? What better criteria would you propose?
Well, there's been some reference in this discussion to people (be they
real or hypothetical) who would or do try to artificially affect the scores,
and who allow this intent to supersede a desire to honestly and fairly rate the
games. That's what I mean by "manipulating the contest." And you got me right
-- "manipulating the contest" is "out of whack." (Heck, it's just plain
"wack.") :-)

Lucian Paul Smith

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
Avrom Faderman (Avrom_F...@email.msn.com) wrote:

>OPEN CAN OF WORMS
Which can of worms do you mean, the one opened last year, the one opened
two years ago, or the one opened three years ago?

>AREN'T THEY ALL THE SAME?
Oh, that's right. Sorry. You open the can of worms, spraying usenet
posts all over the newsgroup. Again.


Ahem.

The purpose of voting is so that the prizes can be distributed relatively
fairly. That's pretty much it. There's a certain amount of bragging
rights that come with the top places, but honestly, the standard
deviations on the scores are just huge. Anyone who believes that the
contest proved that 'The Edifice' is just a smidge better than 'Babel' is
deluding themselves. A bunch of people liked them both. I got to pick my
prize first. That's about it.

With the voting as it is, each judge has an influence of 1-10 they can
distribute as they wish. Adam chooses to spend most of that influence on
the games he liked the most. So, in a close race between first and second
place, Adam's vote may swing the results one way or the other. But then
his influence is used up. In a close race between second and third, he's
given both a '3', so it's up to someone else to cast the deciding vote.
And maybe the game he really liked was only a 4th place contender, and
Adam's vote helped clinch it. Wheee.

Personally, I like to spread my votes out evenly from 1-10. My vote may
not cinch the difference between third and fourth place, but it has more
influence setting up the overall hierarchy to begin with. And I usually
*don't* have a clear favorite among the games I give 10's to. Better to
give that decision to someone who does.

And let's stay away from giving games negative votes, shall we? C.E.
Forman essentially graded the competition one year on a 0 to -9 scale and
there was no end of tears as a result.

There is one difference between the 1-10 scale and the -5 to +5 scale that
nobody has mentioned yet--the latter has a spread of 11. And I'm all for
doing that. "IF Comp '99: Ours goes up to 11!"

-Lucian

TenthStone

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
On 26 Aug 1999 03:14:00 GMT, bren...@aol.comRemove (BrenBarn) wrote:

>>The guidelines don't tell the judge anything, because everyone has a
>>different conception of what the best and worst IF might be.
> Right, but at least know we have a fix on what, for example, an 8 (or a 3)
>means. I just posted something about two minutes ago that explains this more
>fully, but the long and the short of it is that no will take seriously a judge
>who repeatedly gives "Best game ever" and/or "Worst game ever" scores.

I've been staying out of this, but I thought I'd note that judging is
essentially anonymous, and that unless you're suggesting the
maintainers pick and choose which votes to accept, there wouldn't
be any consequences to doing this.

Frankly, I don't think the voting is important enough to really
discuss manipulating. I mean, it's a competition, not a contest.

----------------
The Imperturbable TenthStone
mcc...@erols.com tenth...@hotmail.com mcc...@gsgis.k12.va.us

TenthStone

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
On 26 Aug 1999 00:25:19 +0100, turn...@xserver.sjc.ox.ac.uk (ct)
wrote:

>In article <%NYw3.8890$9i....@newreader.ukcore.bt.net>,
>Vincent Lynch <vincen...@lineone.net> wrote:
>> Gene Wirchenko wrote in message <37c36efe...@news.shuswap.net>...
>> > I'm starting to think that the Interactive Fiction is the voting.
>>
>> I'm not sure I understand that...
>
>And there I thought it was the bestest, funniest line I'd read in rai-f
>for months. Each to his own!

Blessed irony.

BrenBarn

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
>"IF Comp '99: Ours goes up to 11!"
I laughed out loud when I read this. That'll drum up publicity if
anything can! :-D

Second April

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
On 26 Aug 1999, Lucian Paul Smith wrote:

[many thoughtful, lucid things]

What Lucian said, plus:

There have been four years of the competition now. In none of those years
has anyone seriously complained that judges were unfair to a game, or that
a game was ranked below where it should have been because of, I dunno, a
grudge against the author or something.

Most of us have personal faves that we would have liked to see ranked a
bit higher than they were. One of mine was "Tempest," which came in 25th
in the '97 comp. On the other hand, I understood (at least, came to
understand when I read the reviews posted here) why so many people hated
"Tempest," and I never imagined for a second that anyone ranked it low
for any reasons other than that they didn't enjoy it. Always look for the
simplest explanation: people are more than willing, post-comp, to give
detailed opinions about every game.

Moreover, the IF community is growing (or so it seems to me), and the pool
of judges is more than likely also growing. So with every year, the
ability of one deranged judge to skew the final results diminishes.

In short: since there have been no clearly unjust results in the past
(besides the disgracefully low finish of CASK, of course :-D ), let's
stick with what we've got. Hypothesizing scenarios where an angry judge
(in a tiny sample, of course) doesn't mean those scenarios are likely.

Duncan Stevens
d-st...@nwu.edu
773-728-9721

Love in the open hand, no thing but that,
Ungemmed, unhidden, wishing not to hurt,
As one should bring you cowslips in a hat
Swung from her hand, or apples in her skirt,
I bring you, calling out as children do,
"Look what I have!--And these are all for you."

--Edna St. Vincent Millay

Gene Wirchenko

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
bren...@aol.comRemove (BrenBarn) wrote:

>>No, the 'weighting' given to each judge's votes is exactly the range
>>of results they use. (Or 'distribution' perhaps, but you can't fiddle
>>this without cheating.)
> I see what you're saying, but what I'm saying is that if I decide to, say,
>calibrate my scoring so that most games are in the 4-6 range, I don't know how
>much difference my score will make. If everyone else is giving 1-3s, it will
>make a big difference; if they're doing the same as me, it'll make some
>difference; if they're giving 7-10s, it won't make so much of a difference.

All other things being the same, why would it make much
difference at all what you vote? You're but one person. Unless only
a few people vote, the difference between you voting a 0 and you
voting a 10 will be very little. It's only after a number of people
vote similarly that it shows.

I'm curious as to how many people have been voting in the past.

> The obvious solution is to grab the biggest possible piece of the pie by
>leaning heavily toward the extremes of the scale, but that comes dangerously
>close to the realm of "manipulating the contest," which, as I stated in another
>post just a few minutes ago, is despicable.
> I think this discussion originally got started with the suggestion of
>guidelines, and I also think that's a good way to go. It's a lot harder to
>believably give every game an 9 when the "official" guidelines say that an 9 is
>"One of the best games I've ever played."

True. I find that many people appear to rate things higher if
they like them and ignore the other aspects. I would not be surprised
to see this happening in IF judging.

> Of course, because the scoring is totally mathematical, it doesn't matter
>if anyone believes you or not. Is banning certain people from judging allowed?

Is banning certain people from banning certain people from
judging banned? How do you judge whom to ban from judging? How do
you judge who to ban from banning from judging?

> Suppose I habitually and blatantly manipulated the scores like this? Could I
>be barred?

Bar bar.

> These have been some whimsical thoughts, all delivered in a rambling
>package. . .

Sincerely,

David Thornley

unread,
Aug 26, 1999, 3:00:00 AM8/26/99