Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What do YOU like in IF?

1 view
Skip to first unread message

Conrad

unread,
Nov 12, 2009, 12:48:51 AM11/12/09
to
I've recently gone through the online reviews of IF Comp 09 games,
looking for commonalities and patterned differences in what people
like about IF. What I found is counter-intuitive; or at least,
contrary to the prevailing wisdom.

Admittedly, my method required a lot of gisting. So, I'd like to
collect more numerical data. Then when we get the scores, we can sort
through the survey results to see what the strongest influences are.
And that means we can tell IF authors with specificity how to write
games you'll like better.

Take the survey and tell us what you like in IF:

http://tinyurl.com/y8m5oxo

Thanks!

Conrad.

Ben Cressey

unread,
Nov 12, 2009, 4:11:51 PM11/12/09
to
> I've recently gone through the online reviews of IF Comp 09 games,
> looking for commonalities and patterned differences in what people
> like about IF.  What I found is counter-intuitive; or at least,
> contrary to the prevailing wisdom.

The question that bothered me the most was the one where I had to rate
how meaningful the various games were. Maybe I am just a cynical,
jaded curmudgeon, but I had trouble rating any of this year's entries
higher than a 3. Most were solidly in the 1 or 2 territory.

What exactly are we measuring here? I saw it as numerical shorthand
for the novelty, relevance and importance of the game's themes to the
player. As such, I didn't get the sense that any of this year's crop
had genuine ambitions in that regard.

Any thoughts on what constitutes meaningful IF? Offhand, I would
suggest Cry Wolf from last year, in that it presents the player with
an interesting personal decision in an unusual context. Floatpoint,
too, for similar reasons. Open-ended, morally ambiguous endings seem
to help, although I consider Vespers meaningful and it shoots pretty
straight.

Kate McKee

unread,
Nov 12, 2009, 8:54:02 PM11/12/09
to
I'm not sure that the survey will collect the metadata you're seeking,
which seems to be: What's most important to you in assigning a score:
A) playability, B) story, C) writing, or D) agency?

The essential barrier to scientifically evaluating IF Comp scores is
the lack of a standardized scoring rubric, to address philosophically-
difficult questions like:
* How much better than a 5 is a 6? 20% better (linear) or 10 times
better (log) ?
* Are you scoring the games relative to each other only, or relative
to all the IF you've ever played?
* Do you ever give 10s, and under what circumstances?
* Are there any problems in games where, despite their virtues, you
rank them as FULL OF FAIL and give them a disproportionately low
score? (Any "pet peeve" factors should be eliminated by honest use of
a standardized scoring system, right?)
* Conversely, is there anything -- a really stupendous puzzle,
perhaps, or use of your favorite in-joke -- that would make you give a
relatively high score to a game that was otherwise undeserving?

However, getting to invent your own rubric is part of the fun of the
Comp, I think. I enjoy reading people's analyses of why they scored
not just one game a certain way, but their whole methodology. Some
people try to score the games on a curve. For others, it's "Score:
NC (NOT CADRE) = 1." This doesn't tell me a lot about the game,
perhaps, but it does tell me something about the judge. :) And
occasionally a game comes along that's so innovative in a certain way
that it would be off the scale of whatever axis you'd imposed
beforehand (assuming, of course, the rubric contained an axis for
whatever the awesome element was.)

One thing that might be interesting to evaluate is: how has the mean
score of games (all games, all judges) drifted over time? (Or has
it?) I'm interested in knowing whether there's "grade inflation."

-K


Conrad

unread,
Nov 12, 2009, 9:22:07 PM11/12/09
to
On Nov 12, 4:11 pm, Ben Cressey <bcres...@gmail.com> wrote:
>
> The question that bothered me the most was the one where I had to rate
> how meaningful the various games were.  ...

> Any thoughts on what constitutes meaningful IF?  

Well, it's not about what I think constitutes meaningful IF; I want
judges to tell me how meaningful they themselves found the games,
based on whatever criteria they use.

If what people call meaningfulness highly correlates with the game's
final score, then we can start talking about what's meaningful and
what's not. If there's no correlation, or a weak one, then we can
look to unpack strategies that have a greater impact, as we would
consider them a higher priority.


Conrad.

Conrad

unread,
Nov 12, 2009, 9:53:57 PM11/12/09
to
On Nov 12, 8:54 pm, Kate McKee <kate-mc...@hotmail.com> wrote:
> I'm not sure that the survey will collect the metadata you're seeking,
> which seems to be:  What's most important to you in assigning a score:
> A) playability, B) story, C) writing, or D) agency?

Those are some of the categories I ask about, yes; although I think
this survey would give us a far more interesting answer than your
multiple-choice version, because for example it would tell us *how
much more important* the judges' first choice is than their second.

> The essential barrier to scientifically evaluating IF Comp scores is
> the lack of a standardized scoring rubric, to address philosophically-
> difficult questions like:
> * How much better than a 5 is a 6?    20% better (linear) or 10 times
> better (log) ?

> [..etc..]


> However, getting to invent your own rubric is part of the fun of the
> Comp, I think.  

But it doesn't even matter how rational people think they're being in
the tricky math formulas. When it comes down to it, they *still* have
to go with a gut feeling, or an inner vision, or to listen to their
inner voice, to assign the initial scores. And that inner strategy
they use will vary with how they understand the quality of the game.

People's end behavior in the Comp is Olympic scoring. We can treat
judging purely behaviorally -- judging strategies as black boxes.


> I enjoy reading people's analyses of why they scored
> not just one game a certain way, but their whole methodology.  Some
> people try to score the games on a curve.  For others, it's  "Score:
> NC (NOT CADRE) = 1."  This doesn't tell me a lot about the game,
> perhaps, but it does tell me something about the judge. :)  

Well, if that's a significant source of scoring behavior, we should
eventually be able to identify it and see how strongly it correlates
to the final score. If it's more significant a factor than any other,
then we can recommend to authors that they adopt "Cadre" as a pen
name.

It's a peculiar fact of aggregate behavior that it can be rational and
patterned while no one member is behaving rationally. On the other
hand, if the survey shows judges behave purely chaotically, then that
would itself be interesting information.

> And
> occasionally a game comes along that's so innovative in a certain way
> that it would be off the scale of whatever axis you'd imposed
> beforehand (assuming, of course, the rubric contained an axis for
> whatever the awesome element was.)

Again, it doesn't matter. We're not in this case interested in the
judges' private worlds. We're interested in how they behave, and how
they behave is given shape by the Comp rules and scoring process:

* The judges rate the games 1-10;
* and at the end of six weeks those scores are averaged for every
game.

Because of the averaging math, a "5" score is in fact one point away
from a "6" score; it's not a power of ten or a natural log or any
other goofy thing, unless a judge want to redefine all of
mathematics. In which case, it doesn't matter anyway, because it's in
the judge's private world.

There's another, similar case: Look at a troll game. A troll author
(such as whoever wrote last year's _The Absolute Worst IF Game In
History_) is presumably trying to get the lowest score possible. Then
we can gather that if his game gets a low score, he's happy, and if it
gets a high or middle score, he's less happy, and perhaps confused.

Then low scores are "good" and high scores are "bad." But this is in
the author's private world; this analysis is not meant to track it.


> One thing that might be interesting to evaluate is: how has the mean
> score of games (all games, all judges) drifted over time? (Or has
> it?)  I'm interested in knowing whether there's "grade inflation."

Well, that's easily answerable. If you go to the IF Comp's web page,
you'll see that the final score sheets are all posted. Of course,
whether increasing Comp scores mean that judges are rating higher --
"grade inflation" -- or whether games are getting actually better,
would be open to question.


Conrad.

Conrad

unread,
Nov 12, 2009, 10:04:54 PM11/12/09
to

Guys,

To get a good confidence interval, I'm hoping to get 30-50 people.
So, every person who completes the survey before the Comp scores are
announced -- which is coming up real quick! -- really helps--


Conrad.

DJ Hastings

unread,
Nov 13, 2009, 2:55:24 PM11/13/09
to
The question "Game flow - How stuck did you get?" says the answers go
from 1 (worst) to 10 (best). Does that mean I should rate games lower
for getting me more stuck? (My first impulse was to rate them higher for
more "stuckness.")
-DJ

Ben Cressey

unread,
Nov 13, 2009, 5:21:05 PM11/13/09
to
> The question "Game flow - How stuck did you get?" says the answers go
> from 1 (worst) to 10 (best). Does that mean I should rate games lower
> for getting me more stuck? (My first impulse was to rate them higher for
> more "stuckness.")

My approach was to decide which of the two states was more desirable,
then treat that as a 10.

Though it does make me wonder. Resonance, a game I rated pretty
highly overall, was extremely smooth and polished, so much so that it
felt nearly devoid of any significant challenge. That actually cost
it some points in the overall score, a fact which may not be easily
extracted from the raw survey numbers.

Condemned was far more interesting at first when you had to rescue
your sister, and where it was possible to get somewhat stuck. It went
rapidly downhill thereafter, when (spoiler alert!) the solution
consisted of approximately one thousand "wait" actions. For the
purposes of the survey question, I counted this as "stuck doing
something I really don't enjoy" and gave it a very low rating.

Conrad

unread,
Nov 13, 2009, 10:17:55 PM11/13/09
to
On Nov 13, 5:21 pm, Ben Cressey <bcres...@gmail.com> wrote:

> My approach was to decide which of the two states was more desirable,
> then treat that as a 10.

That's how I designed the survey. In other words, if you were to rate
the game *just* on that one criteria, how would you rate it?

> Though it does make me wonder.  Resonance, a game I rated pretty
> highly overall, was extremely smooth and polished, so much so that it
> felt nearly devoid of any significant challenge.  That actually cost
> it some points in the overall score, a fact which may not be easily
> extracted from the raw survey numbers.
>
> Condemned was far more interesting at first when you had to rescue
> your sister, and where it was possible to get somewhat stuck.  It went
> rapidly downhill thereafter, when (spoiler alert!) the solution
> consisted of approximately one thousand "wait" actions.  For the
> purposes of the survey question, I counted this as "stuck doing
> something I really don't enjoy" and gave it a very low rating.

Maybe I should've included a "challenging" scale. Certainly, I should
have included a "your final Comp score" scale, but I didn't think
people would be willing to go through the trouble.


I suppose, ideally, we should have a Goldilocks rating system, where
you can exceed the top of the scale and have the scores start going
down again.

Food for thought for next year. We'll see what this one can tell us.


Conrad.

Conrad

unread,
Nov 14, 2009, 9:21:02 AM11/14/09
to
On Nov 14, 10:17 am, Conrad <conradc...@gmail.com> wrote:
> On Nov 13, 5:21 pm, Ben Cressey <bcres...@gmail.com> wrote:
>
> > My approach was to decide which of the two states was more desirable,
> > then treat that as a 10.
>
> That's how I designed the survey.  In other words, if you were to rate
> the game *just* on that one criteria, how would you rate it?

ps-

A couple people gave the "stuckness" scale the reverse of the one I
intended. However, they noted the fact in the comments, which will
allow me to normalize them after data collection.


C.

Conrad

unread,
Nov 15, 2009, 6:11:59 AM11/15/09
to
On this blog post, you can see the tentative results of this survey as
they are (or attempt to be) predictive of the IF Comp results. Now,
that's important, because I am not collecting the judges' final scores
for the games.

See the graph here:

http://wp.me/py3Iu-iP

Meantime, the problem I'm facing here is that the data are still TOO
FUZZY -- I've had only *seven* people from newsgroups answer this
survey. If you want IF authors to know where they should invest their
development time to make games you like, you should take the survey
*now* -- it's about to close.

Go here to tell me what you think: http://tinyurl.com/y8m5oxo


Conrad.

0 new messages