Ratings and player tiering

3 views
Skip to first unread message

alight

unread,
Sep 1, 2011, 8:07:11 PM9/1/11
to nullpomino-dev
I've stated before that I don't like Elo much for rating in Tetris,
but I only recently realized precisely why and what can be done about
it. You may also have heard me state a liking of TF's level system.
After some chat with Zircean I believe I've arrived at a system that
would work well for Nullpomino.

Let's start with the level thing first of all. Most people's
complaints with TF's system have to do with properties of that system
that have nothing to do with the levels, such as its inflationary
nature, the ability to buy items to prevent ranking down, and so on.
The thing I like about it is that it provides a way of quite stably
representing a player's relative skill such that it doesn't change
frequently, while still providing some appearance of progress (either
up or down). While my original complaints about Elo's behavior are now
assuaged (it's an Implementation Thing, I'll talk about it later),
level tiering does present us some other benefits:

* A noise-free representation of relative skill among peers
* Easy and intuitive grouping into divisions
* Medium-term goals for players to strive for (leveling up)
* Weakening of the focus on rating in general (more players in an
"equal" rating)

I think these thinks make it worth using a level system, particularly
in light of a push for a more casual-friendly Nullpomino. Plus, should
the path be taken as I suggested in the HD "Let's get serious!"
thread, with a Nullpo Lite and a Nullpo Pro (shouldn't be called this,
we want people who wouldn't care to feel like they are getting the
"whole" Nullpo - so the "Pro" version should be hosted elsewhere or
something) - anyway, should we go that route, we could always show the
full Elo rating to players who strongly care for whatever reason.

On to Elo itself.

My biggest complaint with Elo is its behavior. Your rating changes
every game, sometimes by a lot, and for the higher rated player one
loss can really sting in a way that it shouldn't. While it ranges
around "near" what it should be, it never gets there and stays there.
For me, the longer two players play together in a session, the *less
the ratings should change at all* until an equilibrium is reached.
Equally, for the first few games, there isn't a significant enough
amount of data to decide how the ratings should be altered.

The problem with the way the Tetris games I have seen implement Elo
have done it is that they calculate the updates after each game, on
only that game, as if each game was a separate unrelated session. Elo
actually provides a means for calculating rating updates that are
based on many games, but it seems that nobody has taken advantage of
this.

Elo at its heart is a valuation of relative player skill that
represents the likelihood that each player will win a game. This means
that the more statistical data we include, the more accurately we can
estimate the involved players' ratings - and yet, in all
implementations so far, that information is simply thrown away. This
results in the undesirable behavior I mentioned.

I propose, then, than Elo can be used in a specific way to beget the
traits I mentioned above. Specifically, I would consider ALL the games
played in a room when calculating rating change, or at least the most
recent (large number) of them. This allows Elo to approach an
equilibrium instead of dancing around it. Additionally, for the first
ten games or so, any rating changes should be scaled down and
gradually brought up to 100% - this as a way of preventing the initial
noise from severely affecting ratings until some data has been
gathered that can be useful. It also disincentivizes people from
leaving a room and coming back in order to possibly get a lucky win
for lots of points.

I'm uncertain how Elo would be calculated for games involving more
than two players, but I'm sure somebody who knows the math better than
I can make an informed decision. I suspect you could represent a win
in a 3-player match as 1, 2nd place as .5, and third place as 0 - or
some non-linear distribution even.

I believe that players do want so see "something happen" after a
match, so we want to update ratings every match when appropriate. To
avoid repeatedly sampling the previous games, then, what we want to do
is calculate the adjustment from the ratings the players had when they
joined the room, inclusive of all games played since then. This
enables us to arrive at successively more accurate ratings until
eventually the sample size of the games is large enough to result in
very little change at all.

Finally, a system like Glicko 2 may be a better choice than Elo, and
should really be able to handle the same conditions I've laid out
here. In Glicko's case, I think updating faster than a "group of
rating periods" can be done in a similar way to what I suggested above
- by calculating the player's rating according to all currently
available data, and recalculating it from scratch when new data is
available (new games).

---

How then do we arrive at levels with Elo or some similar system? We
don't want players to be able to "hover around the border" and rank
back and forth between two levels - this defeats the noise-dampening
purpose of said levels. We could do like Tetris Friends does and move
your rating to the middle of the level when you cross a level
boundary, but that is mathematically silly and may present other
problems since nobody is going to make an equivalent gain or loss when
it happens to you. There is another way that I find preferable,
however (credit to caffeine, though I'm not sure if what I am about to
describe is specifically what he had in mind):

If you consider each level to be overlapping by half, then your
current level is the last level you entered - but you enter at the
middle. For example, if level 10 consists of ratings from 1450 to
1550, then level 11 might consist of ratings from 1500 to 1600. To
enter level 11, you must pass 1550 rating - but to fall back to level
10 you must pass below 1500. The range of ratings that make up a level
will necessarily be tied to certain parameters of the rating system,
but you get the picture.

How many levels should be used? Initially, I think 10. The default
rating should always be the MIDDLE of the level range, allowing it to
be a 0-sum game centered around the middle rating. If more levels were
added, then, this becomes easy - change the middle point to level 10,
and now we have levels from 1-20. This will affect everybody's visible
level, but it won't affect their rating at all. Levels will
necessarily be capped off at 0, and I suggest that they be capped at
the top end as well (though they wouldn't necessarily have to be). The
top end cap will prevent high rating players from "losing" as many
levels if the level range is adjusted further in the future.

Alternately, since it is known that Elo ratings cannot fall below 100,
that knowledge can be used to assign a scale that places 1500 at some
appropriate point and leaves the top end open - no level clipping is
then necessary at all.

The problem of starting players at a "level" in the middle was raised
- no player wants to start out and lose levels, after all. A
probationary rating period can be imposed whereby a player's level is
not assigned until they participate in a match that is close enough to
reach a rating equilibrium (after a certain number of games, the
ratings are not changing by very much). This process can be hastened
perhaps by handicapping methods - enabling the player to find
equilibrium even with players out of his range. At this point, the
player's rating can be "locked in" and represented as their actual
level.

A bit of behind-the-scenes cleverness could be used to make the player
appear to be gaining rank from 0 based on the number of games played
and their results before actually reaching this point, but I'm
uncertain yet if I think this is an idea that should be implemented.
It may be better simply to not display any level at all until an
approximate rating can be determined for the player.

---

Lastly, some reasonable points with regards to "rated" vs "unrated"
games were raised in the discussion thread on HD. I am for removing
the distinction entirely, but I am also for representing the player's
level and something like a bar graph to indicate their position within
that level - and entirely removing any numerical ratings aside from
those indicators. I think this will help focus players more on playing
and less on maintaining some particular rating.

I'm further in support of creating certain divisions, such as the
Bronze through Platinum stuff on TF, whereby the room settings
progress towards the "delay-free" play that Tetris pros seem to so
know and love. This is a contentious point and warrants further
discussion in another post. I would like to see both that and
automatic handicapping, with automatic handicapping to be disabled in
the higher divisions as the room settings get faster overall.

All of these things I believe should apply to all players in the main
release version of Nullpomino as a means of creating one single game
experience for everybody to participate in. All the optional room
wankery can be available in a separate "Pro" release as mentioned
before.
Reply all
Reply to author
Forward
0 new messages