rating inflation

Kit Woolsey

unread,

Nov 1, 1995, 3:00:00 AM11/1/95

to

Stephan Pomp (po...@vidar.tsl.uu.se) wrote:

: Hi all,

: actually the top 13 on FIBS all have a rating of 1900 and more. Two are
: even above 2000. Two years ago the number 1 player had a rating of
: about 1850 if i remember correct and during that time the top ratings
: got higher and higher.
: How can one explain this? Is this only due to the increasing number of
: players? Or are there e.g. more and more 'good' players involved, not only
: computer and net-freaks ? Remember also that kitwoolsey was the first one
: to be aove 2000 (was it this spring?), he even reached 2040-something i think. Now he is some 200 points lower for whatever reason.
: Does anyone have a kind of a statistic about the number of _active_ players?

As I understand it, this rating inflation has to do with new players who
aren't very strong. They play for a while, lose more than they win, and
eventually give up playing. The players they have played have increased
their ratings, which means the opponents of these fortunate players also
tend to have higher ratings since they are playing players with the same
skill but slightly higher ratings, etc. It's a slow process, but the end
result is the rating inflation you see. I believe this occurs in the
chess world also, where a similar rating system is employed.

As for my own rating drop from my high point, I guess I'm just another
statistic.

Kit

Stephan Pomp

unread,

Nov 1, 1995, 3:00:00 AM11/1/95

to

Hi all,

actually the top 13 on FIBS all have a rating of 1900 and more. Two are
even above 2000. Two years ago the number 1 player had a rating of
about 1850 if i remember correct and during that time the top ratings
got higher and higher.
How can one explain this? Is this only due to the increasing number of
players? Or are there e.g. more and more 'good' players involved, not only
computer and net-freaks ? Remember also that kitwoolsey was the first one
to be aove 2000 (was it this spring?), he even reached 2040-something i think. Now he is some 200 points lower for whatever reason.
Does anyone have a kind of a statistic about the number of _active_ players?

Greetings from Sweden,

Stephan (aka SAP)

Stephen Turner

unread,

Nov 1, 1995, 3:00:00 AM11/1/95

to

Stephan Pomp wrote:
>
> actually the top 13 on FIBS all have a rating of 1900 and more. Two are
> even above 2000. Two years ago the number 1 player had a rating of
> about 1850 if i remember correct and during that time the top ratings
> got higher and higher.
> How can one explain this? Is this only due to the increasing number of
> players?

Yes; if there are more players, one would expect them to be spread more widely.
Also there is no reason that the mean should stay the same over time. Ratings
only mean something relative to other ratings from the same date.

This is an issue in chess when people try and compare Kasparov with Fisher
(say) by comparing their best ratings. It's meaningless.

--
Stephen R. E. Turner
Stochastic Networks Group, Statistical Laboratory, University of Cambridge
e-mail: sr...@cam.ac.uk WWW: http://www.statslab.cam.ac.uk/~sret1/home.html
"I always keep one big file in case I run out of space." A colleague of mine

Steve Koca

unread,

Nov 2, 1995, 3:00:00 AM11/2/95

to

( This is really Bob Koca writing)

Here is another possible explanation of ratings inflation.
Actually I think ratings expansion would be a better term.

Suppose, for some game that there is a fixed pool of players of
various abilities and they will start a FIBS like server
( and with a FIBS like rating system). They all will
play regularly and no new players will join.

If they all start at 1500 then obviously at the start the ratings
will expand with best players getting higher and higher and the worst
players getting lower and lower. This will cause the ratings to expand
until the ratings system is made accurate. In other words, until the
best players are rated high enough above an average player, so that
the ratings formula gives an accurate estimate of the probability
of the best player beating an average player.

Looking at FIBS 2 or 3 years ago when the best ratings were about 1800
I think it is clear that the ratings formula would underestimte
the probability of a best player beating an average player. I am unsure
whether this is true today.

Some miscellaneous points:

1) If this idea does not explain current inflation, it surely explains
a lot of, I would even say most of, past ratings inflation.

2) I don't think it would be too hard to test whether this idea
explains current inflation. (hint)

3) Perhaps the rating system could have been devised so that ratings
started at the right amount of spread.

4) I agree that the other explanations involving new players losing some
games and then dropping out, more players means more randomness which
means higest ratings higher, also explain some of the inflation.

,Bob Koca
bobk on FIBS
replies to ko...@bobrae.bd.psu.edu please

see...@accessone.com

unread,

Nov 5, 1995, 3:00:00 AM11/5/95

to po...@vidar.tsl.uu.se

Stephan Pomp <po...@vidar.tsl.uu.se> wrote:
>
>
>actually the top 13 on FIBS all have a rating of 1900 and more. Two are
>even above 2000. Two years ago the number 1 player had a rating of
>about 1850 if i remember correct and during that time the top ratings
>got higher and higher.
>How can one explain this? Is this only due to the increasing number of

>players? Or are there e.g. more and more 'good' players involved, not only
>computer and net-freaks ? Remember also that kitwoolsey was the first one
>to be aove 2000 (was it this spring?), he even reached 2040-something i think. Now he is some 200 points lower for whatever reason.
>Does anyone have a kind of a statistic about the number of _active_ players?
>
>Greetings from Sweden,
>
>Stephan (aka SAP)

******************

Hello, Stephan.

As a lurker-about-to-join-FIBS, I will intro with a post to you. :)

Having been a longtime chess-player and local chess club office-holder, I
once wrote a ratings program for the Tacoma (Washington, USA) Chess Club.
This entailed a rather involved exploration of many rating and
ranking systems (i.e., Ingo, British Grading, Swiss) used by various
sporting associations and performance institutes. As it happens, I
selected the Elo system for my program. This is the same system that is
used by FIDE (World Chess Federation), USCF (US Chess Federation), and --
I believe -- FIBS. The remarks that follow are based on my (strictly
informal and casual) research, the experience in writing the program, and
the impressions I gathered from speaking with Prof. Arpad Elo shortly
before his death several years ago. Please consider them off-the-cuff,
and open to free criticism or revision by anyone with a better grasp of
the subject matter. I am extensive only as it might help those who are
unfamiliar with Elo's remarkable system. Others, please excuse my
self-indulgence in the next couple of paragraphs!

The Elo system is based wholly on statistical probability theory. Simply
put, it is a universal system for the determining of outcomes in which
opponents or teams face each other in direct competition. (Those
interested might consult Elo's "The Rating of Chess Players, Past and
Present", or Elo's original monograph in the Journal of Gerontology,
1965). The Elo system has been used by the USCF since 1960 and was
adopted by FIDE as the national and international standard in 1970.

Elo's concept was straightforward -- quite unlike the complex math it
involves: humans have good days and bad, and their performances in a
given situation will fluctuate accordingly. Any two or more people who
compete in a given activity over time will, over time, yield a record of
their performances. These recorded results can be used to yield a
probabilistic table of winning expectancies. Inversely, this can be used
to derive "ratings", or the probable winning expectancy of a given
competitor in a given field. Elo combined the principles of standard
deviation and normal distribution (familiar to any insurance company)
with his numerical interval scale to develop a system by which many
performances of an individual would be normally distributed. That is to
say that the competitor would have a scoring probability that could be
converted to a rating differential. What this rating means to us, of
course, is that a player who was rated 2000, when competing with an 1800,
should win about 75% of the time. A cornerstone of the Elo theory is
that "class" or "category" intervals follow the statistical concept of
standard deviation in single games. What this means to us is that, while
a player competing in a group consisting of those with similar ratings
would find relatively even competition, there is a quantifiable point at
which the the difference in ratings would find him either being clearly
outclassed or outclassing his opponents. Thus, for us chessplayers, the
"class prize" became statistically valid, allowing players in the same
numerical rating-group to compete with reasonable chances. Thus all
"Class A" players (ratings from 1800-1999) might compete for the "A"
prize, while "Experts" (2000-2199) would fight for the "Expert" prize.
Even in competitions like open-Swiss system pairings, where the strength
of their opposition in individual games would be randomized, the players
would compete for the highest score within their own class,

It is recognized that, given any general rating 'pool', the improvement
of young players (and others new to the competition) will be rapid, while
established competitors would see their rating stabilize for a long
period, then slowly decline. For either group, however, ratings seldom
decline to the point that they were when the player entered the pool.
This tends to "inflate" the rating pool, skewing the results, after
awhile, in the direction of change. This unpleasant situation would make
it impossible to compare a player from one era to another, or to see a
rating stabilize at a "true" level, for purposes of performance
evaluation. Fortunately, there are a number of remedies, most (if not
all) of which have been employed by the USCF, among others:

1. Use of 'provisional' ratings to recognize players whose youth or
introduction to the game would signal a rapid improvement of results.

2. Use of a multiplier to enhance or degrade a given performance
grouping. This "K-coefficient" (as Elo describes it) can be used to
accerlerate improving players to their proper location in the rating
field, while a lower "K" would be used for older or better-established
players.

3. Periodic adjustment of the entire pool, as the British Chess
Federation does (or did, anyway, when I was up on all this).

4. The award of bonus or feedback points, whose purpose would be to
compensate or handicap players who encounter those with unstabilized or
less-documented performance tracks. (Like giving a Master player a 10%
return in rating points if he were to be defeated in a match by a new,
talented player).

5. Regardless of the above, establishing a Ratings Director, who would
monitor the systematic drift of the rating pool and maintain a 'control
group' of players, whose ratings would be used as a baseline for
implementing appropriate adjustments. This method is standard for all
well-established rating systems.

Frankly, I know little of the Internet Backgammon Server, but I am sure
they already have their pool-inflation under control. I tracked
undesired rating-pool inflation and wrote a compensating routine that
used the FIDE standard for K-coefficients and monitored all provisional
ratings for 30 games before issuing established ratings and modifying the
K-values. Testing it against grandmaster result-tables from Sarajevo,
Moscow, and the International Chess Olympiad, the final version
paralleled FIDE within a single percentage point. You know the funny
thing, Stephan? I only wrote the thing to rate after-hours pinochle
games at the chess club! :) I only add this to show that if I could do
that program on an old Commodore-64, they can (and probably are) doing
wonders now, assuming they have a ratings director.

Well, so much for all that. I hope I haven't bored you with too many
non-essentials, and that the final segments answered your questions about
ratings-inflation. In closing, I want to add that the ONLY reason I
posted this was that, after having my wife dig out all my old flowcharts
and notes and books, she said more than one person had better read it!:)

So long, and nice to have corresponded with you. Feel free to write me
or post a reply, Stephan.

see...@accessone.com

Article Unavailable

see...@accessone.com

unread,

Nov 5, 1995, 3:00:00 AM11/5/95

to po...@vidar.tsl.uu.se

see...@accessone.com

unread,

Nov 5, 1995, 3:00:00 AM11/5/95

to po...@vidar.tsl.uu.se

see...@accessone.com

unread,

Nov 5, 1995, 3:00:00 AM11/5/95

to po...@vidar.tsl.uu.se

see...@accessone.com

unread,

Nov 5, 1995, 3:00:00 AM11/5/95

to po...@vidar.tsl.uu.se