KGS's slow rating system

5 views
Skip to first unread message

. . .

unread,
Mar 28, 2003, 7:07:29 AM3/28/03
to
Hello...

What's up with KGS's rating system? I've won about 17 of my last 20
games being a 8kyu, and still haven't gone up. I'm not so much
concerned about the value of my rank itself but about playing even
games.. Or should I just stop whining and add handicap when necessary?

. . .

Mehmet Dardeniz

unread,
Mar 28, 2003, 9:14:25 AM3/28/03
to
". . ." <x...@x.com> wrote in message
news:gae88vshbd3br8cp8...@4ax.com

Hi,
When you look for your rating graph from http://kgs.kiseido.com/
what did you see?

--
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG

Steve

unread,
Mar 28, 2003, 9:35:17 AM3/28/03
to
There could be any one of a number of reasons for this. It all depends on
who you played and what there result were. For example, if you won even
games against players weaker than yourself and lost three games on the
correct handicap then this is not unreasonable.

I suggest that the best solution is to chat with an admin when you are
online. It is very lielly that they can help you.

DrStraw
KGS Admin


". . ." <x...@x.com> wrote in message

news:gae88vshbd3br8cp8...@4ax.com...

Oliver Richman

unread,
Mar 28, 2003, 12:09:43 PM3/28/03
to

"Steve" <rgg@eklectika..com> wrote in message
news:b61mie$8d5d$1...@news3.infoave.net...

> I suggest that the best solution is to chat with an admin when you are
> online. It is very lielly that they can help you.

Lielly? You sure know how to give a guy hope, DrStraw ;)

-frl


Christopher Hayashida

unread,
Apr 1, 2003, 9:57:33 PM4/1/03
to
. . . <x...@x.com> wrote in message news:<gae88vshbd3br8cp8...@4ax.com>...

Part of the confusion with both the AGA and KGS rating methods is the
fact that they use statistical models to determine ratings. The idea
is something like what follows:

Since you are 8k, if you play another 8k (like me) in an even game,
you should win roughly 50% of the time.

If you are playing a 9k in an even game, however, the model predicts
that it is more likely that you will win. In the AGA model, it's about
83% of the time. If you are playing a 10k in an even game, you should
win 97% of the time. I didn't compute the percentage for the KGS
algolrithm, but it's a similar function.

Your rating will go up or down the more you deviate from the
statistical norm. If you are on a streak winning (or losing) against
other 8k players, your rating will adjust. If you are winning a lot
more than 50% of those games, your rating will move to 7k. If you are
losing a lot more than 50% of those games, your rating will move
closer to 9k.

However, if you are playing even games against 10k players, it will
take a lot more wins to improve your rating. You'd have to win a lot
more than 97% of the games before your rating will move. If you only
win about 83% of even games against 10k players games, your rating can
actually move closer to 9k (because a 9k player should win 83% of his
games against a 10k player in even games.)

Therefore, if you are on a winning streak (in even games) against 9k
and 10k players, it takes a lot more wins before the model will find
that you are outside of the normal range and your rating goes up.

The way that both models regard handicap games is also similar. If you
are playing a 10k player at 2 stones, or a no-komi game against a 9k
player, the models reset so that you should have a 50% chance of
winning.

I suspect what is happening is that you are winning a large number of
games, but they are under-handicapped. If you win 8 out of 10 games
against a 9k, but they are even games, then you are still really close
to the statistical model, and your rating will not change much. (Note
that if you play even against a 9k, and you lose 5 out of the 10
games, it can still hurt your rating because you are further from the
statistical norm.)

The best way to increase your rating is to win games where you give
more stones than normal (against a weaker player) or take fewer than
normal (against a stronger player.) Both situations are weighted
against you, and if you manage a 50/50 split in the games, your rating
will still go up.

There are, of course, other factors. You can go up in rating (even
without playing) if players you beat are moving up as well. If
everyone else is beating the players that you beat, your rating might
stay about the same, while the player with the losing streak will have
his rating fall. It's a more complicated model than what's presented
here.

At any rate, I suggest at least giving as many stones as is suggested
by the client and go from there. If nothing else, it will help you
learn to settle groups quickly and play lightly.

HTH,

Chris Hayashida

frisco

unread,
Apr 2, 2003, 10:58:44 AM4/2/03
to
What body of data was used to construct these models; or were they simply
made up for their nice-looking appearance, and now reality is whatever
conforms to them?

"Christopher Hayashida" <ch...@prosum.com> wrote in message
news:556ed15a.03040...@posting.google.com...

Bill Spight

unread,
Apr 2, 2003, 6:39:09 PM4/2/03
to
Dear Chris,

> If you are playing a 9k in an even game, however, the model predicts
> that it is more likely that you will win. In the AGA model, it's about
> 83% of the time.

Which is ridiculous. I doubt if it is more than 60% of the time. (Low
kyu games have high variance).

> The way that both models regard handicap games is also similar. If you
> are playing a 10k player at 2 stones, or a no-komi game against a 9k
> player, the models reset so that you should have a 50% chance of
> winning.

Which is just wrong. If an 8 kyu takes White against a 9 kyu, with no
komi, the game favors White. The fact that White does not receive komi
is worth only a 1/2 rank difference. White should give komi. (Or Black
should take 2 stones and give komi. Or the rating system should take
White's advantage into account.)

Best,

Bill

Araldo van de Kraats

unread,
Apr 2, 2003, 6:45:11 PM4/2/03
to
> Since you are 8k, if you play another 8k (like me) in an even game,
> you should win roughly 50% of the time.
>
> If you are playing a 9k in an even game, however, the model predicts
> that it is more likely that you will win. In the AGA model, it's about
> 83% of the time. If you are playing a 10k in an even game, you should
> win 97% of the time.

These numbers are quite high, compared with the European tournaments
statistics:
http://www.european-go.org/rating/statev.html

About 50% - 54% - 60% instead of 50% - 83% - 97%

Araldo van de Kraats


William M. Shubert

unread,
Apr 2, 2003, 9:31:22 PM4/2/03
to
Bill,

Just to clear it up, both of your complaints are not applicable for the
KGS system. (Maybe not for AGA either, I can't say). For the first, I
actually did an analysis about a year ago of all KGS rank data, and
computed the proper constant for probability of a win. For the equation at
http://kgs.kiseido.com/en_US/help/math.html the constant k is set to -0.8,
giving a 68% chance of winning an even game against a 1 stone weaker
opponent. When I set the tuning parameter to this it gave the overall rank
system the best accuracy in predicting the outcome of future KGS games. I
testing making this constant vary based on the ranks of the players
involved, but did not find any measureable correlation, so by my
measurements the "low kyu games have high variance" statement is not
correct on KGS. It took a lot of work to do all this, but since doing this
work I've been very happy with the rank system's performance. In a few
years I may revisit the whole problem (KGS is bigger now and maybe the
population characteristics have changed) but it is so time consuming and
complex that for now I'm going to just leave it be!

For your second complaint, KGS does treat a zero komi game as giving a 0.5
stone advantage to black. I would be very surprised if the AGA did not do
the same.

Christopher Hayashida

unread,
Apr 3, 2003, 3:19:41 AM4/3/03
to
Bill Spight <Xbsp...@pacbell.net> wrote in message news:<3E8B74B7...@pacbell.net>...

> Dear Chris,
>
> > If you are playing a 9k in an even game, however, the model predicts
> > that it is more likely that you will win. In the AGA model, it's about
> > 83% of the time.
>
> Which is ridiculous. I doubt if it is more than 60% of the time. (Low
> kyu games have high variance).

I can't vouch for the model, or for the statistics. I just tried to
explain the model a bit when I probably should have just left a link.
The information for the AGA model is
http://www.usgo.org/resources/downloads/aga-rating.pdf. I apologize if
I confused the issue.

> > The way that both models regard handicap games is also similar. If you
> > are playing a 10k player at 2 stones, or a no-komi game against a 9k
> > player, the models reset so that you should have a 50% chance of
> > winning.
>
> Which is just wrong. If an 8 kyu takes White against a 9 kyu, with no
> komi, the game favors White. The fact that White does not receive komi
> is worth only a 1/2 rank difference. White should give komi. (Or Black
> should take 2 stones and give komi. Or the rating system should take
> White's advantage into account.)

You are correct, komi is handled differently. I should have used a
2-stone game with a 10k and a 3-stone game with an 11k instead. I was
just trying to give an example how using a handicap moves the expected
outcome closer to 50/50.

Well, the point of the post, before wandering off into the statistics,
was that the person who was on the winning streak benefits less from
underhandicapped games. It still holds true, depite my errors in my
explanation.

Sorry for the confusion,

Chris Hayashida

Petri P

unread,
Apr 3, 2003, 3:32:55 AM4/3/03
to
Why don't you start a new account? That way your rating is evaluated from
fresh and you can get it corrected in far fewer games. And you don't end up
sandbagging people- unless you like it :)

What ever rating system is a negative feedback control system and has some
settling time. Shorter the settling time, noisier is the control result.
And they are really meant for estimating stable ratings.

What KGS systems does by complex ML estimation end up some sort non-linear
P-controller. To cope with improving players quickly it would have be PD or
PID type of controller. How to that on top ML. I dunno. Somehow the delta of
the rating over past would have to be introduced into system. To keep it
stable and still responsive to developing players.

Petri P
(petri on KGS)


". . ." <x...@x.com> kirjoitti
viestissä:gae88vshbd3br8cp8...@4ax.com...

kogo

unread,
Apr 5, 2003, 1:32:52 AM4/5/03
to
This reminds me of that bad 80's song: "Blinded by Science". Instead, it's
"blinded by statistics", and it keeps being played. Rating should be decided
by win/loss record, not misapplication of probability statistics. The fact
that you win or lose is not probabilistic; it's a fact, and probabilitistic
extrapolation is inappropriate. The externalities to the common rating
system statistical approach are enormous, but remain unseen by the
conceptually challenged (Bill Shubert & Tweedie, most notably, but with many
peers). Just because you can use statistics doesn't mean you have a
conceptual understanding of their proper application. It appears that only
those players who are experientially victimized by the rating system because
they're winning, but not being promoted, seem to catch onto the fact that
something is wrong, even if they don't know what.

- kogo
__________
"William M. Shubert" <w...@igoweb.org> wrote in message
news:pan.2003.04.03....@igoweb.org...

-

unread,
Apr 5, 2003, 2:44:50 AM4/5/03
to

"kogo" <ko...@waterfire.us> wrote:
> This reminds me of that bad 80's song: "Blinded by Science". Instead,
> it's "blinded by statistics", and it keeps being played. Rating should be
> decided by win/loss record, not misapplication of probability statistics.


Is this the same "kogo" who keeps cropping up in these
discussions and then runs away whenever somebody exchanges?
What is your purpose in posting if you pay no attention to replies?
Or is it the case that you're simply dysfunctionally non-communicative?


> The fact that you win or lose is not probabilistic; it's a fact, and
> probabilitistic extrapolation is inappropriate.


Considerable thought has been invested into the source of your
misunderstanding, but it seems to boil down to certain perceptual
differences among different classes of observors, i.e. (a) players who
experience the "fact" of win/loss record in context to a tournament,
and (b) tournament organizers and ratings-calculators/assigners who
do not experience the "fact" of individual win/loss records but instead
deal with "the hurricane of facts" in terms of very large numbers which,
in the limit, could tend to approach -continuous- variational quantities.


> The externalities to the common rating system statistical approach are
> enormous, but remain unseen by the conceptually challenged (Bill Shubert
> & Tweedie, most notably, but with many peers).


Everybody is "conceptually challenged" to be sure, but you are amiss
in presuming that Bill Shubert & Tweedie are "conceptually challenged"
in some fashion you think, or that they are "conceptually challenged" in
a manner similar to your own, or that they are "conceptually challenged"
similarly to each other. People are not all "conceptually challenged" in
the same way: this -essential- error is one source of your difficulties.


> Just because you can use statistics doesn't mean you have a
> conceptual understanding of their proper application.


Use of statistics is -not- a moral philosophy. You should clarify
what you mean by "proper" and, if you seek to make a case, prove it.

> It appears that only those players who are experientially victimized by
> the rating system because they're winning, but not being promoted,
> seem to catch onto the fact that something is wrong, even if they
> don't know what.


Perhaps if they stopped losing altogether then they'd figure it out.
The total number of wins equals the total number of losses. Overall,
the "ratings system" tends to stay balanced, with very little drift. If
minor drift occurs, it is correctable. If an entire ratings system needs
adjustment, then there can be "a great leap forward" (or "a great leap
backward"). An event such as that -did- occur, which means that its
implementation is not impossible, and that committees are formed to
investigate concerns such as yours, awarding them also a statistical
level of significance, whenever "appropriate" and "proper."

- regards
- jb

------------------------------------------------------------
STILL RUNNING THE SHOW...
THE "HEAD" OF AL-QAEDA
http://www.geocities.com/beaver_militia/brain.html
------------------------------------------------------------

Andrew Walkingshaw

unread,
Apr 5, 2003, 6:03:23 AM4/5/03
to
In article <XLuja.372359$sf5.6...@rwcrnsc52.ops.asp.att.net>, kogo wrote:
> This reminds me of that bad 80's song: "Blinded by Science". Instead, it's
> "blinded by statistics", and it keeps being played. Rating should be decided
> by win/loss record, not misapplication of probability statistics.

This statement is meaningless. Ratings are decided by a combination of
win-loss record and strength of opponents; clearly winning 50% against
3 dans is more valuable than winning 50% against 20 kyus.

> The fact
> that you win or lose is not probabilistic; it's a fact, and probabilitistic
> extrapolation is inappropriate.

Um. It's *precisely* probabilistic, _before the game is played_[0];
based on one's prior knowledge of the two players, one has essentially a
Bayesian prior which allows a prediction of likely outcomes - in terms
of the likelihood of each player winning. Once the game is complete, you
have perfect information on the result - so you update your Bayesian
prior in the light of this.

In what way is this *not* a statistical process? The way rating systems
work is by, essentially, predicting the result of a game or series of
games (these systems all work best in the large game limit, but that's
obvious population statistics), and then correcting the prior in light
of the accuracy, or otherwise, of the results.

I see no other way a rating system could work; indeed, even if a rating
system is non-algorithmic, at heart it's going to boil down to "he's
consistently beating 3 kyus, so he's 2 kyu" - which is the same thing
without the algebraic coating. As such, I await the publication of your
"non-statistical" system with baited breath...

- Andrew

[0] Pretty much standard in statistical mechanics, really, which is what
a rating system is a good example of.

[1] Author of the most successful rating algorithm, the ELO system, which
has been very accurately *predicting* the result of chess tournaments
for over twenty years. It's also applied to Go in Europe with an
impressive degree of success.

--
Andrew Walkingshaw | andrew...@lexical.org.uk

Nick Wedd

unread,
Apr 5, 2003, 6:08:46 AM4/5/03
to
In message <XLuja.372359$sf5.6...@rwcrnsc52.ops.asp.att.net>, kogo
<ko...@waterfire.us> writes

>This reminds me of that bad 80's song: "Blinded by Science". Instead, it's
>"blinded by statistics", and it keeps being played. Rating should be decided
>by win/loss record, not misapplication of probability statistics. The fact
>that you win or lose is not probabilistic; it's a fact, and probabilitistic
>extrapolation is inappropriate. The externalities to the common rating
>system statistical approach are enormous, but remain unseen by the
>conceptually challenged (Bill Shubert & Tweedie, most notably, but with many
>peers). Just because you can use statistics doesn't mean you have a
>conceptual understanding of their proper application. It appears that only
>those players who are experientially victimized by the rating system because
>they're winning, but not being promoted, seem to catch onto the fact that
>something is wrong, even if they don't know what.

The rating systems used on KGS, IGS, and elsewhere are based on
statistical models which are documented and understood. If you are
proposing something else, perhaps you could let us know what it is?

Nick
--
Nick Wedd ni...@maproom.co.uk

-

unread,
Apr 5, 2003, 6:58:52 AM4/5/03
to

Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
> [1] Author of the most successful rating algorithm, the ELO system,
> which has been very accurately *predicting* the result of chess
> tournaments for over twenty years. It's also applied to Go in
> Europe with an impressive degree of success.


Was this some footnote for a missing footnoted text?


"Nick Wedd" <ni...@maproom.co.uk> wrote:
> The rating systems used on KGS, IGS, and elsewhere are based on
> statistical models which are documented and understood. If you are
> proposing something else, perhaps you could let us know what it is?


"Kogo" has been asked this a few times during the past
three-or-four years, if this is indeed the same "kogo" as before.
As an aside, consider the numerical profiling for the political
statistics chart http://www.digitalronin.f2s.com/politicalcompass/ .
The number of questions was limited, maybe selected from a
much larger set. Presumably there had been a very large set of
questions, which was presented to a very large sample of people
answering the balloting quiz, and then the smaller set of questions
was selected which provided the best "quantizing" over the range
depicted ( essentially "civil liberties" along vertical and "economic
theory" along horizontal ). A distribution was obtained over the
grid that maximizes the normalized sum of neighbor distances among
all components in the mix. Given "N questions" in an original set,
the quantizing procedure might look at -all- "combi(N,X) subsets",
(where a < X < b provides bracketing parameters) yielding some
suitable "X" value for numbers of questions on a finalized ballot
quiz, which was what is obtained today, apparently.

The same sort of "quantizing" occurred more rigorously by some
political scientists (Nolan McCarty & Keith T. Poole) using a Fortran
Program which performed "multi-dimensional W-nominate" for locating
primary axis about which political events cluster. Their development
milestones occurred during 1982-1984, 1986-1987, 1991, 1992, & 1997.
Crunching legislative data from the past 100 years of "Congressional
Acvitity" (perhaps an oxymoron), they arrived at confirmation of
these two axis (as with "political compass" quiz) providing the most
reliable means for -framing- the breakout for political sentiments,
and thereby a highly-correlative means for political quantizing,
given the grid-spread. This indicates also a centralizing tendency
which works against any of the outliers at the extremes, since
any outliers would imply relative "clustering" of interior data more
densely, thus reducing any maximizing of all neighbor distances.

In work by McCarty & Poole, however, they first wished to
examine for clustering phenomena, in order to obtain each
multidimensional axis now found also in the "Political Compass"
balloting quiz. Furthermore, they found only two, basically.
Then, once obtaining reliable clustering, a logical next step,
for "political compassing", was to obtain a reasonable subset of a
maximal set of questions concerning political topics, which thereby
could supply quantizing, (max entropy) rather than clustering.

- regards
- jb

----------------------------------------------------
Examining the Robustness of Ideological Voting ...
http://www.msu.edu/~jenki107/AJPSfinal2.pdf

W-NOMINATE FORTRAN program and executable
http://voteview.uh.edu/dwnl_4.htm

MULTI-DIMENSIONAL W-NOMINATE ( flow diagram )
http://voteview.uh.edu/ideal_point_NOMINATE.htm

The Geometry of Multidimensional Quadratic Utility ...
http://web.polmeth.ufl.edu/pa/PA93-211-226.pdf

The Influence of Jurisprudential Considerations on Supreme Court ...
http://www.arches.uga.edu/~cmslind/research/influence.htm

The Statistical Analysis of Roll Call Data
http://jackman.stanford.edu/papers/siqss2.pdf

Government Instability with Perfect Spatial Voting ...
http://home.gwu.edu/~voeten/RosenthalVoeten.pdf

Critical Elections, Divided Government, and Gridlock ...
http://sobek.colorado.edu/~esadler/macropolitics/ HeitshusenYoung.pdf

Examining the Linkage Between Descriptive and Substantive ...
http://www2.chass.ncsu.edu/cobb/me/past%20articles%20and%20working%20papers/
cobb%20and%20jenkins%202001%20PRQ.PDF
----------------------------------------------------

Roy Schmidt

unread,
Apr 5, 2003, 1:27:26 PM4/5/03
to
"kogo" <ko...@waterfire.us> wrote:

> This reminds me of that bad 80's song: "Blinded by Science".
Instead, it's
> "blinded by statistics", and it keeps being played. Rating should be
decided
> by win/loss record, not misapplication of probability statistics.
The fact
> that you win or lose is not probabilistic; it's a fact, and
probabilitistic
> extrapolation is inappropriate. The externalities to the common
rating
> system statistical approach are enormous, but remain unseen by the
> conceptually challenged (Bill Shubert & Tweedie, most notably, but
with many
> peers). Just because you can use statistics doesn't mean you have a
> conceptual understanding of their proper application. It appears
that only
> those players who are experientially victimized by the rating system
because
> they're winning, but not being promoted, seem to catch onto the fact
that
> something is wrong, even if they don't know what.

Questions for kogo:

1. Without reference to probability, how do you determine whether two
players are the same strength, given that komi practically eliminates
draws?

2. George enters a tournament. In the first round, he plays a 3-kyu
on even and wins. In the second round, he loses to a shodan on even.
Third round, he beats a 1-kyu on even, and in the fourth round he wins
against a 2-dan on even. What rank/rating do we assign to George?
Answer this question without reference to statistics or probability.

Cheers, Roy

--
my reply-to address is gostoned at insightbb dot com
-------------------------------------------------
Roy Schmidt
Part-time Translator for Yutopian
Full-time Professor of Business Computer Systems
Bradley University

Bill Spight

unread,
Apr 6, 2003, 3:06:54 AM4/6/03
to
Dear Roy,

> 2. George enters a tournament. In the first round, he plays a 3-kyu
> on even and wins. In the second round, he loses to a shodan on even.
> Third round, he beats a 1-kyu on even, and in the fourth round he wins
> against a 2-dan on even. What rank/rating do we assign to George?
> Answer this question without reference to statistics or probability.

We assign him the average of the ratings of the shodan and 2-dan. One or
both of those results are anomalous. We assume that both are, but to the
minimum extent.

You can make perfectly reasonable even game ratings without appeal to
probability.

Ciao,

Bill

Alec Edgington

unread,
Apr 6, 2003, 3:53:46 AM4/6/03
to
The statistical model is, of course, just a model. There is no perfect
model of rank. It's easy to see why: there can be three players A, B and C
with peculiar (deterministic) strategies such that A always beats B, B
always beats C and C always beats A! So any rank-based model is an
imperfect model of results. In my experience the KGS model works well
enough to give interesting games most of the time, and that is all that
matters.

Alec

"kogo" <ko...@waterfire.us> wrote in
news:XLuja.372359$sf5.6...@rwcrnsc52.ops.asp.att.net:

Bill Spight

unread,
Apr 6, 2003, 4:04:49 AM4/6/03
to
Dear Andrew,

> > The fact
> > that you win or lose is not probabilistic; it's a fact, and probabilitistic
> > extrapolation is inappropriate.
>
> Um. It's *precisely* probabilistic, _before the game is played_[0];
> based on one's prior knowledge of the two players, one has essentially a
> Bayesian prior which allows a prediction of likely outcomes - in terms
> of the likelihood of each player winning. Once the game is complete, you
> have perfect information on the result - so you update your Bayesian
> prior in the light of this.
>
> In what way is this *not* a statistical process? The way rating systems
> work is by, essentially, predicting the result of a game or series of
> games (these systems all work best in the large game limit, but that's
> obvious population statistics), and then correcting the prior in light
> of the accuracy, or otherwise, of the results.
>
> I see no other way a rating system could work; indeed, even if a rating
> system is non-algorithmic, at heart it's going to boil down to "he's
> consistently beating 3 kyus, so he's 2 kyu" - which is the same thing
> without the algebraic coating. As such, I await the publication of your
> "non-statistical" system with baited breath...
>

[snip]

> [0] Pretty much standard in statistical mechanics, really, which is what
> a rating system is a good example of.
>

First, a person's go strength is multi-dimensional, so a single-number
rating is fuzzy, and not to be taken too seriously, anyway.

Second, the connection between ratings systems and statistical mechanics
is debatable.

Third, traditional go rankings are based, not on probabilities, but upon
handicaps. The translation between rankings and probabilities is not
obvious. The statistics from European tournaments differ from Shubert's,
for example. When you define the ratings in terms of probabilities, you
gain consistency, but attenuate that connection. From what I have heard
over the years, online rank differences tend to be narrower than
traditional rank differences, so that Black is favored in handicap
games. For that reason, I have heard, players become reluctant to give
handicaps because of that. So ratings are based primarily upon even
games, perpetuating the disconnect with traditional rankings. (I do not
know if that is still the case, if it ever truly was. I'm just reporting
what I heard.)

Fourth, there is the question of the stability of the system. This is a
problem for all rating systems. 100 years ago a Japanese amateur shodan
took 4 stones from pros. There has been obvious inflation over the
years. Pro ratings have undergone inflation, as well. If yours is the
only system, that is not much of a problem, because it is the relative
ratings that are the main thing. However, there is a big psychological
problem with an unstable system that has to do with recurrent complaints
about online ratings systems. An unstable system violates players'
expectations that winning should increase their ratings and that losing
should decrease them. That is a recurrent complaint about online
systems. And, from what I heard a few years ago, online systems are not
particularly designed for stability. (Again, I do not know to what
extent those complaints are or were justified.)

Fifth, all reasonable ratings systems are self-correcting. Even those
that reward you for winning and punish you for losing. Even those that
do not appeal to sophisticated mathematics. Even those that are closely
tied to handicaps. Years ago I administered such a system. There was a
certain amount of mathematical sophistication in its design, but it was
designed for simple administration by the players themselves. As for
stability, I had to promote everybody by 1/2 stone after 2 years, to
bring us in line with other rankings. Then I tweaked the system to make
it more stable.

For psychological reasons, and for predicting proper handicaps, I would
favor such a system today. But that's not my department. ;-) BTW, I have
noted fewer complaints in recent years about online rating systems, and
besides, they are self-correcting. :-)

Best,

Bill

Barry Phease

unread,
Apr 6, 2003, 5:11:30 AM4/6/03
to
Bill Spight wrote:


> First, a person's go strength is multi-dimensional, so a single-number
> rating is fuzzy, and not to be taken too seriously, anyway.

I think that this effect is overstated. While there are examples of people
having difficulty with a particular player that is nominally weaker than
them, in most cases a properly designed rating system performs well at
predicting the result of any particular game.

>
> Second, the connection between ratings systems and statistical mechanics
> is debatable.

Not if the rating system is based on statistical mechanics. :)

You can design ratings systems that ignore statistical mechanics but they
usually turn out to be approximations of what the statistical model would
give (if they are any good).

>
> Third, traditional go rankings are based, not on probabilities, but upon
> handicaps. The translation between rankings and probabilities is not
> obvious. The statistics from European tournaments differ from Shubert's,
> for example. When you define the ratings in terms of probabilities, you
> gain consistency, but attenuate that connection.

It is true that the exact function P1(r1,r2,h) describing the probability of
winning for the average person of any rank r1 when playing the average
person of rank r2 on handicap h is not known. Shubert and the Europeans
have different approximations. Yes! such approximations can deviate from
the traditional ratings in terms of handicaps. The only way to get rid of
any deviation is to make sure that the ratings are based on handicap games
as much as possible. Of course this is true for traditional systems too.
It makes no sense to adjust the ratings for games that are played off
handicap.

>From what I have heard
> over the years, online rank differences tend to be narrower than
> traditional rank differences, so that Black is favored in handicap
> games. For that reason, I have heard, players become reluctant to give
> handicaps because of that. So ratings are based primarily upon even
> games, perpetuating the disconnect with traditional rankings. (I do not
> know if that is still the case, if it ever truly was. I'm just reporting
> what I heard.)

It was my experience with IGS. It seems to be less of a problem on KGS
where more handicap games are played.

>
> Fourth, there is the question of the stability of the system.

There is no automatic solution to this, apart from making the world's
strongest player always 9 dan. There is some evidence that the levels
where most people get bogged down are the same, and this could also be used
to match up different rating systems (or different times). However there
is the possibility that the plateaus in strength might depend on the
environment that people are learning in.

>
> Fifth, all reasonable ratings systems are self-correcting.

This is true, but they correct to the games that are being played. If there
are distinct populations then they drift apart. If only even games are
played then they only correct for even games. etc

--
Barry Phease

mailto://bar...@es.co.nz
http://homepages.ihug.co.nz/~barryp

JKP

unread,
Apr 6, 2003, 7:34:01 AM4/6/03
to
> I see no other way a rating system could work; indeed, even if a rating
> system is non-algorithmic, at heart it's going to boil down to "he's
> consistently beating 3 kyus, so he's 2 kyu" - which is the same thing
> without the algebraic coating. As such, I await the publication of your
> "non-statistical" system with baited breath...
>
> - Andrew
>


Well, I am perfectly happy with the KGS system, and as far as I see, the
only case where this fearsome 'fuzzy maths' of probabilities (I am a
mathematician myself) could cause much controversy is if someone has not
played a reasonable number of games against both poorer and better players.
And in that case I would say it's the players fault, not the system's.

Anyway, just for the sake of discussion I suggest a simple system with no
explicit reference to probabilities:

---
Players rank Z is the highest value Z such that the player has won more even
games than lost against players of rank Z or higher.
---

This needs of course some additional details:

1. Handicap games are converted to even games by modifying the opponents
perceived rank according by the handicap.

2. The opponent ranks used in calculation are the ones that prevailed when
the game in question was started, so that simultaneous adjustment of both
ranks don't complicate things.

3. One needs some way to make the system more responsive to player
development. One could e.g. use only 30 last games or games within last 2
weeks, whichever is a bigger set of games. Or there could be a more
sophisticated weighting system based on how old a result. This part is not
very easy to solve well, but the same problem is there with any system.


Maybe someone with more experience or theoretical understanding could tell
why more complicated systems are needed. I am not saying there couldn't be
major problems that I don't see. Maybe related to the stability of the
system or something else ...

juhop, KGS 23 kyu

JKP

unread,
Apr 6, 2003, 9:39:21 AM4/6/03
to
> Anyway, just for the sake of discussion I suggest a simple system with no
> explicit reference to probabilities:
>
> ---
> Players rank Z is the highest value Z such that the player has won more
even
> games than lost against players of rank Z or higher.
> ---
>

...


> Maybe someone with more experience or theoretical understanding could tell
> why more complicated systems are needed. I am not saying there couldn't be
> major problems that I don't see. Maybe related to the stability of the
> system or something else ...

I'll answer my question myself. I think that this system would be OK, if the
only purpose is to establish a conservative rank for a single player, given
an existing population of players who already have correct ranks. But it
would not work in the dynamic situation where there's a pool of players with
constantly changing rankings.

I'll try find a simple example. Say, there are two players in the system
already, with ranks 10k and 20k. Now add a player who constantly wins the
20k player but and constantly loses to the 10k player. My system would give
the new player a rank of 20 k, while he could actually be a 13k, say. This
is conservative, but basically would be fair so far, I would say. However,
if we now start adding players, they will also become under-ranked, when
judged against the first under-ranked new player. So there will probably be
a constant deflation of the ranks when new players arrive. Therefore the
system must not be conservative, but it should give an unbiased rank
estimate given the games played so far.

Well, someone suggested that let's give the player a ranking of 15 k. But
what about if he wins 95% of the games against the 20k player and loses 70%
of games against the 10k player. It is again obvious that 15k would be
probably too conservative. What to do then? To give a reasonable ranking,
one needs to use some kind of probabilistic reasoning and elaborated
statistical models. And the system needs to be usable when there are not 3
but 3000 players. It seems there's just no way to avoid building some
statistical models that everyone will never understand in detail.

Well, I am sure this is all perfectly obvious to someone that has thought
about this all before, but maybe my dialogue with myself will help someone
who has not ... :)

Cheers,

juhop


kogo

unread,
Apr 6, 2003, 2:04:41 PM4/6/03
to
Well, it seems there is some interest in this topic. It would take too long
to explain all facets of the problem, but here are some major issues and
hints.

The standard probability statistical analysis on which everyone seems so
keen is suitable for ranking players, but inappropriate for rating players
ongoing in a go server environment. A rank is an ordinal, whereas a rating
is a statement of a player’s skill level. The basic problem is that a
probabilistic ranking system is that is only a ranking system, not a skill
rating system.

The root problem with the probabilistic ranking system is its bounded
nature: there is no good way to promote players at the top rating. The
effect is to keep players at their same rank: suppression of promotion.

Araldo van de Kraats hints at another part of the problem with using
probabilistic ranking systems as an ongoing rating system:

“Since you are 8k, if you play another 8k (like me) in an even game,

About 50% - 54% - 60% instead of 50% - 83% - 97%.”

An 8k playing a 9k should be an even match (50%) at proper handicap (no
komi). That’s what the handicap system in go is for. Likewise 8k playing 10k
with a two-stone handicap. There’s something seriously wrong, purely from a
statistical viewpoint, with numbers that show stronger players consistently
beating weaker players when the right handicap is applied: the handicap is
supposed to account for difference in player strength. What’s wrong, in the
large, is suppression of promotion.

There’s an even bigger issue with suppression of promotion.

Analogy - Question: what’s wrong with a computer interface comprising 2 or 3
letter text commands? It’s what computer interfaces were for decades.
Answer: it’s non-intuitive. What is better are today’s ubiquitous graphical
user interfaces, which roughly conform to a operational paradigm of an
office; most importantly, the interface interacts and responds in a way
consistent with natural user experience and expectations.

It took a major paradigm shift to recognize the problem with command-based
computer interfaces, and another shift to get to the solution. Same with the
go rating system issue. If you don’t think there is a problem, you won’t
think a solution is needed.

A major part of the problem with the probabilistic approach for go ratings
is user experience. In traditional (non-computerized) rating systems, a
player was promoted/demoted based upon win/loss record. A player had a sense
that it was time for a promotion, because he was winning most of his games.
An overambitious player might promote himself early, then backpedal to his
old rating because he wasn’t as strong as he hoped. That was about the only
time demotion occurred. Now it’s not uncommon on IGS or KGS to be playing
for quite some time at a rating and suffer demotion, or be winning most
games, and still not be promoted. That is, most often, inherently wrong. It
is true that players occasionally suffer losing streaks, but generally,
active players become more skillful over time; that’s the nature of human
experience. Suppression of promotion, caused by probabilistic ranking
systems, which can only rank players, not rate them, utterly fails to
recognize advancement in skill.

As a goal, a rating system should be predictable to a player. If you don't
think that is a worthwhile goal, then analogously you are in the camp with
the command line interface crowd of old, and the possibility of religious
conversion is remote. With a good rating system, a player would know how
close he is to promotion (or demotion). The solution is a rating system
based on win/loss record.

I contend that there are more problems with the current probabilistic
ranking systems than there would be with a win/loss record rating system.
The most important benefit of a win/loss rating system is that players, by
knowing where they stand at their current rating, and having a humanly
predictable rating system, would have the feeling of be in control of their
rating, rather than dictated to by a humanly indecipherable ranking system.

- kogo


Bill Spight

unread,
Apr 6, 2003, 2:35:09 PM4/6/03
to
Dear Kogo,

> An 8k playing a 9k should be an even match (50%) at proper handicap (no
> komi). That’s what the handicap system in go is for. Likewise 8k playing 10k
> with a two-stone handicap. There’s something seriously wrong, purely from a
> statistical viewpoint, with numbers that show stronger players consistently
> beating weaker players when the right handicap is applied: the handicap is
> supposed to account for difference in player strength. What’s wrong, in the
> large, is suppression of promotion.

If traditional handicaps are used, the reason is simple. They give a 1/2
stone advantage to White.

> Now it’s not uncommon on IGS or KGS to be playing
> for quite some time at a rating and suffer demotion, or be winning most
> games, and still not be promoted. That is, most often, inherently wrong.

It is certainly disconcerting. It also seems to be an effect of using a
model based upon statistical mechanics. It is also unnecessary, as
rating systems without such behavior can be devised, which may yield
slightly poorer predictions of results in the short run, but which track
the players' skills better. It appears that current rating systems aim
to predict the results of an even game between any 2 rated players. That
aim is different from the one of charting the players' skill levels, and
how they change over time. The latter is actually a more difficult goal
to achieve, but one which most players expect, I believe.

> It
> is true that players occasionally suffer losing streaks, but generally,
> active players become more skillful over time; that’s the nature of human
> experience. Suppression of promotion, caused by probabilistic ranking
> systems, which can only rank players, not rate them, utterly fails to
> recognize advancement in skill.

One problem is that players who believe that they are significantly
underrated may register under a new name and start off under that name
with a higher provisional rating. That may solve their personal
problems, but it masks any problem of underpromotion in the system.

Ciao,

Bill

Patrick Bridges

unread,
Apr 6, 2003, 2:18:38 PM4/6/03
to
"kogo" <ko...@waterfire.us> writes:

> The standard probability statistical analysis on which everyone
> seems so keen is suitable for ranking players, but inappropriate for
> rating players ongoing in a go server environment. A rank is an
> ordinal, whereas a rating is a statement of a player's skill level.

So, what you're saying that currently servers only have "ranking
systems", and that to have a "rating system", a server would have to
be able to assess the level of a player's skills, as opposed to just
ranking players relative to one another. Right?

How do you propose a server do that without making relative
comparisons between players? How can a computer reasonably assess a
the level of player's skills independently of other players if it has
no such skills itself? If a server can only measure those skills
relative to other players, then it is again a "ranking system", is it
not? If it is not possible for a server to assess the level of a
player's skills, then servers can *only have* "ranking
systems".

You're already ceded that KGS and IGS have suitable "ranking systems",
so I really don't see what your beef is until you can put forth a
concrete way for servers to systematically and objectively assess the
level of a player's skill without making relative comparisons between
players.

--
Patrick G. Bridges bri...@cs.unm.edu GPG ID = CB074C71
GPG fingerprint = FEEA ECFF 1E23 148C 2804 FDD9 DB63 6993 CB07 4C71

"Anyone that can't make money on Sports Night should get out of the
money-making business" - Calvin, on the last episode of Sports Night

Patrick Bridges

unread,
Apr 6, 2003, 9:41:58 PM4/6/03
to
"kogo" <ko...@waterfire.us> writes:

> The root problem with the probabilistic ranking system is its bounded
> nature: there is no good way to promote players at the top rating. The
> effect is to keep players at their same rank: suppression of promotion.

Why? The fact that you assMathematically, there's no reason for this
that I can see. The fact that you assert this makes me dout that you
actually understand the systems IGS and KGS use.

Current servers (both IGS and KGS) and national systems (e.g. AGA)
have a cap on the *displayed* rating, but computed ratings aren't
capped, at least on KGS. For example, there are players on KGS whose
rank is well above 7d, even though the highest rating the server
displays right now is 7d. If those players keep increasing the
handicaps they give to other very stong players their "ranking" will
continue to climb. Even if they don't, it can continue to climb; if a
player they consistently beats other "real" 7ds, the odds of them not
having a single loss in more and more games (e.g., the player mymy on
KGS) is less and less, and this pushes their computed rating high and
higher.

The fact that mymy's displayed rank is still 7d is a technical
shortcoming of the KGS client/server, not it's "ranking system".

Patrick Bridges

unread,
Apr 6, 2003, 9:46:15 PM4/6/03
to
"kogo" <ko...@waterfire.us> writes:

> I contend that there are more problems with the current probabilistic
> ranking systems than there would be with a win/loss record rating system.
> The most important benefit of a win/loss rating system is that players, by
> knowing where they stand at their current rating, and having a humanly
> predictable rating system, would have the feeling of be in control of their
> rating, rather than dictated to by a humanly indecipherable ranking system.

Okay, why don't you define appropriate metrics for evaluating rating
and ranking systems (predictability, stability, accuracy, etc.),
implement a win-loss rating/ranking system, get the KGS pool of
results, and demonstrate that your proposal would in fact be
superior. If you can do that instead of just bad-mouthing everyone
else's hard work from the sidelines, there might actually be a chance
your proposal would be used.

-

unread,
Apr 7, 2003, 1:08:23 PM4/7/03
to

"kogo" <ko...@waterfire.us> wrote:
> Well, it seems there is some interest in this topic. It would take too long
> to explain all facets of the problem, but here are some major issues and
> hints.


I'm gladdened to find your reply. Hopefully, you won't get discouraged.


> The standard probability statistical analysis on which everyone seems
> so keen is suitable for ranking players, but inappropriate for rating
> players ongoing in a go server environment.


What qualitative differences occur for "a go server environment" to
render those discussions about the relative merits/demerits of ranking
or rating systems distinct from discussions that might concern "a non-go
server environment"? Why shouldn't max-ent be applied outside the
context of "a go server environment" ? It's a statistical procedure,
right? Why has the topic of "go server environment" become mixed
up with your thesis concerning how ranking or rating systems operate?


> A rank is an ordinal,


An ordinal of what?


> ... whereas a rating is a statement of a player’s skill level. The basic

> problem is that a probabilistic ranking system is that is only a ranking
> system, not a skill rating system.


Now you posit a discorrelation between ranks and skill? Please
elaborate. What are ranks providing (other than the honorary ranks)?


> The root problem with the probabilistic ranking system is its bounded
> nature: there is no good way to promote players at the top rating. The
> effect is to keep players at their same rank: suppression of promotion.


Yet this (alleged) "root problem" is administrative, rather than
something chaotic. Administrative difficulties are easily resolved
because administrators can be placed under operation of committees
who form assessments and evaluations based upon empirical data
considerations, and can be elected representatives by a democracy,
if not the owners of some enterprises servicing the general community.
The rate of migration upwards from the pigeonholed rank is sufficiently
slow so as to offer little by way of adminsitrative tasking. The higher
ranked players simply report upward mobility according to external
systems regarded as authoritative, or are periodically queried by use
of email conveniences. A small downward movement in drift-factor
may alert the wary administrator(s) of the need for high-ranking review.
Since this was the obvious point-of-attack, and critique, considerable
attention is devoted to some high-ranking players, to monitor accuracy.


> Araldo van de Kraats hints at another part of the problem with using
> probabilistic ranking systems as an ongoing rating system:
>
> “Since you are 8k, if you play another 8k (like me) in an even
> game, you should win roughly 50% of the time.”
>
> If you are playing a 9k in an even game, however, the model
> predicts that it is more likely that you will win. In the AGA model,
> it's about 83% of the time. If you are playing a 10k in an even game,
> you should win 97% of the time.”
>
> These numbers are quite high, compared with the European tournaments
> statistics: http://www.european-go.org/rating/statev.html
>
> About 50% - 54% - 60% instead of 50% - 83% - 97%.”


So the effect of bell-curve sharpening is to adjust for quick
movements in ratings, and to accord more weight to the "upsets"?
Wasn't it instead your complaint that players do not progress at a
rate sufficiently fast to keep commensurate with study improvement?


> An 8k playing a 9k should be an even match (50%) at proper handicap (no
> komi). That’s what the handicap system in go is for. Likewise 8k playing 10k
> with a two-stone handicap. There’s something seriously wrong, purely from a
> statistical viewpoint, with numbers that show stronger players consistently
> beating weaker players when the right handicap is applied: the handicap is
> supposed to account for difference in player strength. What’s wrong, in the
> large, is suppression of promotion.


Similarly, did you offer data for comparitive statistics on
handicapped games, rather than the comparisons on the even games?
Now the "model" (of which you speak) being applied on the max-ent systems
is NOT reflective of expectations or reality, but consists of an adjusted

curve to either accelerate or deaccelerate sensitivities. If the curve
were to be "flattened" as you suggest, according to the empirical data
from European tournaments, then progression would be slower than now.
Is that what you prefer, or would you prefer the more rapid progression?


> There’s an even bigger issue with suppression of promotion.
>
> Analogy - Question: what’s wrong with a computer interface comprising 2 or 3
> letter text commands? It’s what computer interfaces were for decades.
> Answer: it’s non-intuitive. What is better are today’s ubiquitous graphical
> user interfaces, which roughly conform to a operational paradigm of an
> office; most importantly, the interface interacts and responds in a way
> consistent with natural user experience and expectations.
>
> It took a major paradigm shift to recognize the problem with command-based
> computer interfaces, and another shift to get to the solution. Same with the
> go rating system issue. If you don’t think there is a problem, you won’t
> think a solution is needed.


Eh? How does Text vs. GUI connect with ranking/rating/promotions?
Am I about to ask if you're cracking up?


> <snippage>


>
> As a goal, a rating system should be predictable to a player. If you don't
> think that is a worthwhile goal, then analogously you are in the camp with
> the command line interface crowd of old, and the possibility of religious
> conversion is remote. With a good rating system, a player would know how
> close he is to promotion (or demotion). The solution is a rating system
> based on win/loss record.


Seems to be the "solution" currently in application ...


> I contend that there are more problems with the current probabilistic
> ranking systems than there would be with a win/loss record rating system.
> The most important benefit of a win/loss rating system is that players, by
> knowing where they stand at their current rating, and having a humanly
> predictable rating system, would have the feeling of be in control of their
> rating, rather than dictated to by a humanly indecipherable ranking system.


Let's examine issues of progression speed, and bell-curve sharpness.


- regards
- jb

-------------------------------------------------------------
"If there were always the truth in media,
there would be no need for the Truth in Media."
http://www.truthinmedia.org/
-------------------------------------------------------------

Chris Schack

unread,
Apr 3, 2003, 12:13:59 PM4/3/03
to
In article <556ed15a.03040...@posting.google.com>,
ch...@prosum.com (Christopher Hayashida) wrote:

<snip>

>Well, the point of the post, before wandering off into the statistics,
>was that the person who was on the winning streak benefits less from
>underhandicapped games. It still holds true, depite my errors in my
>explanation.

Of course, you have to feel sorry for the rank beginner I lost to the
other day. They're going to get beaten up pretty badly until their
rating corrects itself, all because I was on a catastrophic losing
streak and simply missed an atari of a big group. It's bad when you
lose 2/3 of your stones on the board...

Chris Schack

Jenny Radcliffe

unread,
Apr 7, 2003, 1:13:39 PM4/7/03
to
"Bill Spight" <Xbsp...@pacbell.net> wrote

Yes, but Roy specified "without reference to statistics or probability", and
I would consider an "average" to be a statistic, myself.

Jenny - applied mathematician


Jenny Radcliffe

unread,
Apr 7, 2003, 1:17:54 PM4/7/03
to
"Patrick Bridges" <bri...@cs.unm.edu> wrote

> "kogo" <ko...@waterfire.us> writes:
> > The standard probability statistical analysis on which everyone
> > seems so keen is suitable for ranking players, but inappropriate for
> > rating players ongoing in a go server environment. A rank is an
> > ordinal, whereas a rating is a statement of a player's skill level.
> So, what you're saying that currently servers only have "ranking
> systems", and that to have a "rating system", a server would have to
> be able to assess the level of a player's skills, as opposed to just
> ranking players relative to one another. Right?

But that's meaningless.

A "rating" (by kogo's definition) cannot exist in isolation, and is only
meaningful when there are two or more, to make a relative system - i.e. a
ranking ...

There is no way of measuring - for human or for computer - a person's
Go-playing skill, except by comparison with another player.

It just doesn't make sense.


Patrick Bridges

unread,
Apr 7, 2003, 3:02:58 PM4/7/03