Thoughts on FIBS rating system

Lloyd V. Lawrence

unread,

Aug 3, 1995, 3:00:00 AM8/3/95

to

Some Thoughts on the FIBS Rating System

First, as a newbie to FIBS, let me apologise if I am
reinventing the wheel, and also reveal that I am neither a
mathematician nor statistician.

The rating system is useful, sophisticated, and confusing.
I have had more than one opponent exclaim because their
rating did not change by five points after they had won a
five point match. And, because the explanation in the FIBS
help is exact, though perhaps terse, I wrote a little
program in FORTRAN to see what the rating changes would be
given equal experience and with match lengths running up to
ten with rating differences from equal up to 500.

The formula in the help is not particularly easy to
understand (codedaddy and joshu helped with this).

NAME
formula - The formulas used to calculate rating changes

DESCRIPTION
These are the formulas used to determine the ratings of a player:
Let's say that two players P1 and P2 were playing a n-point match.
The ratings of the players are r1 for P1 and r2 for P2 .

Let D = abs(r1-r2) (rating difference) D is easy. subtract the smaller
from the larger. . Humans do this easily; it's trickier with computers)

Let P_upset = 1/(10^(D*sqrt(n)/2000)+1) (probability that underdog wins)
Let P=1-P_upset if the underdog wins and P=P_upset if the favorite wins.
I do not understand the rationale behind the probability computation, but
it does produce a number less than one for all values I tried.

One problem immediately arose. The use of 'experience.'

The 'experience' of a player is the sum of the lengths of all matches
a player has finished. Every player starts with a rating of 1500 and
an experience of 0.

For the winner:
Let K = max ( 1 , -experience/100+5 )
The rating change is: 4*K*sqrt(n)*P
For the loser:
Let K = max ( 1 , -experience/100+5 )
The rating change is: -4*K*sqrt(n)*P

Put in normal language, the formula says 'compute the following:

Take the number of games played, and divide by 100.
then subtract that number from 5.

If that number is greater than 1,
then take that value for K.
If not, take 1 for that value.'

This gives players with experience of less than 400 an
adjustment to the rating change. With no experience, it
will be 5, and as experience grows, it shrinks to 1 at 400
games, and stays at 1 thereafter. When K is equal to 1, the rating
change is equal for both players.

Since FIBS is addictive, and players who are hooked quickly
reach 400 games, I decided to eliminate it from the program,
and considered the situation of players with experience of
greater than 400. (K = 1).

One question I was interested in answering was 'How can I
maximize my chances of improving my rating?" Would it be
better to play a 10 matches of say 10 games, or to play 100
matches of 1 game against a player 100 points higher than
me?
(The data follow the body of thie note.)

For one game matches, the probability of upset is 0.471. The
underdog will win 47.1 games and lose 53.9 ( I know you
can't win a tenth of a game.)

The rating change for one game matches is 2.12 (that's 2.12
per game) if the underdog wins, and 1.88 if the favorite
wins.

Multiply 47.1 by 2.12 = 99.85 points won by underdog
Multiply 52.9 by 1.88 = 99.45 points lost by underdog
Net rating change = 0.40 gain for the underdog.

Playing 10 game matches, the underdog will win 41.0, and
lose 59.0. The rating change for a 10 game match is 7.46 if
the underdog wins, and 5.19 if the favorite wins.

multiply 41.0 by 7.46 = 305.86 points won by underdog
multiply 59.0 by 5.19 = 306.21 points lost by underdog
net rating change = - 0.35 loss for the underdog.

That's interesting. As Henry Morgan said, lose a million
here, a milion there, and pretty soon, you're talking real
money. In the long run, it seems that the underdog should
seek short matches, and the favorite longer.

One concluding remark. If you want a reputation as a chess
player, join the most prestigious club you can -- like the
Steiner, in Hollywood. Observe -- do not play. Figure out
who the current champ is. Play him one game for a
substantial wager. If you win, return to the club, but
never play again; if you lose, never go back. Think of the
gossip if you win: "See that guy? He played so-so for
thousands, and beat him so badly, he won't even bother with
us."

The source code and the results are listed below. For anyone curious
enough to follow up.

(Adverse comments will be read. Praise will be cherished)

elvee

$$=======
Results of ratings computations.

difference 0
n pupset FAVOR UNDER
WINS WINS
1 .500 2.00 2.00
2 .500 2.83 2.83
3 .500 3.46 3.46
4 .500 4.00 4.00
5 .500 4.47 4.47
6 .500 4.90 4.90
7 .500 5.29 5.29
8 .500 5.66 5.66
9 .500 6.00 6.00
10 .500 6.32 6.32
difference 100
n pupset FAVOR UNDER
WINS WINS
1 .471 1.88 2.12
2 .459 2.60 3.06
3 .450 3.12 3.81
4 .443 3.54 4.46
5 .436 3.90 5.04
6 .430 4.21 5.59
7 .424 4.49 6.09
8 .419 4.74 6.57
9 .415 4.97 7.03
10 .410 5.19 7.46
difference 200
n pupset FAVOR UNDER
WINS WINS
1 .443 1.77 2.23
2 .419 2.37 3.28
3 .402 2.78 4.15
4 .387 3.09 4.91
5 .374 3.35 5.60
6 .363 3.55 6.25
7 .352 3.73 6.86
8 .343 3.88 7.44
9 .334 4.01 7.99
10 .326 4.12 8.53
difference 300
n pupset FAVOR UNDER
WINS WINS
1 .415 1.66 2.34
2 .380 2.15 3.51
3 .355 2.46 4.47
4 .334 2.67 5.33
5 .316 2.83 6.12
6 .300 2.94 6.86
7 .286 3.03 7.55
8 .274 3.09 8.22
9 .262 3.14 8.86
10 .251 3.18 9.47
difference 400
n pupset FAVOR UNDER
WINS WINS
1 .387 1.55 2.45
2 .343 1.94 3.72
3 .311 2.15 4.78
4 .285 2.28 5.72
5 .263 2.35 6.59
6 .245 2.40 7.40
7 .228 2.42 8.17
8 .214 2.42 8.90
9 .201 2.41 9.59
10 .189 2.39 10.26
difference 500
n pupset FAVOR UNDER
WINS WINS
1 .360 1.44 2.56
2 .307 1.74 3.92
3 .270 1.87 5.06
4 .240 1.92 6.08
5 .216 1.93 7.01
6 .196 1.92 7.88
7 .179 1.89 8.69
8 .164 1.86 9.46
9 .151 1.81 10.19
10 .139 1.76 10.89
$$============

PROGRAM fibs
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! A program to explore the FIBS rating formula
!
! WRITTEN BY; L. V. Lawrence
! VERSION 1.0
! DATE: 9 July
!
!!!!!!!!!!!!!!
! Formula taken from on-line help in FIBS, July 1995.
!
!DESCRIPTION These are the formulas used to determine the ratings of a player:
! Let's say that two players P1 and P2 were playing a n-point match.
! The ratings of the players are r1 for P1 and r2 for P2 . Let D = abs(r1-r2) (rating difference)
! Let P_upset = 1/(10^(D*sqrt(n)/2000)+1) (probability that underdog wins)
! Let P=1-P_upset if the underdog wins and P=P_upset if the favorite wins.
!
! For the winner:
! Let K = max ( 1 , -experience/100+5 )
! The rating change is: 4*K*sqrt(n)*P
! For the loser:
! Let K = max ( 1 , -experience/100+5 )
! The rating change is: -4*K*sqrt(n)*P
!
! The 'experience' of a player is the sum of the lengths of all matches
! a player has finished. Every player starts with a rating of 1500 and
! an experience of 0.
!
!!!!!!!!!!!!!!!!1
!print format
5 FORMAT ( I4, 5X, F4.3, 2X, (F6.2), 2X, (F6.2))
!
DOUBLE PRECISION experience, D ,pupset, P, realn, K, favwin, underwin
INTEGER difference, n
PARAMETER (experience = 500.0)
! assume equal experience; it's not really needed, since K is set to 1.
! Since K will go to one for players with experience greater than 400,
! there is no need to figure it.
!see SUBROUTINE testexp
PARAMETER (K = 1.0)
!
! D is real version of difference in ratings
! pupset is probability of upset
! P is set depending on winner ( read the online help)
! realn is conversion of number of games in match to double precision
! K is used in the computation (again, look at the help)
! experience is assumed equal.
! difference and n are used as counters for the loops.
!!!!!!!!!!!!!!!!!!!!!!!!
!to check and see how experience changes, run the subroutine.
! CALL testexp
OPEN (5, 'ratings')
DO difference = 0, 500, 100
PRINT *, 'difference', difference
PRINT *, ' n pupset FAVOR UNDER'
PRINT *, ' WINS WINS'
! loop through rating differences from 10 to 500 by 10s
WRITE (5, *) 'difference', difference
WRITE (5, *) ' n pupset FAVOR UNDER'
WRITE (5, *) ' WINS WINS'
DO n = 1, 10
! loop through the number of games from 1 to 10
D = DBLE (difference)
realn = DBLE (n)
! P_upset = 1 / (10 ^ (D * sqrt (n ) / 2000 ) + 1)
pupset = 1.0 / (10.0 ** (D * sqrt (realn) / 2000.0) + 1.0)
! Let P=1-P_upset if the underdog wins and P=P_upset if the favorite wins.
!favorite wins

P = pupset
favwin = 4.0 * K * sqrt (realn) * P
!underdog wins
P = 1.0 - pupset
underwin = 4.0 * K * sqrt (realn) * P
PRINT 5, n, pupset, favwin, underwin
WRITE(5, 5) n, pupset, favwin, underwin
END DO
PAUSE
END DO
CLOSE (5)
END
!!!!!!!!!!!!!!!!!!
SUBROUTINE testexp
! a little test to show how experience varies.
! The variable starts at 5 for players with no experience,
! and drops to 1 as they approach 400 games.
!(thanks to codedaddy and Anders Nielsen)
real experience, k
PRINT *, ' experience K'
do experience = 0, 450, 50
k = max (1, -experience / 100. + 5.)
print *, experience, k
end do
pause
return
end
!!!!!!!!!!!!!!!
!end of source code.

Robert D. Johnson

unread,

Aug 3, 1995, 3:00:00 AM8/3/95

to

I did the following UNIX shell script last year.
It outputs your expected odds and rating change.
---------------
#!/bin/sh
if [ $# -ne 3 ]; then
echo SYNTAX: `basename $0` YourRating OpponentRating MatchPoints
exit 1
fi
# Assume experience is at least 400 for both players.
echo $@ | awk '{\
r1= $1; r2 = $2; n = $3; D = r2-r1;
P = 1/(10^(D*sqrt(n)/2000)+1);
K = 1;
Ra = 4*K*sqrt(n)*(1-P);
Rb = -4*K*sqrt(n)* P ;
printf("k FIBS odds: ");
if ( Ra > -Rb ) printf("1 to %3.1f",-Ra/Rb);
else printf("%3.1f to 1",-Rb/Ra);
printf(" (%2.0f%% to win %3.2f pts, ", P *100, Ra);
printf("%2.0f%% to lose %3.2f pts)\n",(1-P)*100, -Rb);
}'
--------------
Robert D. Johnson MAIL: rjoh...@cvbnet.cv.com FIBS: rjohnson

gha...@dsm1.dsmnet.com

unread,

Aug 4, 1995, 3:00:00 AM8/4/95

to

In <3vrajs$3...@newsreader.wustl.edu>, lvla...@artsci.wustl.edu (Lloyd V. Lawrence) writes:
>
>
>Some Thoughts on the FIBS Rating System
>
>First, as a newbie to FIBS, let me apologise if I am
>reinventing the wheel, and also reveal that I am neither a
>mathematician nor statistician.
>
>The rating system is useful, sophisticated, and confusing.
>I have had more than one opponent exclaim because their
>rating did not change by five points after they had won a
>five point match. And, because the explanation in the FIBS
>help is exact, though perhaps terse, I wrote a little
>program in FORTRAN to see what the rating changes would be
>given equal experience and with match lengths running up to
>ten with rating differences from equal up to 500.
>

I have a simple way of thinking about ratings that my help some people.
The basic formula for ratings changes is: 4*k*p*squareroot(n), where
k is the multiplier, p is the probablity of winning, and n is the number
of points in the match.

When you get an experience of 400 or greater, k is equal to 1, so the formula
becomes: 4*p*squareroot(n).

For most peope that you play, p will be approx 0.5, so the formula becomes:

2 * squareroot(n)

This is a simple formula, and approximates the rating change fairly well
for people with an experience >= 400.

So, here is the approx rating change for matches of different lengths:

match Length approx ratings change
1 2 points
4 4 points
9 6 points

Thanks,

gha...@dsmnet.com
ghandy on FIBS

Christopher Yep

unread,

Aug 6, 1995, 3:00:00 AM8/6/95

to

In article <3vrajs$3...@newsreader.wustl.edu>,
Lloyd V. Lawrence <lvla...@artsci.wustl.edu> wrote:

|> [...]

^^^^^^^^^

|>Playing 10 game matches, the underdog will win 41.0, and
|>lose 59.0. The rating change for a 10 game match is 7.46 if
|>the underdog wins, and 5.19 if the favorite wins.
|>
|>multiply 41.0 by 7.46 = 305.86 points won by underdog
|>multiply 59.0 by 5.19 = 306.21 points lost by underdog
|>net rating change = - 0.35 loss for the underdog.

^^^^^^^^^^^^

|>That's interesting. As Henry Morgan said, lose a million
|>here, a milion there, and pretty soon, you're talking real
|>money. In the long run, it seems that the underdog should
|>seek short matches, and the favorite longer.

The two sections that I careted (^) should both be zero. Your calculations
showed slight positive and negative values due to roundoff error.

The rating system was desgined with the goal that accurately rated players
(players at their "true" rating) should have a net ratings expectation of
0.00 pts., over the whole spectrum of possible match lengths (1-99 pt.
matches).

|> [...]

|>The source code and the results are listed below. For anyone curious
|>enough to follow up.
|>
|>(Adverse comments will be read. Praise will be cherished)
|>
|>elvee
|>

|> [Code/results deleted]

You have a keen scientific spirit. Hope you understand the ratings system
better now.

Chris

USRobots

unread,

Aug 7, 1995, 3:00:00 AM8/7/95

to

Greetings,

Chris Yep wrote:

>The rating system was desgined with the goal that accurately rated
players
>(players at their "true" rating) should have a net ratings expectation of
>0.00 pts., over the whole spectrum of possible match lengths (1-99 pt.
>matches).

I'm curious about the sqrt(n) term, where n is the length of the match.
How was this term derived initially? Has anyone checked a database of
FIBS matches or arranged an experiment to see if winning chances really do
vary as sqrt(n)? (Let's not consider the obvious problem of 2-point
matches in this discussion...)

Also, a while ago, I saw a post stating that a perfect player could
maximize his/her rating by playing only one-point matches. Is this true?
If so, is it because the term above is imperfect, or because playing
one-point matches increases the variability around a player's "true"
rating, or is there another reason?

Thanks,
USRobots
usro...@aol.com

Christopher Yep

unread,

Aug 7, 1995, 3:00:00 AM8/7/95

to

In article <40436e$d...@newsbf02.news.aol.com>,
USRobots <usro...@aol.com> wrote:

|>Chris Yep wrote:
|>
|>>The rating system was desgined with the goal that accurately rated
|>players
|>>(players at their "true" rating) should have a net ratings expectation of
|>>0.00 pts., over the whole spectrum of possible match lengths (1-99 pt.
">>matches).

Actually, I don't know if this was doc/marvin's goal, but I assume(d)
that it was, and believe that it should be the goal of any reasonable
rating system.

|>I'm curious about the sqrt(n) term, where n is the length of the match.
|>How was this term derived initially? Has anyone checked a database of
|>FIBS matches or arranged an experiment to see if winning chances really do
|>vary as sqrt(n)? (Let's not consider the obvious problem of 2-point
|>matches in this discussion...)
|>
|>Also, a while ago, I saw a post stating that a perfect player could
|>maximize his/her rating by playing only one-point matches. Is this true?
|>If so, is it because the term above is imperfect, or because playing
|>one-point matches increases the variability around a player's "true"
|>rating, or is there another reason?

I think the implication was based on the idea that the sqrt(n) term is
imperfect. Btw, I was told that Kent Goulding's rating list does not
use the sqrt(n) term, but was based more on the Central Limit Theorem.
Can someone verify this?

Chris

Craig Connell

unread,

Aug 8, 1995, 3:00:00 AM8/8/95

to

If I understand the rating system, your rating is influenced by the
number of matches won less the number loss weighted by opponent strentgh.
Many people think it is the percentage of matches won weighted by
opponent strength.

This confused me at first because I figured that if you maintained a
50.1% winning percentage your rating would rise simply by playing more
matches - not by increasing your playing skill. (Someone who played 100
matches and maintained a 51% winning percentage would have won 2 matches
more than lost. If he maintained the winning percentage through to 1,000 he
would have 20 more wins than losses. 20 more wins than losses would
produce a higher rating than 2 more wins than losses, but there was no
change in ability)

What I forgot was that in order for your rating to rise, you would have to
maintain the 50.1% winning percentage against increasingly more difficult
players. Otherwise, if you did not play more difficult players and your
winning percentage remained the same, the adjustment for opponent strength
would eventually prevent your rating from rising.

I suppose these comments are obvious to those who understand the rating
system. It took me awhile to understand it.

Craig Connell con...@alpha.fdu.edu

Asger Kring

unread,

Aug 9, 1995, 3:00:00 AM8/9/95

to

usro...@aol.com (USRobots) writes:

>I'm curious about the sqrt(n) term, where n is the length of the match.
>How was this term derived initially? Has anyone checked a database of
>FIBS matches or arranged an experiment to see if winning chances really do
>vary as sqrt(n)? (Let's not consider the obvious problem of 2-point
>matches in this discussion...)

>Also, a while ago, I saw a post stating that a perfect player could
>maximize his/her rating by playing only one-point matches. Is this true?
>If so, is it because the term above is imperfect, or because playing
>one-point matches increases the variability around a player's "true"
>rating, or is there another reason?

>Thanks,
>USRobots
>usro...@aol.com

I haven't seen the posting you are referring to, but I must say that I
agree. An above-average player can increase his rating by playing shorter
matches, and the reason IMHO is because the rating increase is
proportional with SQRT (n). This basically means that your rating
increase will be the same weather you beat an equal opponent in a
9-pointmatch or in 3 1-point matches. It would be nice to have some
statistics regarding this, to examine if the rating change should be
proportional with SQRT(n) or n, or something in between (most likely IMO).

A big difference from FIBS in comparison with RLBG (Real Life BackGammon)
is the average matchlengts. Most tournament matches (on which for example
Kent Gouldings ratinglist is based) are played for 11 or more points. A
lot of 5pointmatches from Last Chance or Warm-up events are rated too.
But on FIBS the average matchlength is much shorter, and in particular
it's possible to play 1pointmatches and matches for and even number of
points, especially 2pointmatches. This has revealed that shorter matches
are weighed too much compared to longer matches.

How much higher rating can a player achieve if he only plays
1pointmatches compared to if he played an average matchlength of 5 points?
My guess is about 100 points, based on the rating difference of loner and
TD-gammon.

I'm not suggesting that we should eliminate 1- and 2-pointmatches from
FIBS. It would be nice if someone (I might do it sometime when time
permits) would examine the rating system based on matches actually
played, and after that perhaps a minor change could be implemented, if
FIBSters feel strongly for it.

Regards,

Asger Kring aka Albatross

Darse Billings

unread,

Aug 11, 1995, 3:00:00 AM8/11/95

to

kr...@login.dknet.dk (Asger Kring) writes:

>usro...@aol.com (USRobots) writes:

>>I'm curious about the sqrt(n) term, where n is the length of the match.
>>How was this term derived initially? Has anyone checked a database of
>>FIBS matches or arranged an experiment to see if winning chances really do
>>vary as sqrt(n)? (Let's not consider the obvious problem of 2-point
>>matches in this discussion...)

I wondered about this too. I'm sure the average length of a match *isn't*
sqrt(n), and I would be interested in knowing a better estimate. I did a
thumbnail approximation, but I don't want to influence the judgement of
more experienced players, so I won't say what it is.

This is something that could be provided by an observer program. Such a
program exists on the poker server, and has accumulated a large database
of very useful information.

In any case, the sqrt(n) term doesn't really effect the ultimate accuracy
of the rating system, since all that matters is that the long-term
outcomes converge on an accurate measure of strength. Fine tuning the
sqrt(n) term could improve the rate of convergence, but shouldn't
significantly change the eventual rating attained.

>>Also, a while ago, I saw a post stating that a perfect player could
>>maximize his/her rating by playing only one-point matches. Is this true?
>>If so, is it because the term above is imperfect, or because playing
>>one-point matches increases the variability around a player's "true"
>>rating, or is there another reason?

Playing only one-point matches does appear to have a significant effect
on rating. I think the reason for this is that the complete game must be
played out, so the average game is longer than when the cube is in play.
This affords the stronger player more opportunities to outplay her weaker
opposition, magnifying the eventual difference in ratings.

It would seem to be appropriate to adjust the rating system for the
special cases of one and two point matches, but if marvin doesn't want
to invest the time on it, then it's certainly no big deal.

>I haven't seen the posting you are referring to, but I must say that I
>agree. An above-average player can increase his rating by playing shorter
>matches, and the reason IMHO is because the rating increase is
>proportional with SQRT (n). This basically means that your rating
>increase will be the same weather you beat an equal opponent in a
>9-pointmatch or in 3 1-point matches. It would be nice to have some
>statistics regarding this, to examine if the rating change should be
>proportional with SQRT(n) or n, or something in between (most likely IMO).

As noted, it isn't proportional, and it doesn't matter.

>A big difference from FIBS in comparison with RLBG (Real Life BackGammon)

>is the average matchlengths. Most tournament matches (on which for example

>Kent Gouldings ratinglist is based) are played for 11 or more points. A
>lot of 5pointmatches from Last Chance or Warm-up events are rated too.
>But on FIBS the average matchlength is much shorter, and in particular
>it's possible to play 1pointmatches and matches for and even number of
>points, especially 2pointmatches. This has revealed that shorter matches
>are weighed too much compared to longer matches.

I think you mean the one and two point matches are weighted *less*
heavily than they should be. If the games last 50% longer, they should
be weighted about 50% more than match games, with one and two point
matches being weighted the same as each other.

>How much higher rating can a player achieve if he only plays
>1pointmatches compared to if he played an average matchlength of 5 points?
>My guess is about 100 points, based on the rating difference of loner and
>TD-gammon.

It could be this high, for very strong players. Note that if the
hypothesis is true, weak one-point players should have a *lower* rating
than otherwise expected.

>I'm not suggesting that we should eliminate 1- and 2-pointmatches from
>FIBS. It would be nice if someone (I might do it sometime when time
>permits) would examine the rating system based on matches actually
>played, and after that perhaps a minor change could be implemented, if
>FIBSters feel strongly for it.

I think the opinions of FIBsters are completely secondary to the
opinions of the people who do the work...
Cheers, - Darse.
--

BlddLDfddFdbfRuuruBuubUF

Robert D. Johnson

unread,

Aug 11, 1995, 3:00:00 AM8/11/95

to

If someone wins 65% of the time when playing 7pt matches against
people rated 1500, then that person's rating will become 1700.

If someone wins 65% of the time when playing 1pt matches against
people rated 1500, then that person's rating will become 2000.

If someone wins 82% of the time when playing 7pt matches against
people rated 1500, then that person's rating will become 2000.

This means a player rated 2000 will lose 35% of their games,
but lose only 18% of their 7pt matches, against a player rated 1500.

The above numbers come straight out of the FIBS rating formula.

Percentages is the way to look at this. It is true that you get just
as many rating points if you win one 9pt match versus winning three
1pt matches. But, FIBS considers you a great player if you lose
1 out of 3 of your 1pt matches. FIBS would expect you to lose only
1 out of 7 of your 9pt matches. (Roughly speaking.)

How do I interpret this? A good player improves his/her edge
somewhat by playing longer matches -- but not by THAT much, I feel.
Perhaps the ideal would be something between "square root" and
"linear" in the FIBS formula?

Lloyd V. Lawrence

unread,

Aug 12, 1995, 3:00:00 AM8/12/95

to

Christopher Yep (chri...@soda.CSUA.Berkeley.EDU) wrote:
: In article <3vrajs$3...@newsreader.wustl.edu>,

: Lloyd V. Lawrence <lvla...@artsci.wustl.edu> wrote:

: |> [...]

: |>One question I was interested in answering was 'How can I

: ^^^^^^^^^

: |>Playing 10 game matches, the underdog will win 41.0, and

: ^^^^^^^^^^^^

: |>That's interesting. As Henry Morgan said, lose a million

: |>here, a milion there, and pretty soon, you're talking real
: |>money. In the long run, it seems that the underdog should
: |>seek short matches, and the favorite longer.

: The two sections that I careted (^) should both be zero. Your calculations

: showed slight positive and negative values due to roundoff error.

: The rating system was desgined with the goal that accurately rated players

: (players at their "true" rating) should have a net ratings expectation of
: 0.00 pts., over the whole spectrum of possible match lengths (1-99 pt.
: matches).

: |> [...]

: |>The source code and the results are listed below. For anyone curious

: |>enough to follow up.
: |>
: |>(Adverse comments will be read. Praise will be cherished)
: |>
: |>elvee

: |>

: |> [Code/results deleted]

: You have a keen scientific spirit. Hope you understand the ratings system
: better now.

: Chris

Robin Davies

unread,

Aug 13, 1995, 3:00:00 AM8/13/95

to

> Btw, I was told that Kent Goulding's rating list does not
> use the sqrt(n) term, but was based more on the Central Limit Theorem.

Anyone know where I can find info on Kent Goulding's rating system?

Thanks

Robin.
--
------------------------------------------------------------------------
Robin Davies 224 3rd Avenue Ottawa ON, Canada. K1S 2K3
1-(613)-231-2783 rda...@magi.com
------------------------------------------------------------------------

Walter G Trice

unread,

Aug 17, 1995, 3:00:00 AM8/17/95

to

rda...@fox.nstn.ns.ca (Robin Davies) writes:

>> Btw, I was told that Kent Goulding's rating list does not
>> use the sqrt(n) term, but was based more on the Central Limit Theorem.

>Anyone know where I can find info on Kent Goulding's rating system?

The KG system was written up in Inside Backgammon, Volume 1, Number 5
(Sept.-Oct. 1991) by Larry Kaufman, who invented the system. Vol. 1 of
Inside Backgammon may still be available from the Gammon Press -- ask
Robertie.

The KG system, as far as I know, uses the same formulas that FIBS does,
except that KG starts off 'intermediate' players at a rating lower than
1500 (1400 or 1300 I think.)

Note that sqrt(n) occurs twice in the formula, once as an exponent
in determining the winning probability P and again as a coefficient
of P in calculating the rating change. Kaufman's justification of
the exponent sqrt(n) is as follows: "... an answer is provided by
random walk theory, which says that the expected distance of a
random walker from his starting point is proportional to the square
root of the number of random steps he has taken."

Skill-differenced match equity tables have actually been around
since 1977 and the sqrt(n) prediction seems to conform pretty
well to more sophisticated analytical models. But it would be
EXTREMELY interesting to see a large volume of empirical data
based on FIBS matches.

The other sqrt(n) which multiplies 4*P*(experience factor) to get
the rating gain or loss has no special justification. In his
article Kaufman says "It is clear that a long match victory should
count more than a short match..." but to me it ISN'T clear. Given
that the system says player A wins 40% against player B in a match,
why should an upset in a long match count more than an upset in
a short match? (Note that match length was already factored into
the 40%.) This is what Doug Roberts calls a "truth-and-justice
function" as opposed to its having any mathematical significance.

Originally the KG ratings used n rather than 4*sqrt(n). One noticable
result was that people whose 1st tournament was Monte Carlo sometimes
got truly spectacular ratings from the combination of long matches
and a high experience coefficient. Going to 4*sqrt(n) reduced the
disparity between matches of different lengths but also INCREASED
the rating changes for short matches. In my opinion the factor is
much too high -- totally ridiculous rating swings are very common
both in the KG listings and on FIBS.

-- Walter Trice

Robert D. Johnson

unread,

Aug 17, 1995, 3:00:00 AM8/17/95

to

w...@world.std.com (Walter G Trice) wrote:

>Note that sqrt(n) occurs twice in the formula, once as an exponent
>in determining the winning probability P and again as a coefficient
>of P in calculating the rating change. Kaufman's justification of
>the exponent sqrt(n) is as follows: "... an answer is provided by
>random walk theory, which says that the expected distance of a
>random walker from his starting point is proportional to the square
>root of the number of random steps he has taken."

Only the 1st sqrt(n) affects ratings in the long run, since
it controls the ratio of points won/lost. The 2nd sqrt(n)
scales both the points won and points lost equally.

So I feel the 2nd sqrt(n) is not much of a big deal -- and might
even make sense. Larger matches should push you toward your
"true" rating just a bit faster than smaller matches.

I am undecided about the 1st sqrt(n) for a couple reasons. Large
cubes might occur in a larger match, lowering the number of games
occassionally. Also, the scores in a match tend to even out
(slightly) due to the leader playing more conservatively.

Walter G Trice

unread,

Aug 17, 1995, 3:00:00 AM8/17/95

to

"Robert D. Johnson" <rjoh...@cvbnet.cv.com> writes:

>w...@world.std.com (Walter G Trice) wrote:

>>Note that sqrt(n) occurs twice in the formula, once as an exponent
>>in determining the winning probability P and again as a coefficient
>>of P in calculating the rating change. Kaufman's justification of
>>the exponent sqrt(n) is as follows: "... an answer is provided by
>>random walk theory, which says that the expected distance of a
>>random walker from his starting point is proportional to the square
>>root of the number of random steps he has taken."

>Only the 1st sqrt(n) affects ratings in the long run, since
>it controls the ratio of points won/lost. The 2nd sqrt(n)
>scales both the points won and points lost equally.

>So I feel the 2nd sqrt(n) is not much of a big deal -- and might
>even make sense. Larger matches should push you toward your
>"true" rating just a bit faster than smaller matches.

The 2nd sqrt(n) doesn't affect the expected value of a player's
rating but it does affect the expected value of the ERROR in a
player's rating. But my complaint was about the overall magnitude
of the coefficient rather than the use of a term reflecting match
length. Surely you'd agree that replacing 4*sqrt(n) with
40000000*sqrt(n) would make the ratings useless? My claim is
that 4*sqrt(n) is also too high. There are various ways the
problem could be fixed -- one would be to let the experience
coefficient continue to diminish somewhat after 500 points.

>I am undecided about the 1st sqrt(n) for a couple reasons. Large
>cubes might occur in a larger match, lowering the number of games
>occassionally. Also, the scores in a match tend to even out
>(slightly) due to the leader playing more conservatively.

I'll follow this up later with a posting containing numbers from
Norman Zadeh's skill-differenced tables as well as my own for
comparison with what the ratings formula predicts. The analytic
models do take all the relevant factors into account.

-- Walter Trice

Darse Billings

unread,

Aug 18, 1995, 3:00:00 AM8/18/95

to

w...@world.std.com (Walter G Trice) writes:

>rda...@fox.nstn.ns.ca (Robin Davies) writes:

>>> Btw, I was told that Kent Goulding's rating list does not
>>> use the sqrt(n) term, but was based more on the Central Limit Theorem.

>>Anyone know where I can find info on Kent Goulding's rating system?

>The KG system was written up in Inside Backgammon, Volume 1, Number 5
>(Sept.-Oct. 1991) by Larry Kaufman, who invented the system. Vol. 1 of
>Inside Backgammon may still be available from the Gammon Press -- ask
>Robertie.

>The KG system, as far as I know, uses the same formulas that FIBS does,
>except that KG starts off 'intermediate' players at a rating lower than
>1500 (1400 or 1300 I think.)

>Note that sqrt(n) occurs twice in the formula, once as an exponent
>in determining the winning probability P and again as a coefficient
>of P in calculating the rating change. Kaufman's justification of
>the exponent sqrt(n) is as follows: "... an answer is provided by
>random walk theory, which says that the expected distance of a
>random walker from his starting point is proportional to the square
>root of the number of random steps he has taken."

This is exactly what I suspected, but it isn't clear to me why it should
be used in the way it is... At the very least, this should only apply to
matches between players of even strength. In any case, it doesn't really
matter as long as convergence is assured.

>Skill-differenced match equity tables have actually been around
>since 1977 and the sqrt(n) prediction seems to conform pretty
>well to more sophisticated analytical models. But it would be
>EXTREMELY interesting to see a large volume of empirical data
>based on FIBS matches.

>The other sqrt(n) which multiplies 4*P*(experience factor) to get
>the rating gain or loss has no special justification. In his
>article Kaufman says "It is clear that a long match victory should
>count more than a short match..." but to me it ISN'T clear. Given
>that the system says player A wins 40% against player B in a match,
>why should an upset in a long match count more than an upset in
>a short match? (Note that match length was already factored into
>the 40%.) This is what Doug Roberts calls a "truth-and-justice
>function" as opposed to its having any mathematical significance.

Given the high ratings attained by one_pointer and loner, it could be
argued that longer matches are not given enough weight. mloner should be
able to maintain roughly the same rating as loner, assuming it handles
match conditions competently. So far, this is not the case.

>Originally the KG ratings used n rather than 4*sqrt(n). One noticable
>result was that people whose 1st tournament was Monte Carlo sometimes
>got truly spectacular ratings from the combination of long matches
>and a high experience coefficient. Going to 4*sqrt(n) reduced the
>disparity between matches of different lengths but also INCREASED
>the rating changes for short matches. In my opinion the factor is
>much too high -- totally ridiculous rating swings are very common
>both in the KG listings and on FIBS.

Actually, I am surprized at how stable and robust the FIBS ratings are.
Swings of +/- 80 points are not uncommon, but if that denotes about one
standard deviation, I think that's quite reasonable for a game with such
high variance.

I was playing with the FIBS system in order to make some recommendations
for the new Go server (NNGS), and one thing I noticed was the difference
in scale between FIBS ratings and chess ratings.

If two players with equal established ratings play a game on FIBS, the
winner will earn two points, while in chess, an even game is typically
worth about 16 points. A difference of 200 points in chess is usually
called a rating class, and the higher ranked player should win about 75%
of the points. The same percentage would indicate a 1000 point rating
difference on FIBS -- or about the entire range of the distribution!

There are obviously many more skill levels attainable in chess, and
probably many more again in Go. But the fact that the difference in
backgammon strength can be measured to a precision adequate to distinguish
a 1600 FIBS player from a 1700 FIBS player is, I think, quite commendable.