0 views

Skip to first unread message

Oct 6, 1998, 3:00:00 AM10/6/98

to

Hi

I have looked a little at the formula for calulating ratings

at fibs. Ofcause its nearly impossible to construct a perfect formula.

What I think is the weakest part of it is the calulation of the

probability that someone wins. In the formula is the match length one

factor

and it seems as when playing 1 or 2 pointers it's possible to get a much

higher rating than otherwise. I don't mind people who is getting high

rating because of playing short matches. However i would find it

interesting to know how good I am at different matchlengths. Beacause of

this i would find it

interesting with one ratingsystem for matchlengts 1-2 and maybe also

3-4, 5 to infinity or 3 to infinity. I realise a change needs many

shared opinions, so

anyone else share these suggestions ?

/Dennis Nilsson

/fortuna on fibs

Oct 7, 1998, 3:00:00 AM10/7/98

to

One thing I would observe is this:

In an issue of Inside Backgammon from several years ago, Kit Woolsey

made the statement that he felt he would be lucky to win 55% of his

games against an intermediate player. Now, I suspect that Kit may

have a fairly high standard for "intermediate" - I really don't know.

Does intermediate mean you get some of the quiz problems in IB right,

does it mean you even UNDERSTAND the problems, or does it mean you

know that it's better to be at the edge of a prime than a pip or two

away? (I'm not putting down Kit, understand, I'm saying that I have a

lot of respect for his game!) Anyway, under the FIBS rating formula,

a difference of 175 points translates to a 55% chance of winning a

one-point match. Now, even by his statement, I'm sure that Kit would

win more than 55% of his one-point matches, since his 55% PROBABLY

refers to games - and I suspect that Kit's average win will win more

points than his average loss will lose - if he's just playing to win

the game, he'll probably win more than 55%. I would venture a

wild-a**ed guess that I am probably what Kit would call an

intermediate player, by which I mean that I could play him an

intermediate-length match (7, 9, 11) and not embarass myself, even win

occassionally. My FIBS rating tends to be - about 175 points below

Kit's. So maybe, just maybe, by this theory, the FIBS formula is

right.

I do have to say that the formula, though, doesn't quite seem right in

this sense. The "random walk" theory by which the square root is used

for match length assumes that each game is independent. But a 3-point

match is very very different than playing one-point games, best

3-out-of-5. I would suspect that the difference between players of

different skill levels is much greater at cube handling than at

checker play. Most intermediate players will not at least fail to

consider a checker play with an equity difference of .05 or so greater

than another they're considering, but it's very easy to believe that

an intermediate player would misevaluate the overall equity in a

position. The reason is simple - checker plays present a choice

between two positions; cube decisions require considering a whole

range of positions that could result one, two, three, five, ten rolls

later. So the FIBS formula just might favor the weaker player in a

one-point match versus a longer match. But then, I've been led to

understand that some players fatten their ratings by playing one-point

matches against weak players, so what do I know?

Oct 7, 1998, 3:00:00 AM10/7/98

to

hankyou...@home.com (Hank Youngerman) writes:

> On Tue, 06 Oct 1998 20:35:55 +0200, Dennis Nilsson

> <den...@student.csd.uu.se> wrote:

> >I have looked a little at the formula for calulating ratings

> >at fibs. Ofcause its nearly impossible to construct a perfect formula.

> >What I think is the weakest part of it is the calulation of the

> >probability that someone wins. In the formula is the match length one

> >factor

> >and it seems as when playing 1 or 2 pointers it's possible to get a much

> >higher rating than otherwise. I don't mind people who is getting high

> >rating because of playing short matches. However i would find it

> >interesting to know how good I am at different matchlengths. Beacause of

> >this i would find it

> >interesting with one ratingsystem for matchlengts 1-2 and maybe also

> >3-4, 5 to infinity or 3 to infinity. I realise a change needs many

> >shared opinions, so

> >anyone else share these suggestions ?

>

> On Tue, 06 Oct 1998 20:35:55 +0200, Dennis Nilsson

> <den...@student.csd.uu.se> wrote:

> >I have looked a little at the formula for calulating ratings

> >at fibs. Ofcause its nearly impossible to construct a perfect formula.

> >What I think is the weakest part of it is the calulation of the

> >probability that someone wins. In the formula is the match length one

> >factor

> >and it seems as when playing 1 or 2 pointers it's possible to get a much

> >higher rating than otherwise. I don't mind people who is getting high

> >rating because of playing short matches. However i would find it

> >interesting to know how good I am at different matchlengths. Beacause of

> >this i would find it

> >interesting with one ratingsystem for matchlengts 1-2 and maybe also

> >3-4, 5 to infinity or 3 to infinity. I realise a change needs many

> >shared opinions, so

> >anyone else share these suggestions ?

>

> I do have to say that the formula, though, doesn't quite seem right in

> this sense. The "random walk" theory by which the square root is used

> for match length assumes that each game is independent. But a 3-point

> match is very very different than playing one-point games, best

> 3-out-of-5.

> this sense. The "random walk" theory by which the square root is used

> for match length assumes that each game is independent. But a 3-point

> match is very very different than playing one-point games, best

> 3-out-of-5.

Yes, I think you've hit the nail right on the head. This effect seems to

be observed from time to time, and people make vague comments about what

is causing it without solidly advocating a single conclusion (very unusual

for r.g.b. posters :-) I agree with you that gammons and the cube destroy

independence between games in a match: there is a positive correlation

between each of the "points" won, and less than n degrees of freedom in an

n-point match. The theory assumes games are independent and therefore

underestimates the "skill" (ie. probability of the higher rated player

winning) in shorter matches and/or overestimates the skill in longer

ones (depending which length you assign as "correct").

It is very easy to derive examples or find empirical evidence supporting

this hypothesis. For instance:

- Consider 2 point matches. It should be fairly obvious that a 2-point

match is identical to a 1-point match, because the weaker player can

double immediately and be certain of having a greater probability of

winning the match than by potentially playing it out as 2 or more games.

So, it is clear that the "skill" in a 2-pointer is exactly the same

as that in a 1-pointer. But the Elo system will credit the underdog

with MORE (and the favourite with less) if he/she wins!

- Tom Keith shows that the above example generalises to longer matches

by comparing the probabilities the Elo system estimates against those

in a skill-adjusted match equity table in an article at:

http://www.bkgm.com/rgb/rgb.cgi?view+523

- Looking at real life data, we see the effect occurring in practice.

Peter Fankhauser looks at the matches in the Big Brother database

and summarises the results for different length matches at:

http://x7.dejanews.com/getdoc.xp?AN=135801990

(look at table 7).

- David Montgomery mentions the discrepancies in records of his own

matches of varying lengths at:

http://www.bkgm.com/rgb/rgb.cgi?view+44

- There was a thread at the end of last year with the subject

"rankings and ratings" with some interesting discussion (look for

Don Banks' and Chuck Bower's articles).

I think it's safe to conclude that the Elo system correctly predicts

the probabilities of the favourite winning only when the games in the

match are independent. For backgammon (with cubes and gammons), it is

vaguely adequate but nowhere near perfect. An ideal system would give

any accurately rated player an expected gain of zero when playing any

length match against another accurately rated player; Peter's analysis

(see above) shows that FIBS typically gives expected gains of 0.3

points to the favourite in 1 point matches (this will obviously depend

on the rating difference of the players). Since a 1 point match is

worth about 2 ratings points, this is an error of about 15%.

Coming up with a better scheme is pretty tricky. Existing skill-adjusted

match equity tables address the problem of players having different

probabilities of winning each game, but assumes efficient cube handling

by both players and does not account for the weaker player making

more cube errors. A truly accurate match equity table would have to

account for the fact that different strength players make different

kinds of cube errors (eg. the match equity table for a 7 point match

between a 2000 ranked player and an 1800 ranked player will be

different to that between a 1200 and a 1000, because the 1000 player

might be expected to make silly errors like failing to double immediately

trailing -2, -1 post-Crawford. This will throw the match equities

out of whack and violate another assumption of the Elo system, that

the winning probability depends only on the relative ratings of the

players and not the absolute ratings). When it comes down to it,

even attempting to measure someone's skill with a (scalar) rating is

a little bit presumptuous anyway; it fails to reflect the distinction

between cube skill and chequer skill, for instace. Scalars also imply

transitivity which we don't really have (A is a favourite against B

and B is a favourite against C does not necessarily imply A is a

favourite against C). Those are more problems than I'd care to solve!

When it comes down to it, applying the Elo system to backgammon is

trying to do something that can't really be done, so we shouldn't get

too upset if it doesn't get things quite right :-)

Cheers,

Gary.

--

Gary Wong, Department of Computer Science, University of Arizona

ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

Oct 8, 1998, 3:00:00 AM10/8/98

to

With all of that - not that Gary doesn't make some good points...

here's something I don't really know.

here's something I don't really know.

The key variable in the formula is the factor:

SQR(n) * (rating difference) / 2000

Now - is that 2000 factor empircally derived, or is it an assumption?

It would be relatively easy to test if you had enough data on resolved

matches.

Somehow I think that would have more impact than questions of whether

the sqr part of the formula is correct.

It's not 100% clear to me what the impact of the cube is in

multi-point matches. Think of it this way. Every decision in the

game gives the better player an opportunity to gain equity. We could

model a game by saying that on each turn, the favorite (F) gains some

random amount of equity by making better decisions. The amount will

vary - he will gain zero equity when his opponent rolls an opening

3-1, for example. He also usually gains zero equity on cube decisions

for at least the first few rolls.

It's clear that your average cube decision has more equity impact than

your average checker play. But the more games in a match, the more

checker plays, and the more opportunities for F to squeeze a little

extra equity out. Remember, F can find a correct double a weaker

player will never even think of, his opponent can make a foolish take,

and roll a joker to gammon F. Now, all F is left with is the

consolation that he played right, and wistful longing that he'd tried

to grind down his opponent.

But I diagress. I'd still like to know whether the 2000 is just an

assumption, or empirically tested. That would seem to be the first

thing to refine.

Oct 10, 1998, 3:00:00 AM10/10/98

to

In article <361c1b63.53764026@news> hankyou...@home.com (Hank Youngerman) writes:

[Paraphrasing]:

> What is the effect of the "2000" in "SQR(n) * (rating difference) / 2000"

> (from the FIBS rating formula)? Is it empirically derived?

2000 is actually a constant (call it c). If c is changed to 2000 * t (with t

> 0), the average rating will remain the same (approx. 1500). Ratings which

differ from the population average will be scaled away from the average by a

factor of t, i.e. new_rating = avg_rating + t * (old_rating - avg_rating).

e.g. if avg_rating = 1500, c= 400 (i.e. t = 0.2), then

new_rating = 1500 = 0.2 * (old_rating - 1500)

e.g. 2000 under the old ratings will correspond to 1600 under the new

ratings. 1000 under the old ratings will correspond to 1400 under the

new ratings. Old ratings between 1000 and 2000 have an equivalent new

rating between 1400 and 1600, which can be obtained by linear

interpolation.

Thus, the choice of c = 2000 only affects the spread of the ratings, while not

changing the ordering of "true" ratings. i.e. consider two players

(player1 and player2): if player1_true_rating > player2_true_rating for c =

2000 then player1_true_rating > player2_true_rating for any other c > 0

(and vice-versa).

Note that the above discussion is referring to one's "true" rating. For small

values of c there is a large amount of noise in the ratings system (since the

adjustment factor , 4 * K * SQRT (n) * P, is independent of c), i.e. ratings

will move relatively more quickly with a small c than with a large c. E.g.

in an extreme case, if c = 1, then the "true" ratings would likely range from

1499.75 to 1500.25, yet a 9-pt. match between two players of equal rating

would boost the winner's rating by 6 pts. (assume K = 1) and reduce the

loser's rating by 6 pts. Obviously in this case, the rating system would be

too unreliable for use. Many players would be massively (in a relative

sense) over- or under-rated.

I assume that 2000 was chosen so that there would be a reasonable spread

between the high and low ratings.

[Note: below I will define "ELO-style rating formula" to be a rating formula

similar to the FIBS one in which P_upset = 1 / (10^ (D * sqrt(n)/c) + 1), for

c > 0.)

Assuming that an ELO-style rating formula is appropriate (although there is a

lot of evidence to the contrary), c should be chosen so that the

match-adjustment [4 * K * SQRT (n) * P] moves/changes ratings at a relatively

slow (but not too slow rate). If c is chosen too low, then ratings will move

too fast, i.e. they will be too volatile and thus unreliable. If c is chosen

too high then the rating system will take a very long time to correct the

ratings of those who are significantly under- or over-rated.

Personally, I think that the FIBS ratings system is a bit too volatile. I

would like to see c = 4000. Better yet, to avoid having to scale everyone's

mean-adjusted rating by 2 overnight (and thus alarming many new users), we

can equivalently just change the match-adjustment factor to [2 * K * SQRT (n)

* P].

As noted by some of the empirical evidence referenced in Gary Wong's recent

post, an ELO-style rating formula is not robust over the possible match

lengths. Perhaps a better solution (requiring more housekeeping) would be to

have separate rating formulas for different match lengths. Perhaps there

could be five different ratings: one for 1-pt. matches, one for 2-pt. matches,

one for 3-6 pt. matches, one for 7-16 pt. matches, and one for 17+ pt matches.

Having a separate category for 2-pt. matches may be a little controversial

since among expert players a 2-pt. match is virtually identical to a 1-pt.

match, however among novices, there is still room for cube strategy. :-)

Even better, the value of t (the scaling factor - see the first line of my

post) can be empirically set (different for each of the 5 rating groups) so

that the spread (high rating - low rating) in each of the 5 ratings is

approximately the same. Perhaps one could even be assigned an "overall

rating" which would be the average of each of the 5 ratings (or maybe with

only 50% weighting on the 1-pt. and 2-pt. ratings, i.e. overall_rating = .125

r(1) + .125 r(2) + .25 r(3-6) + .25 r(7-16) + .25 r(17+)). This would mean

that a player has to be good at both small-length matches as well as

long-length matches in order to have a good overall rating.

Under most backgammon rating systems that I've seen, a player who plays

perfectly in 1-pt. matches (and who plays only 1-pt. matches) can obtain an

extremely high rating, even if he is awful in cube strategy (i.e. since he

will never have to make a cube decision). My proposal would remedy this

problem.

Just my $0.02

Chris

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu