Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Rating system suggestions

7 views

Skip to first unread message

Murat Kalinyaprak

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

During the past days, there has been some discussions
about the obvious deficiencies in the current rating
systems. Without repeating the points already made,
I'll get right to my suggestions.

Start by dumping the overly complicated, pretentious
formulas and use one that is at least simple enough
to get rid of those ridiculous decimal digits in the
ratings.

For example, start new players at an arbitrary entry
rating like 1000 which sounds a nice round number to
me. Then pick another number to be used as a player's
"window" size; for this 100 (=10% of 1000) sounds a
good number to me also. It may be better for this to
be percentage giving (i.e. 200 at 2000 rating) since
number of players at the extremities may be sparce.
Then limit players to choose opponents within their
"window" only and count each game simply as 1 point.

Thus, a new player's window would be 950-1050. After
he wins his first 5 point match, his rating would be
1005 and his "window" would slide up to 955-1055. If
some players' rating go up to 6000, let it be. If some
players' ratings fall below 0, just keep track of them
as negative numbers. It is possible that at the upper
and lower brackets the number of players may become
too sparce to allow enough selection (or in fact gaps
may happen in such a way that some player's "window"
may get detached from the rest). If/when that happens
just adjust the span of those players' "windows" to
encompass a minimum number of (i.e. 50) players.

This "sliding window" scheme will eliminate the need
to worry about and compute who is how likely to win
against whom. Experience factor should also be done
away with. It's meaningless when somebody who played
bg for 10, 20, 30 years gets on a certain server and
starts at experience 0. Who cares if Joe learned as
much and became as good a player in 6 months vs. Jim
who did in 16 years...?

Keep all players' ratings in 3 different numbers as
1723/1644/1802 for lackgammon/backgammon/jackgammon
which are obviously different enough to cause the
arguments made during the past days about how people
use 1pt, 2pt, etc. matches to their advantage to boost
their ratings.

This is it. Simple and I belive would eliminate all
the shortcomings mentioned that I have read so far.

Details like what if Joe rated at 1900 wants to play
with Jane rated at 2200 because they also enjoy each
other's cyber-company as much as bg may be discussed
and solutions like non-rated games may be offered. I
would, however, be against refining too much for fear
that it would eventually turn back into one of the
currently criticised systems. Of course, additional
statistics may be kept to any extent desired without
making them part of future computations.

In a related area, I would also suggest that dropped
matches be forced to resume by the server. This could
be achieved by not letting the dropping party play
with anybody else (when the other party is logged on)
until the dropped game is resumed/finished. To prevent
the other side from dragging his feet also, when both
players are on-line, the server could keep that player
from starting another game and autoexecute for them an
invite/accept sequence.

I would like to hear what deficiencies anybody may see
in my suggestions or what refinements they may have to
propose...?

thehub

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

<snip>

>Start by dumping the overly complicated, pretentious
>formulas and use one that is at least simple enough
>to get rid of those ridiculous decimal digits in the
>ratings.

>This is it. Simple

<Snip>

Sounds much more complicated than the present system to me.

>Details like what if Joe rated at 1900 wants to play
>with Jane rated at 2200 because they also enjoy each
>other's cyber-company as much as bg may be discussed
>and solutions like non-rated games may be offered.

There are already a system for unrated games (on fibs), i.e. unlimited
matches.

<snip>

>In a related area, I would also suggest that dropped
>matches be forced to resume by the server. This could
>be achieved by not letting the dropping party play
>with anybody else (when the other party is logged on)
>until the dropped game is resumed/finished. To prevent
>the other side from dragging his feet also, when both
>players are on-line, the server could keep that player
>from starting another game and autoexecute for them an
>invite/accept sequence.

Again, sounds complicated. Who will determine and how will be determined
the dropper and droppee (sp?)?

>
>I would like to hear what deficiencies anybody may see
>in my suggestions or what refinements they may have to
>propose...?
>
>MK

I don't like a system that limits who I am allowed to play. Although you
offer no formula it seems to me that such a system would distort the idea of
a uniform rating system, resulting in a myriad of meaningless little
"window" ratings. If you want to regulate who plays whom, why not a system
that forces you to play other players of ALL rating levels?

How could limiting free interaction among players add to the collective
experience?

Any rating system (as well as any set of rules for a game) is completely
arbitrary. As long as everyone lives under the same set of rules, I am
perfectly satisfied with the present system.

No system is perfect and can be abused. As for those who would bend and
twist the system to inflate their own egos, they will live with that
knowledge.

A rating is, after all is said and done, just a rating.

thehub

Hank Youngerman

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

Although (a) I rarely play on FIBS anymore and (b) with the advent of
highly-talented computer programs for backgammon I can get a very
strong game anytime I want.......

I think that I would be horrified if I were online and noticed Kit
Woolsey or Kent Goulding or a similarly highly-ranked player online,
was able to get them to give me a game, and then was told by the
server "Sorry, you're not good enough to play them. That "cure" would
be far worse than any disease.

It's also occurred to me that playing a 2-point match should work in
favor of the WEAKER player. Suppose I'm playing Kit in a 2-point
match. By doubling immediately, I do two things: (a) I increase the
luck factor by giving him only one game, rather than two, to use his
superior checker play skills, and (b) I eliminate any possibility for
him to better evaluate the cube decisions.

The theory of doubling immediately in 2-point matches is mainly based
on the idea that both players will play perfectly. In practice,
that's not usually what happens. You can still safely postpone
doubling so long as you have no market losers. If you do have some
among the 1296 possible two-roll combinations, your equity loss is
proportional to how many sequences there are; if there are just a very
few (like a 17-to-1 hit followed by a 35-to-1 flunk) that's not really
a big deal.

And when playing a 2-point match against a much weaker player, I'm not
even sure that the ability to turn the cube when it's to your
advantage is that helpful. Say your opponent is 60% to win the game
and doesn't double because he doesn't know it's the theoretically
correct play. You've gained some equity. But would YOU really double
in the same situation? I know others disagree with me on this, but I
would hold the cube. I want two games to grind him down, all else
being the same, rather than one, and he might take even AFTER I've
lost my market, especially if I haven't lost it by much.

Even some bad players are clever enough to know that turning the cube
against a good player gives them some "mix-it-up" vig. OK, not many.

All in all, I'd rather be the weaker player in a 2-point match than
the stronger.

On Sun, 11 Oct 1998 07:09:23 GMT, mu...@cyberport.net (Murat
Kalinyaprak) wrote:

>During the past days, there has been some discussions
>about the obvious deficiencies in the current rating
>systems. Without repeating the points already made,
>I'll get right to my suggestions.
>

>Start by dumping the overly complicated, pretentious
>formulas and use one that is at least simple enough
>to get rid of those ridiculous decimal digits in the
>ratings.
>

>Details like what if Joe rated at 1900 wants to play
>with Jane rated at 2200 because they also enjoy each
>other's cyber-company as much as bg may be discussed

>and solutions like non-rated games may be offered. I
>would, however, be against refining too much for fear
>that it would eventually turn back into one of the
>currently criticised systems. Of course, additional
>statistics may be kept to any extent desired without
>making them part of future computations.
>

Vince Mounts

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

>
>This "sliding window" scheme will eliminate the need
>to worry about and compute who is how likely to win
>against whom.

I believe that is the entire purpose of a rating system. To predict who will
win. Because that is another way of saying "who is the better player". Your
system makes no sense what-so-ever. And according to your system if I play
someone rated -1000 i get the same ratings bump I would if I played someone
rated +10,000. This only makes the situation much worse not better. And does
nothing to "fix" the "problems" with the 2 point matches either(newbies can
still be taken advantage of).

>Experience factor should also be done
>away with. It's meaningless when somebody who played
>bg for 10, 20, 30 years gets on a certain server and
>starts at experience 0.

I beleive the experience factor is used to tell the ratings formula when to
"stabilize". It is also a good indicator of wether this person may be a
dropper or not. A high rating in a short time can be an indication of a
dropper. A high experience also shows that a person is less likely to have
"survived" on the server if they are and still be a dropper.

>Keep all players' ratings in 3 different numbers as
>1723/1644/1802 for lackgammon/backgammon/jackgammon
>which are obviously different enough to cause the
>arguments made during the past days about how people
>use 1pt, 2pt, etc. matches to their advantage to boost
>their ratings.

Oh great, more of that ignorant theory that the cube is somehow less
skillful. Get a a clue Murat. Just because you don't like it doesn't mean it
takes less skill(and therefore labeled with stupid names). In fact I still
hold that the combination of perhaps slightly less checker play skill plus
cube skill still adds up to more required skill in the cubefull game. In
fact double/play-on decisions along with "should I blitz with the cube in my
hand or play safer because I have the cube" etc.. types of checker play
decisions probably more than make up for any lack of skill needed in the
straight checker game as you claim.

>
>This is it. Simple and I belive would eliminate all
>the shortcomings mentioned that I have read so far.

It makes up for none of the shortcomings.

Doesn't sound like it would work. Imagine this very possible scenario.
Player A is playing a match. Player B logs in (player B has a match to
resume with A that A left for very valid reasons such as a bad network
connection or an emergency or maybe his boss walked in his office). B is
playing on his lunch hour but can't play because A is in the middle of a
match. It wasn't B's fault that A had an emergency but now can't play
becuase he doesn't have enough time to wait on A and then play (again B is
on lunch hour or another time limited situation). How is this fair to
anyone? Also consider the fact that when A finishes the match in progress
that he may absolutly _need_ to leave. The automatic invite/accept idea just
won't work. The only way it would work would be to modify the idea so that
if both players are logged on _and_ in the ready state at the same time then
the person cannot play another match. Of course droppers will just go to
"not ready" or log out. It solves nothing.

The only reasonable way top solve the dropper problem I saw on VOG. If you
need to leave you can A) request to postpone the match and you opponent
decides to accept or not. B) leave the "table" and therefore resign the
match. But even this has the problem that your opponent (especially if
losing) might just decide to be a jerk and not allow a postpone so that you
have to resign. The other downfall is that a lost connection is not an
automatic postpone (which is neccessary in order to not penalize people for
a bad internet connection). The problem here is that someone can just kill
their internet connection and still be a dropper. Droppers are a rare
situation that I think we should all just live with. I know that I have 2
droppers on my saved games list right now at FIBS and I think I can live
with it.

Please spare us your half-baked ideas about

OSMAN

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

Murat Kalinyaprak wrote:
>
> During the past days, there has been some discussions
> about the obvious deficiencies in the current rating
> systems. Without repeating the points already made,
> I'll get right to my suggestions.
>
> Start by dumping the overly complicated, pretentious
> formulas and use one that is at least simple enough
> to get rid of those ridiculous decimal digits in the
> ratings.

[snip]

But there is a reliable rating system already in existance and has been
field tested, validated, and improved upon over the decades of rigorous
use. The chess rating system had to be fairly reliable since the great
majority of the "official" tournaments use "Swiss system" that heavily
depends on the participants rating and is only successful if the ratings
are reasonably accurate.

The BG rating systems already use a formula similar to that of chess
(i.e., ELO), but we are adopting it only partially. My earlier
suggestion of rating only tournament matches (5 points or more) and
adopting the "provisional rating" system to replace the 1500 or 1600
baseline starting point will move us considerably ahead.

Why re-invent the wheel when there is already a sytem that works?

--
Osman F. Guner
os...@prodigy.net
http://pages.prodigy.net/osman

Gary Wong

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

mu...@cyberport.net (Murat Kalinyaprak) writes:
> Start by dumping the overly complicated, pretentious
> formulas and use one that is at least simple enough

Whoa! Since when was the rating formula pretentious? In my opinion
the Elo system IS simple, to the point of being beautiful. You can
derive it from only two assumptions:

1) That ratings differences are linearly transitive, ie. if A is 150 points
"better" than B, and B is 75 points "better" than C, then A must be
225 points "better" than C. This may sound trivial, but is actually
fundamentally important.

2) That ratings differences are interpreted logarithmically as the odds
of the favourite winning a single trial. For instance, if in some
particular Elo system, 500 points was odds of 2:1 for the favourite,
then 1000 points must mean odds of 4:1, 2000 points implies 16:1, etc.

(You need additional assumptions to cope with different length matches, but
that's not important right now. Personally I think BOTH assumptions are
technically invalid, but that's besides the point :-) You need to make
some assumptions or you can't have a system at all; and these two are as
good as any.)

So there you have it: the Elo system summed up in 8 lines. Given that
his system can be deduced from 8 lines of assumptions, and your
explanation of your "improved" system required over 40, I think the
Elo system is MUCH simpler! Not only that, but the simplicity of the
assumptions means you can effectively regard them as axioms in a
simple formal system, and derive all sorts of useful theorems and
prove certain properties about the system.

> to get rid of those ridiculous decimal digits in the
> ratings.

Why are decimal digits ridiculous? They're necessary so that short term
rating "signals" (of the order of 2 points) aren't influenced by rounding
"noise" (of the order of 0.01 points, if you keep 2 decimal places).

> For example, start new players at an arbitrary entry
> rating like 1000 which sounds a nice round number to
> me. Then pick another number to be used as a player's
> "window" size; for this 100 (=10% of 1000) sounds a
> good number to me also. It may be better for this to
> be percentage giving (i.e. 200 at 2000 rating) since
> number of players at the extremities may be sparce.
> Then limit players to choose opponents within their
> "window" only and count each game simply as 1 point.

Hang on, I don't like this one bit! A good rule of thumb when designing any
service is to provide _mechanism_, not _policy_. (In this context, mechanism
is rating players at all; policy is dictating who they can and cannot play
with.) Who are you to decide who I'm allowed to play against? Some of my
most enjoyable games of backgammon have been against opponents significantly
stronger or weaker than me (by as much as 400 points). I would resent any
policy that denied me games against them!

> I would like to hear what deficiencies anybody may see
> in my suggestions or what refinements they may have to
> propose...?

One deficiency I see in your system (besdies those above) is that it is
"unstable". A very nice property of Elo ratings is that they predict
the probability of the favourite winning in any match; this allows you
to assign the "reward" for winning or losing to have an expected value
of zero. That is, if two players are at their "true" ratings (ie. the
prediction is perfectly accurate), then they don't expect to win or
lose anything by playing. This is true regardless of their ratings!
(Loosely speaking, if an underrated player plays a match, her expected
gain is positive; if an overrated player plays then his expected gain
is negative. Essentially, this equilibrium effect keeps the system
stable.)

In your system however, a player at a "true" rating of 1000 would be a
slight favourite against a player with a "true" rating of 950.
However, your system does not adjust the reward to reflect this! The
favourite would therefore have a POSITIVE EXPECTED GAIN, however
slight, which makes the system unstable. By repeatedly playing whoever
was ranked the 50th best player in the system, the best player would be
a favourite to win points (even if it's only 0.01 points per game), and
could rack up an arbitrarily large rating, limited only by the number
of games played. The numbers therefore become meaningless.

I also notice that you provide no facility for playing tournament-style
backgammon (ie. matches), nor using the cube. While it's certainly fair
to provide some means for two players who do not like using the cube to
play a cubeless game, I think you ought to allow tournament backgammon
for those of us who happen to prefer playing that way!

If you are interested in the mechanics of rating systems, I suggest
you read Prof. Arpad Elo's book, "The Rating of Chessplayers, Past and
Present". Cross out "chess" and write "backgammon" if you prefer :-)

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

OSMAN

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

Murat Kalinyaprak wrote:
[snip]
> >Why re-invent the wheel when there is already a system that
> >works?
>
> The question is: does it really work? I think that
> sometimes it's better to toss the old baggage and
> "re-invent"...
>
> MK

No disagreement here (philosophically, that is)... I remain corrected.
One can not score progress without critically examining and re-examining
the status quo.

On the other hand, science is full of redundant and wasted resources
merely because some scientist fails to fully investigate and absorb the
existing alternatives before "re-inventing."

Murat Kalinyaprak

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

In <362097...@prodigy.net> OS...@prodigy.net wrote:

>Murat Kalinyaprak wrote:

>> During the past days, there has been some discussions
>> about the obvious deficiencies in the current rating
>> systems. Without repeating the points already made,
>> I'll get right to my suggestions.

>The BG rating systems already use a formula similar to
>that of chess (i.e., ELO), but we are adopting it only
>partially. My earlier suggestion of rating only tournament
>matches (5 points or more) and adopting the "provisional
>rating" system to replace the 1500 or 1600 baseline
>starting point will move us considerably ahead.

I have read your articles and found your suggestions
reasonable and possibly quite useful to improve on
the existing systems. I just didn't want to rehash
all previous arguments made and although under a new
thread title, I indicated that my suggestions were
in following up to what was previously argued on the
subject.

>Why re-invent the wheel when there is already a sytem that

Murat Kalinyaprak

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

In <3620a5d7.696432@news> hankyou...@home.com wrote:

>I think that I would be horrified if I were online and
>noticed Kit Woolsey or Kent Goulding or a similarly
>highly-ranked player online, was able to get them to
>give me a game, and then was told by the server "Sorry,
>you're not good enough to play them. That "cure" would
>be far worse than any disease.

It looks like many readers misunderstood what I
suggested. Under the system I proposed you would
be able to play against anybody, including the
players named above, but if there is too big of
a difference between your and their ratings, the
result of the match will not be used for rating
purposes.

Murat Kalinyaprak

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

In <6vq7mk$d9e$1...@samsara0.mindspring.com> thehub wrote:

>>This is it. Simple

><Snip>

>Sounds much more complicated than the present system to me.

Maybe the way I explained made it sound complicated...?

What I'm proposing is pretty much this: as long as you
play against opponents within plus or minus 5% of your
rating, 1 game = 1 point. There are no other variables.
What's complicated about this...? If you don't like it
for other reasons, I would have no problem with it...

>>Details like what if Joe rated at 1900 wants to play
>>with Jane rated at 2200 because they also enjoy each
>>other's cyber-company as much as bg may be discussed
>>and solutions like non-rated games may be offered.

>There are already a system for unrated games (on fibs),
>i.e. unlimited matches.

Yes, I know and I played many unrated games myself. I
was just giving an example on how such arguments could
be countered, whether by already existing mechanisms,
their modified versions or by altogether new ones. For
example, this could be implemented as simply as the
server allowing players to invite whomever they please
but not rate the match if the difference between their
ratings exceed a certain percentage (optionally with a
curtesy notice to both players at the beginning of the
match that the match will not be rated).

>>In a related area, I would also suggest that dropped
>>matches be forced to resume by the server. This could
>>be achieved by not letting the dropping party play
>>with anybody else (when the other party is logged on)
>>until the dropped game is resumed/finished. To prevent
>>the other side from dragging his feet also, when both
>>players are on-line, the server could keep that player
>>from starting another game and autoexecute for them an
>>invite/accept sequence.

>Again, sounds complicated. Who will determine and how

>will be determined the dropper and droppee (sp?)?

I agree that this issue can get really complicated
and since it's a different/secondary issue anyway,
I will not pursue it any further but will leave it
to others to discuss it if they wish.

>I don't like a system that limits who I am allowed to play.
>Although you offer no formula it seems to me that such a
>system would distort the idea of a uniform rating system,
>resulting in a myriad of meaningless little "window" ratings.

Maybe you are misunderstanding because possibly I
didn't do well explaining things. There is not a
"myriad of windows". All it means that players who
are beyond a certain distance above or below your
rating will not be "visible" to you for purposes
of playing rated games with them...

This concept is adopted in innumerable kinds of
competitions (expecially where luck in not a big
factor).

>If you want to regulate who plays whom, why not a system
>that forces you to play other players of ALL rating levels?

I don't understand what would be the benefit of
this at all...?

>How could limiting free interaction among players add to
>the collective experience?

Limitation only exists for and applies to ratings.
I'm not suggesting any other limitations on who
can play with whom.

>Any rating system (as well as any set of rules for a game)
>is completely arbitrary. As long as everyone lives under
>the same set of rules, I am perfectly satisfied with the
>present system.

Fine. I fully respect your opinion also.

>A rating is, after all is said and done, just a rating.

It's not because it matters all that much to me
that I'm offering suggestions but because I see
that it seems to matter to a lot of people. I'm
just trying to pitch in my little contribution,
hopefully for the better...

Murat Kalinyaprak

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

In <6vqggl$e4d$1...@samsara0.mindspring.com> Vince Mounts wrote:

>>This "sliding window" scheme will eliminate the need
>>to worry about and compute who is how likely to win
>>against whom.

>I believe that is the entire purpose of a rating system.
>To predict who will win. Because that is another way of
>saying "who is the better player".

The better of two players is the one who wins the
match. Based on that you may choose to predict
that he is likely to win the next match also, for
certain purposes, at a certain extent. What I
find idiotic is the practice of allowing a rated
match between a 1900 and a 1200 rated player and
then trying to make up for it by adjusting the
points earned or lost by each player using fancy
computations.

Since some writers resort to analogies with sports
quite often, let me try to use one. Let's say we
want to match two boxers for a million dollar bout.
If we did it using FIBS rating formula, we would
end up putting a 140 pound guy against Mike Tyson,
compute that his chances of winning is 0.02% and
based on that split the prize as 10,000 dollars
for M. Tyson if he wins and 990,000 for the skinny
dude if he wins. Of course, this is for a 1-round
match. If it's a multi-round match, then we would
multiply those amounts by the square root of the
number of rounds. I don't mean to offend anyone
but I find the concept/practice uselessly complex.

>Your system makes no sense what-so-ever.

Maybe beacause you didn't understand well enough...?

>And according to your system if I play someone rated
>-1000 i get the same ratings bump I would if I played
>someone rated +10,000. This only makes the situation
>much worse not better.

According to my system, you won't be able to play
rated matches against opponents with ratings 5%
(or something similar) above or below your rating.
Sorry, but your playing against someone rated
-1000 or +10,000 is simply out of question... :)

However, you are right that players would earn/lose
the same amount of points (i.e. 1/game) whether
they played the lowest or the highest rated player
within their "window". Nobody has to buy it, but
it's true that my proposition compromizes some of
the current accuracy for more simplicity. I believe
that the loss won't become significant because the
player/s who appear at the bottom of your window are
not obligated to let you pick on them if that's what
you would decide to do... This is so, because they
themselves would be seeing other players at the
bottom of their own windows, which in turn they may
try to pick on also...

>And does>nothing to "fix" the "problems" with the 2 point
>matches either(newbies can still be taken advantage of).

This is correct. My proposal does at least include
rating 1 point and multi-point matches separately.
Small but nevertheless an improvement, isn't it?

>I beleive the experience factor is used to tell the
>ratings formula when to "stabilize".

Current systems are criticized for not achieving
this quickly enough. My proposition does away
with it completely for the sake of simplicity but
a more efficient mechanism similar to what's used
in chess rating systems (as suggested by others)
may be incorporated into what I proposed.

>>Keep all players' ratings in 3 different numbers as
>>1723/1644/1802 for lackgammon/backgammon/jackgammon
>>which are obviously different enough to cause the
>>arguments made during the past days about how people
>>use 1pt, 2pt, etc. matches to their advantage to boost
>>their ratings.

>Oh great, more of that ignorant theory that the cube

>is somehow less skillful. Get a clue Murat. Just

>because you don't like it doesn't mean it takes less
>skill(and therefore labeled with stupid names).

Try to understand what you read first before you
react so strongly. My proposition says nothing
about one being more or less skillful than the
other. It simply differentiates among them. Don't
just take my word for it either. Arguments made
by other people and in fact the current formulas
themselves prove that they are different enough
to warrant adjustments in calculations. Rating
them separately would be much simpler and make
more sense to the horses than multiplying X by
the square root of Y, etc. etc...

>Please spare us your half-baked ideas about

Are they really so bad to deserve the term "spare
us"...? I wouldn't say I don't care at all, but I
can't care about "sparing you" to the point of
not writing what I want in this newsgroup. If you
find them so useles, you have the option of not
reading what I write...

Kevin Dickover

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

On Mon, 12 Oct 1998 00:28:43 GMT, mu...@cyberport.net (Murat
Kalinyaprak) wrote:

>In <6vq7mk$d9e$1...@samsara0.mindspring.com> thehub wrote:
>
>>>This is it. Simple
>
>><Snip>
>
>>Sounds much more complicated than the present system to me.
>
>Maybe the way I explained made it sound complicated...?
>
>What I'm proposing is pretty much this: as long as you
>play against opponents within plus or minus 5% of your
>rating, 1 game = 1 point. There are no other variables.
>What's complicated about this...? If you don't like it
>for other reasons, I would have no problem with it...

As a rule, when I log on to FIBS I search for opponents that are
within 100 points of me one way or the other because I find that I get
a better game that way. Frequently an opponent is not available.
What do I do then under your system? Do I wait for someone to finish
playing? Do I log off? Do I make appointments to play only within my
class?
In the real (well, as real as FIBS gets :-) ) I look for someone
out of my normal range or accept any invitation that I get. When I
log on to FIBS I do so to play backgammon, not to improve my rating.
That may be sort of unusual but I think more people are playing on
FIBS to play then to get higher ratings (or at least I hope that is
the case in my naive little way)

(KevinDickover on FIBS)

Solo Deo Favente,
Kevin Dickover

Murat Kalinyaprak

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

In <wtiuhqy...@brigantine.CS.Arizona.EDU> Gary Wong wrote:

>mu...@cyberport.net (Murat Kalinyaprak) writes:

>> Start by dumping the overly complicated, pretentious
>> formulas and use one that is at least simple enough

>Whoa! Since when was the rating formula pretentious?

Sorry, this my opinion of it...

>In my opinion the Elo system IS simple, to the point of
>being beautiful.

It may even be too beautiful for what it's being
tried to be used for. Maybe that's why I thought
it was pretentious. The real issue, however, is
whether it produces desirable results as applied.

>You can derive it from only two assumptions:

> 1) That ratings differences are linearly transitive,
> ie. if A is 150 points "better" than B, and B is
> 75 points "better" than C, then A must be 225
> points "better" than C. This may sound trivial,
> but is actually fundamentally important.

I haven't proposed anything to defeat this. To
the contrary, I'm trying to propose something
that would result in more accurate ratings and
thus better transitivity among them...

> 2) That ratings differences are interpreted
> logarithmically as the odds of the favourite
> winning a single trial. For instance, if in some
> particular Elo system, 500 points was odds of 2:1
> for the favourite, then 1000 points must mean odds
> of 4:1, 2000 points implies 16:1, etc.

The question is: why concentrate so much on the
odds of winning. Are we at the racetrack betting
on horses...? I'm surprised that arguments made
against me seem to highlight the luck factor in bg
again and again. At the horse races adjustments
made (such as making favorite horses carry extra
weights, etc.) because it's a matter of gambling
more than anything else. If you are going to let
1900 and 1200 rated people play for rating points,
maybe it would be better to adopt handicapping
formulas used in horse races than formulas used
in chess. I mean, what is that little X% chance
a 1200 rated player given based on? Surely not
skill, is it? If it's based on the possibilty of
him rolling "good dice", why not just let people
roll dice and do away with playing altogether...?

Sorry to say it but I think the current aproach
in bg rating looks too much of a gambler's (maybe
inadvertently made worse by scientist who love
the beauty of numbers for their own sake).

If we look at other competitions, like sports
which get used here often as examples, where else
do we see such applications as used in bg? Just
because they may be the underdogs, does the "Blue
team" get more points when they score a touchdown
against the "Green team"...? They have brackets
(various levels of leagues, etc.) similar to what
I suggested using the term "window". Within each
bracket, approximation in scored points is good
enough in most cases. Now, why won't they let a
high school team play in a national league? Aren't
there any scientists interested in those areas who
could devise beautiful formulas for them so that
they are compensated propertionately based on their
winning chances...? Couldn't it be better to not
allow something apparently wrong in the first place
rather than rectify it by intricate calculations...?

>So there you have it: the Elo system summed up in 8 lines.
>Given that his system can be deduced from 8 lines of
>assumptions, and your explanation of your "improved"
>system required over 40, I think the Elo system is MUCH
>simpler!

I think not even close. How can it be said to be
simple when players need "rating calculators" in
order to decide if they should invite/join...?
It relies on completely irrelevant/unreliable
variables such as "experience". The resulting
rating is a "combined rating" achieved through
*different* variations of bg, which leads to
arguments like who boosted his rating with 1pt
matches, etc.

You compare your 8 lines of assumptions to my
explaining the actual operation of the system
I proposed. If 40 lines was indeed enough for
you or others to get a fairly good idead of the
concept, I would think that it beats the volume
of some web pages and newsgroup articles which
explain the details of the FIBS rating system.

>> For example, start new players at an arbitrary entry
>> rating like 1000 which sounds a nice round number to
>> me. Then pick another number to be used as a player's
>> "window" size; for this 100 (=10% of 1000) sounds a
>> good number to me also. It may be better for this to
>> be percentage giving (i.e. 200 at 2000 rating) since
>> number of players at the extremities may be sparce.
>> Then limit players to choose opponents within their
>> "window" only and count each game simply as 1 point.

>Hang on, I don't like this one bit! A good rule of thumb
>when designing any service is to provide _mechanism_, not
>_policy_. (In this context, mechanism is rating players
>at all; policy is dictating who they can and cannot play
>with.)

I would prefer to see it as part of the mechanism
rather than get allergies from words like "policy",
but policy isn't inherently or necessarily always
bad either. If you prefer a chaotic service that
produces meaningless ratings, you have the right
to have that also...

>Who are you to decide who I'm allowed to play against?
>Some of my most enjoyable games of backgammon have been
>against opponents significantly stronger or weaker than
>me (by as much as 400 points). I would resent any
>policy that denied me games against them!

I did in previous responses to others but let me
clarify again that the limitation applies to rated
games only. Certainly I wouldn't even think about
preventing you from playing with your under or over
rated friends otherwise. The purpose of this is to
prevent a player feeding itself points from a second
account, etc. The question is whether we would want
a more accurate rating system, where rating actually
means what it supposed to mean. The alternative is
discussing in newsgroups on subjects like whether
"BROHAM"s or some other player's rating is inflated
or not, etc...

>In your system however, a player at a "true" rating of
>1000 would be a slight favourite against a player with
>a "true" rating of 950. However, your system does not
>adjust the reward to reflect this! The favourite would
>therefore have a POSITIVE EXPECTED GAIN, however slight,
>which makes the system unstable. By repeatedly playing
>whoever was ranked the 50th best player in the system,
>the best player would be a favourite to win points (even
>if it's only 0.01 points per game), and could rack up an
>arbitrarily large rating, limited only by the number
>of games played. The numbers therefore become meaningless.

Your first observation above is correct. I don't
see a considerable harm in giving up a little
accuracy for the sake of simplicity. I don't think
the results would be as drastic as you predict.
The resulting ratings may be called "meaningless"
by some absolute measure but compared to ratings
resulting from the current systems, they surely
couldn't be any more "meaningless".

What would prevent that, is the fact that "windows"
of all players would be dynamic and the inaccuracy
would be self-checking. Let me try to explain this
a little and see if it makes sense to you all.

Let's take as an example a player "A" rated at 1000
and a player "B" rated at 950. After "A" beats "B",
"B" will drop out of "A"'s window. Unless "B" can
constantly keep up with "A" in sliding his window
up by beating other players, "A" won't be able to
keep preying on "B". Yes, "A" can always try to
prey on the lowest rated player in his window, but
as he climbs up, at least he will have to prey on
higher and higher rated ones. This is much more of
an improvement that the current system that allows
a 1900 rated player to prey on 1200 rated players
*for ever*! Also, because of the inaccuracy itself,
"A" will never know for sure if he is an absolute
1000 rated player nor "B" is an absolute 950 rated
player. In this example "A" may really be a 964
rated player trying to prey on a 978 rated player
and may be in for a surprize loss (assuming an
absolute! link between rating and outcome)... I'm
sure others may work out the numbers better than
I could (which I didn't even attempt) but I think
that as long as the rounding error applies to all
players' ratings, they will in the most part
cancel each other out and not have any effect
beyond negligible. And if this can be shown with
hard numbers, we can say that more accurate and
meaningful ratings can be achieved while at the
same time eliminating the piddling with decimal
points, etc.

Personally, I would be happier to know my and
other players' ratings within 5% accuracy, than
having to read articles here about how a 2100+
rated player's true rating on FIBS should be
really around 1850's, bla, bla... Such things
alone make those rating systems pretty close
to worthless.

>I also notice that you provide no facility for playing
>tournament-style backgammon (ie. matches), nor using
>the cube. While it's certainly fair to provide some
>means for two players who do not like using the cube to
>play a cubeless game, I think you ought to allow
>tournament backgammon for those of us who happen to
>prefer playing that way!

I didn't propose anything to eliminate the existing
options. I simply suggested that ratings achieved
through what I consider 3 distinct varieties of bg
be itemized as 3 separate ratings. You would of
course be free to add them up and divide by 3 to
come up with a combined/overall rating...

>If you are interested in the mechanics of rating systems,
>I suggest you read Prof. Arpad Elo's book, "The Rating of
>Chessplayers, Past and Present". Cross out "chess" and
>write "backgammon" if you prefer :-)

I witheld from saying it in this newsgroup thus
far but I have to say bg ain't no chess... I think
bg players are stuck in a peculiarly tough spot
between luck and skill. I don't know if any bg
rating system (except mine:) can go beyond being a
cross between chessplayer rating and horse race
handicapping...

Gary Wong

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

mu...@cyberport.net (Murat Kalinyaprak) writes:
> In <wtiuhqy...@brigantine.CS.Arizona.EDU> Gary Wong wrote:
> >mu...@cyberport.net (Murat Kalinyaprak) writes:
> >> Start by dumping the overly complicated, pretentious
> >> formulas and use one that is at least simple enough
>
> >Whoa! Since when was the rating formula pretentious?
>
> Sorry, this my opinion of it...

OK, I guess this is probably a matter of opinion, there's not much point in
anybody trying to convince anybody else whether it is or isn't :-)

> > 1) That ratings differences are linearly transitive,
> > ie. if A is 150 points "better" than B, and B is
> > 75 points "better" than C, then A must be 225
> > points "better" than C. This may sound trivial,
> > but is actually fundamentally important.
>
> I haven't proposed anything to defeat this. To
> the contrary, I'm trying to propose something
> that would result in more accurate ratings and
> thus better transitivity among them...

Perhaps I should have been more explicit and said that (in conjunction with
assumption 2) you can show that ratings points are in some definite units.
In fact, the units are the ratio of the probabilities of winning. Ratios
are more often measured in decibels; it so happens that in the FIBS
implementation of the Elo system, 200 FIBS rating points are equal to 1dB.

In your system, ratings points don't directly correspond to anything at
all. (James points out that you are essentially proposing a _ranking_
system; in which the ORDER of ratings are meaningful, but the numbers
themselves are not.)

In particular, your system does NOT provide the transitivity property.
Let player A have a true rating of 1500 (where "true rating" means
"the limit on the expected value of the player's rating as the number
of games played increases without bound".) Assume that the
transitivity property holds; ie. X is a better player than Y if and
only if his true rating is higher. The trouble is, ANYBODY whose true
rating is above 1500 can achieve and maintain a rating of 1550 simply
by repeatedly playing against A! (To be precise, if you are better
than A, then the limit of your expected rating as the number of games
played increases without bound is A's rating plus 50, or whatever
"window" size you choose. I assume we all agree this is the case by
inspection; it could be shown formally with a stochastic model.)
Therefore, there is NO difference between a rating of 1501 and 1550.
This means that the former assumption is invalid and your system does
NOT guarantee transitivity.

In comparison, any Elo system DOES guarantee transitivity (subject to
the two assumptions given). True ratings ARE transitive, and every player's
rating does converge in probability to their true rating.

> > 2) That ratings differences are interpreted
> > logarithmically as the odds of the favourite
> > winning a single trial. For instance, if in some
> > particular Elo system, 500 points was odds of 2:1
> > for the favourite, then 1000 points must mean odds
> > of 4:1, 2000 points implies 16:1, etc.
>
> The question is: why concentrate so much on the
> odds of winning. Are we at the racetrack betting
> on horses...? I'm surprised that arguments made
> against me seem to highlight the luck factor in bg
> again and again.

I try very hard to avoid using the word "luck" when I don't mean it. I
definitely don't mean it in this case. I am NOT talking about dice, or
lucky rolls, or anything like that. I'm talking about probability only
because the quantity "the result of a game between A and B" is a random
event. At this level of abstraction, it doesn't even matter what
game it is! But appreciating that the outcome of ANY game (whether it's
backgammon, chess, soccer, or arm wrestling) is a random event is
vitally important. I would even go as far as to say that A is a better
player than B _BECAUSE_ A has a greater probability of winning. The
fact that the outcome of a game is uncertain is absolutely essential;
the mechanics of how the game is played is not. It is purely coincidental
that backgammon happens to include random elements (dice) as part of the
process of playing the game; it does not and should not affect the rating
system one bit.

> I mean, what is that little X% chance
> a 1200 rated player given based on? Surely not
> skill, is it? If it's based on the possibilty of
> him rolling "good dice", why not just let people
> roll dice and do away with playing altogether...?

No! The probability of a 1200-level player winning a game is ENTIRELY
determined by his (lack of) skill. When it comes down to it, "skill"
measures the ability of somebody to play a game without making mistakes.
The probability of the favourite winning a game of backgammon can be
predicted PERFECTLY (if you have a good enough model) by knowing the
distribution of mistakes each player is expected to make; "good dice"
has absolutely nothing to do with it.

> Sorry to say it but I think the current aproach
> in bg rating looks too much of a gambler's (maybe
> inadvertently made worse by scientist who love
> the beauty of numbers for their own sake).

I apologise if it looks that way. Perhaps I'm using misleading language or
something, I don't seem to be able to write clearly what I mean. Measuring
the outcomes of games is UNAVOIDABLY probabilistic, purely because the
result is uncertain. The Elo system isn't probabilistic merely to appeal
to bookmakers or because somebody with a mathematical fetish happened to
like it; it's the way it is because probabilities are an inherent,
unavoidable, integral, essential part of what we are measuring. Any metric
which ignores the probabilities is not doing justice to the system in
question.

If you restrict yourself to ignoring probabilities, the best ranking system
I can think of is the simple ladder system ("if you beat the player above
you, switch places with them"). It's even simpler than the Elo system and
makes only one assumption (that a total ordering exists on the "skill" of
all players). Unlike your system, it is stable (the expected ranking of
players does converge to the "true" ranking, given abitrarily many games),
but you can say NOTHING about players except statements like "X is better
than Y". There's no way to say how MUCH better X is.

> If we look at other competitions, like sports
> which get used here often as examples, where else
> do we see such applications as used in bg? Just
> because they may be the underdogs, does the "Blue
> team" get more points when they score a touchdown
> against the "Green team"...?

Of course not! The match is played exactly the same way, regardless of the
players. The only thing that differs is our interpretation of the match
results. If Brazil beat Stuart Island (a small island in New Zealand which
I presume does not have a particularly strong soccer team) at soccer, our
reaction is "so what?"; but if Stuart Island beat Brazil then we would think
"wow! Perhaps those guys are better than we thought they were!". The Elo
system just goes that little bit further and allows us to predict that we
expect (for instance) 70 "so what?"s for every "wow!" :-)

> I think not even close. How can it be said to be
> simple when players need "rating calculators" in
> order to decide if they should invite/join...?

I would argue that they don't. Perhaps some people like them, or are
curious as to what the numbers are, but ideally there is no benefit
in accepting or declining games based on the ratings involved.

> It relies on completely irrelevant/unreliable
> variables such as "experience".

It doesn't _rely_ on experience for anything. The FIBS implementation of
the Elo system happens to use a experience as a heuristic (ie. players
with low experience have their ratings changes scaled to be larger than
normal in an attempt to converge quickly to their "true" rating), but this
is purely an extra optimisation. You could omit the experience calculation
entirely and it wouldn't change the ratings system one bit (except that
new players would take longer to reach their "true" rating).

> The resulting
> rating is a "combined rating" achieved through
> *different* variations of bg, which leads to
> arguments like who boosted his rating with 1pt
> matches, etc.

Yes, your criticism is quite valid. Loosely speaking, it results from
a third assumption (that separate points in a single match are
stochastically independent) which is unjustified. Maintaining separate
ratings for different length matches, as you suggest, or "fixing" the
predictions with (for example) match equity adjusted probabilities
would go a long way to correcting this effect.

> >In your system however, a player at a "true" rating of
> >1000 would be a slight favourite against a player with
> >a "true" rating of 950. However, your system does not
> >adjust the reward to reflect this! The favourite would
> >therefore have a POSITIVE EXPECTED GAIN, however slight,
> >which makes the system unstable. By repeatedly playing
> >whoever was ranked the 50th best player in the system,
> >the best player would be a favourite to win points (even
> >if it's only 0.01 points per game), and could rack up an
> >arbitrarily large rating, limited only by the number
> >of games played. The numbers therefore become meaningless.
>
> Your first observation above is correct. I don't
> see a considerable harm in giving up a little
> accuracy for the sake of simplicity. I don't think
> the results would be as drastic as you predict.

Why not? Loosely speaking (and ignoring cheating for the moment), the only
way to improve your rating in an Elo system is to play better than your
current rating predicts you will. In other words, you expect your rating
to go up if and only if you are underrated. In your system though, you
expect your rating to go up if and only if you are the favourite against
your opponent.

> >If you are interested in the mechanics of rating systems,
> >I suggest you read Prof. Arpad Elo's book, "The Rating of
> >Chessplayers, Past and Present". Cross out "chess" and
> >write "backgammon" if you prefer :-)
>
> I witheld from saying it in this newsgroup thus
> far but I have to say bg ain't no chess... I think
> bg players are stuck in a peculiarly tough spot
> between luck and skill. I don't know if any bg
> rating system (except mine:) can go beyond being a
> cross between chessplayer rating and horse race
> handicapping...

In my opinion, there should be no difference (from the point of view of
the rating system) between rating chess players, backgammon players,
horses, whatever. The rating system doesn't know or care what it's being
applied to. The Elo system depends only on the two assumptions I gave
at the beginning (plus the third one -- that points in a match are
stochastically independent -- in the case of varying length matches).
Backgammon, chess, and horse racing both fulfill the first two assumptions
fairly well (close enough to use in practice, anyway). Chess, but not
backgammon, fulfills the third. This effect leads to the only significant
deficiency we see in the FIBS implementation of the Elo system (that
the higher or lower rated player may be favoured, depending on the
match length). The rest of the system isn't broken, so we shouldn't try to
fix it.

Murat Kalinyaprak

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

In <362b7190....@news.erols.com> Kevin Dickover wrote:

> As a rule, when I log on to FIBS I search for opponents
> that are within 100 points of me one way or the other
> because I find that I get a better game that way.

Ok, so you are self-imposing a "window" (bracket)
similar to what I proposed for rated matches. In
fact, I was hoping that what I proposed would suit
most players based on my assumption that most are
probably already doing it by their own choice.

> Frequently an opponent is not available. What do I do
> then under your system? Do I wait for someone to finish
> playing? Do I log off? Do I make appointments to play
> only within my class?

If you wanted the result of that match to apply
toward your rating, then the answers to any/all
those questions could/would be "yes".

> In the real (well, as real as FIBS gets :-) ) I look
> for someone out of my normal range or accept any
> invitation that I get. When I log on to FIBS I do so
> to play backgammon, not to improve my rating.

In that case, you wouldn't feel any more restricted
because you could play unrated matches against any
opponent you wish and/or find available just as you
are doing now.

> That may be sort of unusual but I think more people are
> playing on FIBS to play then to get higher ratings (or
> at least I hope that is the case in my naive little way)

I myself don't mind my rating (at least yet) and at
the moment play almost as though I was conducting
some experiment and just to play rated or unrated
matches with people I came to know/like to various
degrees. I share your hope but I think most people
do care about their ratings quite a bit. I don't
necessarily see it as something good or bad, and
I'm participating in this discussion not for my
own sake but for the sake of the subject itself.

Murat Kalinyaprak

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

In <wtbtnhx...@brigantine.CS.Arizona.EDU> Gary Wong wrote:

>mu...@cyberport.net Murat Kalinyaprak writes:

>In your system, ratings points don't directly correspond to
>anything at all. (James points out that you are essentially
>proposing a _ranking_ system; in which the ORDER of ratings
>are meaningful, but the numbers themselves are not.)

Are you guys trying to baffle me with word plays...? :)

>In particular, your system does NOT provide the transitivity
>property. Let player A have a true rating of 1500 (where "true
>rating" means "the limit on the expected value of the player's
>rating as the number of games played increases without bound".)
>Assume that the transitivity property holds; ie. X is a better
>player than Y if and only if his true rating is higher. The
>trouble is, ANYBODY whose true rating is above 1500 can achieve
>and maintain a rating of 1550 simply by repeatedly playing
>against A! (To be precise, if you are better than A, then the
>limit of your expected rating as the number of games played
>increases without bound is A's rating plus 50, or whatever
>"window" size you choose. I assume we all agree this is the
>case by inspection; it could be shown formally with a stochastic
>model.) Therefore, there is NO difference between a rating of
>1501 and 1550. This means that the former assumption is invalid
>and your system does NOT guarantee transitivity.

Your argument is a "beautiful" one, so I quoted it in
its entirety. Now comes my counter-argument:

Your system does NOT guarantee transitivity either.
Based on the whole numbers you used as examples, you
don't differentiate between a rating of 1501.1 and
1501.2! I hope there is no need to repeat back your
entire above argument by just replacing the numbers.

It's all merely a matter of degree of comfort. I feel
comfortable with 5% accuracy and find FIBS ratings to
2 decimal accuracy an overkill (i.e. "pretentious").
But I could just as well turn around and argue that
in the current system there is NO difference between
a rating of 1501.0000001 and a rating of 1501.0000002
and that therefore it does NOT guarantee transitivity.

>In comparison, any Elo system DOES guarantee transitivity
>(subject to the two assumptions given). True ratings ARE
>transitive, and every player's rating does converge in
>probability to their true rating.

Sorry, but unless I'm completely misunderstanding
what you are saying, it's all boiling down to degree
of accuracy. I'll also skip many arguments below
for the same reason.

>> I mean, what is that little X% chance
>> a 1200 rated player given based on? Surely not
>> skill, is it? If it's based on the possibilty of
>> him rolling "good dice", why not just let people
>> roll dice and do away with playing altogether...?

>No! The probability of a 1200-level player winning a game
>is ENTIRELY determined by his (lack of) skill. When it
>comes down to it, "skill" measures the ability of somebody
>to play a game without making mistakes. The probability of
>the favourite winning a game of backgammon can be predicted
>PERFECTLY (if you have a good enough model) by knowing the
>distribution of mistakes each player is expected to make;
>"good dice" has absolutely nothing to do with it.

I must say that I admire your courage in using (and
capilazing!) words like "ENTIRELY", "PERFECTLY", etc.
when talking about a game that involves dice and in
which we can't even determine what a "mistake" is...
(Actually there is a philosophical argument that
"mistakes" always happen in the past, never at the
present or in the future, thus we really shouldn't
be able to say that someone "will" make a mistake).

Personally, I prefer a more practical approach while
giving myself a little more room for my possibility
of being wrong. Since I played on FIBS, I remember
(hopefully accurately) one occasion where I played
22 1-point matches against a player rated about 200
points below me. I won 19 of them hands down. Even if
we were to make some allowance because I may not have
yet reached a rating to reflect my cubeless skills,
experiments like this are enough for me to think that
letting players with ratings 300, 400, 500, 600, 700
points apart play for points is a very bad idea...

By your intricate calculations you may argue that a
1200 rated player has a 0.01 or 1.03 or 3.72 (or
whatever) percent chance of winning against a 1900
rated player, but to me all such numbers would for
bg purposes round to nothing more than zero...

>If you restrict yourself to ignoring probabilities, the
>best ranking system I can think of is the simple ladder
>system ("if you beat the player above you, switch places
>with them").

This is a little too drastic compared to what I'm
proposing, which merely compromizes some accuracy
and not eliminate the role of probabilities... In
other wors, instead of calculating probabilities
for each pair of opponents, you "bracket" them
(where the probabilities still apply) in groups
withing which fractions of 1 point can be ignored
(to whatever degree of comfort). In what I termed
a "window", I group ratings from 950-1050 while
in the system you are defending each rating is a
"window" in itself (i.e. grouping ratings 950-950,
951-951, etc.) Even narrower (fractional) "windows"
are also possible, of course...

I should also stress that a player's "window" as
I'm proposing is extremely dynamic (i.e. moves
up or down after each match). A player is never
in the same "window" for more than 1 match. I
hope nobody is understanding it as players moving
from one to another bracket in chunks of 50 or
100 points. This is not the case! Depending on
the length of a match, it can move by as little
as a single point...

>Of course not! The match is played exactly the same way,
>regardless of the players. The only thing that differs is
>our interpretation of the match results. If Brazil beat
>Stuart Island (a small island in New Zealand which I
>presume does not have a particularly strong soccer team)
>at soccer, our reaction is "so what?"; but if Stuart Island
>beat Brazil then we would think "wow! Perhaps those guys
>are better than we thought they were!". The Elo system
>just goes that little bit further and allows us to predict
>that we expect (for instance) 70 "so what?"s for every
>"wow!" :-)

No. It's not 70 "so whats" for every "wow". Maybe
you meant 70 wins to 1 win ratio...? Because the
difference between a "so what" and a "wow" is
already based on and conveys that ratio. It's just
a different way of measuring/expressing the same
thing, perhaps much less accurate/elegant/etc.
than the ELO formula but sufficient enough for
its purpose.

Imagine that Brazil beats Stuart Island 69-1...
Poor Stuart Island would still watch Brasil go
to the world finals, while if they used the ELO
formula their feat would translate to a 70-69
win and they would go to the world finals instead.
If somebody were to put a bug in their ear, maybe
Stuart Island would start campaining for adoption
of ELO by FIFA (world soccer association)... :)

BTW: does anybody play for money using ELO...?

>The rest of the system isn't broken, so we shouldn't try
>to fix it.

Based on the discussions going on, I got the
impression that it wasn't working. I guess it
all could depend on what is called "system"
also. Maybe what you call "system" is not too
bad but that its implementations have failed.
In that case maybe just a doze of "policy" may
help quite a bit. Incidentally, I had reacted
to that word the way I did because what you
called "policy" could be integrated into my
ratings formula as a few arithmetical operations.
So, if anybody decides to integrate it into the
current formula, I won't call it "policy" either.

Ian Shaw

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

Murat Kalinyaprak wrote in message <6vv3rc$5p2$1...@news.chatlink.com>...

>Imagine that Brazil beats Stuart Island 69-1...
>Poor Stuart Island would still watch Brasil go
>to the world finals, while if they used the ELO
>formula their feat would translate to a 70-69
>win and they would go to the world finals instead.
>If somebody were to put a bug in their ear, maybe
>Stuart Island would start campaining for adoption
>of ELO by FIFA (world soccer association)... :)
>

>
>MK

FIFA does use an ELO type system to produce a rating of each international
football teams. However, it is not actually used for anything important. The
top 32 teams in the rankings do not necessarily play in the World Cup.
To qualify for the World Cup Finals, you have to beat your opponents. If
Stuart Island lose to Brazil, it's Brazil that go to the party whether the
score was 70-1, 70-69 or whatever.

(There are tie-breakers on goal difference etc, but I'm simplyfying for
clarity.)

In backgammon, you can enter a competition whatever your rating, and to win
it you have to beat your opponents. Any rating change is irrelevant, to the
extent that it is usually calculated after the event.

Regards,
Ian Shaw

Phill Skelton

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

Ian Shaw wrote:

> It's not 70 "so whats" for every "wow". Maybe
> you meant 70 wins to 1 win ratio...? Because the
> difference between a "so what" and a "wow" is
> already based on and conveys that ratio. It's just
> a different way of measuring/expressing the same
> thing, perhaps much less accurate/elegant/etc.
> than the ELO formula but sufficient enough for
> its purpose.
>
> Imagine that Brazil beats Stuart Island 69-1...
> Poor Stuart Island would still watch Brasil go
> to the world finals, while if they used the ELO
> formula their feat would translate to a 70-69
> win and they would go to the world finals instead.
> If somebody were to put a bug in their ear, maybe
> Stuart Island would start campaining for adoption
> of ELO by FIFA (world soccer association)... :)

No, the result would still be 69-1 to Brazil and Brazil would
still go to the finals. But in the unlikely event of the Stuart
Islands beating Brazil, their standing in the world would go up
by a large amount, wheras they don't go down very far by losing
to Brazil because that's what everybody expects to happen. The ELO
system doesn't alter the outcome of a match.

Phill

Ian Shaw

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

Phill Skelton wrote in message <362361...@sun.leeds.ac.uk>...

>Ian Shaw wrote:
>
>> It's not 70 "so whats" for every "wow". Maybe
>> you meant 70 wins to 1 win ratio...? Because the

:
:
:

>> Stuart Island would start campaining for adoption
>> of ELO by FIFA (world soccer association)... :)
>

>No, the result would still be 69-1 to Brazil and Brazil would
>still go to the finals. But in the unlikely event of the Stuart
>Islands beating Brazil, their standing in the world would go up
>by a large amount, wheras they don't go down very far by losing
>to Brazil because that's what everybody expects to happen. The ELO
>system doesn't alter the outcome of a match.
>
>Phill

Murat wrote that bit in >> marks, not me. I agree with what you're saying.
The ratings are just a bit of fun which may (hopefully) give some guide as
to how good people are relative to each other. The real event are the
football/bg matches. I don't think anybody puts much store by FIFA rankings
BTW; it's just for media hype.

Ian

Chuck Bower

unread,

Oct 13, 1998, 3:00:00 AM10/13/98

In article <362361...@sun.leeds.ac.uk>,
Phill Skelton <!remove!this!ph...@sun.leeds.ac.uk> wrote:

>Ian Shaw wrote:
>
>> It's not 70 "so whats" for every "wow". Maybe
>> you meant 70 wins to 1 win ratio...? Because the
>> difference between a "so what" and a "wow" is
>> already based on and conveys that ratio.

(snip)

No. Ian didn't write that (and the rest of the quote which
Phil included but I have snipped). Murat did. Ian just
quoted what Murat said. I know it can get confusing, but we all
need to be careful in crediting (or discrediting) the person who
actaully posted the words.

Chuck
bo...@bigbang.astro.indiana.edu
c_ray on FIBS

Kevin Dickover

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

On Tue, 13 Oct 1998 06:08:00 GMT, mu...@cyberport.net (Murat
Kalinyaprak) wrote:

>In <362b7190....@news.erols.com> Kevin Dickover wrote:
>

>
>> Frequently an opponent is not available. What do I do
>> then under your system? Do I wait for someone to finish
>> playing? Do I log off? Do I make appointments to play
>> only within my class?
>
>If you wanted the result of that match to apply
>toward your rating, then the answers to any/all
>those questions could/would be "yes".
>

I'm afraid that would be unacceptable. The reality of the situation
is that I more often end up playing someone out of my window then in
my window. The window is just where my preferances lie. Secondly one
of the reasons people play higer rated people is to learn from their
technique and while I'm not a ratings hog, when I do defeat a player
that is rated by 100 points or so better then me (I nearly always play
5 point matches BTW, so luck is minimized) I do like my rating to
reflect that victory. Conversely, when I am defeated by a weaker (by
rating) opponant I think that it is only fair that I take a ratings
hit. If my decision making process is so clouded that I get whipped
(because I play whenI am tired) or my opponant has grown as a player
recently the ratings change is appropiate.

>> In the real (well, as real as FIBS gets :-) ) I look
>> for someone out of my normal range or accept any
>> invitation that I get. When I log on to FIBS I do so
>> to play backgammon, not to improve my rating.
>
>In that case, you wouldn't feel any more restricted
>because you could play unrated matches against any
>opponent you wish and/or find available just as you
>are doing now.
>

I think I would be restricted and I know I would feel restricted.
When I have an opportunity to play an rgb poster I always take the
time to invite. Your system would limit that. And while I do not
love my rating I do like the system.

In fairness, your system *may* prevent things like jf_level_8 or
(fill FIBS Cheater/manipulater in here) it still won't stop droppers
and other ratings hawks. They will just have to be little more
creative. It strikes me that the benefits of a switch will not
outweigh the losses.

Murat Kalinyaprak

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

I, Murat Kalinyaprak, had written:

and thanks to Ian Shaw and Phill Skelton (perhaps
also others that I haven't seen their responses)
for correcting me on this. What I tried to express
ended up having no relation to reality, which is
that Stuart Island would get no points unless they
beat Brasil, of course.

Lou Poppler

unread,

Nov 11, 1998, 3:00:00 AM11/11/98

On 12 Oct 1998 17:33:34 -0700, Gary Wong <ga...@cs.arizona.edu> wrote:
: mu...@cyberport.net (Murat Kalinyaprak) writes:

:> The resulting

:> rating is a "combined rating" achieved through
:> *different* variations of bg, which leads to
:> arguments like who boosted his rating with 1pt
:> matches, etc.

: Yes, your criticism is quite valid. Loosely speaking, it results from
: a third assumption (that separate points in a single match are
: stochastically independent) which is unjustified. Maintaining separate
: ratings for different length matches, as you suggest, or "fixing" the
: predictions with (for example) match equity adjusted probabilities
: would go a long way to correcting this effect.

Are there any specific suggestions for this ?
I happen to have a laboratory (NOBS) where we could experiment.

The current formula makes use of the square root of the match length,
in some non-linear way, to calculate the probability:
rating difference D=%10.6f
match length N=%d
Probability that underdog wins: Pu=1/(10^(D*sqrt(N)/2000)+1)=%8.6f

There exists a set of match equity tables adjusted for the ratings
difference between the players, computed by Tomas Szabo. (The link
to these, http://sg3.organ.su.se/~tsz/equity.html, in Turner's webpage is
not working, but I have a copy at http://www.msen.com/~lwp/szabo.tar.gz ).
For example, here is his table for ratings difference of 350:

Rating difference: 350.00

P L A Y E R 2 N E E D S

1 2 3 4 5 6 7 8 9 10 11 12 13
P
L 1 .599 .777 .840 .892 .911 .945 .957 .972 .978 .986 .989 .993 .994
A 2 .430 .638 .736 .796 .859 .901 .928 .948 .962 .973 .981 .986 .990
Y 3 .359 .565 .669 .736 .804 .855 .891 .918 .938 .955 .966 .975 .981
E 4 .288 .507 .618 .690 .760 .817 .859 .890 .916 .936 .952 .963 .972
R 5 .258 .436 .554 .634 .709 .771 .819 .856 .888 .913 .932 .947 .959
6 .198 .377 .496 .580 .660 .728 .781 .823 .859 .889 .912 .931 .945
1 7 .172 .332 .449 .534 .616 .686 .744 .789 .830 .863 .890 .912 .930
8 .134 .292 .407 .494 .576 .648 .708 .757 .800 .837 .867 .892 .913
N 9 .118 .255 .366 .452 .534 .608 .671 .723 .770 .809 .843 .871 .894
E 10 .092 .222 .327 .412 .495 .570 .635 .690 .739 .782 .818 .849 .875
E 11 .081 .195 .294 .377 .459 .534 .600 .657 .708 .753 .792 .826 .854
D 12 .063 .170 .264 .345 .424 .499 .566 .625 .678 .725 .766 .802 .833
S 13 .055 .150 .237 .315 .392 .467 .534 .593 .648 .697 .740 .778 .811

from which we could presumably use the diagonal list of probabilities
.599, .638, .669, ..., .811 for matches of 1 to 13 points.

Does anyone feel qualified to comment on Tomas' calculations,
or to propose a formula for calculating the underdog's probability
of winning a match, given the match length and ratings difference ?

If we can develop a credible alternative, I will put it into NOBS for
people to try out.

-- Spider

Jeffrey Mendelsohn

unread,

Nov 11, 1998, 3:00:00 AM11/11/98

The below is, I believe, a bad idea. The equity table approach does
not account for the better player using the doubling cube better. In
fact, it gives exactly zero weight to this factor. The current FIBS
system - if there was no cube advantage for the better player -
implies longer matches are in favor of the LOWER rated player. The
better player's use of the cube is assumed to make up the difference.

I'm not saying that the square root of match length is an appropriate
value; just that I believe it is better than a pure equity table
approach.

I think the equity table approach modified to account for better use
of the cube is a method to test. However, I don't have any good ideas
on what the weighting should be.

- Jeff Mendelsohn

>The current formula makes use of the square root of the match length,
>in some non-linear way, to calculate the probability:
>rating difference D=%10.6f
>match length N=%d
>Probability that underdog wins: Pu=1/(10^(D*sqrt(N)/2000)+1)=%8.6f
>
>There exists a set of match equity tables adjusted for the ratings
>difference between the players, computed by Tomas Szabo. (The link
>to these, http://sg3.organ.su.se/~tsz/equity.html, in Turner's webpage is
>not working, but I have a copy at http://www.msen.com/~lwp/szabo.tar.gz ).
>For example, here is his table for ratings difference of 350:

.
.
table deleted
.
.

0 new messages