FIBS:Keeping the rankings credible

WHill

unread,

Feb 1, 1999, 3:00:00 AM2/1/99

to

Hi FIBSTERS

This is my fourth year playing at FIBS and I still have no real idea who
the best players are. Excluding the bots there are about 10 people who I
would say are definitely better players than I am, I've played about 500
different players in total. About 100 of the others are a similar standard
to myself, the rest being noticably poorer. Problem is, that this does not
bear much relation to the current rankings. Regular opponents who beat me
consistantly are often lower ranked.

So, is the ranking list of any real value at the moment? I would say no.
There are too many people manipulating their rankings, too many people
playing within closed groups, and a small group of people just plain
cheating.

So, what's the answer? Well I think publishing the results of the top 100,
and all those entering the top 100, would be a good start. That way, any
cheating becomes obvious, and the good players stand out. Surely this
information could be easily filtered from the FIBS log and displayed at the
fibs.com web site. All you need is a filter on for example "xxxx wins a
....." or "...match against xxxx" where xxxx is each of the top names in
sequence. This could be updated weekly or monthly.

We had a very stange set of occurances this week surrounding the ranking of
the bot MonteCarlo, whereby at the start of the week it's ranking had
dropped to 1722 but as of yesterday had climb back up to a spectacular
1975. Now for an inconsistant human that might not be so surprising, but
surely for an ultra consistant bot it is very strange. Can anyone explain?
When I posed the question on FIBS it was recieved with general apathy, only
one person even hazarding that it might be the run of the dice. But surely
if that were so it would make the ratings next to useless. Or is it perhaps
that we are using an inapropriate ratings system adapted from a pure skill
game like chess?

I would like to add my congratulations to Jerry Grandell for his stunning
wins in Istanbul, I was amused to here someone comparing his style to that
of Jellyfish, after my little story of a few months back:-) But if the
reports of Jerry's victories over the last few years are correct then he
has far surpassed anything Jellyfish could hope to achieve:-) One up for
the humans I say!

Bill Hill, illium on FIBS

Patti Beadles

unread,

Feb 1, 1999, 3:00:00 AM2/1/99

to

In article <B2DBA9E09...@whills.demon.co.uk>,

WHill <ill...@whills.demon.co.uk> wrote:
>We had a very stange set of occurances this week surrounding the ranking of
>the bot MonteCarlo, whereby at the start of the week it's ranking had
>dropped to 1722 but as of yesterday had climb back up to a spectacular
>1975. Now for an inconsistant human that might not be so surprising, but
>surely for an ultra consistant bot it is very strange. Can anyone explain?

This isn't at all surprising, really.

Ed Rybak did some work a while ago wherein he simulated a player who
played only five-point matches against similar opponents, and concluded
that a 100-point rating swing in either direction from "true" rating
was well within the norm.

Therefore, it's not at all surprising that a bot who plays 1-pointers
against all comers would have a 250-point swing.

-Patti
--
Patti Beadles |
pat...@netcom.com/pat...@gammon.com |
http://www.gammon.com/ |
or just yell, "Hey, Patti!" | Quisque comoedus est.

Gary Wong

unread,

Feb 1, 1999, 3:00:00 AM2/1/99

to

ill...@whills.demon.co.uk (WHill) writes:
> This is my fourth year playing at FIBS and I still have no real idea who
> the best players are. Excluding the bots there are about 10 people who I
> would say are definitely better players than I am, I've played about 500
> different players in total. About 100 of the others are a similar standard
> to myself, the rest being noticably poorer. Problem is, that this does not
> bear much relation to the current rankings. Regular opponents who beat me
> consistantly are often lower ranked.

Could you please post any data you have about these games? I'd be
particularly interested to see information of the form "against
opponent X, I won only 17 of the 43 5-point matches we played, even
though I am currently rated 125 points higher". Even better would be
many samples of each player's ratings over the course of those
matches.

> So, is the ranking list of any real value at the moment? I would say no.
> There are too many people manipulating their rankings, too many people
> playing within closed groups, and a small group of people just plain
> cheating.

That's true. But those are all problems that cannot be addressed by a
rating system in any case. There will always be ways to cheat, and
some of them cannot possibly be detected by the FIBS server. If people
choose to play within closed groups, you cannot possibly derive a total
ordering of the players from the results of their games. So we shouldn't
pretend we can solve these problems; we know it's impossible.

> So, what's the answer? Well I think publishing the results of the top 100,
> and all those entering the top 100, would be a good start. That way, any
> cheating becomes obvious, and the good players stand out. Surely this
> information could be easily filtered from the FIBS log and displayed at the
> fibs.com web site. All you need is a filter on for example "xxxx wins a
> ....." or "...match against xxxx" where xxxx is each of the top names in
> sequence. This could be updated weekly or monthly.

The good players and cheaters were generally apparent on the FIBS Ratings
Reports (good players are more likely than normal to appear under
La Creme de la Creme and to some extent Best New Players; cheaters are
more likely than normal to appear under both of those and especially
On The Way Up).

Whether it's worth wasting any attention on cheaters is a different
question :-)

> We had a very stange set of occurances this week surrounding the ranking of
> the bot MonteCarlo, whereby at the start of the week it's ranking had
> dropped to 1722 but as of yesterday had climb back up to a spectacular
> 1975. Now for an inconsistant human that might not be so surprising, but
> surely for an ultra consistant bot it is very strange. Can anyone explain?

I don't think it really needs an explanation, other than a shrug of one's
shoulders and a mention that that sort of behaviour is to be expected from
time to time. All FIBS ratings are subject to sampling error -- that is
unavoidable. My estimate is that the standard error of a rating is around
50 points (somebody could derive a more accurate parameter either
analytically or empirically, if they felt it was worth the effort).
Loosely speaking, that means that if you take a large sample of ratings,
you can expect about 95% of those samples to fall within +/-100 points of
each player's "true" rating (where the true rating is what your observation
would tend to, if you took the mean of n independent samples of a player's
rating as n increased without bound). Bots play about 1000 matches a week,
and I assert that samples taken 1000 matches apart can be considered
stochastically independent for practical purposes. The standard error when
comparing the difference between two independent samples is probably around
70 points: a discrepancy of 250 points between two samples is very unlikely,
to be sure, but not unheard of; when those two samples are self-selected it
isn't quite so strange after all. (If you took samples every week, and
found that this happened significantly more frequently than predicted,
then I would be very surprised and admit that something was probably going
very wrong.)

Of course this result does not rule out a systematic bias affecting MC's
rating (perhaps the software changed during that week; perhaps it played
a large number of matches against underrated or overrated players; astrologers
might claim that two blue moons in three months is interacting with MC's
star sign to give it spectacularly (un)lucky dice :-). But since the
sampling error is large enough to plausibly account for the observation,
we shouldn't jump to conclusions and invoke some external influence as
the only possible justification for what we see.

(As a side note: it pays to remember that the standard error is very large
whenever drawing conclusions from rating observations. Now and then I see
posts reading along the lines of "My rating went up 30 points in one night
so I'm obviously improving quickly; but I started playing lower ranked
opponents and it went down by 15 points, so the dice are clearly biased in
favour of the weaker player". Anybody looking at figures like those is
measuring pure noise. There is NO WAY that example observation indicates
anything whatsoever.)

> When I posed the question on FIBS it was recieved with general apathy, only
> one person even hazarding that it might be the run of the dice.

Sorry to receive your question with general apathy here as well, but I agree
that the run of the dice could quite possibly result in such an observation :-)

> But surely
> if that were so it would make the ratings next to useless. Or is it perhaps
> that we are using an inapropriate ratings system adapted from a pure skill
> game like chess?

I believe the FIBS system is as useful as can be expected for general
purpose backgammon ratings. The trouble is that a short backgammon
match is a very poor indicator of skill (ie. the result of the match
yields very little information -- there's a little bit of signal and a
lot of noise). Technically, a rating is an _estimator_, a mapping of
a set of samples onto an estimate of a parameter. Estimators from noisy
samples are _inefficient_; they have large variances unless we can get
a big sample size. Note that this problem isn't caused by FIBS or the
particular rating system it uses or anything like that -- it's an
inherent restriction on _anything_ we derive from the results of
matches! In fact it is possible to show that there is a limit on the
accuracy with which we can estimate ratings, no matter what system we
use; there is an _absolutely efficient estimator_ which gives the lower
bound on the variance of all estimators we could possibly imagine.
Loosely speaking, the maximum likelihood method is absolutely efficient
(see http://www.cs.arizona.edu/~gary/backgammon/elo.html) -- the system
used by FIBS isn't this efficient, but in my opinion is a reasonable
compromise. We can always record data ourselves and calculate a
MLE, if we want to.

So therefore I claim that, given workarounds to the problems we know
FIBS does not address (eg. we ignore cheaters, and require players whose
ratings we compare to play opponents and match lengths generally
representative of those on FIBS), the ratings are more or less as
useful as we can get. The problem of ratings being inefficient
estimators is not because the rating system is inappropriate; it's a
direct result of the nature of the matches whose outcomes we are
measuring!

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

OSMAN

unread,

Feb 1, 1999, 3:00:00 AM2/1/99

to

WHill wrote:
>
> Hi FIBSTERS

>
> This is my fourth year playing at FIBS and I still have no real idea who
> the best players are. Excluding the bots there are about 10 people who I
> would say are definitely better players than I am, I've played about 500
> different players in total. About 100 of the others are a similar standard
> to myself, the rest being noticably poorer. Problem is, that this does not
> bear much relation to the current rankings. Regular opponents who beat me
> consistantly are often lower ranked.
>

> So, is the ranking list of any real value at the moment? I would say no.
> There are too many people manipulating their rankings, too many people
> playing within closed groups, and a small group of people just plain
> cheating.
>

> So, what's the answer? Well I think publishing the results of the top 100,
> and all those entering the top 100, would be a good start. That way, any
> cheating becomes obvious, and the good players stand out. Surely this
> information could be easily filtered from the FIBS log and displayed at the
> fibs.com web site. All you need is a filter on for example "xxxx wins a
> ....." or "...match against xxxx" where xxxx is each of the top names in
> sequence. This could be updated weekly or monthly.
>

> We had a very stange set of occurances this week surrounding the ranking of
> the bot MonteCarlo, whereby at the start of the week it's ranking had
> dropped to 1722 but as of yesterday had climb back up to a spectacular
> 1975. Now for an inconsistant human that might not be so surprising, but
> surely for an ultra consistant bot it is very strange. Can anyone explain?

> When I posed the question on FIBS it was recieved with general apathy, only

> one person even hazarding that it might be the run of the dice. But surely

> if that were so it would make the ratings next to useless. Or is it perhaps
> that we are using an inapropriate ratings system adapted from a pure skill
> game like chess?
>

> I would like to add my congratulations to Jerry Grandell for his stunning
> wins in Istanbul, I was amused to here someone comparing his style to that
> of Jellyfish, after my little story of a few months back:-) But if the
> reports of Jerry's victories over the last few years are correct then he
> has far surpassed anything Jellyfish could hope to achieve:-) One up for
> the humans I say!
>
> Bill Hill, illium on FIBS

There are several factors that can cause one's FIBS rating to be
considerably different than their "actual" strength, one of which has
been discussed quite at length on-and-off in this newsgroup is due to
the abuse of some playes to inflate their rating.

Excluding the above, what is more intriguing to me is the difference in
ratings of the rest of us who do not abuse the system. There are
factors like freely experimenting different strategies in casual games
that (but not in tournament games) may lead a series of losses for an
otherwise stronger player. There are differences due to cube play (or
lack of), due to gammons (or lack of at for examples DMPs), or due to
different length of play, etc.

I believe the FIBS rating system is quite reasonable, though I am sure
it can be improved. What is harder to accommodate in any rating system
is the daily changes in player's attitude (altough this too averages out
in the long term). Seeking an environment as stable as it can get to
establish a benchmark, I have decided to investigate the rating
differences of people who are playing in tournaments (as opposed to
casual plays). This reduces my pool to only those people who are
playing tournaments, but I thought it would be a reasonable compromise.
I made several assumptions (like eliminating the number of game
dependency in the rating system after establishing a minimum number of
games, etc.). I have also decided to adopt a much simpler ELO formula
that has been used for the correspondence chess ratings.

At any rate you can find the details of the explanation of the rating
formula at http://pages.prodigy.net/osman/ratingsystem.htm. The results
based on all the DavidE tournaments that I could get my hands on since
1994 and the annual rankings are listed at
http://pages.prodigy.net/osman/Davideratings.htm as well and may
constitude the partial answer to your question.

I apologize for calling the system "FIBS Tournament Ratings" as it is
not official, and can change it if needed. The top 15 players based on
their FIBS tournament performance (and the number of matches played) are
listed below:

1
oegger
1920/53
2
crlo
1876/29
3
dorn
1846/38
4
indianajones
1839/43
5
heinrich
1810/29
6
kitwoolsey
1804/48
7
SallySue
1790/25
8
Elric_Tsol
1752/36
9
ToddC
1733/29
10
natV
1730/43
11
md
1730/36
12
abenjamin
1726/26
13
ronkarr
1720/44
14
simonw
1707/45
15
jimmer
1703/36

The above are not necessarily the best players at FIBS; but they are
*reasonably accurately* the best performers at the DavidE tournaments
since 1994. Also note that only players with established ratings are
listed. There are many good players who have less than 25 matches with
higher ratings (you can see the full listing in the same URL above). As
the database gets bigger with more people with "established ratings" the
standings will become more accurate. One of the potential source of
inaccuracy is the the number of provisional rated players and unrated
players. These players dominate the early tournaments and cause a
slight inflation for the ratings of the top players. It is too early to
really analyze these results.

Meanwhile, I am about to complete the 1998 status (waiting for the
Morten's December Jackpot to finalize). The 1998 year end tables will
contain more people with established ratings and include more
tournaments (as I also include Morten's Jackpot tournaments into the
evaluation).

You may want to compare these tournament ratings with actual FIBS
ratings for the same people and see some significant deviations in some
players. These people may have a different attitude when they are
playing tournament games then when they are playing casual games. Other
than that, I wouldn't be surprized to see if the relative orders of the
players rankings based on the FIBS ratings and the tournament ratings
would approximately match each other. However, I would consider the
tournament ratings more accurate representation of their actual
strengths.

I would give the system a few more years before betting on it.
Nevertheless, take a look and feel free to use the data for your own
analysis.

Cheers...Osman

--
Osman F. Guner
os...@prodigy.net

wi...@wonka.com

unread,

Feb 2, 1999, 3:00:00 AM2/2/99

to

On Mon, 01 Feb 1999 18:58:40 -0100, ill...@whills.demon.co.uk (WHill)
wrote:

>Hi FIBSTERS
>
>This is my fourth year playing at FIBS and I still have no real idea who
>the best players are. Excluding the bots there are about 10 people who I
>would say are definitely better players than I am, I've played about 500
>different players in total. About 100 of the others are a similar standard
>to myself, the rest being noticably poorer. Problem is, that this does not
>bear much relation to the current rankings. Regular opponents who beat me
>consistantly are often lower ranked.
>
>

Thats because expert players like me, who have several log in names
ranging from
1100's right thru to the low 2000's often play people like you while
we use one of our "low rating" logins. My true rating fluctuates
between 1890 and 1920, and its quite satisfying taking rating points
of all you high ranked players from a lowly ranked 1250 player, and
then losing on purpose to other lowly ranked players. Its like robbing
from the rich and redistributing the rating points to the poor.
So next time you play a low ranked player, beware of the wolf in
sheep's clothing, watch how they play and you might learn something.

Gary Wong

unread,

Feb 2, 1999, 3:00:00 AM2/2/99

to

ill...@whills.demon.co.uk (WHill) writes:
> In article <pattibF6...@netcom.com>,

> pat...@netcom.com (Patti Beadles) wrote:
> >Ed Rybak did some work a while ago wherein he simulated a player who
> >played only five-point matches against similar opponents, and concluded
> >that a 100-point rating swing in either direction from "true" rating
> >was well within the norm.
>

> Dear me, now I am confused. So, I should really consider my rating to be
> about 1770 plus or minus 100, no, that assumes I'm sitting at my "real"
> rating.

Yes, that is a reasonable conclusion. We do not and cannot know that you
are sitting at your "real" rating (if we did, we could say your rating
was exactly 1779.62 and forget about the +/-100).

If FIBS currently estimates your rating at 1779.62, then my interpretation
is that the range 1680-1880 is a 95% confidence interval for your true
rating.

> Makes me wonder why we have digits after the decimal point, spurious
> accuracy surely?

It depends what you are using the value for. If you are reporting
your current rating to give an indiciation of your skill, then yes,
those digits are spurious. But note that FIBS still needs to maintain
more accuracy than you would normally use, just so it can update your
rating accurately. I doubt you'd be very happy if FIBS recorded your
rating as "roughly 1800"; you win a 1-point match against another 1800
rated player, which is worth 2 rating points; so it adds on 2 and
assigns you a new rating of "roughly 1800" :-)

> In fact, if we do away with the tens and units we could
> just have level 15, level 16, level 17 etc. Now, isn't that so much more
> chummy and equitable?:-)

Yes, that approach would be perfectly valid too. I believe that's
roughly how go players are graded; there are 30 kyu (pupil) ranks and
9 dan (master) ranks. There is a handicap system so that the stronger
player allows a handicap to the weaker, depending how many ranks
separate them.

> My real point about ratings for bots as opposed to humans was, we all know
> that it's easy for high ranked human players to drop rapidly, this could be
> due to, bad luck, inconsistancy or bordom. But bot, as far as I can tell,
> always play consistantly and of course never get bored. So, surely a bots
> rating should be much more stable than a humans? or am I missing something
> important?

I think the important point is that uncertainties in the results of
matches alone are sufficient to generate large fluctuations (ie. the
50 point standard error I'm assuming) in ratings. This is true
regardless of whether the player is a bot or a human! Inconsistency
and/or boredom may or may not have a significant effect on
ratings; but you don't need to invoke them to explain the large
variance ratings, whether for a bot or a human.

> My pet theory is that MonteCarlo has a fatal flaw which some clever players
> can exploit, they wait until MonteCarlo has harvested all the points from
> the novices and then step in and reap the rewards, maybe with a little bot
> help of their own. Why not publish the results and let everyone see?:-)

Yes, that theory seems consistent with observations so far. So do
some theories that FIBS is an elaborate hoax and/or psychological
experiment and/or (insert your favourite conspiracy theory here). So
does the theory that everything is quite normal and that the unusual
events we occasionally see are to be expected when we examine data as
noisy as FIBS ratings.

We seem to have many theories which are all capable of explaining the
observed effects. Which one do we assume? Me, I go with Occam's
Razor and prefer the simplest ("everything is quite normal") until I
find evidence inconsistent with that theory. And so far, I haven't
seen any.

Gary Wong

unread,

Feb 2, 1999, 3:00:00 AM2/2/99

to

ill...@whills.demon.co.uk (WHill) writes:
> 1) yes, I do have records of the match scores of more than a 1000 matches
> and over 3,000 individual games at the last count. They are in notebooks
> and currently being entered into a spreadsheet. I did not however track my
> opponents ratings as we played though.

Wow! I'd be interested to see the spreadsheet when you're finished, if
you want to share it. Did you record your _own_ rating at the time of
each game? That would be sufficient to calculate what the opponent's
rating was.

> 2) My view on cheaters is that you should name them and shame them, pretty
> feeble, but better than doing nothing. And of course you should actively
> look at how people are doing it and close a few loop holes.

Fair enough.

> 3) As far as ratings systems go, I already stated my preference for a
> system several months ago. And that would be for a ratingBot to rate you.
> In Chess for example, not everyone get to play Gary Kasparov, but a couple
> of dozen games against Chessmaster 5000 and you realise you'll never be in
> his league (will anyone!). Surely such a system would be much more accurate
> as it goes on the premiss that "good players play good moves" and virtually
> eliminates the vagueries associated with the dice.

By "having the bot rate you" I presume you mean a rating of the style
Snowie provides (which is essentially a measure of how closely your
preference for moves correlates to its own). I agree that this system
would be much more accurate (it works around the problems I described
last time by measuring a large number of moves, which provide a
significant amount of information; rather than merely the results of
matches, which are noisy and provide very little information).
However, it would not be as accurate when measuring players as good as
or better than itself (if a very strong player makes a move the bot
does not like, who is wrong -- the human or the bot?).

Now you've got me wondering about how well the bot's evaluation of
an opponent would correspond to its record of wins and losses against
that player. Perhaps I'll program a bot to record both of those
statistics one day.

WHill

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

In article <pattibF6...@netcom.com>,
pat...@netcom.com (Patti Beadles) wrote:

>Subject: Re: FIBS:Keeping the rankings credible
>From: Patti Beadles
>Organization: ICGNetcom
>Date: Mon, 1 Feb 1999 19:36:43 GMT
>Newsgroups: rec.games.backgammon

>
>In article <B2DBA9E09...@whills.demon.co.uk>,
>WHill <ill...@whills.demon.co.uk> wrote:

>>We had a very stange set of occurances this week surrounding the ranking of
>>the bot MonteCarlo, whereby at the start of the week it's ranking had
>>dropped to 1722 but as of yesterday had climb back up to a spectacular
>>1975. Now for an inconsistant human that might not be so surprising, but
>>surely for an ultra consistant bot it is very strange. Can anyone explain?
>

>This isn't at all surprising, really.
>

>Ed Rybak did some work a while ago wherein he simulated a player who
>played only five-point matches against similar opponents, and concluded
>that a 100-point rating swing in either direction from "true" rating
>was well within the norm.
>

>Therefore, it's not at all surprising that a bot who plays 1-pointers
>against all comers would have a 250-point swing.
>
>-Patti
>--
> Patti Beadles |
>pat...@netcom.com/pat...@gammon.com |
> http://www.gammon.com/ |
> or just yell, "Hey, Patti!" | Quisque comoedus est.

Hi

Dear me, now I am confused. So, I should really consider my rating to be
about 1770 plus or minus 100, no, that assumes I'm sitting at my "real"

rating. Maybe I'm sitting at the top of my rating and have just been really
lucky. That would mean my "real" rating was around 1670, or if I had a real
run of bad luck at 1570. But hang on! what if I'm just in a trough (always
seems like that), my "real" rating could be 1870 wow! And If I hit a real
lucky streak I could hit 1970 (don't worry I never get lucky streaks longer
than 40 points).

Makes me wonder why we have digits after the decimal point, spurious

accuracy surely? In fact, if we do away with the tens and units we could

just have level 15, level 16, level 17 etc. Now, isn't that so much more
chummy and equitable?:-)

My real point about ratings for bots as opposed to humans was, we all know

that it's easy for high ranked human players to drop rapidly, this could be
due to, bad luck, inconsistancy or bordom. But bot, as far as I can tell,
always play consistantly and of course never get bored. So, surely a bots
rating should be much more stable than a humans? or am I missing something
important?

My pet theory is that MonteCarlo has a fatal flaw which some clever players

can exploit, they wait until MonteCarlo has harvested all the points from
the novices and then step in and reap the rewards, maybe with a little bot
help of their own. Why not publish the results and let everyone see?:-)

Mark that down as way to cheat No 59:-)

Best Regards

Bill Hill, illium on FIBS.

WHill

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

Hi Again

Follow up to my original post and having read the posts by Gary Wong and
Osman, which were both very interesting.

Firstly, Gary:-

1) yes, I do have records of the match scores of more than a 1000 matches
and over 3,000 individual games at the last count. They are in notebooks
and currently being entered into a spreadsheet. I did not however track my
opponents ratings as we played though.

2) My view on cheaters is that you should name them and shame them, pretty

feeble, but better than doing nothing. And of course you should actively
look at how people are doing it and close a few loop holes.

3) As far as ratings systems go, I already stated my preference for a

system several months ago. And that would be for a ratingBot to rate you.
In Chess for example, not everyone get to play Gary Kasparov, but a couple
of dozen games against Chessmaster 5000 and you realise you'll never be in
his league (will anyone!). Surely such a system would be much more accurate
as it goes on the premiss that "good players play good moves" and virtually
eliminates the vagueries associated with the dice.

Secondly, Osman

You have some interesting information on top players, but I would be more
inclined to go with a system as stated in 3) above.

I have only played 4 players from your top 15 however, and I don't go in
for Internet Backgammon tournaments at present. I've played indianajones,
kitwoolsey, SallySue and RonKarr I think, but the last three only lightly.
It would be unfair to comment on them from such limited knowledge, but I
can see why they would be there. I'm delighted to see indianajones well up
in the rankings, as I've played him almost 30 matches now, and it confirms
my opinion. For a long while he was well in front 12 matches to 5 at one
stage, but now there are probably only a couple of matches between us, so
there is hope that I may be improving after all!:-)

Thanks for both your comments

Bill Hill, illium on FIBS

P.S. Impossabilities take only an instant, miracles need to be booked in
advance:-)

Murat Kalinyaprak

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

Gary Wong wrote in message ...

> rating observations. Now and then I see posts reading
> along the lines of "My rating went up 30 points in one
> night so I'm obviously improving quickly; but I started
> playing lower ranked opponents and it went down by 15
> points, so the dice are clearly biased in favour of the
> weaker player".

Come on, Gary, you can do better than this, can't you...?

I'm following this newsgroup pretty closely and haven't
seen anybody make the above related statement. It sounds
like a very badly distorted and mixed paraphrasing of
linked but separate comments I had made a few days ago.

I had made those comments after playing 1 point games
against about 200 new opponents all with ratings below
1550, in a span of 19 days. Of course, this in itself
may not necessarily prove anything and you are welcome
to argue so but don't try to make less of things than
what they are. Frankly, I'm very much irked by all the
"cheap-shot arguments", "unjustified sarcasm", etc. in
this newsgroup...

MK

Murat Kalinyaprak

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

Gary Wong wrote in news:wtvhhkr...@brigantine.CS.Arizona.EDU...

>ill...@whills.demon.co.uk (WHill) writes:

>> Surely such a system would be much more accurate as it
>> goes on the premiss that "good players play good moves"
>> and virtually eliminates the vagueries associated with
>> the dice.

> By "having the bot rate you" I presume you mean a rating

> of the style Snowie provides (which is essentially a
> measure of how closely your preference for moves correlates

> to its own). I agree that this system would be much more

> accurate (it works around the problems I described last
> time by measuring a large number of moves, which provide a
> significant amount of information; rather than merely the
> results of matches, which are noisy and provide very little
> information). However, it would not be as accurate when
> measuring players as good as or better than itself (if a
> very strong player makes a move the bot does not like, who
> is wrong -- the human or the bot?).

This argument points to a dilemma which is likely to
be discussed for some time to come. How do we decide
whether SW or other robot or human player is better
than any other robot or human player without looking
at the results of matches? If certain moves deemed to
be better by a certain player don't produce winning
results, how can they be claimed to be better moves?
Based on the "prediction" that in so many hundred
years and after so many billion games they will be
proven to be the better moves? If not by the argument
of a player's being "temporaryly lucky" (which will
wear away in time), what other argument/s could be
offered to say "A beats B but B is better than A"...?

MK

Murat Kalinyaprak

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

Gary Wong

unread,

Feb 3, 1999, 3:00:00 AM2/3/99

to

"Murat Kalinyaprak" <mu...@compuplus.net> writes:
> Gary Wong wrote in message ...

> > rating observations. Now and then I see posts reading
> > along the lines of "My rating went up 30 points in one
> > night so I'm obviously improving quickly; but I started
> > playing lower ranked opponents and it went down by 15
> > points, so the dice are clearly biased in favour of the
> > weaker player".
>

> may not necessarily prove anything and you are welcome
> to argue so but don't try to make less of things than
> what they are. Frankly, I'm very much irked by all the
> "cheap-shot arguments", "unjustified sarcasm", etc. in
> this newsgroup...

I'm sorry if I made up a bad example, but I assure you I'm not being
sarcastic.

The important thing I was trying to express in that paragraph was
that we should interpret any FIBS rating as a measurement with a
standard error of something like 50 points, so we all need to be
careful about making unjustified conclusions from ratings we see;
particularly when dealing with relatively small differences like
15, 30 or even 80 ratings points.

WHill

unread,

Feb 5, 1999, 3:00:00 AM2/5/99

to

In article <79afnh$2v...@taisp3.in-tch.com>,
"Murat Kalinyaprak" <mu...@compuplus.net> wrote:

>Subject: Re: FIBS:Keeping the rankings credible

>From: "Murat Kalinyaprak"
>Organization: in-tch
>Date: Wed, 3 Feb 1999 14:42:50 -0700
>Newsgroups: rec.games.backgammon

Hi again

I see I need to explain a little more fully. My idea would be that you
would play say a series of perhaps 20 9-point matches against a bot, I'm
assuming here that the strongest bot available would be used and it would
normally beat 99%+ plus of it's opponents. It would rate the opponents
using two factors 1) rating each individual move against it's own ranked
list of moves 2) taking into account the results of each match. If however,
the opponent wins more than 10 of the twenty matches, the bot is
unqualified to rate such a player and they would go into an "elite" or
"pro" catogory. Everyone else is easily rated by the bot. Elite players
could decide their final rankings using a regular swiss type tournament
format perhaps, assuming this group will be relatively small. Additional,
elite players could also be ranked on their performance against "normal"
players, by matches won, average rating of opponent and diversity of
players and their record be made available for scrutiny. Hope this
clarifies things a little. This type of system only really has problems
with the elite catagory of players, for the other 99.9% of us it's fine.

Regards

Bill Hill...

Chris W.

unread,

Feb 7, 1999, 3:00:00 AM2/7/99

to

On Wed, 03 Feb 1999 00:38:11 -0100, ill...@whills.demon.co.uk (WHill)
wrote:

Bill & Gary,

[snip]

>1) yes, I do have records of the match scores of more than a 1000 matches
>and over 3,000 individual games at the last count. They are in notebooks
>and currently being entered into a spreadsheet. I did not however track my
>opponents ratings as we played though.

I too track my matches and have done so since my first day on Fibs
back in 1994. (8781 matches at this time) I track my rating, the
rating of my opponent, match length, and score. I think that in order
to get a real feel for what your true rating is, you have to take a
look at your performance over time. For this excercise, I decided to
go back 4 months. Now I'm no statistics guru but this is how my
rating has fluctuated over that period:

4 month period (10/7/98-2/7/99) - 715 matches played (2980 exp points)

4.17 - average match length
1823.82 - my current rating
1881.59 - my high rating
1735.38 - my low rating
1803.67 - my avg. rating
1790.16 - opp. avg. rating

The result is that I experienced an average of 73.10 points on either
side of my average rating during that period. So based on my data,
100 points on either side seems a bit high but certainly possible.

---
"chrisw" can be reached "at iftech.net"

Kevin Bastian

unread,

Feb 14, 1999, 3:00:00 AM2/14/99

to

Hi folks. Long time, no see. Haven't been here for awhile. Got tired of
sifting through the 90% of the posts dealing with whether or not FIBS,
Jellyfish, etc. cheat or not :-)

As for ratings, thought I'd share some data with you. I started logging my
FIBS matches a year ago, inspired in part by my old buddy (and nemesis!)
chrisw!

Having been a spreadsheet junkie/guru for...hmmm...18 years or so (yikes!),
I've been logging them in an Excel spreadsheet. Have quite an array of
linked graphs and such. The other night, someone asked my opponent and me
how much ratings seem to vary from one's "true" rating, so I checked my
spreadsheet. Added a few lines to check the mean, median, max, min,
standard deviation.

For what it's worth, if it adds to the discussion any, here is the data:

2/27/98-2/13/99 KevinB on FIBS:

1150 matches
4911 experience
4.27 average match length
1658.55 minimum
1792.08 maximum
1724.50 median
1725.82 mean
24.25 standard deviation
18.98 average deviation
70% of the time my rating was within 1 std. deviation of the mean
94% of the time it was within 2 std. deviations
100% of the time it was within 3 (the max was 2.73 std. dev above the mean;
the min was 2.77 below mean)

What does all this mean? Well, first of all, I'm sure many of you know far
far far more than I about statistics, so you tell me! But I will offer some
observations.

First, a bit of background. I've played 21,919 experience on FIBS over the
past 4 years. I'm a "serious" player; i.e., I don't play 99 point matches
that are over in one game, etc. I play nearly all my matches with opponents
who are between 1600 and 1900. I wish I'd logged the ratings of my
opponents at the time of the matches, but I didn't. Generally, I look at
the ready list and invite the highest rated player who meets several
criteria: not a cheater (or a jerk), 1000+ experience, ideally if I've
played them before and know they move quickly, aren't unpleasant, etc.
Anyway, I start at the top. So, I'd guess that the average rating of my
opponents is higher than mine by 25 pts or so, but just guessing.

The other thing is that I'm (unfortunately) pretty well past my learning
curve in this game. The first several years on here, I improved a lot, but
since then, I've been pretty stable. So, I think the above data is a pretty
good indication of what a "stable" (not learning & improving a lot) player
experienced.

I have had some long term upswings and downswings over the past year. I do
NOT attribute them to luck. I'm sure a bit of it might have been luck, but
I don't think much of it was. One of the graphs my spreadsheet system
automatically produces for me is a semi-monthly bar graph of how much I
play. My interest and available time have varied over the past year, and I
found that when I got really psyched about backgammon, and was determined
to get my rating up, two things ALWAYS happened: I played a lot more. My
rating went up. Then, I'd start to lose interest. I'd play less. My rating
would go down. Recently, I got a bit down in the dumps for a few days. Had
trouble being very interested in much of anything. So I played a lot of
FIBS, although I wasn't highly focused. I lost 10 out of 11 matches. And I
was NOT swearing at the dice! For the most part, I was pretty sure I just
wasn't playing that well. After a couple days of that, I woke up feeling
much better. Had my head sorted out I guess. And one of the things I
decided to do was win back my points. I immediately won 10 of my next 11
matches, and was right back where I started from before the slump. Talk
about mind over matter! Now, this may have been accompanied by a bit of bad
luck on the downslope and good luck on the upslope, but by and large, it
was mood, focus, concentration, etc. Certainly, these will be swing more
with some people than others.

Anyway, what my data shows for me, at least over the past year, is that I
spend roughly 95% of the time within +/- 50 points of my "true" FIBS rating
of 1725.

Don't know if anyone is still reading this rambling post. It's after
midnight and I'm too tired to edit it, so here it is for what it's worth.

:-)

Kevin Bastian
KevinB on FIBS

Chris W. <a...@xyz.com> wrote in article
<36bd34c4....@cnews.newsguy.com>...

Paul Ferguson

unread,

Feb 14, 1999, 3:00:00 AM2/14/99

to

Kevin Bastian <kba...@ibm.net> wrote:

> As for ratings, thought I'd share some data with you.
> I started logging my FIBS matches a year ago, inspired

> in part by my old buddy (and nemesis!) chrisw! [...]

Very well said, Kevin. This is a fascinating analysis, the most
detailed I've seen on this newsgroup. It should be required reading for
anyone who posts to this group about "fixed" dice.

Can you post any of graphs to the Web somewhere?

//fergy

--
Paul Ferguson <http://www.best.com/~fergy/>
"Life is not chess but backgammon,
with a throw of the dice at every turn."
-- Steven Pinker, How the Mind Works

WHill

unread,

Feb 17, 1999, 3:00:00 AM2/17/99

to

Hi again Folk!!

Following on from what ChrisW and KevinB have already posted, I decided to
see what information I could dig out of my logs. It's only partially
complete and has plenty of gaps.

Date Rating Experience
1539 563
1534 625
9/3/96 1549 691
1552 707
1557 721
1557 737
1560 775
1562 889
1558 905
1556 1051
1604 1913
1681 5025
1701 5163
1705 5318
22/2/97 1706 5538
26/2/97 1727 5668
2/3/97 1706 5764
8/3/97 1725 5936
11/3/97 1733
15/3/97 1720 6079
17/8/97 1648 6923
8/11/97 1711 7888
24/11/97 1709 8019
30/12/97 1728 8626
16/1/98 1737 8863
20/2/98 1738 9331
23/2/98 1739 9479
25/2/98 1746 9501
4/10/98 1747 10768
5/10/98 1775
17/10/98 1779
31/10/98 1779
6/11/98 1787
14/11/98 1802 11304
16/2/99 1760 12412

Dates are in British DD/MM/YY format

As you can probably see I learned most of my skills at Backgammon playing
on FIBS, especially matchplay and cube action which I hadn't really tried
before. So when I started about October 95 up until late in '96 I was
probably on a steep learning curve, I certainly played a lot of bot matches
during that period. When I reached 1720 in March 97 I felt maybe I was
getting somewhere but a slump back to about 1650 decided me to take a break
from backgammon for almost 5 months. On returning in Aug '97 I made a slow
but steady climb through the 1700s. I pass the 1800 barrier only once( for
about 3mins). The thing is, I thought my rating would be pretty stable by
now, but it looks like it could fluctuate anywhere between 1725 + or - 75
and I wouldn't be able to say I was improving or not? And as I noted at the
start of this thread, other "peoples" ratings can fluctuate a lot more
dramatically than this. Just think about it for a moment, let's say you're
a fairly new backgammon player and you come to FIBS, after your first 400
exp points you might be lucky enough to end up at 1600 say, now what
happens if you take a 200 point slump like the one that happened to
MonteCarlo? you end up in the 1400s not really playing any worse than you
did before! And at this point most people would get fed up with Backgammon
and just leave, I checked up on some of the people I used to play when I
started at FIBS and I would say a good 20% have either been deleted or are
in the 1400s and not actively playing very much. So that's one big reason I
think the rating system isn't very good. Here's one last thought that I
probably mentioned before, how much luck is involved in a backgammon match
between two equally matched opponents?( equally good or equally bad, it
doesn't really matter), if you think about it I'm sure you already know the
answer.

Regards

Bill Hill

P.S. latest stats 908 different opponents played, 2389 matches logged,
3000+ games recorded.