Relation between field size and ratings spread?

Daniel Murphy

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

It seems commonsensical that the more players within a ratings system,
the larger the spread will be tend to be from top to bottom.

For instance, oldtime FIBSters remember when 1800 really meant
something :)

Currently,

Rating Currently Highest Lowest
System Rated Players (difference from start rating)

Norway: 126 +256 -300
BIBA: 194 +288 -396
Sweden: 369 +356 -192
Denmark: 1042 +264 -373
GamesGrid: 2402 +543 -550
FIBS: 6769 +768* -1170**
Netgammon ? ? ?

How can this effect be quantified?

*2nd highest is+580
**2nd lowest is -845

Nis Jørgensen

unread,

Jul 10, 2000, 3:00:00 AM7/10/00

to

On Fri, 30 Jun 2000 06:22:17 GMT, rac...@best.com (Daniel
Murphy) wrote:

>It seems commonsensical that the more players within a ratings system,
>the larger the spread will be tend to be from top to bottom.

However commensensical it might be, it is not in line with the
workings of the rating system. This system is designed to
guarantee, that a difference of 200 points is a difference of 200
points - no matter the number of players. The numbers you give
for the internet servers are not really interesting - since the
lowest ranked players are certainly (ill-programmed) bots. A more
ionteresting number would be the average deviation of the
rankings. Please note, that an internet server, and probably also
a large national system like the Danish one, will probably
attract more weak players.

>For instance, oldtime FIBSters remember when 1800 really meant
>something :)

This has to do with inflation IMHO (the average rating rising),
not with the rating differences.

Don Hanlen

unread,

Jul 10, 2000, 3:00:00 AM7/10/00

to

Say Nis, how do you explain NIHILST's manipulation of the ratings on FIBS?
He plays 1 pointers against low rated players, and has pushed his ratings
to the top. Is there a way to create a ratings formula that won't let
this happen?

--
don

Gary Wong

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

Nis Jørgensen <n...@dkik.dk> writes:
> On Fri, 30 Jun 2000 06:22:17 GMT, rac...@best.com (Daniel Murphy) wrote:
> >It seems commonsensical that the more players within a ratings system,
> >the larger the spread will be tend to be from top to bottom.
>
> However commensensical it might be, it is not in line with the
> workings of the rating system. This system is designed to
> guarantee, that a difference of 200 points is a difference of 200
> points - no matter the number of players.

Well, I think you're both right here. Yes, the expected difference
between two players should (by and large) be equivalent regardless of
the number of players in the system, but it is perfectly normal to
find larger differences between the extremes with larger samples.
(After all, if you rate 10 randomly selected backgammon players,
chances are the best player of the 10 will be somewhat better than the
worst, but not by a huge amount. But if you rate every player in the
world, you'll be able to measure the difference between the world
champ and somebody who barely knows the rules, which we would expect
to be enormous.)

> The numbers you give for the internet servers are not really
> interesting - since the lowest ranked players are certainly
> (ill-programmed) bots. A more ionteresting number would be the average
> deviation of the rankings.

That's quite possibly true, although it depends what you mean by
"interesting" :-). A better metric for the "spread" of a distribution
would be the inter-quartile range (the difference between the 25th and
75th percentiles). If we are allowed to assume that the populations
we're sampling from are equivalent (e.g. that FIBS does not attract a
different type of player than those measured in the Norweigian
system), then the expected inter-quartile ranges between each rating
system ought to be the same. Of course this assumption is unlikely to
be reasonable in practice (FIBS attracts all kinds of players from the
casual to world-class professionals, whereas national ratings mostly
consist of regular tournament players; the tournament players are more
likely to be closely matched than the FIBS ones). But the important
thing about the inter-quartile range is that its expectation is
independent of sample size.

> >For instance, oldtime FIBSters remember when 1800 really meant
> >something :)

Well, I think it still means more or less the same thing, depending on
how you interpret it; a rating of 1800 today means you are in the top
6% (or so) of FIBS players. When there were only 300 players, that would
put you in the top 20; now that there are going on 7000, it's only enough
to make it into the top 400.

> This has to do with inflation IMHO (the average rating rising),
> not with the rating differences.

Actually I believe the effect of inflation is rather small compared to the
other factors. The median FIBS rating at the moment is only 1528, after
8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
to me!

To get back to the original question ("How can this effect be
quantified?"), there has surely been plenty of work on the expected
maxima of samples of various distributions. I'm at work at the moment
and don't have any references handy, so I cheated and made a quick
simulation which appears to show that the expected deviation of the
maximum of n samples from a normal distribution appears to grow
slightly less than proportionally with log(n). I plotted a graph of
this expectation and superimposed Daniel's data on it; I had to assume
that backgammon ratings are normally distributed with std. dev. 150
points. I have no idea whether this assumption is reasonable or not;
in practice the FIBS/GG standard deviations are likely to be higher
than the national ratings, because they include a wider variety of
players, as described above. (Daniel, do you still have your original
samples available? It might be interesting to compute the inter-quartile
ranges and standard deviations to see how much they vary between pools
of players.)

The graph is available in PostScript form at:

http://www.cs.arizona.edu/~gary/backgammon/spread.ps

for those interested.

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

Nis Jørgensen

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

On 11 Jul 2000 12:27:00 -0700, Gary Wong <ga...@cs.arizona.edu>
wrote:

>> This has to do with inflation IMHO (the average rating rising),
>> not with the rating differences.
>
>Actually I believe the effect of inflation is rather small compared to the
>other factors. The median FIBS rating at the moment is only 1528, after
>8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
>to me!

Hmmm - I saw that too. Perhaps the playing strength of players
signing up has decreased - that could give inflation in ratings,
without increasing the average. Or perhaps it is just as you say
- there is no real inflation - only 1800 doesn't make you
"someone".

Pip_Panther

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

>"Don Hanlen" <dha...@oneworld.owt.com> wrote in message
news:8kc0kc$423$1...@news.owt.com...
Snip

>
>Is there a way to create a ratings formula that won't let this happen?
>

Brings up a good point about the formulas used. Most of it makes sense to me
but there is one number in it that I wonder about. Maybe one of the old
timers will know.

Given:
The value of the match = n
The absolute value of the difference between the two players ratings = D
The probability of the lower rated player winning = U
The probability of the higher rated player winning = 1-U
The formulas for ratings change feed directly upon the probabilities
computed from above.

U is computed by this formula which then feeds to the ratings change
calculations

1/(10^(D*SQRT(n)/2000)+1)

Where did the "2000" come from?

When any other number is used, the ratings change for two equally rated
players is STILL the same. A match of 1 point = 2 ratings points. However as
the difference between two players increases, the higher the number, the
closer the probabilities are of win/loss. This translates to lower winning
probability for the higher rated player and thus more points for winning and
less points deducted for losing. Conversely, as the number is lowered, the
"curve" flattens. The closer the rated players are, the less of an affect
there is.

For the sake of simplicity, below are some calcs between players separated
by 500 points. It's the same whether it is 1000 vs 500, or 2500 vs 2000.

With a value of 2000, if players are 1900 vs 1400
Probability = 64.01%/35.99% Ratings change for 1900 player: (Win/Loss) =
1.440/-2.560

When using 3000:
Probability = 59.48%/40.52% Ratings change for 1900 player: (Win/Loss) =
1.621/-2.379

Likewise the lower the number the smaller the spread. Using 1000 instead of
2000 it calcs like so:
Probability = 75.97%/24.03% Ratings change for 1900 player: (Win/Loss) =
0.961/-3.039

As long as player's real-life winning percentages exceed the computed
probabilities their rating WILL go up with time.

Even if all of this proves a change is needed, it does not necessarily mean
FIBS (And the rest to the BG community) should change anything. However:

Does a "flattened" probability scale more accurately reflect the real world?
If so, are there distributions and data that provide real world input to
what the number should be?
Also if so, what are the impressions and opinions of what it should be,
based upon experience?
Here is another way to look at it. How many one point games out of 1000
would a 1450 rated player beat Jellyfish? I doubt it would be 360. Maybe we
could set up two bots with lots of games (some of them have tens of
thousands of games) that are rated differently by 500 points and let them go
at it in an unrated 1,000 point money game?

Daniel Murphy

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

On 11 Jul 2000 12:27:00 -0700, Gary Wong <ga...@cs.arizona.edu> wrote:

>Nis Jørgensen <n...@dkik.dk> writes:
>> The numbers you give for the internet servers are not really
>> interesting - since the lowest ranked players are certainly

>> (ill-programmed) bots. A more interesting number would be the average
>> deviation of the rankings.

Server rating extremes on either end are subject to uncompetitive
manipulation. The lowest rating on FIBS is 649.78, the 2nd lowest
701.50. The 3rd lowest (775.72) is most definitely a bonafide human.
The 4th lowest (also human) boasts a substantially more respectable
rating of 870.16. All the lowest rated GamesGrid players appear to be
bonafide human players.

See below, where the statistics given include exclusion of 1% of
players at each end of the ratings lists.

>Actually I believe the effect of inflation is rather small compared to the
>other factors. The median FIBS rating at the moment is only 1528, after
>8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
>to me!

Agreed, and an aside: it's been mentioned in other discussions that
average, not median rating, is a better indication of ratings
inflation. Danish median is 1502.86, the average 1516.6. Norway median
is 1526.50, average is 1523.13. Calculating averages for other systems
is beyond my endurance for tedium.

>To get back to the original question ("How can this effect be
>quantified?"), there has surely been plenty of work on the expected
>maxima of samples of various distributions. I'm at work at the moment
>and don't have any references handy, so I cheated and made a quick
>simulation which appears to show that the expected deviation of the
>maximum of n samples from a normal distribution appears to grow
>slightly less than proportionally with log(n). I plotted a graph of
>this expectation and superimposed Daniel's data on it; I had to assume
>that backgammon ratings are normally distributed with std. dev. 150
>points. I have no idea whether this assumption is reasonable or not;
>in practice the FIBS/GG standard deviations are likely to be higher
>than the national ratings, because they include a wider variety of
>players, as described above. (Daniel, do you still have your original
>samples available? It might be interesting to compute the inter-quartile
>ranges and standard deviations to see how much they vary between pools
>of players.)

Can you use these statistics, Gary?

Group # #1 75%-ile median 25%-ile lowest
FIBS 6683 2273.74 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 701.50
GG 2418 2068.09 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 957.58
DKk 992 1764.93 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1126.31
SEn 388 1879.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1173.00
BIBA 217 1781.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1102.00
NO 130 1760.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1200.00

Group # 1%-ile 75%-ile median 25%-ile 99%-ile
FIBS 6683 1911.42 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 1136.30
GG 2418 1960.75 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 1234.71
DBgF 992 1696.22 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1373.77
SBgF 388 1827.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1266.00
BIBA 217 1772.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1187.00
NBgF 130 1741.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1272.00

FIBS: excludes unknown # of players with less than 50 TMP, and lowest
ranked "player."
GG: excludes the non-player at bottom of list
DK: excludes 51 members with 0 TMP
Sweden: includes all rated players (qualifications unknown)
BIBA: includes all rated players (qualifications unknown)
Norway: includes all listed players (i.e., minimum 15 matches and and
least 1 match played in last year).

Danish system start point is 1000, not 1500; ratings adjusted by +500
for comparison.

Gary Wong

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

rac...@best.com (Daniel Murphy) writes:
> On 11 Jul 2000 12:27:00 -0700, Gary Wong <ga...@cs.arizona.edu> wrote:
> >Actually I believe the effect of inflation is rather small compared to the
> >other factors. The median FIBS rating at the moment is only 1528, after
> >8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
> >to me!
>
> Agreed, and an aside: it's been mentioned in other discussions that
> average, not median rating, is a better indication of ratings
> inflation.

True -- I tried searching for the articles about inflation that had
been posted here in the past, but unfortunately now that we have only
a "precision buying service" instead of Deja News, things like that
aren't easy to find.

Luckily we still have Tom Keith's r.g.b. archive -- one relevant article
is:

http://www.bkgm.com/rgb/rgb.cgi?view+416

which does seem to indicate that a FIBS rating of 1800 has been reasonably
consistent at marking the 95th percentile in 1995, 1997 and 2000.

One other snippet -- Michael Klein's latest FIBS Ratings Report shows
the mean FIBS rating to be 1534, which is surprisingly close to the median.

Thanks for those data! (I believe that the "-69.50" figure in the FIBS
25%-ile should be "-98.14".)

A few random observations:

- The inter-quartile ranges of the online servers do seem to be
significantly higher than the national ratings (~210 vs. ~170), which
supports the hypothesis that the Internet servers attract a more
varied range of players than real-life tournaments.

The Danish range is much smaller than the others, though; I have no
idea why this would be the case (perhaps the results include a large
number of relatively new players? The other descriptions make it
sound as if they do or might exclude inexperienced players.)

- The Danish, Swedish and British medians show virtually no sign of
inflation. I suspect this may be because they "include all rated
players": the main cause of inflation is that weak players are more
likely to leave the system than strong players, and so weak ratings
are gradually deleted over time which effectively raises whatever is
left behind. The Norweigian ratings (which require at least 1 match
played in the last year) show comparable inflation to FIBS.

GamesGrid shows the most inflation of all. This might well be because
the financial cost increases the tendency of weak players to leave. I
understand that GG have added points to all players' ratings in the
past when a server crash lost the results of some games (I'm not sure
which is more disturbing -- that somebody thought this was a good idea,
or that users were apparently pacified by it!) which would certainly
add to this effect.

- The results show that the distributions tend to be skewed slightly to
the right (the upper quartile is larger than the lower quartile). One
explanation for this might be that weak players tend to improve faster
than strong players (hopefully nobody's getting significantly worse!)
which could shrink the left-hand tail somewhat.

Nis Jørgensen

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

On Tue, 11 Jul 2000 16:28:24 -0500, "Pip_Panther"
<Pip_noPa...@my-deja.com> wrote:

[snip]

>1/(10^(D*SQRT(n)/2000)+1)
>
>Where did the "2000" come from?

It is deciding the scaling of the rating system, ie how much a
certain difference means in playing strength. If you replace it
with 1000, two players who in the old rating system are 200
apart, will in the new be 100 points apart. You will see, that
they then have exactly the same expected outcomes as in the old
system.

The number is not totally irrelevant, though, as the scaling must
be balanced with the number of points lossed and gained in single
matches. If the scaling is to low, people will be elevatoring. If
the scaling is too high, it takes too long time to reach your
true rating.

Nis Jørgensen

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

On 11 Jul 2000 18:17:48 -0700, Gary Wong <ga...@cs.arizona.edu>
wrote:

> - The Danish, Swedish and British medians show virtually no sign of

> inflation. I suspect this may be because they "include all rated
> players": the main cause of inflation is that weak players are more
> likely to leave the system than strong players, and so weak ratings
> are gradually deleted over time which effectively raises whatever is
> left behind. The Norweigian ratings (which require at least 1 match
> played in the last year) show comparable inflation to FIBS.

I am almost sure the Danish numbers are only of paying members
(correct me if I'm wrong, Daniel). I suspect the Norwegians, like
FIBS, use a gearing system. This way, when a strong player enters
at 1500, he injects rating points into the system.

Daniel Murphy

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

The Danish rating list includes only current, paid-up members. Members
who neglect to renew their membership are dropped from the rankings.
Ditto for GamesGrid and NBgF and, I assume, for BIBA and SBgF. Not
only because seeing one's name in the ratings list is an incentive to
remain a member, but because (as is the case in Denmark) membership in
the national federation is mandatory for residents to participate in
Open or Intermediate flights of almost all tournaments.

But several factors do limit inflation in the national ratings. No one
can drop out and then rejoin under a different identify. No ever ever
gets his rating "re-set" to par. The system never awards all players X
points. At least in Denmark, everyone new to the system starts out at
par regardless of real or estimated ability. And my impression is that
in Denmark, for example, there's a small but steady outflow of
higher-ranked players every year, as people move or give up real life
play for whatever reason -- I imagine this effect isn't so notable on
the online servers. Nis mentions another reason -- unlike all the
online systems, the Danish system has no accelerated ratings boost for
low-experience players. I believe he's correct that the Norwegian
system has adopted the exact FIBS formula, including the "boost" for
players with less than 400 TMP.

Daniel Murphy

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

On 11 Jul 2000 18:17:48 -0700, Gary Wong <ga...@cs.arizona.edu> wrote:

>True -- I tried searching for the articles about inflation that had
>been posted here in the past, but unfortunately now that we have only
>a "precision buying service" instead of Deja News, things like that
>aren't easy to find.

The DejaNews newsgroup archive is still there, it's just not mentioned
on the deja.com homepage front page. Who know why ... I think if you
click on the link to their FAQs you can find your way to their
archive.

Gary Wong

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

rac...@best.com (Daniel Murphy) writes:
> The DejaNews newsgroup archive is still there, it's just not mentioned
> on the deja.com homepage front page. Who know why ... I think if you
> click on the link to their FAQs you can find your way to their
> archive.

Well, it's still there, but old messages aren't. The earliest available
message from r.g.b. is Henrik Jensen's "Gammon?" from May 15th last year.

Bits and pieces of the archive have been coming and going for the last
two months. The last I heard was a notice at:

http://www.deja.com/=dnc/info/site_move.shtml

which reads:

Old Usenet messages - Between May 20 and May 26, messages posted 2
weeks to a year ago will not be available. Starting May 4, many
messages posted over two years ago will not be accessible on a
temporary basis, and after May 15, all messages posted over a year ago
will not be accessible on a temporary basis. We will be taking this
opportunity to reconfigure the service that provides messages posted
prior to May 1999. Therefore, these messages will not be accessible on
the site for some time, possibly a few months. Have no fear: We're
committed to bringing these messages back online as soon as possible.

Gary Wong

unread,

Jul 12, 2000, 3:00:00 AM7/12/00

to

rac...@best.com (Daniel Murphy) writes:
> On Wed, 12 Jul 2000 11:03:04 +0200, Nis Jørgensen <n...@dkik.dk> wrote:
> >I am almost sure the Danish numbers are only of paying members
> >(correct me if I'm wrong, Daniel). I suspect the Norwegians, like
> >FIBS, use a gearing system. This way, when a strong player enters
> >at 1500, he injects rating points into the system.
>
> The Danish rating list includes only current, paid-up members. Members
> who neglect to renew their membership are dropped from the rankings.
> Ditto for GamesGrid and NBgF and, I assume, for BIBA and SBgF.

Thanks! Those explanations make a lot of sense, and appear to correspond
well to the data Daniel compiled. If the Danish system makes equal rating
changes regardless of experience, that could well explain why the observed
range is smaller there than with the other systems.

Mary Hickey

unread,

Jul 13, 2000, 3:00:00 AM7/13/00

to

Pip_Panther wrote:

> Snip

>
>
> Does a "flattened" probability scale more accurately reflect the real world?

What is the real world, LOL? But I think I know what you meant...

>
> If so, are there distributions and data that provide real world input to
> what the number should be?
> Also if so, what are the impressions and opinions of what it should be,
> based upon experience?

It depends on whether we are talking bot vs. bot, human vs. human, or human vs.
bot--and in this last group, it depends on who is higher-rated, the human or the
bot.

>
> Here is another way to look at it. How many one point games out of 1000
> would a 1450 rated player beat Jellyfish? I doubt it would be 360.

If by JellyFish, you mean 3.0 or 3.5, Level 7, the games sure would be ugly to
watch.

> Maybe we
> could set up two bots with lots of games (some of them have tens of
> thousands of games) that are rated differently by 500 points and let them go
> at it in an unrated 1,000 point money game?

Low-rated bots tend to play only "one-pointers", but you could still have them
play 1000 games. However, the score at the end of 1000 games would be different
when you have a 1900-rated human vs. a 1400-rated bot, as opposed to a
1900-rated bot vs. a 1400-rated bot. I believe the 1900-rated human would do
better than the 1900-rated bot, because he/she would notice what the bot's
weakest areas are, and intentionally steer the games in that direction. As yet,
there is no bot (that I know of, anyway) that changes its plays based on what it
sees its opponent doing wrong.

For example, Costello (a bot on FIBS, rating class 1500-1600) will run from a
defensive anchor, especially a 3-anchor, long before it should. Therefore, if it
would normally be a close call as to whether to hit loose to prevent his getting
the anchor, you might as well punt and let him make it. He won't know what to
do with it once he has it anyway.

Costello is also rather clueless in a backgame, as are even some of the
higher-rated bots. If you are playing one-pointers (which are all it plays
anyway), if you fall behind in the race early on, you might as well go all-out
and play from the back, even when against a human a bit more moderation would be
called for. It has been my experience that the bot will cheerfully help you
solve your timing problems and also dump checkers behind you instead of clearing
forward points ASAP, causing it to leave shots later that could have been
avoided.

The above are just two examples. Put the cube in the picture, and you have even
more angles to discover and exploit. These angles can be found against any
player, myself included. The difference is that a human will look for these
things, but a bot won't because it doesn't know how. This means that a
technically superior bot may be ranked below humans that go this extra mile.

One more thing: Even if you did decide to have the bots play each other, you
would need to establish the true rating for each bot based on some consistent
method. I have seen one FIBS bot, MonteCarlo, rated below 1800 and I've also
seen it in the higher half of the 1900s! I haven't watched the others as much,
but I am sure that they, too, cover a lot of ground as they rack up their
5-figure experience levels.

mamabear

Pip_Panther

unread,

Jul 13, 2000, 3:00:00 AM7/13/00

to

"Mary Hickey" <mamab...@att.net> wrote in message
news:396D72BF...@att.net...

>
>
> What is the real world, LOL? But I think I know what you meant...
>

You'd have to ask my other personalities, the trouble is hey all cheat each
other for who gets the final say.

>
> > Here is another way to look at it. How many one point games out of 1000
> > would a 1450 rated player beat Jellyfish? I doubt it would be 360.
>
> If by JellyFish, you mean 3.0 or 3.5, Level 7, the games sure would be
ugly to
> watch.
>

I was thinking of the one I saw that was once rated 1950 or so. Was that 3.0
lvl5? I don't remember.

You are right about a test of bot vs bot or even human vs bot. The only way
to tell would be to gather bulk data from actual match statistics. For
example, how often do matches between players separated by x points adhere
to the computed probabilities? As in the example I gave, what is the
dispersion of win/loss between players separated by 500 points and does it
match the formula? 450? 300? I think it is evident there is "elevatoring"
going on. If the data showed that the greater the point spread between
players the greater the deviation from the formula then it would be proof
and the 2000 "seed" number could be adjusted up or down.

It doesn't mean that the whole thing makes it any less fun to play, ratings
are fun to have but it's just not that big of a deal to all but a lot of
people. Some don't even care at all. And for those that do care, adjusting
the formula that has been used so widely for so long would not be easily
accepted. The cure could be worse than the problem.

Nis Jørgensen

unread,

Jul 13, 2000, 3:00:00 AM7/13/00

to

On Thu, 13 Jul 2000 12:31:08 -0500, "Pip_Panther"
<Pip_noPa...@my-deja.com> wrote:

>You are right about a test of bot vs bot or even human vs bot. The only way
>to tell would be to gather bulk data from actual match statistics. For
>example, how often do matches between players separated by x points adhere
>to the computed probabilities?

The question is not "how often do they match", but "how well do
they match".

> As in the example I gave, what is the
>dispersion of win/loss between players separated by 500 points and does it
>match the formula? 450? 300? I think it is evident there is "elevatoring"
>going on. If the data showed that the greater the point spread between
>players the greater the deviation from the formula then it would be proof
>and the 2000 "seed" number could be adjusted up or down.

As I explained, ot tried to, the number 2000 is not the important
part here. We would have to change either the outer formula (the
1/(10^(something) + 1) part, or the formula for "something",
which basically involves (rating diference)/2000 and matchlength.

I think DBgF has data material lying around, which would provide
useful in this matter.