Estimation of ZARKOV rating

Teun Hendriks

unread,

Dec 8, 1989, 2:42:32 PM12/8/89

to

Apparently, there is a large discussion going on in r.g.chess about
the validity of estimations of ratings of chess computers
At the same time, this discussion seems to be intermixed with some personal
wars.

Rather than contributing to the latter, I will try to give a scientific
supported estimate for the rating in question.

Let's cite the data first:

Article 3743 of rec.games.chess:
Stuart Cracraft writes:

> if you like, make it 14-6 out of 20 games which is what two
> Zarkov matches against Mach 3 have produced.
>
> In a standard ELO table, a 70%-30% result is given as
> a rating difference of 146-153 points.
>
> you, to a small sample, gives 2265 less 150 points. The first
> is Mach 3's CRA rating, and the latter is the ELO table number.

==> 20 games sample
==> 70%-30% result in favor of Mach3.
==> rating Mach3 = 2265
==> Elo table 70%-30% results = 146-153

Article 3744 of rec.games.chess:
Thomas Anantharaman writes:
>
> While working on my PhD Thesis I computed the following table of
> 95% confidence intervals for ELO rating differences. Thus the
> 95% confidence interval for the ELO rating difference based
> on a 20 game match with 10 Wins, 0 Draws, 10 Losses is -158 to +158 ELO
> points. The values in the tables were computed using Monte Carlo
> simulation
> and have a 95% confidence interval of +- 10%.

> Score (Win/Draw/Lose)
> Games (.5/0/.5) (.25/.5/.25) (.75/0/.25)
> 20 158 108 171
> 32 124 85.2 153
> 64 83.2 60.6 100
> 256 42.6 30.2 48.0

(only relevant entries mentioned)

>
> From the table it follows that the CRA's official rating for commercial
> chess computers based on 48 games has a 95% confidence interval spanning
> about 200 rating points (+-100 points).
(only relevant entries mentioned)

==> confidence interval bounds for .75/0/0.25 result, based on sample of
20 games = +/- 171

MY CONCLUSION:
-------------

In accordance with standard estimation theory:

Estimated rating ( mean) = Opponents Rating - ELO table difference

mean: 2265 - (146 , 153) = 2119 - 2112 (Mach3 rating - 70%-30% Elo table)

95% confidence interval, based on Thomas Anantharaman's table for 20 games
with .75/0/.25 score ( closest to 70%-30% score ) is +/- 171 points

This results in:

upper bound: 2119 + 171 = 2290
lower bound 2112 - 171 = 1941

So based on the reported facts we can say with 95% confidence that:

The rating of ZARKOV is between 1941 and 2290.
---------------------------------------------

This the ONLY scientific valid estimate we can make, based on the data
mentioned in the articles above. All other claims based only on this data,
are unsupported guesses.

In specific, the reported range of 1995 - 2348 is incorrect:

Article 3755 of rec.games.chess:
Stuart Cracraft writes:
>
> From Thomas's data we can tentatively say then that since the
> range for Zarkov's performance vs. Mach III was 2153-2190, we take
^^^^^^^^
Note that this range is found, after applying the 50%, 25% "correction".
This "correction" has no scientific (estimation-theoretic) ground.

> the boundary of -158 and subtract from 2153, giving 1995 as a lower bound
> for Zarkov's rating on a 16mhz 80386, and 2190+158 giving 2348 as
> the upper bound the same hardware.

I hope this clears the discussion about ratings and their calculation.
Comments welcome.

-Teun Hendriks

Some final remarks:

1 - Noting that the official CRA rating for chess computers has a
95% confidence interal of approx. +/- 100 points, one may wonder how
accurate the rating of Mach3 is.

2 - To significantly decrease the 95% confidence intervals, at least
hundreds of games should be played, according to Thomas' table.

Stuart Cracraft

unread,

Dec 10, 1989, 2:16:05 PM12/10/89

to

In article <70...@philabs.Philips.Com> t...@philabs.philips.com (Teun Hendriks) writes:
...
First, thankyou for your thoughtful reply. I will comment on the
parts I feel need elaboration. The other parts of your message,
although left unquoted, will imply agreement.

>MY CONCLUSION:
>-------------
>
...

>So based on the reported facts we can say with 95% confidence that:
>
> The rating of ZARKOV is between 1941 and 2290.
> ---------------------------------------------
>

>> From Thomas's data we can tentatively say then that since the
>> range for Zarkov's performance vs. Mach III was 2153-2190, we take
> ^^^^^^^^
>Note that this range is found, after applying the 50%, 25% "correction".
>This "correction" has no scientific (estimation-theoretic) ground.
>

Maybe, but it has considerable empirical ground.

According to Larry Kaufman, in the latest (vol 1 no 1) issue of
Computer Chess Quarterly,

"Allowing for the tendency of computer vs. computer results
to overstate rating differences by around a 4 to 3
ratio..."

>
>Some final remarks:
>
>1 - Noting that the official CRA rating for chess computers has a
> 95% confidence interal of approx. +/- 100 points, one may wonder how
> accurate the rating of Mach3 is.
>

The latest rating list in the Quarterly suggests that Fidelity Mach III
is 2215 (CCR tests), 2220 (English tests) at normal time control (40/2).
It is farily accepted that Mach III got lucky and is 45-50 points over-rated.

>2 - To significantly decrease the 95% confidence intervals, at least
> hundreds of games should be played, according to Thomas' table.

Perhaps thousands. David Kittinger indicates that it is almost a
"crap shoot" when playing computers vs. each other for brief
matches. Brief should be thought of as anything below 20 games.

In particular, the results of 4 or 5 round ACM and World Micro
and World Computer matches are literally meaningless (except for
the winners).

--Stuart

Greg King

unread,

Dec 18, 1989, 7:48:21 PM12/18/89

to

The discussion about the uncertainty in Zarkov's rating based
on a 30% winning percentage against Mach3 in a 20-game contest
prompted me to write a program to figure out rating differences
and their uncertainties in a general way.

Here's the approach I used. First some definitions of terms:

x = the probability that Player1 will beat Player2 in any one game;
xapp = the apparent value of x, based on an N-game contest;
xtrue = the true value of x, based on an infinite-game contest;
xlo = the lower bound of xtrue;
xhi = the upper bound of xtrue;
dR_12 = rating of Player1 relative to Player2 (i.e. R_1 - R_2).

The exact value of xtrue cannot be found, but we can find xlo and xhi
such that the inequality:

xlo < xtrue < xhi

is true with a given amount of confidence (e.g. 95%). If x is the
probability that Player1 will beat Player2 in any one game, then:

P(N,m,x) = C * x^{m} * (1-x)^{N-m}

is the probability that Player1 will win m of N games with Player2.
Here x is the only variable, since N and m are taken as constants.
The program calculates the function P(N,m,x) for the complete range
of possible values of x (0 <= x <= 1), and the normalization constant
C is chosen to satisfy:

1 = int_{0}^{1} dx P(N,m,x).

Not surprisingly, the x for which P(N,m,x) is at its maximum value
is just: x = xapp = m/N (xapp is the most probable value of x).

xlo and xhi are determined by evaluating the integrals:

0.025 = int_{0}^{xlo} dx P(N,m,x)

and

0.025 = int_{xhi}^{1} dx P(N,m,x)

Thus we have:

0.95 = int_{xlo}^{xhi} dx P(N,m,x)

The Zarkov vs. Mach3 match (m=6, N=20) yields the following results:

xapp = 0.300
xlo = 0.146
xhi = 0.522

A winning percentage x can be converted to a rating difference dR_12
with the relation:

dR_12 = 400 * log_10(x/(1-x))

Plugging xapp into this equation tells us that the most probable
rating difference is dR_12 = -147. Plugging xlo and xhi into the
equation tells us that -307 < dR_12 < 15 is true with 95% confidence.
If Mach3 is rated at 2265, then the most probable rating for Zarkov
is 2118, and the 95% confidence interval is 1958 to 2280.

If you are interested in the details of the program, contact me
via e-mail.

--Greg King
gk...@kronos.usc.edu or ki...@native.usc.edu