Rather than contributing to the latter, I will try to give a scientific
supported estimate for the rating in question.
Let's cite the data first:
Article 3743 of rec.games.chess:
Stuart Cracraft writes:
> if you like, make it 14-6 out of 20 games which is what two
> Zarkov matches against Mach 3 have produced.
>
> In a standard ELO table, a 70%-30% result is given as
> a rating difference of 146-153 points.
>
> you, to a small sample, gives 2265 less 150 points. The first
> is Mach 3's CRA rating, and the latter is the ELO table number.
==> 20 games sample
==> 70%-30% result in favor of Mach3.
==> rating Mach3 = 2265
==> Elo table 70%-30% results = 146-153
Article 3744 of rec.games.chess:
Thomas Anantharaman writes:
>
> While working on my PhD Thesis I computed the following table of
> 95% confidence intervals for ELO rating differences. Thus the
> 95% confidence interval for the ELO rating difference based
> on a 20 game match with 10 Wins, 0 Draws, 10 Losses is -158 to +158 ELO
> points. The values in the tables were computed using Monte Carlo
> simulation
> and have a 95% confidence interval of +- 10%.
> Score (Win/Draw/Lose)
> Games (.5/0/.5) (.25/.5/.25) (.75/0/.25)
> 20 158 108 171
> 32 124 85.2 153
> 64 83.2 60.6 100
> 256 42.6 30.2 48.0
(only relevant entries mentioned)
>
> From the table it follows that the CRA's official rating for commercial
> chess computers based on 48 games has a 95% confidence interval spanning
> about 200 rating points (+-100 points).
(only relevant entries mentioned)
==> confidence interval bounds for .75/0/0.25 result, based on sample of
20 games = +/- 171
MY CONCLUSION:
-------------
In accordance with standard estimation theory:
Estimated rating ( mean) = Opponents Rating - ELO table difference
mean: 2265 - (146 , 153) = 2119 - 2112 (Mach3 rating - 70%-30% Elo table)
95% confidence interval, based on Thomas Anantharaman's table for 20 games
with .75/0/.25 score ( closest to 70%-30% score ) is +/- 171 points
This results in:
upper bound: 2119 + 171 = 2290
lower bound 2112 - 171 = 1941
So based on the reported facts we can say with 95% confidence that:
The rating of ZARKOV is between 1941 and 2290.
---------------------------------------------
This the ONLY scientific valid estimate we can make, based on the data
mentioned in the articles above. All other claims based only on this data,
are unsupported guesses.
In specific, the reported range of 1995 - 2348 is incorrect:
Article 3755 of rec.games.chess:
Stuart Cracraft writes:
>
> From Thomas's data we can tentatively say then that since the
> range for Zarkov's performance vs. Mach III was 2153-2190, we take
^^^^^^^^
Note that this range is found, after applying the 50%, 25% "correction".
This "correction" has no scientific (estimation-theoretic) ground.
> the boundary of -158 and subtract from 2153, giving 1995 as a lower bound
> for Zarkov's rating on a 16mhz 80386, and 2190+158 giving 2348 as
> the upper bound the same hardware.
I hope this clears the discussion about ratings and their calculation.
Comments welcome.
-Teun Hendriks
Some final remarks:
1 - Noting that the official CRA rating for chess computers has a
95% confidence interal of approx. +/- 100 points, one may wonder how
accurate the rating of Mach3 is.
2 - To significantly decrease the 95% confidence intervals, at least
hundreds of games should be played, according to Thomas' table.
>MY CONCLUSION:
>-------------
>
...
>So based on the reported facts we can say with 95% confidence that:
>
> The rating of ZARKOV is between 1941 and 2290.
> ---------------------------------------------
>
>> From Thomas's data we can tentatively say then that since the
>> range for Zarkov's performance vs. Mach III was 2153-2190, we take
> ^^^^^^^^
>Note that this range is found, after applying the 50%, 25% "correction".
>This "correction" has no scientific (estimation-theoretic) ground.
>
Maybe, but it has considerable empirical ground.
According to Larry Kaufman, in the latest (vol 1 no 1) issue of
Computer Chess Quarterly,
"Allowing for the tendency of computer vs. computer results
to overstate rating differences by around a 4 to 3
ratio..."
>
>Some final remarks:
>
>1 - Noting that the official CRA rating for chess computers has a
> 95% confidence interal of approx. +/- 100 points, one may wonder how
> accurate the rating of Mach3 is.
>
The latest rating list in the Quarterly suggests that Fidelity Mach III
is 2215 (CCR tests), 2220 (English tests) at normal time control (40/2).
It is farily accepted that Mach III got lucky and is 45-50 points over-rated.
>2 - To significantly decrease the 95% confidence intervals, at least
> hundreds of games should be played, according to Thomas' table.
Perhaps thousands. David Kittinger indicates that it is almost a
"crap shoot" when playing computers vs. each other for brief
matches. Brief should be thought of as anything below 20 games.
In particular, the results of 4 or 5 round ACM and World Micro
and World Computer matches are literally meaningless (except for
the winners).
--Stuart
Here's the approach I used. First some definitions of terms:
x = the probability that Player1 will beat Player2 in any one game;
xapp = the apparent value of x, based on an N-game contest;
xtrue = the true value of x, based on an infinite-game contest;
xlo = the lower bound of xtrue;
xhi = the upper bound of xtrue;
dR_12 = rating of Player1 relative to Player2 (i.e. R_1 - R_2).
The exact value of xtrue cannot be found, but we can find xlo and xhi
such that the inequality:
xlo < xtrue < xhi
is true with a given amount of confidence (e.g. 95%). If x is the
probability that Player1 will beat Player2 in any one game, then:
P(N,m,x) = C * x^{m} * (1-x)^{N-m}
is the probability that Player1 will win m of N games with Player2.
Here x is the only variable, since N and m are taken as constants.
The program calculates the function P(N,m,x) for the complete range
of possible values of x (0 <= x <= 1), and the normalization constant
C is chosen to satisfy:
1 = int_{0}^{1} dx P(N,m,x).
Not surprisingly, the x for which P(N,m,x) is at its maximum value
is just: x = xapp = m/N (xapp is the most probable value of x).
xlo and xhi are determined by evaluating the integrals:
0.025 = int_{0}^{xlo} dx P(N,m,x)
and
0.025 = int_{xhi}^{1} dx P(N,m,x)
Thus we have:
0.95 = int_{xlo}^{xhi} dx P(N,m,x)
The Zarkov vs. Mach3 match (m=6, N=20) yields the following results:
xapp = 0.300
xlo = 0.146
xhi = 0.522
A winning percentage x can be converted to a rating difference dR_12
with the relation:
dR_12 = 400 * log_10(x/(1-x))
Plugging xapp into this equation tells us that the most probable
rating difference is dR_12 = -147. Plugging xlo and xhi into the
equation tells us that -307 < dR_12 < 15 is true with 95% confidence.
If Mach3 is rated at 2265, then the most probable rating for Zarkov
is 2118, and the 95% confidence interval is 1958 to 2280.
If you are interested in the details of the program, contact me
via e-mail.
--Greg King
gk...@kronos.usc.edu or ki...@native.usc.edu