Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Proposal for a New rating list based in human Vs machines

10 views

Skip to first unread message

Fernando Villegas Darroui

unread,

Feb 28, 1997, 3:00:00 AM2/28/97

Hi all:
This is a proposal to Create a Rating List Based in Games Against Human
Beings and I want your opinion. As all of us know about it, SSDF rating
list, although a great and useful effort, is not the best way to rate
chess programs. Ratings emerging from machines against machines or
programs against programs cannot give a real vision to us, human beings,
about how good they are playing against flesh-and-blood opponents. The
list is specially biased against speculative human-like programs,
althought these are precisely the programs that fits better with what
people wants. Sure, we know that a top ranked program in SSDF will do
better than a middle or low ranked one, but apart from that extreme
cases, in the wide range of gradations inside top, inside middle and
inside low categories, differences of 50 or even more points does not
give us more than a glance of what we can expects when WE, not other
programs, face them in the board.
Of course, the problem is how to get something different, accurate and
fast. After all, the human Elo list -in itself not deprived of
criticism- that seems to be a natural part of the scenery of chess
competition is the result of a long historical accumulation of results
throughout the years. Hundreds of thousands of games have been played
from the date when chess authorities begun to use Mr. Elo invention. It
took 30 years for the Elo of each rated player means something at the
point we can predict his performance.
Some time ago a weak effort was made in that direction by USCF
federation with the so called “Action Games” ratings, but it was
performed with only a few stand-alone units and with a very discrete
number of games. Just a handful of them sufficed to give a rating and
sometimes the conditions were not adequate and fair, as the case with
Par Excellence showed in 1986. Even in such poor conditions the
experiment was expensive and the interest to engage in it was very
scarce. The guys of the companies were prepared to do it only if they
believed they were to get a fantastic rating for publicity sake. If not,
in some cases even retired his product before testing was concluded.
Then, what? Of course nobody has the money and the time you would need
to get hundreds of first category, experts and master level player to
play games in enough number. But at the same time, each day at least
hundreds of games are played in all the world by people like us, chess
computers fans. And also almost all that games are lost when the
computer is putted off. And of course between us there is a great
variety of players, from C category to maybe Master levels and maybe
even some IM. So, each month there are thousands of games played in
different conditions of time against all kind of programs in all kind
of computers and by all kind of players that are lost with the exception
of one or two games per head that we save because we won and/or were
specially interesting. Potentially, I think that in six months more than
an adequate number of these games are played and if they were used to
rate programs according to the rate of human opponent, they would be
enough to get a relatively accurate and meaningfull list of strength
ratings. And in this case this would be THE list that matters, the list
where programs would show what they can really do against human
opposition.
How to do it? I suppose that some kind of organization and protocol
should be created more or less in this way:
a) a good number -the most the better- of chess computers fans
previously identidfied -not programmers aloud- should engage in the
compromise to play, if not all, at least most games as possible in
serious conditions, that is, without take back and respecting time
limits in a variety of settings, by example, 5 minutes finish, 30
minutes finish, 40 moves in one hour etc, what would be considered best.
In fact, to do what they already do with his programs, but in more
formal conditions.
B) After completing 10 games in one category -or any other number
previously decided- the member of this organization should calculate the
rating obtained by the program on the ground of his own officially or
approximate rating, according in the last case to solids and goods
reasons. Even more, it could be used the rule to just suppose -let us
say- 1800 Elo rating to any non rated player. As all the programs would
face the same conditions, even this initially capricious number would
be useful to discriminate between them, althought only in the long run.
C) Results of each member, WITH the games in PGN notation, would be
submitted to a number of supervisors previously designated and
dedicated to collect them and do the final calculation. Pre-digered
results would be easy to compute because it would just a task to get an
average. Here and there a test can be done to see if a game is real or
just a joke or something makeshift. The result would be published as the
rating list of the organization and, with time, they surely would show
very accurate data even if in the beginning the system is loose and
inexact. The very mechanism of Elo calculation guarantee that.
Is this too difficult to implement? Am I missing something so big that
because of the sheer size I don’t see?
Please let me Know and let talk about it. Maybe something can be got
for a really not very additional effort. The point is to use the games
that already we play and get sometghing more of them.

Komputer Korner

unread,

Mar 2, 1997, 3:00:00 AM3/2/97

to Fernando Villegas Darroui

Fernando Villegas Darroui wrote:
>
snipped

> C) Results of each member, WITH the games in PGN notation, would be
> submitted to a number of supervisors previously designated and
> dedicated to collect them and do the final calculation. Pre-digered
> results would be easy to compute because it would just a task to get an
> average. Here and there a test can be done to see if a game is real or
> just a joke or something makeshift. The result would be published as the
> rating list of the organization and, with time, they surely would show
> very accurate data even if in the beginning the system is loose and
> inexact. The very mechanism of Elo calculation guarantee that.
> Is this too difficult to implement? Am I missing something so big that
> because of the sheer size I don’t see?
> Please let me Know and let talk about it. Maybe something can be got
> for a really not very additional effort. The point is to use the games
> that already we play and get sometghing more of them.

Eric Hallsworth has been doing this for years, and also Larry Kaufman.
--
Komputer Korner

The inkompetent komputer.

0 new messages