Jouni Uski
It's been reported several times over the past 10-12 years, Ken Thompson,
Hans Berliner ran such tests and came to the same conclusion. Basically,
the difference between a 4 ply search and a 5 ply search is much more
significant than the difference between a 10 ply search and a 11 ply
search.
However, that's holding everything else "constant". If you use increased
speed to do more selective extensions, or more evaluation, then this changes
things, again as reported in Berliners paper.
50 points is a lot. If program X is 50 points better than another
program, it will score 57% vs that program, which is significant.
bruce
Suppose that a top PC program can achieve a 2200 rating at 1K nodes per
second and 2270 at 2K nodes per second (reasonable figures I think),
Then the following table shows what might happen as search speed
increases:
Nodes/Sec Rating
--------------------
1K 2200
2K 2270
4K 2333
8K 2390
16K 2441
32K 2487
64K 2528
128K 2565
256K 2598
512K 2628
1M 2655
2M 2679
4M 2701
8M 2721
16M 2739
32M 2755
64M 2763
Hmm, could be tough to beat Kasparov...
If one makes the assumption that DB (at 250M nodes/sec) has the equivalent
SEARCH capability to such a program running at 64M nodes/sec then
DB would rate 2763 on this scale. My reason for believing that DB
would need more nps is that it uses parallel processing and it's
hardware probably imposes some search inefficiencies.
If the evaluation algorithms are similar in strength then a PC program
searching at 64K nps would be about 235 rating points below DB
and win about 20%.
However, the DB hardware makes it difficult to implement an evaluation
function that is as accurate as that of a general purpose computer.
This could reduce the rating difference such that a top PC program
on a Pentium Pro-200 might be only around 100 points worse than DB.
Of course, my assumptions could be wrong....
John
Several points. Hans Berliner showed that this "rating curve" varies from
program to program, evaluation to evaluation, etc. That is, nps, or any
performance measure can probably be plotted to a curve predicting rating for
*that* program based on search speed... but any changes to that program
blow the curve away. Check out his paper in one of the Advances in Computer
Chess books on HiTech vs LoTech... LoTech was hitech with most of the know-
ledge ripped out. The speed vs performance curves were drastically different,
and were different from the two different results Ken Thompson reported for
similar tests using belle many years ago.
DB hardware does have some shortcomings. The special-purpose chips don't
access a global hash table for example. That would take a 1024-port memory
system for their full-blown system, not something anyone would want to
design. Ditto for the q-search, which appears to be much like what I now do
in Crafty, no checks or whatever. In short, they lose something in the
process of moving algorithm into hardware. However, I can guarantee you that
they are more than 100 points stronger than a program on a P6/200. As I
mentioned, Deep Thought was awarded the Fredkin prize for the first program
to acheive a true GM rating over 24 consecutive games. No PC program has
come close to this. And Deep Blue is only 25-50 times faster at present,
with possibly another 4x if they build all 1024 chips. Hsu reported that in
his testing on-site, he won over 90% against Fritz. That, too, indicates
there's more than a 100 point rating gap. I suspect it's more like 300,
because 200 is a 75-25 % split, which is too low.
Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article
<50quvq$s...@juniper.cis.uab.edu>...
> John Stanback (j...@verinet.com) wrote:
> : My opinion is that the ELO rating for a chess program will
> : be reduced by about 10% for each doubling in search rate.
> :
> : Suppose that a top PC program can achieve a 2200 rating at 1K nodes per
> : second and 2270 at 2K nodes per second (reasonable figures I think),
> : Then the following table shows what might happen as search speed
> : increases:
> :
> : DB would rate 2763 on this scale. My reason for believing that DB
> : would need more nps is that it uses parallel processing and it's
> : hardware probably imposes some search inefficiencies.
>
>
it just struck me as 'funny' - here we are talking about a machine that
searches 50,000,000,000 positions per move during the normal course of a
game and yet ... we have some search 'inefficiences' - and the author is
probably right if we were to compare this to some huge single fictional
'big blue' processor that exists only in my mind - but until then I would
be hard pressed to discuss the 'inefficiences' of DB...
Jouni Uski <Jouni...@nce.nokia.com> wrote in article
<322FBB...@nce.nokia.com>...
>John Stanback (j...@verinet.com) wrote:
>: My opinion is that the ELO rating for a chess program will
>: be reduced by about 10% for each doubling in search rate.
>:
>: Suppose that a top PC program can achieve a 2200 rating at 1K nodes per
>: second and 2270 at 2K nodes per second (reasonable figures I think),
>: Then the following table shows what might happen as search speed
>: increases:
>:
>: Nodes/Sec Rating
>: --------------------
>: 1K 2200
>: 2K 2270
>: 4K 2333
>: 8K 2390
>: 16K 2441
>: 32K 2487
>: 64K 2528
>: 128K 2565
>: 256K 2598
>: 512K 2628
>: 1M 2655
>: 2M 2679
>: 4M 2701
>: 8M 2721
>: 16M 2739
>: 32M 2755
>: 64M 2763
>:
>:
>: Hmm, could be tough to beat Kasparov...
>:
>: If one makes the assumption that DB (at 250M nodes/sec) has the equivalent
>: SEARCH capability to such a program running at 64M nodes/sec then
>: DB would rate 2763 on this scale. My reason for believing that DB
>: would need more nps is that it uses parallel processing and it's
>: hardware probably imposes some search inefficiencies.
From my search, the equation for rating vs ply level is
Elo Rating = XXXX + YYYY * sqrt(ply level)
XXXX = Basic Knowledge (constant for a version of program)
YYYY = Dynamic Knowledge (constant for a version of program)
so yes, the higher the ply level, less it help in the rating,
Louis Geoffroy
The exponentiation is is understood already.
The 'diminishing returns' issue is a question of whether the
advantage gained from getting an extra ply decreases with the ply number.
Ie: You get a good ELO increase from 4 to 5 ply, but probably very
little from 12 to 13 ply and so on.
Chris Whittington
True, although the "equation" varies program to program. In Berliner's
paper, he found that additional depth improved hitech more than lotech,
because all lotech got was more depth, but hitech gave up some depth to
add knowledge. I don't have his paper in front of me, but the rating
increase for lotech depth 9 to 10 was less than the rating increase for
Hitech depth 9 to 10, by a significant (but not huge) amount. Just goes
to show that each program responds differently to different modifications.
Roger Davis
http://www.cs.ualberta.ca/~andreas/Papers/dim.ps.Z
There you will find explanations of diminishing returns, pointers to
previous work and graphs showing those results and our experiments on
the subject as well as the conclusions we draw. Of course, this work is
by no means final, there are still open questions. For those of you who
like all that in a digested form, I will try to give a abstract of
the paper here:
We started this work, because our group here in Alberta found CLEAR
diminishing returns for games similar to chess, such as Othello and
checkers (check our home-page http://www.cs.ualberta.ca/~games for
further information on work about the other games!). The question we
wanted to answer was:
Why can't we see diminishing returns AS CLEARLY (some would say, "if at
all") in chess? In the paper you can find graphs on diminishing returns
for checkers and Othello. What are the differences between chess,
Othello and checkers with respect to diminishing returns?
The literature gives different conclusions on the subject. Whereas Ken
Thompson (as repeated at the conference) interprets his later results as
indication of diminishing returns, the last publication on the subject (that
we are aware of), Peter Mysliwietz's thesis draws the conclusion that there
are no diminishing returns. Those contradictions alone are reason enough to
try to understand the problem better.
ALL the self-play data (the same program plays itself with different
search depths) given in the literature is extremely noisy (see Figure
1 in the paper) and the statistician would reject the hypothesis that
the curves show a negative slope (diminishing returns) with high
confidence, especially because the amount of experimental data is small.
As Robert Hyatt stated already: These curves are a function of the entire
program including search, extensions, knowledge... you change one thing
and the curves change. The question we tried to answer was: How? What
happens if you have a faster machine? Better knowledge?
In short: We found statistically significant evidence of the existence
of diminishing returns, using other measures than winning percentages
(ELO is the same). Using the same data and interpreting it in the
traditional way (ELO or winning percentages) we cannot draw this
conclusion. Furthermore, the worse your knowledge, the more beneficial
the search, the later you will observe diminishing returns. (Berliner's
Hitech/Lowtech results are based on 16 games only and therefore I would
not draw any conclusions from them, especially because one crucial data
point is estimated and another one seems to be an anomaly.) The
implications for DB are sort of clear: If their evaluation function has
limitations because of the hardware implementation, they are more
likely to gain by searching deeper than a program with a lot of
knowledge searching to the same depth (if it could :).
Andreas
--
Andreas Junghanns / "It's not a bug - it's a feature!"
10726 85Ave #102 /
Edmonton/AB T6E 2K8 email: and...@cs.ualberta.ca
Tel/Fax: (403) 437 8098 http://www.cs.ualberta.ca/~andreas
An unanswerable question. In computing theory we'd call this a
"undecidable" issue. We can prove that DB is not better right now
by a simple match, but we can't prove that it never will be. The
big problem at present is we don't have much of an understanding of
how humans play chess, so it's impossible to analyze what we do as
humans, to see if we can clearly program a way that's superior. So
we will continue to debate, develop, debug, and debunk, and one day
we'll have the answer. When? I'm reminded of my old calculus teacher
from years ago, when we were just getting into "limits." He posed the
classic question "you stand 4 feet from the blackboard, and step 1/2 of
the distance to the board. You repeat this over and over. Will you
ever reach the board?" The answer, of course, is yes, because the
limit of the function 4/2 +4/4 + 4/8 + ... + 4/n where n -> infinity
is 4. His answer was the ubiquitous " you'll get there if you try long
enough." :) Ditto for determining if a machine can ever beat Kasparov.
Fortunately I don't think it will take a limit as n -> infinity, I
hope... :)
I hope it does.
Chris Whittington