This was the article I was thinking of, per my earlier post, not the
Jeff Sonas article.
RL
All this study shows is that Capablanca played moves that
agrees with a relatively weak chess engine.
J.Lohner
Lohner, you're a 1300 rated IMBECILE! You're in no position to call a
program that plays 2600 elo "weak". You will never in your life even BEGIN
to understand the rudiments of any of Capablanca's games.
JMR
That article is, frankly, junk: I'm surprised it was ever accepted for
an academic conference.
They haven't determined the strongest champion of all time: they've
determined which World Champion plays most like a crippled version of
Crafty. That's better than working out which World Champion plays
most like me but not much better. See Soren Riis's rebuttal
http://www.chessbase.com/newsdetail.asp?newsid=3465
Dave.
--
David Richerby Moistened Toy (TM): it's like a fun
www.chiark.greenend.org.uk/~davidr/ child's toy but it's moist!
I don't think Riis or you understood the original article. The
researchers addressed in detail the objection that Crafty is not
the ultimate in determining the best move - obviously
we can find some specific positions where the version of
Crafty used in the analysis is wrong, but that is not a
fundamental objection.
There is much very interesting and original work
in the article - perhaps the Chessbase synopsis concentrates
excessively on the findings rather than on the methodology,
since it makes a better story. Certainly there were analyses
that they didn't do which should get done. That's just the normal
way that research advances. In any case, the approaches
investigated in the article are far preferable to the "historical
ELO" or "chessmetics" nonsense, which are *completely*
lacking in rigor of any kind.
> All this study shows is that Capablanca played moves that
> agrees with a relatively weak chess engine.
So does this mean that if you could find a chess engine
weak enough, my moves would agree even more? Then
I would rate even higher than all these world champions,
right?
-- help bot
It seems likely that whatever conclusions may be
drawn from such studies are largely determined by
the way in which the study is constructed, which
is just the opposite of what is supposedly desired
(i.e. computer-like objectivity).
For instance, had the study shown "desirable"
results right off the bat, the need to compensate
for the simplicity of position would never have
even occurred. If it turned out that, say, GM
Capablanca was more accurate because he
preferred simple positions, this could have been
interpreted as meaning he was simply the
strongest player; instead, there arose an
"emotional need" to compensate for some
assumed flaw, as if his choice of style were
somehow unfair to the other contenders.
What is never shown and rarely mentioned
is all the tweaking of the various formulae
that goes on before finalizing the charts and
results presented to us as readers, and this
invisible stuff is precisely what determines
the final rankings.
-- help bot
> > That article is, frankly, junk: I'm surprised it was ever accepted for
> > an academic conference.
>
> > They haven't determined the strongest champion of all time: they've
> > determined which World Champion plays most like a crippled version of
> > Crafty. That's better than working out which World Champion plays
> > most like me but not much better. See Soren Riis's rebuttal
>
> > http://www.chessbase.com/newsdetail.asp?newsid=3465
>
> I don't think Riis or you understood the original article. The
> researchers addressed in detail the objection that Crafty is not
> the ultimate in determining the best move - obviously
> we can find some specific positions where the version of
> Crafty used in the analysis is wrong, but that is not a
> fundamental objection.
You know, if you took the games of a typical (1300) rated
player and checked them with a dumbed-down-Crafty (1500),
you might get some useful information, but not nearly as
much as hoped for. But when you take the games of the
world champions and check them with a program which is
short of 2800, you get mainly garbage, combined with many
instances where a tactical oversight is correctly pinpointed.
You also penalize those players who *deliberately* chose
to play what they knew to be sub-optimal moves, for
whatever reason. I just did this myself at RedHotPawn,
choosing to grab a Knight rather than leap in with another
piece to set up a 95%-certain mating net. Why? Because
while the mating net was around 95% certain, the capture
of the free piece was 100% certain (unless I have lost my
mind)! When I spot another mating net, things should be
simple enough for me to get the 100% certainty I desire,
and having captured yet another piece, this is all but
inevitable, barring my opponent's resignation.
Another item which these statistical analyses overlook
is the deliberate gift of, say, a half-point. These have
been known to occur in world championship level play,
and of course the "nice guys" will be penalized for not
being "tough players", despite clinching the match
with their action.
In short, what can be learned is who was least prone
to tactical blunders, and apparently, whose style leans
most toward a sizable gap between what the program
sees as the #1 optimal move, and #2 -- something I
think may be termed the sharpness of play. For one
example, I am playing a game at RedHot now where
I had to decide whether to develop my QB "normally"
via ...d6 and then B-moves somewhere, or fianchetto
via ...b6 and B-b7. It was a toss-up, since it makes
no difference whatever to the outcome. I expect a
computer would see both moves as being nearly
equal, weighing them in such a way as to slightly
favor the move which gives the Bishop immediate
control of squares, though this immediacy is quite
irrelevant to the true value of the moves.
I wonder just how much time, and to what depth
the moves were analyzed before scoring them. I
recall that often a player's move may be scored poorly,
but if executed and stepped forward, a program may
change its mind completely about this, suddenly
realizing it had overlooked something.
-- help bot
I am in the process of reading this article now, and just
noticed a laughably absurd claim by the authors: that
the truncated Crafty used would naturally rank all
superior programs in reverse order. LOL! This is the
most ignorant comment I have seen since before I began
ignoring many recent postings by the Evans ratpack.
Of course, it is not the strength, but rather the
*similarity in style*
which would actually determine how truncated Crafty
ranks *all* other programs. It is theoretically possible
for Crafty to rank Rybka near the top, penalizing it
(unfairly) only for the few moves which it correctly sees
but where Crafty would blunder horribly. All this would
require is that Rybka *usually* agree with Crafty, but
when they disagree, for Rybka to always be right. The
gap in ratings could easily be 400 points, if the key
differences of opinion were instant game-losers.
I am beginning to get the impression that people who
play around with statistics in an attempt to demonstrate
something, are loony, as well as utterly incompetent in
applying statistics rationally.
-- help bot
J.Lohner
I tend to agree with you David Kane.
I find the rebuttal by Dr. Søren Riis, Oxford, UK unconvincing for a
number of reasons.
- it was clearly written with a popular audience in mind (witness the
exclamation point! It's been said that no serious article has ever
been written with an exclamation point! Unless the author did so
deliberately)
- it fails to understand the simple argument of 'normalization'. The
Matej Guid and Ivan Bratko original article pointed out that Crafty
was used since it was open source and could be modified; the stronger
programs are not, but in any event Crafty is hardly a weak tactics
program and the authors are looking for a standardized (normalized)
way of spotting blunders.
-The fact that Riis found positional sacrifices not evaluated by
Crafty is not convincing since: (1) such positional sacrifices are
rare--as computers have shown, chess is largely tactics; (2) everybody
will be judged equally by Crafty, so others pos sacs are also scored
'badly', so nobody will lose relative standing to one another, and
(3), as long as assumption (1) is valid, Crafty will find the most
"mistake free" chess player, or one that plays closest to being
"tactics mistake free", which is a very good way to determine a good
chess player IMO.
Now of course the surrebutter (rebuttal to the rebuttal) will be that
players like Tal will score poorly--and indeed they (he) did--but
let's face it, Tal was more of a shock player that relied on playing
the man rather than the board. In a match of coolheaded Karpov or
Kramnik versus Tal, all in their prime, the less emotional player is
likely to win (unless he loses his cool and loses...haha... think of
Topolov vs Kramnik). Also nobody ever became champion ignoring
tactics. That is the lesson of chess. Think of all the bogus moves
made by beginners, sacrificing knight for pawn, "to break up their
pawn chain", with no positional advantage. If you believe chess is
positional play more than tactics then such bogus moves should work
more often than they do. They do not.
So, understanding how chess works, and how chess playing computers
work, and having seen Crafty evaluate pretty good myself, I have to
side with the original article.
RL
Obviously, anything written with a popular audience in mind cannot
possibly be accurate.
> - it fails to understand the simple argument of 'normalization'. The
> Matej Guid and Ivan Bratko original article pointed out that Crafty
> was used since it was open source and could be modified; the stronger
> programs are not, but in any event Crafty is hardly a weak tactics
> program and the authors are looking for a standardized (normalized)
> way of spotting blunders.
Just because they used the same system for everyone doesn't mean the
system was good or useful. For example, they could declare that every
king move is a blunder. That's consistent across all the players but
would declare players who tend to win in the endgame (where the king
gets moved more) to be weaker than players who tend to win in the
middlegame. You need to apply the same *good* measure to everyone.
> -The fact that Riis found positional sacrifices not evaluated by
> Crafty is not convincing since: (1) such positional sacrifices are
> rare--as computers have shown, chess is largely tactics; (2) everybody
> will be judged equally by Crafty, so others pos sacs are also scored
> 'badly', so nobody will lose relative standing to one another
No. A player who plays more positional sacrifices will be penalized
for playing moves that crafty doesn't understand.
> and (3), as long as assumption (1) is valid, Crafty will find the
> most "mistake free" chess player, or one that plays closest to being
> "tactics mistake free", which is a very good way to determine a good
> chess player IMO.
But World Champions make very few tactical mistakes.
> Now of course the surrebutter (rebuttal to the rebuttal) will be
> that players like Tal will score poorly--and indeed they (he)
> did--but let's face it, Tal was more of a shock player that relied
> on playing the man rather than the board.
I'm not convinced by that assertion. Tal played games that were sound
enough that they were very hard to defeat over the board. I don't
think that counts as playing the man rather than the board.
> In a match of coolheaded Karpov or Kramnik versus Tal, all in their
> prime, the less emotional player is likely to win
Hmm... The two Botvinnik-Tal matches between them were only won by
Botvinnik +12-11=19. Hardly a convincing victory for the cool head.
> Think of all the bogus moves made by beginners, sacrificing knight
> for pawn, "to break up their pawn chain", with no positional
> advantage. If you believe chess is positional play more than
> tactics then such bogus moves should work more often than they do.
> They do not.
This argument is bogus. Sacrificing a knight against one's opponent's
pawn structure is hardly a prime example of `positional chess'. You
might as well say that all the bogus tactical shots attempted by
beginners to `win material' or `checkmate the king' show that tactics
play a small role in chess.
Dave.
--
David Richerby Frozen Erotic Gerbil (TM): it's like
www.chiark.greenend.org.uk/~davidr/ a children's pet but it's genuinely
erotic and frozen in a block of ice!
> > Now of course the surrebutter (rebuttal to the rebuttal) will be
> > that players like Tal will score poorly--and indeed they (he)
> > did--but let's face it, Tal was more of a shock player that relied
> > on playing the man rather than the board.
>
> I'm not convinced by that assertion. Tal played games that were sound
> enough that they were very hard to defeat over the board. I don't
> think that counts as playing the man rather than the board.
The whole idea of judging a player by his "error rate" presumes that the
way to win at chess is to commit no errors.
But a quick look at players like Lasker, Tal, and Bronstein shows that
there's another way: make an error in order to induce your opponent to
make a bigger error.
Many of Tal's sacrifices would be considered errors by a chess program
(and that's just counting the ones where you could expect a program to
see it through to the end, in all variations, in however much time you
gave it - and if you're only giving even a top program ten minutes a
move, you're not getting there on a lot of sacrifices) but Tal wasn't
trying to play perfect chess. He was trying to win games.
And judging by his results (a world championship; the longest undefeated
streak in tournament games) he did so incredibly well.
To say, therefore, that he was making errors strikes me as somewhat
absurd.
If the "error" was never intended to be an irrefutable move, and it
leads directly to victory against a top player, how can you call it an
error?
-Ron
That's total rubbish "Ron". You're obviously someone who doesn't know much
about the game of chess. Tal didn't set out to make errors, with the
lamebrain idea that this would somehow cause his opponents to make bigger
errors. Tal set out to create COMPLICATIONS for his opponents. Obviously Tal
desired for all of his sacrifices to be sound and forcing, but no human can
calculate everything to the end, so computer analysis has shown flaws in
many of his games. This is meaningless, because he wasn't playing against
computers.
Your comment is similiar to a common theme of beginner (or patzer) level
thinking, ie: "I know this move is bad, but if he doesn't see Bxf7+ then it
will be very good for me.
JMR
> That's total rubbish "Ron". You're obviously someone who doesn't know much
> about the game of chess. Tal didn't set out to make errors, with the
> lamebrain idea that this would somehow cause his opponents to make bigger
> errors. Tal set out to create COMPLICATIONS for his opponents. Obviously Tal
> desired for all of his sacrifices to be sound and forcing, but no human can
> calculate everything to the end, so computer analysis has shown flaws in
> many of his games. This is meaningless, because he wasn't playing against
> computers.
Have you read Tal's books?
I have. There are many time when he says things like, "It's clear 36. f4
was stronger," (Tal-Gligoric, Zagreb 59), or see his note to 5. ... Qc7
in Tal-Olaffson, Bled 1961 (a move he describes as "bad" - but that he
clearly made intentionally) or, say, 10. a3 in Tal-Bagirov,
Dnepropetrosk, 1970, which he describes as "in no way stronger than the
approved Re1."
(I found these notes by basically opening "The Life and Games of Mikhail
Tal" at random. Stuff like this is all over that book. You should try
reading it sometime, before you talk about what Tal was, or wasn't,
thinking. His book on his match with Botvinnik goes into even more depth
on his thinking, again, and does a good job explaining the emphasis Tal
put of psychology over soundness. And what is psychology, in chess,
other than playing an inferior move which you think your opponent will
respond badly too. In particular, I'd point you to his discussion of his
12th move of game 17.)
It's clear from his notes that he doesn't care if his sacrifices were
"correct" or not. He made a move - which he knew could well be unsound -
with the expectation that in the resulting position his opponents would
play incorrectly.
That's pretty much the definition of "making an error to induce your
opponent into making a bigger one."
-"Ron"
> > I find the rebuttal by Dr. S=F8ren Riis, Oxford, UK unconvincing for
> > a number of reasons.
>
> > - it was clearly written with a popular audience in mind (witness the
> > exclamation point!
>
> Obviously, anything written with a popular audience in mind cannot
> possibly be accurate.
No, but popular means not as accurate as a journal paper, which the
original paper was. Otherwise it's like saying whoever wins this
Usenet thread is right moreso than two chess researchers debating.
>
> > - it fails to understand the simple argument of 'normalization'. The
> > Matej Guid and Ivan Bratko original article pointed out that Crafty
> > was used since it was open source and could be modified; the stronger
> > programs are not, but in any event Crafty is hardly a weak tactics
> > program and the authors are looking for a standardized (normalized)
> > way of spotting blunders.
>
> Just because they used the same system for everyone doesn't mean the
> system was good or useful. For example, they could declare that every
> king move is a blunder. That's consistent across all the players but
> would declare players who tend to win in the endgame (where the king
> gets moved more) to be weaker than players who tend to win in the
> middlegame. You need to apply the same *good* measure to everyone.
That is the ideal, but my point stands--equally bad is not so bad.
And BTW using your example, a player who wins in the middlegame is
indeed probably stronger than one who wins in the endgame (it's
tougher to win a short game--think of winning a chess brilliancy
against equally matched opposition--than to grind out a win in the
endgame. In fact, a standard technique I use to draw against my much
more powerful chess playing computer is to reduce to the endgame and
go for the draw).
>
> > -The fact that Riis found positional sacrifices not evaluated by
> > Crafty is not convincing since: (1) such positional sacrifices are
> > rare--as computers have shown, chess is largely tactics; (2) everybody
> > will be judged equally by Crafty, so others pos sacs are also scored
> > 'badly', so nobody will lose relative standing to one another
>
> No. A player who plays more positional sacrifices will be penalized
> for playing moves that crafty doesn't understand.
No. See my point above. And chess is 99% tactics (famous quote).
>
> > and (3), as long as assumption (1) is valid, Crafty will find the
> > most "mistake free" chess player, or one that plays closest to being
> > "tactics mistake free", which is a very good way to determine a good
> > chess player IMO.
>
> But World Champions make very few tactical mistakes.
Not true. Nearly all games are full of tactical mistakes, except
perhaps at the correspondence chess level. I was reading a book by
John Nunn ("Chess explained move by move") that makes this point in
the preface--Nunn had a hard time finding 20 OTB games that were
'mistake free' for his book, after searching 1000s of games.
>
> > Now of course the surrebutter (rebuttal to the rebuttal) will be
> > that players like Tal will score poorly--and indeed they (he)
> > did--but let's face it, Tal was more of a shock player that relied
> > on playing the man rather than the board.
>
> I'm not convinced by that assertion. Tal played games that were sound
> enough that they were very hard to defeat over the board. I don't
> think that counts as playing the man rather than the board.
But on balance Tal was a shock player. Deny that and you become a
chess revisionist.
>
> > In a match of coolheaded Karpov or Kramnik versus Tal, all in their
> > prime, the less emotional player is likely to win
>
> Hmm... The two Botvinnik-Tal matches between them were only won by
> Botvinnik +12-11=19. Hardly a convincing victory for the cool head.
Pace Karpov's lifetime record against Tal, which is way positive. Of
course it was a young Karpov against an older, sick Tal, but the point
stands.
>
> > Think of all the bogus moves made by beginners, sacrificing knight
> > for pawn, "to break up their pawn chain", with no positional
> > advantage. If you believe chess is positional play more than
> > tactics then such bogus moves should work more often than they do.
> > They do not.
>
> This argument is bogus. Sacrificing a knight against one's opponent's
> pawn structure is hardly a prime example of `positional chess'. You
> might as well say that all the bogus tactical shots attempted by
> beginners to `win material' or `checkmate the king' show that tactics
> play a small role in chess.
Positional chess SACRIFICE was my point. A positional chess sacrifice
is rare in chess is my point (goes to chess being 99% tactics). A
positional chess sacrifice is one where you do indeed exchange knight
for two pawns, so you're down a pawn, with no immeadiate hope of
recapturing your lost material. But the positional gain will help you
20 moves from now. This is common in GO but not in chess.
Ray
> You know, if you took the games of a typical (1300) rated
> player and checked them with a dumbed-down-Crafty (1500),
> you might get some useful information, but not nearly as
> much as hoped for. But when you take the games of the
> world champions and check them with a program which is
> short of 2800, you get mainly garbage, combined with many
> instances where a tactical oversight is correctly pinpointed.
But chess is 99% tactics help bot.
>
> You also penalize those players who *deliberately* chose
> to play what they knew to be sub-optimal moves, for
> whatever reason. I just did this myself at RedHotPawn,
> choosing to grab a Knight rather than leap in with another
> piece to set up a 95%-certain mating net. Why? Because
> while the mating net was around 95% certain, the capture
> of the free piece was 100% certain (unless I have lost my
> mind)! When I spot another mating net, things should be
> simple enough for me to get the 100% certainty I desire,
> and having captured yet another piece, this is all but
> inevitable, barring my opponent's resignation.
But you risk the chance of letting your opponent escape--remember the
maxim: "always check, since the next move may be mate". Just recently
I did not follow this move and instead of winning a pawn against my PC
I drifted and eventually lost.
>
> Another item which these statistical analyses overlook
> is the deliberate gift of, say, a half-point. These have
> been known to occur in world championship level play,
> and of course the "nice guys" will be penalized for not
> being "tough players", despite clinching the match
> with their action.
Keep in mind this was not a statistical analysis of the kind Sonas is
famous for, but a different kind. Also over time the "nice guys"
penalty will statistically average out.
>
> In short, what can be learned is who was least prone
> to tactical blunders, and apparently, whose style leans
> most toward a sizable gap between what the program
> sees as the #1 optimal move, and #2 -- something I
> think may be termed the sharpness of play. For one
> example, I am playing a game at RedHot now where
> I had to decide whether to develop my QB "normally"
> via ...d6 and then B-moves somewhere, or fianchetto
> via ...b6 and B-b7. It was a toss-up, since it makes
> no difference whatever to the outcome. I expect a
> computer would see both moves as being nearly
> equal, weighing them in such a way as to slightly
> favor the move which gives the Bishop immediate
> control of squares, though this immediacy is quite
> irrelevant to the true value of the moves.
Again, over time this will "wash out" or "average out". In general
sharp play is better than just pushing yourself into a passive
position, don't you think? That's what Crafty is looking for--sharp
play. Sharp play = sharp mind bot!
>
> I wonder just how much time, and to what depth
> the moves were analyzed before scoring them. I
> recall that often a player's move may be scored poorly,
> but if executed and stepped forward, a program may
> change its mind completely about this, suddenly
> realizing it had overlooked something.
>
No, you're talking about "move on opponent's time" feature. The way
the study was done was to analyze each move for a fixed time, so no
"changing of mind", and even if so, each player had the same scoring
applied, so it doesn't really matter (over time). Besides, have you
noticed that _MOST_ of the time (not always) the best move found by
Fritz or Crafty in the first five seconds is also the best move found
after 60 seconds? Because chess is 99% tactics, and often the tactics
are no more than 4 moves deep (most of the time).
RL (a 1950 Elo player, so I can speak with some authority).
> - it fails to understand the simple argument of 'normalization'. The
> Matej Guid and Ivan Bratko original article pointed out that Crafty
> was used since it was open source and could be modified; the stronger
> programs are not, but in any event Crafty is hardly a weak tactics
> program and the authors are looking for a standardized (normalized)
> way of spotting blunders.
But the test crippled Crafty by cutting off the search
at only 12 plys. In a game between two patzers, this
might be a minor flaw, but at the world championship
level, things are not always so simple.
Although this cutting off at a specific ply makes it
possible to duplicate the test on any computer, it may
have been more useful to use a fixed time instead
(provided the time is equal to or greater than the average
time to complete 12 plys).
> -The fact that Riis found positional sacrifices not evaluated by
> Crafty is not convincing since: (1) such positional sacrifices are
> rare--as computers have shown,
No, they aren't. Only in games between patzers is
the intentional sacrifice of material for position "rare".
> chess is largely tactics;
True. But not all tactics are visible at a depth of
only 12 plys. Tactics can flow from positional
advantage, with virtually no limit as to depth.
> (2) everybody
> will be judged equally by Crafty,
Misjudged would be more accurate.
> so others pos sacs are also scored
> 'badly', so nobody will lose relative standing to one another,
Except at random, due to all the errors.
> and (3), as long as assumption (1) is valid, Crafty will find the most
> "mistake free" chess player, or one that plays closest to being
> "tactics mistake free", which is a very good way to determine a good
> chess player IMO.
But not good enough for these guys.
As all the world champions were good at tactics, it
requires a bit of subtlety to differentiate between them.
> Now of course the surrebutter (rebuttal to the rebuttal) will be that
> players like Tal will score poorly--and indeed they (he) did--but
> let's face it, Tal was more of a shock player that relied on playing
> the man rather than the board. In a match of coolheaded Karpov or
> Kramnik versus Tal, all in their prime, the less emotional player is
> likely to win
A silly statement. As we saw, the wild, attacking
style of GK gave GM Karpov a very hard time, except
for their very first match. How was GM Tal, in his
prime, all that different from GM Kasparov?
Another example was the cool, calm, collected
Bobby Fischer, who was overwhelmed by GM Tal in
his prime, and who calmly observed after the fact
that GM Tal's hyper-aggressive play was "unsound".
> (unless he loses his cool and loses...haha... think of
> Topolov vs Kramnik). Also nobody ever became champion ignoring
> tactics.
> That is the lesson of chess. Think of all the bogus moves
> made by beginners, sacrificing knight for pawn, "to break up their
> pawn chain", with no positional advantage. If you believe chess is
> positional play more than tactics then such bogus moves should work
> more often than they do. They do not.
It's not this simple. The world champions are all
competent at tactics, so the differences between
them are more subtle than just "who was the best
tactician".
> So, understanding how chess works, and how chess playing computers
> work, and having seen Crafty evaluate pretty good myself, I have to
> side with the original article.
If you mean the one I think, it was horribly
skewered by a whole slew of critics under
"Reader's Feedback", in addition to all the
points made by the various critics who had
their articles published.
The primary issue is not that computers are
incapable of ranking the world champions by
accuracy, it is that attempting to do this with
a crippled Crafty and just the games from the
world championships is a poor method.
I would have preferred a deeper analysis by
a stronger program of all their important games,
in conjunction with a side-by-side subjective
analysis of the same games by a human GM
who, instead of tweaking the program to suit his
whims/preconceptions, simply comments on
where he thinks the program went astray.
The ideal might be the HAL9000 computer
"discussing" the games and results in plain
English, and giving "his" considered opinion
on the strengths and weaknesses of each of
the world champions, as seen by a program
rated (in the future) 9000 USCF. :>D
-- help bot
> > In a match of coolheaded Karpov or Kramnik versus Tal, all in their
> > prime, the less emotional player is likely to win
>
> Hmm... The two Botvinnik-Tal matches between them were only won by
> Botvinnik +12-11=19. Hardly a convincing victory for the cool head.
In the first match, won by GM Tal, he often stood
worse out of the opening but maintained a cool head,
realizing the only chance was to complicate, apply
pressure to the opponent's King, to randomize the
position a bit. It was GM Botvinnik who choked,
rather than maintaining his coolness.
In the second match, the annoying attacks were
fended off in part by a switch to different openings
which were less conducive to GM Tal's wild,
attacking style. It should not be assumed that
GM Tal was a hothead, while his victims were all
coolheaded.
-- help bot
Mr. Mitchell has made a serious error here in
equating game commentary after the fact with
what a player may have been thinking at the
time.
All of these games have been annotated --
often after looking at notations by others -- by
such players as GM Tal. This in no way means
that if, say, GM Kortchnoi said move x was
better and then GM Tal wrote in his book that
move x was better (since he agreed), that at
the time the game was played GM Tal *saw*
that move x was better, but deliberately chose
to play a stupid move instead!
On the contrary, in match one against GM
Botvinnik, GM Tal often points out that he had
to choose a different line because his old
choice had failed the last time out. In effect,
he readily admits when his openings were poor,
despite having won the game anyway.
Besides, those books on GM Tal, by GM Tal,
were all written by someone we now know to be
a horrible patzer, thanks to crippled-Crafty! :>D
-- help bot
>So, understanding how chess works, and how chess playing computers
>work, and having seen Crafty evaluate pretty good myself, I have to
>side with the original article.
I would not go so far as to say that I side with the original argument,
only that Riis' objections were groundless. In fact, the original authors
have done some groundbreaking work on developing a
methodology to rate chess players. It is, at the very least,
very interesting, and a refreshing change from the pseudo-science
historical ELO/chessmetrics stuff. The problem with the work is
that it applies a new method to a very hard problem (ranking
world champions) when they haven't even shown the method's
worth when applied to easy problems (ranking everybody else).
I have previously expressed belief in the theory that "move rating" will
eventually surpass "result rating" as the gold standard measurement of
chess skill. This is a small first step, but there is much work left to
do.
> The whole idea of judging a player by his "error rate" presumes that the
> way to win at chess is to commit no errors.
>
> But a quick look at players like Lasker, Tal, and Bronstein shows that
> there's another way: make an error in order to induce your opponent to
> make a bigger error.
Right. This is precisely the, um, strategy I use when
I make all my errors. I am deliberately failing to see
the correct move and instead playing a turkey, with
the intention of inducing a similar blunder by my
opponent. Of course, I could always find the best
move if I really wanted to; I just *want* to play poorly.
;>D
> Many of Tal's sacrifices would be considered errors by a chess program
> (and that's just counting the ones where you could expect a program to
> see it through to the end, in all variations, in however much time you
> gave it - and if you're only giving even a top program ten minutes a
> move, you're not getting there on a lot of sacrifices) but Tal wasn't
> trying to play perfect chess. He was trying to win games.
This is why it is rather unfair to try and judge
competitive players by how closely their moves
match up to a chess program; the program is
under no pressure to protect its title, for instance.
Nor is it ever faced with stupid questions from
reporters like- Q: "In game one, why did you allow
42.Q-g7 mate?" A: "As world champion, I never
overlook such things. Clearly then, I must have
been offered, and accepted, a huge bribe, of say,
ten billion dollars. Pardon me, but I *must* get to
the bank before it closes. The interest I'm losing
as we speak is KILLING me!"
> And judging by his results (a world championship; the longest undefeated
> streak in tournament games) he did so incredibly well.
Because of all the hype surrounding GM Fischer
and all the controversies brought on by Cold War
politics, we seldom remember that even as BF was
taking the title from the "evil axis" in 1972, at the
same time GM Tal was undergoing a period of near
invincibility -- the streak you mentioned above.
Countless fans of BF will recount a 6-0 match
victory or two, while never once realizing the
simultaneous exploits of GM Tal, who by the way,
"took" the year 1972 according to Chessmetrics,
over GM Fischer!
> To say, therefore, that he was making errors strikes me as somewhat
> absurd.
If we go by what GM Botvinnik said, only Tigran
Petrosian never made any (combinational) errors.
(In the position after 1.e4 Nc6 2.Qh5 Nb8 3.Qxf7+,
one would be wise to decline the sac according to
GM Botvinnik's advice, if GM Petrosian has White.)
> If the "error" was never intended to be an irrefutable move, and it
> leads directly to victory against a top player, how can you call it an
> error?
All this shows is how closely a given player's
world championship games matched up with
move selections by a crippled Crafty. I don't
know about you, but if I were world champion,
I would hope to be a bit stronger than crippled
Crafty, and want my moves to match up well in
simple tactical exchanges, but not otherwise.
I really think the scope of such a statistical
analysis ought to have been limited to finding
out which world champion was the least
afflicted by a tendency to blunder, and which
was most afflicted.
-- help bot
> Mr. Mitchell has made a serious error here in
> equating game commentary after the fact with
> what a player may have been thinking at the
> time.
>
> All of these games have been annotated --
> often after looking at notations by others -- by
> such players as GM Tal. This in no way means
> that if, say, GM Kortchnoi said move x was
> better and then GM Tal wrote in his book that
> move x was better (since he agreed), that at
> the time the game was played GM Tal *saw*
> that move x was better, but deliberately chose
> to play a stupid move instead!
Is "Mr. Mitchell" supposed to be me?
In all of the cases I cited, it's clear that Tal is talking about what
he saw during the game, not about his after-the-fact analysis.
Tal does quite a bit of this in his books. He'll give notes from
after-the-fact analysis, but he tends to focus much more than most
players on what he saw, when, and what his motivations were for playing.
So, nice try, but you're actually completely wrong here.
-Ron
> >So, understanding how chess works, and how chess playing computers
> >work, and having seen Crafty evaluate pretty good myself, I have to
> >side with the original article.
>
> I would not go so far as to say that I side with the original argument,
> only that Riis' objections were groundless. In fact, the original authors
> have done some groundbreaking work on developing a
> methodology to rate chess players. It is, at the very least,
> very interesting, and a refreshing change from the pseudo-science
> historical ELO/chessmetrics stuff. The problem with the work is
> that it applies a new method to a very hard problem (ranking
> world champions) when they haven't even shown the method's
> worth when applied to easy problems (ranking everybody else).
It seems to me that the above comments themselves do
a decent job of showing how the "groundbreaking work"
is little different from ChessMetrics' pseudo-science.
------
In one of the defenses to a criticism, it was argued that
even a weak chess program could be utilized effectively
to rank players, due to a strong correlation of some sort.
But in constructing their example to demonstrate how
this works, the authors (as always) made some invalid
assumptions; in this particular case, that apart from the
single strongest move in a given position, the remaining
choices are distributed or chosen evenly. Obviously,
the remaining move choices are anything but equal, and
how a player chooses among them is a big part of how
strongly they play. The stronger the player, the more
likely he would be to go for #2 as opposed to #10
(granting the oddball assumption of exactly ten choices
per position). All these invalid assumptions come off
as a clueless math major having fun "playing around
with" numbers which just happen to relate to chess.
Thus far, the only works I have seen which are not
seriously flawed in terms of logic and reason, were a
few of the brief criticisms of the published works by
the math whiz-kids.
-- help bot
Well, it worked for Petrosian. ;-)
> [...] Tal wasn't trying to play perfect chess. He was trying to win
> games.
This is a crucial point, yes.
Dave.
--
David Richerby Broken Atlas (TM): it's like a map of
www.chiark.greenend.org.uk/~davidr/ the world but it doesn't work!
Popular does not mean `not as accurate'.
> Otherwise it's like saying whoever wins this Usenet thread is right
> moreso than two chess researchers debating.
It's not like saying that at all. And what's this about `winning'
threads? I'm posting to share information, help people understand
things and hopefully be entertaining from time to time. What are you
posting for?
>>> [...] the authors are looking for a standardized (normalized) way
>>> of spotting blunders.
>>
>> Just because they used the same system for everyone doesn't mean
>> the system was good or useful. [...] You need to apply the same
>> *good* measure to everyone.
>
> That is the ideal, but my point stands--equally bad is not so bad.
Strongly disagree. Consider an even more ludicrous measure: the
better player is the one with most a's in his name. (That said, this
does explain why Lasker beat Steinitz, why Capablanca beat Lasker, why
Alekhine beat Euwe, why Tal and Petrosian beat Botvinnik, why Karpov
kept beating Korchnoi and why Kasparov kept beating Karpov. Perhaps
there's something in this, after all?)
> And BTW using your example, a player who wins in the middlegame is
> indeed probably stronger than one who wins in the endgame
Disagree strongly.
> (it's tougher to win a short game--think of winning a chess
> brilliancy against equally matched opposition--than to grind out a
> win in the endgame.
The scarcity of short games at high level is because short games are
the result of catastrophic mistakes and high-level players tend not to
make those.
>>> -The fact that Riis found positional sacrifices not evaluated by
>>> Crafty is not convincing since: (1) such positional sacrifices are
>>> rare--as computers have shown, chess is largely tactics; (2)
>>> everybody will be judged equally by Crafty, so others pos sacs are
>>> also scored 'badly', so nobody will lose relative standing to one
>>> another
>>
>> No. A player who plays more positional sacrifices will be
>> penalized for playing moves that crafty doesn't understand.
>
> No. See my point above.
No, really. A player who makes more moves that Crafty doesn't
understand (e.g., positional sacrifices) will have greater deviation
from Crafty's play than a player who makes only moves that Crafty does
understand. Hence, he will score lower. Not because he plays bad
moves but because he plays moves that are better than the ones Crafty
found.
> And chess is 99% tactics (famous quote).
If you're going to argue by quotation, I'll throw in ``82% of
statistics are made up on the spot,'' ``History is more or less bunk''
and ``The devil may quote scripture for his own purposes.'' Oh,
and``I never drink water because of the disgusting things fish do in
it.''
>>> and (3), as long as assumption (1) is valid, Crafty will find the
>>> most "mistake free" chess player, or one that plays closest to
>>> being "tactics mistake free", which is a very good way to
>>> determine a good chess player IMO.
>>
>> But World Champions make very few tactical mistakes.
>
> Not true. Nearly all games are full of tactical mistakes, except
> perhaps at the correspondence chess level. I was reading a book by
> John Nunn ("Chess explained move by move") that makes this point in
> the preface--Nunn had a hard time finding 20 OTB games that were
> 'mistake free' for his book, after searching 1000s of games.
I assume you mean ``Understanding Chess Move by Move''? Does he
say that he had a hard time finding games that were mistake free or
free of *tactical* mistakes?
>> Tal played games that were sound enough that they were very hard to
>> defeat over the board. I don't think that counts as playing the
>> man rather than the board.
>
> But on balance Tal was a shock player. Deny that and you become a
> chess revisionist.
But, on balance, Tal was a very successful player. As Ron said, the
point is to win, not to play perfect chess.
>>> In a match of coolheaded Karpov or Kramnik versus Tal, all in their
>>> prime, the less emotional player is likely to win
>>
>> Hmm... The two Botvinnik-Tal matches between them were only won by
>> Botvinnik +12-11=19. Hardly a convincing victory for the cool head.
>
> Pace Karpov's lifetime record against Tal, which is way positive. Of
> course it was a young Karpov against an older, sick Tal, but the point
> stands.
Well, I'm not sure the point does stand. That rider about Tal being
old and sick strikes me as being just a leetle bit significant.
>>> Think of all the bogus moves made by beginners, sacrificing knight
>>> for pawn, "to break up their pawn chain", with no positional
>>> advantage. If you believe chess is positional play more than
>>> tactics then such bogus moves should work more often than they do.
>>> They do not.
>>
>> This argument is bogus. Sacrificing a knight against one's
>> opponent's pawn structure is hardly a prime example of `positional
>> chess'. [...]
>
> Positional chess SACRIFICE was my point. A positional chess
> sacrifice is rare in chess is my point (goes to chess being 99%
> tactics).
If you meant `positional sacrifices' you should have said that. What
you said was `If you believe chess is positional play more than
tactics.' Anyway, I certainly agree with you that tactics are much
more common that positional sacrifices. But there's an awful lot more
to positional play than positional sacrifices. Indeed, one might say
that ``Positional play is 99% positional-sacrifice--free'' but that's
much less snappy than ``Chess is 99% tactics.''
Dave.
--
David Richerby Generic Laptop Tool (TM): it's like
www.chiark.greenend.org.uk/~davidr/ a hammer that you can put on your lap
but it's just like all the others!
Er, I took that record from chessgames.com -- I don't know if their
database has all the Tal-Fischer games.
Dave.
--
David Richerby Addictive Dictator (TM): it's like a
www.chiark.greenend.org.uk/~davidr/ totalitarian leader but you can never
put it down!
Tal sacrificed for the attack; Kasparov sacrificed for the initiative,
according to somebody I read. Tal and Kasparov were dramatically
different in style. Quite apart from anything else, Kasparov's play
is much more sound than Tal's. Of course, Tal's play was sound enough
to win in many cases; who needs to be more sound than that?
> Another example was the cool, calm, collected Bobby Fischer, who was
> overwhelmed by GM Tal in his prime
You mean the +4-2=5 career record (excluding the two blitzgames at
Herzeg Novi, which were both won by Fischer and which were ten years
after the rest of the games) in Tal's favour? That's not particularly
overwhelming.
> and who calmly observed after the fact that GM Tal's
> hyper-aggressive play was "unsound".
Tal never claimed to be sound. He just claimed to be sound enough to
be very difficult to beat over the board. Who cares if some sacrifice
takes two hours of computer time to defeat? The opponent doesn't have
two hours of computer time.
> The world champions are all competent at tactics, so the differences
> between them are more subtle than just "who was the best tactician".
Exactly. Ditto to the rest of your comments from this point, which
I've snipped.
Dave.
--
David Richerby Chocolate Widget (TM): it's like a
www.chiark.greenend.org.uk/~davidr/ thingy that's made of chocolate!
> Tal never claimed to be sound.
In fact, he went even further than that.
His quote was something to the effect of:
"There are two kinds of sacrifices: sound ones, and mine."
-Ron
> > No, but popular means not as accurate as a journal paper, which the
> > original paper was.
>
> Popular does not mean `not as accurate'.
Generally it does, statistically speaking.
>
> > Otherwise it's like saying whoever wins this Usenet thread is right
> > moreso than two chess researchers debating.
>
> It's not like saying that at all. And what's this about `winning'
> threads? I'm posting to share information, help people understand
> things and hopefully be entertaining from time to time. What are you
> posting for?
To win this thread. I win.
>
> >>> [...] the authors are looking for a standardized (normalized) way
> >>> of spotting blunders.
>
> >> Just because they used the same system for everyone doesn't mean
> >> the system was good or useful. [...] You need to apply the same
> >> *good* measure to everyone.
>
> > That is the ideal, but my point stands--equally bad is not so bad.
>
> Strongly disagree. Consider an even more ludicrous measure: the
> better player is the one with most a's in his name. (That said, this
> does explain why Lasker beat Steinitz, why Capablanca beat Lasker, why
> Alekhine beat Euwe, why Tal and Petrosian beat Botvinnik, why Karpov
> kept beating Korchnoi and why Kasparov kept beating Karpov. Perhaps
> there's something in this, after all?)
Irrelevant. We are talking about using a chess program not as strong
as the players it rates, to rate the players based on the least number
of tactical mistakes (and the least number of positional mistakes,
since chess programs do make positional evaluations, often
surprisingly good). We are not talking Mensa word games.
>
> > And BTW using your example, a player who wins in the middlegame is
> > indeed probably stronger than one who wins in the endgame
>
> Disagree strongly.
>
> > (it's tougher to win a short game--think of winning a chess
> > brilliancy against equally matched opposition--than to grind out a
> > win in the endgame.
>
> The scarcity of short games at high level is because short games are
> the result of catastrophic mistakes and high-level players tend not to
> make those.
Whatever dude. My point stands: winning in the middlegame is tougher
for equally rated players than winning in the endgame. Probably worth
10 centipawns.
>
> >>> -The fact that Riis found positional sacrifices not evaluated by
> >>> Crafty is not convincing since: (1) such positional sacrifices are
> >>> rare--as computers have shown, chess is largely tactics; (2)
> >>> everybody will be judged equally by Crafty, so others pos sacs are
> >>> also scored 'badly', so nobody will lose relative standing to one
> >>> another
>
> >> No. A player who plays more positional sacrifices will be
> >> penalized for playing moves that crafty doesn't understand.
>
> > No. See my point above.
>
> No, really. A player who makes more moves that Crafty doesn't
> understand (e.g., positional sacrifices) will have greater deviation
> from Crafty's play than a player who makes only moves that Crafty does
> understand. Hence, he will score lower. Not because he plays bad
> moves but because he plays moves that are better than the ones Crafty
> found.
But this is rare. Remember, chess is 99.7% tactics. Has a decade of
computer chess and Deeper Blue not taught you anything? And you, a
programmer no less?
>
> > And chess is 99% tactics (famous quote).
>
> If you're going to argue by quotation, I'll throw in ``82% of
> statistics are made up on the spot,'' ``History is more or less bunk''
> and ``The devil may quote scripture for his own purposes.'' Oh,
> and``I never drink water because of the disgusting things fish do in
> it.''
>
> >>> and (3), as long as assumption (1) is valid, Crafty will find the
> >>> most "mistake free" chess player, or one that plays closest to
> >>> being "tactics mistake free", which is a very good way to
> >>> determine a good chess player IMO.
>
> >> But World Champions make very few tactical mistakes.
>
> > Not true. Nearly all games are full of tactical mistakes, except
> > perhaps at the correspondence chess level. I was reading a book by
> > John Nunn ("Chess explained move by move") that makes this point in
> > the preface--Nunn had a hard time finding 20 OTB games that were
> > 'mistake free' for his book, after searching 1000s of games.
>
> I assume you mean ``Understanding Chess Move by Move''? Does he
> say that he had a hard time finding games that were mistake free or
> free of *tactical* mistakes?
I think he implies both, but since chess is 99.69% tactics, it implies
the latter.
>
> >> Tal played games that were sound enough that they were very hard to
> >> defeat over the board. I don't think that counts as playing the
> >> man rather than the board.
>
> > But on balance Tal was a shock player. Deny that and you become a
> > chess revisionist.
>
> But, on balance, Tal was a very successful player. As Ron said, the
> point is to win, not to play perfect chess.
Tal won, and is a winner. But he was not the strongest player--that
is the player that made the least mistakes. This is routine btw in
computer chess --once a GM, I think it was Walter Browne, was rated to
see how 'closely' he played an ending where the best play is already
known (I think it's the B+N+P database called "Nablom*" something,
pretty famous), just to see how 'close' he played to the (already
known) perfect ending database. THe theory was the stronger the
player, the closer he plays to the 'theoretically correct' perfect
play of the database. Why is this so hard to understand? You
understand pointers don't you, yet can't grasp this? Or perhaps you
still program in Visual Basic and Perl?
>
> >>> In a match of coolheaded Karpov or Kramnik versus Tal, all in their
> >>> prime, the less emotional player is likely to win
>
> >> Hmm... The two Botvinnik-Tal matches between them were only won by
> >> Botvinnik +12-11=19. Hardly a convincing victory for the cool head.
>
> > Pace Karpov's lifetime record against Tal, which is way positive. Of
> > course it was a young Karpov against an older, sick Tal, but the point
> > stands.
>
> Well, I'm not sure the point does stand. That rider about Tal being
> old and sick strikes me as being just a leetle bit significant.
Actually I was wrong. Per another post in this thread the same day I
posted, apparently in the early 1970s Tal was playing better than
Fischer. He had gotten over his serious kidney ailment. So my point
still stands and indeed is stronger than before, like Tal was.
>
> >>> Think of all the bogus moves made by beginners, sacrificing knight
> >>> for pawn, "to break up their pawn chain", with no positional
> >>> advantage. If you believe chess is positional play more than
> >>> tactics then such bogus moves should work more often than they do.
> >>> They do not.
>
> >> This argument is bogus. Sacrificing a knight against one's
> >> opponent's pawn structure is hardly a prime example of `positional
> >> chess'. [...]
>
> > Positional chess SACRIFICE was my point. A positional chess
> > sacrifice is rare in chess is my point (goes to chess being 99%
> > tactics).
>
> If you meant `positional sacrifices' you should have said that. What
> you said was `If you believe chess is positional play more than
> tactics.'
Sorry, but now you know what I meant. Move on.
> Anyway, I certainly agree with you that tactics are much
> more common that positional sacrifices.
Then you conceed my point and indeed the point of Crafty rating chess
players.
> But there's an awful lot more
> to positional play than positional sacrifices. Indeed, one might say
> that ``Positional play is 99% positional-sacrifice--free'' but that's
> much less snappy than ``Chess is 99% tactics.''
Whatever. Point being Crafty does both positional and tactical
evaluations, the latter better than the former, but it does both.
Case closed. Like the header says: Greatest chess players ever?
Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't
lie!)
Goodbye, duffer.
RL
>
> Dave.
>
> --
> David Richerby Geriatric Laptop Fool (TM): it's likewww.chiark.greenend.org.uk/~davidr/ like an old dumb stripper that you can put on your lap
>
I agree Riis' objections were groundless.
> In fact, the original authors
> have done some groundbreaking work on developing a
> methodology to rate chess players. It is, at the very least,
> very interesting, and a refreshing change from the pseudo-science
> historical ELO/chessmetrics stuff.
If you have a moment kindly articulate what you mean by pseudo-
science. I like Sonas' work, which has been used by the PGA (defunct
GM association). What I am aware of is that (from memory) it seems
Sonas rates players from different rating pools (which Sonas
acknowledges is faulty), that is to say from different time periods,
when in fact as Arpad Elo pointed out, as time progresses the rating
pool as a whole gets stronger. Also Sonas apparently (from reading
some of these threads) makes assumptions such as a player who plays
infrequently should gain or lose a different number of Elo points than
a player who plays regularly. What other 'pseudo-science' (which I
take to be what you feel are defects) are you referring to?
> The problem with the work is
> that it applies a new method to a very hard problem (ranking
> world champions) when they haven't even shown the method's
> worth when applied to easy problems (ranking everybody else).
>
Ranking everybody else meaning what? If the players compete regularly
against one another, the Elo Gaussian distribution seems a good way of
ranking to me. BTW I've seen Arpad Elo's scheme even applied to rank
world football (soccer) teams, and it's surprising how well the system
seems to work (Brazil was #1, as expected, and the other familiar
winners were in the top 10, even Greece, which won the Euro
Championships in 2004 and were considered a 'surprise team', in fact
was ranked at the time in the top 10 by this system, so the 2004
championship wasn't quite that big an upset, not unlike Euwe's victory
over Alekhine was not that big an upset because in fact Euwe was quite
a good player, albeit obscure).
> I have previously expressed belief in the theory that "move rating" will
> eventually surpass "result rating" as the gold standard measurement of
> chess skill. This is a small first step, but there is much work left to
> do.
Agreed, though I doubt move rating will eventually surpass 'result
rating' anytime soon, unless we develop a very powerful PC (which is
possible) and/or a quantum computer that can instantly solve the
entire chess tree to give a 'perfect' verdict on every move (i.e., the
move, with perfect play will end in victory, defeat or a draw).
RL
OK, now if I used Crafty for analysis of Fritz or Rybka games, it would
certainly not agree and would often call their moves 'errors' even though
they are FAR stronger than crafty.
To properly analyze the world champions
you would need to use a program that is atleast equal in strength to
these champions. Crippled Crafty just doesn't cut it... now if they
used Rybka for analysis I wouldn't have any problem
with the study.
J.Lohner
They say there's a fine line between love and hate,
but *this* is really splitting hairs. Obviously, the
road to attack is paved with stones bearing the label:
"gain the initiative".
> Tal and Kasparov were dramatically
> different in style. Quite apart from anything else, Kasparov's play
> is much more sound than Tal's.
Perhaps. But I know of one game where GM Kasparov
sacked an entire Rook for nothing more than a few spite
checks -- and this is about as unsound as it gets with
these guys. And that game was hardly an exception,
apart from the rarity of winning a full Rook down at that
level despite the idiotic attack.
> Of course, Tal's play was sound enough
> to win in many cases; who needs to be more sound than that?
It depends mainly on whom you are playing. For
instance, against DeepFritz, it would be wise to refrain
from unsound sacrifices altogether.
> > Another example was the cool, calm, collected Bobby Fischer, who was
> > overwhelmed by GM Tal in his prime
>
> You mean the +4-2=5 career record
Lapsing beyond well GM Tal's prime now, and into GM
Fischer's. In between there were, I believe, three other
world champions.
> (excluding the two blitzgames at
> Herzeg Novi, which were both won by Fischer and which were ten years
> after the rest of the games) in Tal's favour?
No, I was talking about GM Tal's prime, just as I wrote.
The year 1959 leaps to mind.
> > [GM Fischer] calmly observed after the fact that GM Tal's
> > hyper-aggressive play was "unsound".
>
> Tal never claimed to be sound.
Who cares? You are beginning to sound like Taylor
Kingston, who wishes to substitute mere hearsay for hard
evidence. As I see it, it makes no difference what GM
Tal said; he either was, or he wasn't, sound; and on top
of this, he is to be disqualified on the basis of knowing
himself personally.
> He just claimed to be sound enough to
> be very difficult to beat over the board.
This is immaterial to what I was talking about, although
it might prove relevant to a discussion of GM Tal's
humility.
-- help bot
ELOs methodology was designed for a certain set of conditions
(playing actively in a pool) but is applied under conditions way outside
of those conditions. The issue is whether those extrapolations
are supported by any evidence.
>
>> The problem with the work is
>> that it applies a new method to a very hard problem (ranking
>> world champions) when they haven't even shown the method's
>> worth when applied to easy problems (ranking everybody else).
>>
>
> Ranking everybody else meaning what? If the players compete regularly
> against one another, the Elo Gaussian distribution seems a good way of
> ranking to me.
That is a big if. Look at actual ratings lists and you will find: 1. That they
contain a very small number of players compared to the
total number of chess players 2. Many of the players in the list have
very few games 3. Many are based on old results.
The proposed method should be confirmed by applying
it to problems where we know what the answer should be. I.e.
first determine a general relationship between moves and wins,
and *then* apply it to a hard problem (like ranking world
champions).
BTW I've seen Arpad Elo's scheme even applied to rank
> world football (soccer) teams, and it's surprising how well the system
> seems to work (Brazil was #1, as expected, and the other familiar
> winners were in the top 10, even Greece, which won the Euro
> Championships in 2004 and were considered a 'surprise team', in fact
> was ranked at the time in the top 10 by this system, so the 2004
> championship wasn't quite that big an upset, not unlike Euwe's victory
> over Alekhine was not that big an upset because in fact Euwe was quite
> a good player, albeit obscure).
I'm not familiar with the soccer ratings which you speak of, but many ratings
of this form consider factors *other* than the result. Famously,
statisticians have analyzed in great detail what wins games in major
league baseball. While certainly results tell us something, it is absurd to
think
that W/L results are the *only* things with predictive value.
>
>> I have previously expressed belief in the theory that "move rating" will
>> eventually surpass "result rating" as the gold standard measurement of
>> chess skill. This is a small first step, but there is much work left to
>> do.
>
> Agreed, though I doubt move rating will eventually surpass 'result
> rating' anytime soon, unless we develop a very powerful PC (which is
> possible) and/or a quantum computer that can instantly solve the
> entire chess tree to give a 'perfect' verdict on every move (i.e., the
> move, with perfect play will end in victory, defeat or a draw).
This is not at all necessary. Most people conveniently overlook
that, in practice, the conditions required for ELO rating are rarely
met. Move rating becomes useful when its errors are smaller than
those already present in result rating. The key advantage of move
rating is that you get much more information per game, so in theory
get better ratings faster.
Not true at all. Crafty could easily tell you which programs far
stronger than itself played the most perfect chess. This is not
debatable. For instance, the winning program between two chess
programs playing each other by definition will produce at least one
less error than the losing program--and Crafty could, at some point,
appreciate this.
The only way you can get around your erroneous statement is to qualify
"properly" in "properly analyze". If you mean that it is better to
have an even stronger chess program than Crafty to better ("properly")
rate the champions, of course you're right and nobody would disagree
with you. But that doesn't mean Crafty's efforts are of no value.
Perhaps with a 'properly' written program you might have, in a close
tie, a switch between two players say tied for fifth place in the
pantheon of all-time champions
RL
Thanks David Kane. My speculation about what you thought was bogus
about the current Elo rating was then largely correct--seems like it's
the sample size being too small--for a moment I thought you had some
other special insight and/or were a crank. I still find ELO quite
useful when done in a normal distribution--play more games, and you
lower the error rate. Of course if you don't play often then you can
have an erroneous rating.
BTW here is the list of soccer (football) ELO ranked champions:
http://www.eloratings.net/world.html
RL
Oh. Well, if you put it like that, goodbye.
Dave.
--
David Richerby Swiss Apple (TM): it's like a tasty
www.chiark.greenend.org.uk/~davidr/ fruit but it's made in Switzerland!
In principle,a system that truly rated the moves wouldn't have
to use the result at all. For example, if a 1900 played like a 1700
but beat a 1300 playing like a 1500, then the winner would lose
points and the loser would gain them! Of course, some hybrid
such as that used in soccer could also have merit.
Geez, don't be so sensitive, I was only flaming you.
RL
Interesting, now I see where you're going. So, to further amplify, if
you fail to find the best move possible and, in a mating net, fail to
mate your opponent in the most efficient manner possible, you could in
theory lose points or not win as many points as somebody who mates in
X moves, as opposed to mating in 2X moves. I suppose this is
analogous to losing Elo points if you fail to win, but only draw,
against an opponent who is much weaker than you.
RL
That's an example but likely not a good one. It depends on how strongly
finding the most efficient mate is corrolated with players who win more.
The moves with the most predictive value (I'd guess) would be those
where the outcome hangs in the balance - better players will find them,
weaker players won't. But it's something that would have to be
determined empirically.
With ELO, you gain points if your performance is better
than expected by your rating. With move rating, you'd gain
points if your moves are better than expected by your rating.
I will also comment on the "Tal" argument that players
can play objectively bad moves on purpose in order to
increase their winning chances. This isn't a fundamental
objection at all - all that it means is that the analysis function
will have to be more complex. I.e. if it is determined that
playing objectively inferior moves really gives
superior winning chances, then we'd just have to quantify what
it is about those moves that makes them work, and then adjust
the calculation accordingly to give credit for them. For example,
in the lower levels of scholastic chess if you arrange your Queen
and Bishop in a battery aimed at the castled King, there is a pretty
good chance that it will pay off with a win. The 20-ply "best play"
analysis of the moves has no more relevance to the game than
the phase of the moon. Perhaps at that level we don't need Crafty
analysis at all: we can have just two dimensions: "threatens mate
in 1" and "hangs pieces."
The trouble with your system is that you can play a "beautiful" game,
full of profound moves, then, like Kramnik did against the computer
last year, miss a mate in one and lose the game. But your rating
would go up under your system (if you played, sans the one losing
move, 'above' your level).
I'm not sure the average person will understand this proposed system.
RL
Huh? Players at Kramnik's level almost never miss mates in one.
That is an indication of playing *way* below his level. Beautiful
GM moves are normal for strong GMs. They're expected and hence
don't change the rating.
> Not true at all. Crafty could easily tell you which programs
> far stronger than itself played the most perfect chess.
Wrong.
> This is not debatable.
Wrong again.
> For instance, the winning program between two chess
> programs playing each other by definition will produce at least one
> less error than the losing program
And again. You are completely ignoring the *magnitude* and
severity of these errors.
> The only way you can get around your erroneous statement is to qualify
> "properly" in "properly analyze". If you mean that it is better to
> have an even stronger chess program than Crafty to better ("properly")
> rate the champions, of course you're right and nobody would disagree
> with you.
Still wrong. There are those who will always refuse
to admit that a computer program has sufficient chess
"understanding" to rate the world champions, though
they are gradually declining in numbers.
IMO, the fastest way to make progress here would
be to utilize the very strongest programs for this sort
of game analysis, and give them plenty of time to
look at each position -- far more than the players had.
It is also good to make full use of endgame tablebases.
One more thing: a trio (for instance) of the top-rated
programs, working in tandem, might well do a better
job of evaluating such games than any single program,
because there would be fewer oversights/misjudgments
where the program mistakenly penalizes a good move
which it simply cannot fathom.
-- help bot
I was replying to this assertion by David Kane.
I don't think the average Joe will accept a rating scheme where
winning a game will lose you points ("then the winner would lose
points and the loser would gain them!")
RL
>From your other posts bot you clearly show you are not qualified to
answer. This is over your head.
SO you are wrong.
Bye
RL
You should go to where the data takes you without
worrying about consumer acceptance. That said,
games with a 1900 beating a 1300 have almost
zero significance in the ELO system and should
not be expected to have much significance in a move-
rating system either.
BTW, the average Joe does not accept the existing
performance-based rating scheme, if you
judge by the small numbers of people with such ratings.
I suspect a good many of those people without ELO ratings
would be happy to have their moves rated, esp. if they
see that it gives accurate answers quickly. That number
might very well dwarf the number who currently
"accept" the ELO rating scheme.
You are aware I hope of the theory behind voting, and that game
theorists have determined there is no such thing as a 'free and fair'
vote scheme where more than two parties exist? So you cannot say
"where the data takes you" since your scheme is normative, not based
on the laws of mathematics or science.
> That said,
> games with a 1900 beating a 1300 have almost
> zero significance in the ELO system and should
> not be expected to have much significance in a move-
> rating system either.
>
> BTW, the average Joe does not accept the existing
> performance-based rating scheme, if you
> judge by the small numbers of people with such ratings.
> I suspect a good many of those people without ELO ratings
> would be happy to have their moves rated, esp. if they
> see that it gives accurate answers quickly. That number
> might very well dwarf the number who currently
> "accept" the ELO rating scheme.
You suspect. And I suspect otherwise. Elo's scheme is the granddaddy
of rating, and FIDE with Pres. Ill's of Soviet Muslimlands millions is
promoting ELO, not your speculative scheme. But good luck and I wish
you well. You should get a programmer to write a slick interface and
then market your scheme as shareware--it must might become a fun way
of getting an 'instant' rating from the scholastic crowd. Witness
those books that sold well that purported to rate you (Elo) based on
how accurately you solved certain chess positions. People nowadays
don't want to play 30 plus games before getting an Elo rating (the
minimum number of game needed to get a less than 5% error rate I
believe), so if your scheme correlates well with traditional Elo
rating schemes, as you imply it does, then it's a good way of getting
an 'instant' rating.
RL
>>
>> You should go to where the data takes you without
>> worrying about consumer acceptance.
>
> You are aware I hope of the theory behind voting, and that game
> theorists have determined there is no such thing as a 'free and fair'
> vote scheme where more than two parties exist? So you cannot say
> "where the data takes you" since your scheme is normative, not based
> on the laws of mathematics or science.
>
Ratings have predictive value that can be measured objectively.
That in reality ratings are administered with consumer acceptance
in mind rather than accuracy doesn't change that. Many of
the ranking systems of which I am aware (for example, you gave
the soccer example) don't restrict themselves in the way that
chess does. Consumer acceptance doesn't seem to suffer.
Not only is it debatable, it's not true.
> For instance, the winning program between two chess
>programs playing each other by definition will produce at least one
>less error than the losing program--and Crafty could, at some point,
>appreciate this.
Er, how? If Crafty is less able than the losing program, how
can it reliably see the error the losing program couldn't?
>
>The only way you can get around your erroneous statement is to qualify
>"properly" in "properly analyze". If you mean that it is better to
>have an even stronger chess program than Crafty to better ("properly")
>rate the champions, of course you're right and nobody would disagree
>with you. But that doesn't mean Crafty's efforts are of no value.
No, they simply aren't of enough value.
>Perhaps with a 'properly' written program you might have, in a close
>tie, a switch between two players say tied for fifth place in the
>pantheon of all-time champions
>
Perhaps a 'properly' written program would completely rewrite the list
because Crafty's analysis was inadequate.
--
Christopher Mattern
NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities
That old adage is very useful for instructing newbies, but
every world chess champion is competent in this area, so
the multitude of positional misjudgments comes to the fore.
> > You also penalize those players who *deliberately* chose
> > to play what they knew to be sub-optimal moves, for
> > whatever reason. I just did this myself at RedHotPawn,
> > choosing to grab a Knight rather than leap in with another
> > piece to set up a 95%-certain mating net. Why? Because
> > while the mating net was around 95% certain, the capture
> > of the free piece was 100% certain (unless I have lost my
> > mind)! When I spot another mating net, things should be
> > simple enough for me to get the 100% certainty I desire,
> > and having captured yet another piece, this is all but
> > inevitable, barring my opponent's resignation.
>
> But you risk the chance of letting your opponent escape--remember the
> maxim: "always check, since the next move may be mate".
What you are missing is this: the reason hopping another
Knight into the fracas was not a 100%-certain mate was that
it wasn't a check, but a quiet move. OTOH, capturing the
free piece was a 100% certain massive gain. With my own
King safe, there was virtually no risk of "escape". BTW, I
was not able to execute a mate because my opponent quickly
resigned after giving up (in addition to the free piece) the
exchange to slow me down a bit. He had no counter play
and the material deficit was continuing to mount.
> Just recently
> I did not follow this move and instead of winning a pawn against my PC
> I drifted and eventually lost.
That was *you*. You are a drifter, a patzer, while
I am a "star".
> > Another item which these statistical analyses overlook
> > is the deliberate gift of, say, a half-point. These have
> > been known to occur in world championship level play,
> > and of course the "nice guys" will be penalized for not
> > being "tough players", despite clinching the match
> > with their action.
>
> Keep in mind this was not a statistical analysis of the kind Sonas is
> famous for, but a different kind. Also over time the "nice guys"
> penalty will statistically average out.
No, it won't. As someone once said: there are nice
guys and there are tough players. The "nice guys" tend
to remain "nice", while the tough players tend to go insane,
getting meaner and even more self-obsessed.
> > In short, what can be learned is who was least prone
> > to tactical blunders, and apparently, whose style leans
> > most toward a sizable gap between what the program
> > sees as the #1 optimal move, and #2 -- something I
> > think may be termed the sharpness of play. For one
> > example, I am playing a game at RedHot now where
> > I had to decide whether to develop my QB "normally"
> > via ...d6 and then B-moves somewhere, or fianchetto
> > via ...b6 and B-b7. It was a toss-up, since it makes
> > no difference whatever to the outcome. I expect a
> > computer would see both moves as being nearly
> > equal, weighing them in such a way as to slightly
> > favor the move which gives the Bishop immediate
> > control of squares, though this immediacy is quite
> > irrelevant to the true value of the moves.
>
> Again, over time this will "wash out" or "average out".
No, it won't. Chess programs are written to penalize
certain aspects while rewarding others (such as mobility,
for instance). There is no averaging-out, but rather the
semi-flaw will manifest itself again and again, ad infinitum.
A chess program is a bit like a doorbell: press it and it
makes the same sound *every time*.
Let me give a little example here. In a game against
GM Petrosian, GM Fischer singled out the move Knight
on f3 to d1 as being one which set the champion apart
from other GMs, making quite a fuss. Of course, a
chess program like crippled-Crafty might very well see
this same move -- many plys beforehand -- as a retreat
which temporarily gives up control of the vital central
squares. In other words, a program is crippled by its
inflexibility in terms of depth of search, while a human
is crippled by his inability to "see everything obvious"
all the time.
> In general
> sharp play is better than just pushing yourself into a passive
> position, don't you think?
Sure. Since my goal is to win, I dislike dead positions,
and passive ones tend to drag things out to the point of
boredom. I don't want to win at move 123; I want to win
quickly, or at least as quickly as reasonably possible.
> That's what Crafty is looking for--sharp
> play. Sharp play = sharp mind bot!
Crippled-Crafty is not capable of accurately assessing
what is the sharpest move in games at that level. For
one thing, the 12-ply cutoff means that in many positions,
the program is not even in the same league as those it
is attempting to assess. However, if in addition to those
12 plys, it adds a bunch more for tactical search
extensions, that would mean it can perhaps do a good
job on just the tactical exchanges. As I understand it,
the stated goal was not to merely judge which world
champions were least prone to gross tactical blunders.
If that had been the stated goal, there would never
have been so much criticism.
> > I wonder just how much time, and to what depth
> > the moves were analyzed before scoring them. I
> > recall that often a player's move may be scored poorly,
> > but if executed and stepped forward, a program may
> > change its mind completely about this, suddenly
> > realizing it had overlooked something.
>
> No, you're talking about "move on opponent's time" feature. The way
> the study was done was to analyze each move for a fixed time,
Um, the article I read (by following the links provided here)
stated that the search was cut off at exactly 12 plys. This
is not the same as a fixed-time search at all.
> so no "changing of mind",
FYI: in the famous match where world champion
Kasparov lost to Deeper Blue, in one game (at least)
the program sacrificed a pawn for whatever it thought
it saw, but then immediately changed its mind, going
into defensive mode due to the horizon effect. I am
telling you this because Deeper Blue was about a
bazillion times faster than other programs of the time,
and yet it still managed to lose due to a problem
which has plagued computers since the dawn of time.
Crippled-Crafty at 12 plys is hardly immune.
> and even if so, each player had the same scoring
> applied, so it doesn't really matter (over time).
What you are missing here is that only the games
of the world championships were scrutinized, so for
some, the sample size was quite small. I would
prefer a *large* sample size when making excuses
about how it all evens out in the end.
> Besides, have you
> noticed that _MOST_ of the time (not always) the best move found by
> Fritz or Crafty in the first five seconds is also the best move found
> after 60 seconds?
Yes. And this phenomenon is not limited to chess
programs. The same flaw can be found in shallow-
thinking human players, who are unable to improve
on their first guesses no matter how much time they
are given. IMO, the ability to guess well yet also to
improve on one's first guesses is the mark of a good
player. If a computer cannot do this, odds are it is
because it lacks positional understanding.
> Because chess is 99% tactics, and often the tactics
> are no more than 4 moves deep (most of the time).
You seem to be stuck at the beginner level, where
indeed, chess is 99% tactics, and little else matters.
> RL (a 1950 Elo player, so I can speak with some authority).
The world champions have many games which were
decided by tactics, but they also have many where
strategy was the decisive battlefield, and many where
both tactics and strategy played key roles. In view of
this I think it would be wise to at the very least, make
use of the strongest program available, and give it
plenty of time to assess each position. Additionally,
I would like to see as many top-level games as
possible included, including those from tournament
play -- not just world championship matches.
Even if all this were done, it would still be a simple
matter to skewer the idea of determining "the greatest
player of all time" in this manner. I could easily
produce an example where playing a bad move was
not only intentional, but necessary in order to win.
Where a "blunder" is a thousand times more effective
than the "best" move. Where the room is filled with
gossip about a certain player having allowed his
inferior opponent a certain-lock on a draw, but
where they all have to re-figure the pairings when the
actual result is posted.
-- star bot
> > > Not true at all. Crafty could easily tell you which programs
> > > far stronger than itself played the most perfect chess.
>
> > Wrong.
>
> > > This is not debatable.
>
> > Wrong again.
> >From your other posts bot you clearly show you are not qualified to
>
> answer. This is over your head.
>
> SO you are wrong.
What are you still doing here, after you already lost this
thread? :>D
The idiot statisticians who had fun playing around with
math which sort of related to chess were electrocuted
long ago by their own readers! Those readers used
something called "reason" to pinpoint a few of the many
huge problems with these articles. Reason is something
beyond your grasp, but then, so too is chess. :>D
-- help bot
In particular it will fail to spot moves where short to medium term
material gain is obtained at the expense of a losing long term
positional disadvantage that only shows up well beyond its terminal
node search horizon. The commercial programs with more aggressive
pruning and the largest range of positional heuristics at the moment
have the edge. And even then it is probably safer to use an ensemble
of the strongest programs guided by an independent GM to try and
analyse top players games for "perfection" meaningfully. All of the
engines have blind spots in certain positions.
> > To properly analyze the world champions
> > you would need to use a program that is atleast equal in strength to
> > these champions. Crippled Crafty just doesn't cut it... now if they
> > used Rybka for analysis I wouldn't have any problem
> > with the study.
>
> > J.Lohner
>
> Not true at all. Crafty could easily tell you which programs far
> stronger than itself played the most perfect chess. This is not
> debatable.
Not necessarily. It might take Crafty an interminably long time to
spot that a certain capture leads to a situation where a pawn will
promote 30 or more ply into the future. Whereas an engine with more
sophisticated pruning and a heuristic for detecting "pawn can run"
patterns might see the ultimate outcome with a less than 20ply search.
I use this only as an example (I think Crafty is somewhat smarter than
this).
And I am not being rude about Crafty here. It does a lot better at
blocked pawn positions than Fritz8 which takes fully 15 minutes on top
end PC hardware to see into the classic puzzle quoted in Roger
Penroses book the "Emperors New Mind" as the sort of position
computers "will never understand". At the time he wrote the book in
1994 it was inconceivable that a program could search deeply enough or
understand that grabbing the rook and breaking up the protective pawn
barrier would lead to total disaster about 20 moves in the future.
Shredder10 now uses the position as a demo solved in <2s.
>
> For instance, the winning program between two chess
> programs playing each other by definition will produce at least one
> less error than the losing program--and Crafty could, at some point,
> appreciate this.
Apart from the fact that Crafty may not be able to see far enough into
the future to match the equivalent search depth of top commercial
engines or GMs there is another serious fault in your reasoning. The
only thing that you can say for certain when one program beats another
is that the losing side made a mistake that the winning program could
recognise and then exploit to its advantage. Or equivalently the
winning side saw something through selective extensions that the other
did not. Either side could have made many less than ideal moves up to
that point provided that the other was unable to extract any advantage
from it.
> The only way you can get around your erroneous statement is to qualify
> "properly" in "properly analyze". If you mean that it is better to
> have an even stronger chess program than Crafty to better ("properly")
> rate the champions, of course you're right and nobody would disagree
> with you. But that doesn't mean Crafty's efforts are of no value.
Only that they are likely to be highly misleading in the situation
where the GM and/or a stronger program can see why the most obvious
strong move is not the best principle variation for gaining long term
advantage.
For detecting classic human blunders that can occur in any game any
decent chess engine will do. But if you seek to find the notional
"best" or "strongest" move in a given position you are first going to
have to define exactly what you mean by best or strongest. Some
positions may have several perfectly playable continuations that are
equivalent to within the noise on the evaluation function - even
though the program might still give them slightly different scores.
Playing against a much stronger player the continuation line most
likely to hold a draw has clear merit, whereas playing against a much
weaker player the one leading to a slightly risky quick win may be
perfectly OK.
> Perhaps with a 'properly' written program you might have, in a close
> tie, a switch between two players say tied for fifth place in the
> pantheon of all-time champions
I am not convinced that scoring human GMs by how closely their play
resembles any particular named chess engine has merit. Perhaps ranking
them by percentage blunder rate might be meaningful though (and well
within the capability of any good chess engine). It is surprising how
effective blunder check can be even on GM level games given sufficient
time.
Regards,
Martin Brown
What do you mean by `percentage blunder rate'? The proportion of the
time that the GM plays a move that the engine thinks is, say, more
than one pawn worse than the best move? How does that make a
difference?
Dave.
--
David Richerby Frozen Book (TM): it's like a romantic
www.chiark.greenend.org.uk/~davidr/ novel but it's frozen in a block
of ice!
I wasn't being sensitive. I was realizing that I have better things
to do with my life than read your Usenet posts.
Dave.
--
David Richerby Crystal Perforated Boss (TM): it's
www.chiark.greenend.org.uk/~davidr/ like a middle manager but it's full
of holes and completely transparent!
That would probably do as a rough working definition. The search depth
or time might also need to be specified.
If a move is sufficiently far off the mark then the engine is probably
right to fault it. I reckon 100cp ought to be a wide enough window to
avoid too many false positives.
> How does that make a
> difference?
Unforced tactical errors play their part in the outcome of games. And
these are precisely the sorts of thing that computer chess engines are
very good at spotting. Subtle long term structural games are much
harder for them to score.
Regards,
Martin Brown
Sure.
>> How does that make a difference?
>
> Unforced tactical errors play their part in the outcome of games.
> And these are precisely the sorts of thing that computer chess
> engines are very good at spotting. Subtle long term structural games
> are much harder for them to score.
So you're suggesting that ``Player X makes a one-pawn blunder in n% of
games'' is a better measure than ``Player X, on average scores n cp
lower per move.'' That does sound like a reasonable statement, though
I do worry that sacrifices of pawns are relatively common and might
still be mis-evaluated quite often. Kasparov used to sacrifice a pawn
for long-term initiative faster than you can say, ``My computer thinks
that's a pretty dodgy move.'' :-)
Do you have any guess (or, shock!, data) on how often errors occur in
WC games that an engine (given reasonable time) would score down by
say 100cp?
Dave.
--
David Richerby Swiss Old-Fashioned Atom Bomb (TM):
www.chiark.greenend.org.uk/~davidr/ it's like a weapon of mass destruction
but it's perfect for your grandparents
and made in Switzerland!
>
> >> To properly analyze the world champions
> >> you would need to use a program that is atleast equal in strength to
> >> these champions. Crippled Crafty just doesn't cut it... now if they
> >> used Rybka for analysis I wouldn't have any problem
> >> with the study.
>
> >> J.Lohner
>
> >Not true at all. Crafty could easily tell you which programs far
> >stronger than itself played the most perfect chess. This is not
> >debatable.
>
> Not only is it debatable, it's not true.
No it is true.
>
> > For instance, the winning program between two chess
> >programs playing each other by definition will produce at least one
> >less error than the losing program--and Crafty could, at some point,
> >appreciate this.
>
> Er, how? If Crafty is less able than the losing program, how
> can it reliably see the error the losing program couldn't?
>
Easy. The evaluation function of Crafty will indicate that the losing
program, which we've said is much stronger than Crafty, scored, over
the length of the game, worse than the winning program.
To give a simple example: two programs, A and B, both much stronger
than Crafty, play a slugfest game that extends over 100 moves. Play
is evenly matched, and Crafty scores both programs about the same up
to this point. However, at the 101st move, program A sees a winning
10 move combination--that happens to be a mating net-- that is just
outside the 8 move horizon of program B. Program A enters into the
combination and after say the 5th move, Crafty, with a mere five move
chess horizon, also "sees" the winning combination. Of course program
B also has seen this combination wins after the second move but let's
say is programmed with a contempt factor not to resign but to play to
the end. Program A checkmates program B after the 10 move
combination. Crafty will reward Program A and penalize Program B for
this play, even though it is much weaker than either program A or B.
> --
> Christopher Mattern
>
Do you really work for Sun? What a disaster that stock has been.
Back to work for you.
RL
> Do you have any guess (or, shock!, data) on how often errors occur in
> WC games that an engine (given reasonable time) would score down by
> say 100cp?
I will say, that in my own practical experience, running through games.
That in the same, and not unusual positions, that Fritz 8,9,and 10
have evaluated positions over 100cp different than Rybka 2.3.1 And
that different moves have been suggested.
That alone should provide enough of a question as to the results here.
The fact is that we don't know when the engines will be strong enough to
represent the "truth".
I will say that I do not use Crafty for day-to-day analysis so I don't
have an opinion other than that you need to remember in ELO that the
difference between 2500 and 2800 is vast, and the difference between
2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think
of it as TWICE as good. Or more likely to win MOST of the time. It is
a HUGE difference.
In theory, the engine being too strong could be a source of error
in the analysis, as much as the engines being too weak could.
For example, the best move leads to a win in 20 moves based on
a complicated calculation that no human considers. The second
best move wins more slowly but in a way that strong GMs might be
able to see.
Player makes the best move (for the wrong reasons) overlooking the
alternate way to win. That's evidence of weaker, not
stronger, play.
This happens all of the time if you look at scholastic games. Crafty
sees the win of a rook at 8-ply and deems it superior to winning
a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the
game because the kids are not able to calculate that deeply.
Specifically, an Elo-rating gap of 300 points (and it's the difference
that's significant so, yes, 2800 vs 3100 gives the same results as
2500 vs 2800, gives the same results as 1100-1400) corresponds to the
stronger player being expected to score roughly 85%. The approximate
values are tabulated below, by approximating the real data on FIDE's
website[1].
Rating diff. Score
----------------------
600 99%
500 96%
400 92%
300 85%
250 81%
200 76%
150 70%
100 64%
75 60%
50 57%
25 54%
Dave.
[1] http://www.fide.com/official/handbook.asp?level=B0210
Beware that the table is rather hard to read as the columns are
too narrow. The expected score is to the left of the rating
difference in each case.
--
David Richerby Poetic Toy (TM): it's like a fun
www.chiark.greenend.org.uk/~davidr/ child's toy but it's in verse!
The point of this WHOLE argument was comparing WORLD championship skills
throughout the ages by comparing play to Crafty.
I point out that the two strongest programs can be worlds apart, even by
the magic 100cp measure in the same common positions.
That people on the surface get confused by the huge and substantial
difference between ~3100 and the 2500 quoted for Crafty, and that it is
much farther than they would imagine.
And you state that in this world championship case. The case through
the ages. Is that the software could be too strong, and you use
scholastics to try and prove that.
I just can't give it to you here. You might have an argument is some
other argument with a different set of facts. But it just has nothing
to say here.
David Kane wrote:
\
\
Thank you, a complete explanation like this would be a good FAQ item.
(Is there a FAQ?)
I think this is important when looking at things like the computer
rankings so you can understand how measurably stronger than the field
Rybka is. And how far behind Crafty is is.
It is substantial, and gives tremendous credence to the argument that
the engine is substantially too weak to answer these questions in the
survey. Even if the questions are worth asking.
And I was just trying to add some anecdotal evidence that Fritz and
Rybka are often a 100cp apart in positions, and that value is not a
significant enough measure to say that Crafty is suitable. And that
indeed is even more weight that Crafty is unsuitable.
>
> I think this is important when looking at things like the computer
> rankings so you can understand how measurably stronger than the field
> Rybka is. And how far behind Crafty is is.
>
> It is substantial, and gives tremendous credence to the argument that
> the engine is substantially too weak to answer these questions in the
> survey. Even if the questions are worth asking.
>
> And I was just trying to add some anecdotal evidence that Fritz and
> Rybka are often a 100cp apart in positions, and that value is not a
> significant enough measure to say that Crafty is suitable. And that
> indeed is even more weight that Crafty is unsuitable.- Hide quoted text -
>
> - Show quoted text -
Once again you, and others like you, fail to understand what
normalization of results mean. You do not have to find the 'best'
chess program to rate human champions--as long as Crafty, a second or
third or fourth best chess playing program, or a not bad chess
program, scores everybody the same. That is normalization. In fact,
the biggest potential problem with Crafty is that (without knowing how
it works, I'm guessing) it might have a random number generator for
picking the best move out of a series of candidate moves that uses a
different 'seed' for the rand(), which means it might not score the
identical position the same way two times in a row, since it will pick
a slightly different move if the random number generating seed is
different (often this seed is the system clock, or the last keyboard
key the user pressed). One way to stop this in computer programming
is to make sure the 'seed' never changes. Without knowing how Crafty
is coded I can't tell you if this is an actual problem, but I sense
intutitvely that even if such a problem exists, most of the time it
won't make a big deal in the normalization since most of the time
candidate moves are reasonably close to one another in efficacy.
A larger question looms from this thread: have you people not learned
anything after nearly a generation of computer chess? That the 'puter
is never wrong? (with a few exceptions, that prove the rule) My gawd,
you people act like those philosophers in the 1960s that said
computers will never win in chess because a chess program cannot be
stronger than the person who wrote the program. Idiotcy! My next
thread will be cross-posted to alt.young-earth and alt.creationism if
this ignorance keeps up.
RL
> You do not have to find the 'best'
> chess program to rate human champions--as long as Crafty, a second or
> third or fourth best chess playing program, or a not bad chess
> program, scores everybody the same. That is normalization.
Sure, you can make this evaluation.
The problem is when you then judge what that evaluation means.
If Crafty scores Karpov as better than Tal, does that mean Karpov was a
stronger player than Tal?
Absolutely not.
And yet the title of this thread implies that's exactly what people are
using Crafty for.
The problem isn't in creating an objective measure. The problem - as
with so many statistics - is that given this measure, it appears that
some people in this thread have no idea what it really means.
It would be interesting, for example, to take a decisive match between
two players of different styles, won by the bigger risk-taker (say,
Capablanca-Alekhine, or Tal-Botvinnik 1, or one of the decisive
Kasparov-Karpov matches) and run them through this Crafty evaluation,
and see how just those games measured it. I suspect that Crafty might
judge the more conservative player (Capa, Botvinnik, Karpov) as "better"
despite the fact that he lost the match (but I don't have the tools to
test this out. Does anyone?)
It's very easy to hypothesize a situation where Crafty gives a better
score to a player who loses a game than it does to the player who wins
the game. This non-trivial flaw doesn't invalidate the Crafty rankings,
but it does punch a big hole in the notion that they accurately reflect
who's stronger.
-Ron
You do realize that what makes a strong program is that the moves they
make are "different". It is precisely the quality of that difference
that shows the strength.
That the fact that Rybka has some sort of score that has some
ridiculously high elo vs crafty, that it plays many many things differently.
But seriously, let that set in. THE MOVES ARE DIFFERENT in the same
position. Not a little, or rarely, but often, and results wise, more
correctly.
So if you ask questions based on the move, you are asking questions of
an engine that while good, is not world champion class. Of course they
will be wrong. And it is amazing how many times you find different moves.
Wild Blunders I suppose can be measured. But there is enough question
because of the tool, and it is easy to demonstrate those differences,
that it calls this into question.
Simply they had another option, if they wanted to change the tool. They
could have done the same thing, but instead of manipulating crafty, they
could have manipulated say Arena or Winboard. Then the questions could
have been asked to Crafty, Fruit, Toga, Shredder, Rybka, and other VERY
strong UCI engines. The Differences between the engines in style and
strength and the world champions would have made for a much more
interesting set of answers, and would have killed this argument before
it started.
But from the last paragraph, maybe you were joking about the whole thing
here, and you are jesting in wild agreement.
Perhaps you just don't understand what we are getting at...
In this study they crippled crafty down to 12 ply. Now What we
are trying to get at is that many (if not all of them)
of the world champions saw well
beyond 12 ply. Crafty would mark these moves as Errors, where
a stronger program would note that they were not. This is due to
the playing styles of each of the world champions. Tal and
Alekhine were calculating machines and would often complicate
a position... The crippled crafty would simply mark these as
errors.
It doesn't surprise me that Capablanca came out
as the best in this study because he would often gain a slight edge,
simplifying and using his fantastic endgame skills to win. Even
a crippled crafty would not mark these as errors.
> A larger question looms from this thread: have you people not learned
> anything after nearly a generation of computer chess? That the 'puter
> is never wrong?
lol but different programs are definately wrong. I have used
several programs to examine my games from Chessmaster 10k,
Fritz 7, Fritz 8 and Rybka. Its amazing to see the difference
in what each program believes is 'right'. Guess which one
it closest to chosing the 'right' answer? I put my money on the
strongest program.
> > >Not true at all. Crafty could easily tell you which programs far
> > >stronger than itself played the most perfect chess. This is not
> > >debatable.
>
> > Not only is it debatable, it's not true.
>
> No it is true.
No, it's not. Would you like to debate the point?
> > > For instance, the winning program between two chess
> > >programs playing each other by definition will produce at least one
> > >less error than the losing program--and Crafty could, at some point,
> > >appreciate this.
>
> > Er, how? If Crafty is less able than the losing program, how
> > can it reliably see the error the losing program couldn't?
>
> Easy. The evaluation function of Crafty will indicate that the losing
> program, which we've said is much stronger than Crafty, scored, over
> the length of the game, worse than the winning program.
So you say. How about some hard evidence?
> To give a simple example: two programs, A and B, both much stronger
> than Crafty, play a slugfest game that extends over 100 moves. Play
> is evenly matched, and Crafty scores both programs about the same up
> to this point.
Perhaps; perhaps not.
> However, at the 101st move, program A sees a winning
> 10 move combination--that happens to be a mating net-- that is just
> outside the 8 move horizon of program B.
Hold on there! If the game was a tactical slugfest, as
you said, then how on earth did the dumb program ever
manage to hold its own against the deeper-sighted one
for 100 moves? This seems rather unlikely.
> Program A enters into the
> combination and after say the 5th move, Crafty, with a mere five move
> chess horizon, also "sees" the winning combination.
Unless the game is being scored backwards, from
end to beginning, this means that Crafty would have
penalized the winning program *five times* for a move
which won perforce! Until it "sees" the mate, none
of the moves of the combination make any sense to
a patzer.
> Of course program
> B also has seen this combination wins after the second move but let's
> say is programmed with a contempt factor not to resign but to play to
> the end.
Things are getting uglier all the time. Now, not only
is the dumb program so lucky as to somehow survive
a tactical slugfest for 100 moves, but in addition, it did
so despite the handicap of a contempt factor which of
course distorts its meager vision. How likely is that?
> Program A checkmates program B after the 10 move
> combination.
This statement is the only part of your example so
far which makes any rational sense.
> Crafty will reward Program A and penalize Program B for
> this play, even though it is much weaker than either program A or B.
Whoopie. So it got lucky at the very end.
Instead of rationalizing or "justifying" the use of
a weak program like crippled-Crafty to judge the
quality of play of the world champions, why not
simply admit that it was quite unnecessary in
view of the fact that there now exists a far
superior program, which is widely available. In
order to do this sort of thing with most players,
just use any modern computer and any strong
program. But in order to do it with the world
championships, get a FAST computer and the
TOP program, put lots of memory in the
computer and give it lots of time to think. So
simple!
-- help bot
> > That alone should provide enough of a question as to the results here. The
> > fact is that we don't know when the engines will be strong enough to represent
> > the "truth".
Sure we do. It will happen gradually, as the endgame
table bases grow to include, first, all of the end game, and
later, the late middle game, and so forth.
> In theory, the engine being too strong could be a source of error
> in the analysis, as much as the engines being too weak could.
> For example, the best move leads to a win in 20 moves based on
> a complicated calculation that no human considers. The second
> best move wins more slowly but in a way that strong GMs might be
> able to see.
> Player makes the best move (for the wrong reasons) overlooking the
> alternate way to win. That's evidence of weaker, not
> stronger, play.
Only if you dump understanding/motive into the formula.
As I see it, the way things were done is that every game
was judged, move by move -- not plan by plan. The whole
point was to be as objective as possible.
> This happens all of the time if you look at scholastic games. Crafty
> sees the win of a rook at 8-ply and deems it superior to winning
> a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the
> game because the kids are not able to calculate that deeply.
But you can't determine which is the stronger
player by adjusting to their weaknesses. You
must remain objective, unbiased. (This seems
to be why game results are used, rather than
any voting on the quality of play). No matter
how weak or how strong, we ought to take the
results straight, with no sugar-coating.
If we wish to do a purely subjective analysis, that
is another matter.
> > I will say that I do not use Crafty for day-to-day analysis so I don't have an
> > opinion other than that you need to remember in ELO that the difference
> > between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is
> > as vast. It is not 10% better, it is closer to think of it as TWICE as good.
> > Or more likely to win MOST of the time. It is a HUGE difference.
Yeah, yeah -- that's what they WANT us to believe!
But we all know that in that game where world champion
Kramnik allowed mate-in-one on himself, not one of us
would have been so daft. (Don't take my word for it -- go
to GetClub and look at my games. Not ONE overlooked
mate on the move.) And in the match where Deeper Blue
defeated GM Kasparov, which ought to have put it in the
vicinity of almost 3100, it still made daft errors, now and
then. One game saw the computer recklessly leaving its
King wide open to a perp. while winning, and another
showed the notorious horizon-effect resulting in the simple
giveaway of a free pawn (and with it, the game).
IMO, in order to more accurately visualize what we
think of as perfect chess, we need to set the bar well
above the 3100 mark -- perhaps 4 or 5 thousand will
do *for now*.
And in terms of ratings, the difference between
2800 and 2500 is 300 points -- precisely the same as
between 1800 and 1500. The real difference here is
not in the vastness of the gap, but in the difficulty of
getting from point A (2500) to point B (2800). It's a
bit like climbing Mt. Everest, whereas going from
1500 to 1800 is more like climbing a tree and then
jumping over to the rooftop while barefoot.
IMO, the authors I saw unwisely sacrificed quality
of analysis for the sake of repeatability, which merits
the term pseudo-science.
-- help bot
For someone that is attempting to live in the "truth" here. How about
instead of stating stuff like "the late middle game", just for giggles,
how many *peta* bytes do you think that will be?
> I will try not to laugh too hard.
As a reward for having refrained from laughing, the
elders have decided to award you "a win" of this thread.
You are now free to go to other threads and brag
about having "won" this one. If you can win just two
more threads before the deadline, you can cash them
in for a shot at what's behind door #1, #2, or #3.
(Unfortunately, what is behind these doors is just
more of the same drivel you already found here.)
-- help bot
Woo Hoo!
> (Is there a FAQ?)
No, there is no FAQ. In fact, this is perhaps the
single most often asked question, and as such it
is the very first one answered in the FAQ .
> I think this is important when looking at things like the computer
> rankings so you can understand how measurably stronger than the field
> Rybka is. And how far behind Crafty is is.
As I have noticed over the years, the status on the
computer rating list *changes* over time.
For instance, at one time there was a big difference
between chess programs from say, 1980, where now
all such programs are "compressed" near the bottom
of the current list. Old magazine ads might list a
Mephisto at 2200, and a Fidelity at 1800, while now
you could find both programs having been beaten to
a pulp by their successors, scrunched together at
say 1900 and 1650. By the same token, I would
expect Rybka's now substantive lead to soon begin
to evaporate slowly, once another program comes
along which can draw or beat it.
The computer vs. computer rating lists have certain
advantages, but they also provide little in the way of
information as to how well a program would handle
humans, relative to one another. The one thing we
can be sure of is that if Rybka can squash all the
other top programs, it simply cannot be weak.
-- help bot
>
> The computer vs. computer rating lists have certain
> advantages, but they also provide little in the way of
> information as to how well a program would handle
> humans, relative to one another. The one thing we
> can be sure of is that if Rybka can squash all the
> other top programs, it simply cannot be weak.
All true. But it will be even harder to find humans to fight, and all
that we know, is that the computer program that tied Kramnik, loses to
Rybka.
Rybka has clearly drawn a new line in the sand, and I am sure that the
major engine designers have a new bar to overcome.
But I gotta say, the whole UCI concept, means that I can do it all in
the interface that I like. Which for me, is the Chessbase one. So I
guess they all win, at least from me.
> >> What do you mean by `percentage blunder rate'? The proportion of
> >> the time that the GM plays a move that the engine thinks is, say,
> >> more than one pawn worse than the best move?
>
> > That would probably do as a rough working definition. The search
> > depth or time might also need to be specified.
>
> Sure.
And in fact the graph for %blunder rate for every player is in the
original article.
http://www.chessbase.com/newsdetail.asp?newsid=3455
That its shape broadly correlates with the rms error graph of the
players lends credence to the possibility that Crafty might have been
adequate for the task. And to be fair to the authors they did say that
others with access to the internals of stronger engines should repeat
their tests to see how they compare.
I would have liked to see the rms error graph with blunders excluded.
That might have shed some more light.
>
> >> How does that make a difference?
>
> > Unforced tactical errors play their part in the outcome of games.
> > And these are precisely the sorts of thing that computer chess
> > engines are very good at spotting. Subtle long term structural games
> > are much harder for them to score.
>
> So you're suggesting that ``Player X makes a one-pawn blunder in n% of
> games'' is a better measure than ``Player X, on average scores n cp
> lower per move.'' That does sound like a reasonable statement, though
> I do worry that sacrifices of pawns are relatively common and might
> still be mis-evaluated quite often. Kasparov used to sacrifice a pawn
> for long-term initiative faster than you can say, ``My computer thinks
> that's a pretty dodgy move.'' :-)
Although that may be true. If the program is analysing in blunder
check mode or classical analysis mode it will know the outcome of the
principle variation actually played as well as for its own
hypothetical better move(s).
> Do you have any guess (or, shock!, data) on how often errors occur in
> WC games that an engine (given reasonable time) would score down by
> say 100cp?
It is in the paper referred to by this thread. Sadly the link to the
original article is broken. Anyone have a full copy?
Capablanca maintained a blunder rate of 0.01% (1 blunder in every
10000 moves) and the worst performer was Steinitz at 0.054% (blunder
every roughly every 2000 moves). These are interesting numbers and
right at the limits of human error rates for purely trivial mechanical
tasks like punch key data entry. It is quite astonishing how low these
are!
So as a rough guide if the average game lasts 40-50 moves (80-100
player actions) less than 4% of them will have their final outcome
determined by a blunder at GM level. So the other 96% of cases clearly
needs study.
It would be interesting to know if the downward march of error rate
with time in GM level play is actually due to improved training
methods or sparring against computers which always seize on any minor
tactical error. There look to be clusters of players from other eras
with similar error rates (and a few notable exceptions).
To put it into perspective commercial programming has an effective
error rate around 1-2% (and in some shops >10% is not unknown). Much
greater than 0.2% acheivable by the best formal development methods.
But competitive GM level chess is more than an order of magnitude more
accurate still.
Human error rates for various tasks are online at: http://panko.cba.hawaii.edu/HumanErr/
Regards,
Martin Brown
There is quite often a systematic difference between the absolute
value of the evaluation function of different engines on a given
position, but the relative difference between alternate continuation
moves is what really matters as far as the decision making goes. Yes a
stronger engine will find new resources, but provided the weaker
engine is given sufficient time it can still make useful insights into
a game. That is my main worry with Crafty here - it doesn't always see
far enough into the future because its tree pruning is a lot more
conservative than Shredder or Rybka.
Usually moves where engines evaluations radically disagree are well
worth investigating to see why.
> That alone should provide enough of a question as to the results here.
> The fact is that we don't know when the engines will be strong enough to
> represent the "truth".
There may not be a "the truth" to be found...only successive
approximations to it given our computational limitations. Like
peeling an onion each time you make the engines an order of magnitude
more powerful or add enhanced heuristics you allow deeper searching of
the game tree that may alter the outcome. However, we have now crossed
the point where the best computer programs are demonstrably better at
match play than humans. Computer aided by a human in freestyle mode
and where the blunder rate from human error is essentially nil is
stronger still.
Working back from the tablebases where absolute knowledge and theorem
proof is possible may allow some further progress, but the storage
requirements and intense computational effort needed even for the
important 7 men tablebases is so great that it is only likely to be
done in a research lab. Having said that in 2 decades the size of
removable storage has gone from 360kb to 2GB (5000x) and consumer
grade hard disks from 10MB to 1TB (100000x). If this trend continues
then affordable PetaByte storage might well be available by 2030.
>
> I will say that I do not use Crafty for day-to-day analysis so I don't
> have an opinion other than that you need to remember in ELO that the
> difference between 2500 and 2800 is vast, and the difference between
> 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think
> of it as TWICE as good. Or more likely to win MOST of the time. It is
> a HUGE difference.
You should also note that in engine vs engine games there is a
tendency for the strongest commercial engines to include a few tricks
in their prepared opening book that exploit known weaknesses in other
engines. This makes it a bit unfair to older engines that are not
heavily maintained and prepared for engine-engine matches.
Regards,
Martin Brown
This is a worry, but you really need to play with Rybka some, to
understand what I am saying. You need to follow several games with
Rybka and your engine of choice. You will find numerous positions
where the programs will disagree violently (over 100cp) over the
favorite moves. (Realize that many times it is not that different).
It is precisely that difference where "strength" lies. Different
engines simply do not come up with the same moves given enough time.
Rybka seems to dramatically show that, and it is dramatically stronger.
>> That alone should provide enough of a question as to the results here.
>> The fact is that we don't know when the engines will be strong enough to
>> represent the "truth".
>
> There may not be a "the truth" to be found...only successive
> approximations to it given our computational limitations. Like
> peeling an onion each time you make the engines an order of magnitude
> more powerful or add enhanced heuristics you allow deeper searching of
> the game tree that may alter the outcome. However, we have now crossed
> the point where the best computer programs are demonstrably better at
> match play than humans. Computer aided by a human in freestyle mode
> and where the blunder rate from human error is essentially nil is
> stronger still.
Thank you. That is my point. But you are trying to determine some sort
of "truth" by comparing to Crafty's hobbled play.
> Working back from the tablebases where absolute knowledge and theorem
> proof is possible may allow some further progress, but the storage
> requirements and intense computational effort needed even for the
> important 7 men tablebases is so great that it is only likely to be
> done in a research lab. Having said that in 2 decades the size of
> removable storage has gone from 360kb to 2GB (5000x) and consumer
> grade hard disks from 10MB to 1TB (100000x). If this trend continues
> then affordable PetaByte storage might well be available by 2030.
Don't let physics get into the way. We will probably be storing stuff
into the strings by then!
>> I will say that I do not use Crafty for day-to-day analysis so I don't
>> have an opinion other than that you need to remember in ELO that the
>> difference between 2500 and 2800 is vast, and the difference between
>> 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think
>> of it as TWICE as good. Or more likely to win MOST of the time. It is
>> a HUGE difference.
>
> You should also note that in engine vs engine games there is a
> tendency for the strongest commercial engines to include a few tricks
> in their prepared opening book that exploit known weaknesses in other
> engines. This makes it a bit unfair to older engines that are not
> heavily maintained and prepared for engine-engine matches.
Yes, and you can take the exact same opening book and have the same
issue between these engines. The interesting thing about Rybka, it
really is *that* strong.
Ultimately the point is, that they could have made their measurements in
the client rather than the engine. They could have modified, any sort
of client with source code available to do this. Then they could have
used multiple engine choices to ask their questions.
You seem to misunderstand what is being said. I didn't
say that 12-ply was too strong, but that the notion that
human strength correlates with deeper analysis is something
that should be proved, not asserted.
As to whether 12-ply + quiescence analysis was sufficient
to give meaningful results, the authors have addressed this
point, though not fully. (They're certainly more credible than
historical ELO, in any event.) One thing they did was to
show that analyzing the games of stronger programs
gave small errors - smaller than those of the humans. However,
there are certainly things they could have done but didn't: showing
results for plies less than 12, doing some analyses at higher ply,
showing that the analysis gives sensible answers for weaker players
whose ratings are known to a high degree of accuracy. There is
only a brief discussion of the correlation of the selected measure
of move quality with results, but that is really *the* key
connection that has to be established.
An interesting first step.
>
> An interesting first step.
You know, others apparently may disagree. But I do agree with you here.
Even if the study had apparent flaws, and questions.
It is, fundamentally, an interesting first step.
> Even if all this were done, it would still be a simple
> matter to skewer the idea of determining "the greatest
> player of all time" in this manner. I could easily
> produce an example where playing a bad move was
> not only intentional, but necessary in order to win.
> Where a "blunder" is a thousand times more effective
> than the "best" move. Where the room is filled with
> gossip about a certain player having allowed his
> inferior opponent a certain-lock on a draw, but
> where they all have to re-figure the pairings when the
> actual result is posted.
>
> -- star bot
Are you're talking about grandmaster draws, and how (as Fischer
complained) the Soviets would prearrange who would win? If so, you
make a good point.
Thanks for your input star bot. What is your rating?
BTW I am a true patzer. Today, against my PC (Fritz on a Pentium), I
was up a WHOLE ROOK!!! (I trapped the black queen and won it for a
rook) and I STILL LOST!!!
I can't believe it. I got so overconfident I drifted, got frustrated,
and eventually lost (I was actually down 2 pawns, but out of disgust I
resigned).
Ray (Woodpusher) Lopez
> I am not convinced that scoring human GMs by how closely their play
> resembles any particular named chess engine has merit.
Premise #1
> Perhaps ranking
> them by percentage blunder rate might be meaningful though (and well
> within the capability of any good chess engine). It is surprising how
> effective blunder check can be even on GM level games given sufficient
> time.
Premise #2
Logically, Premise #1 =~= Premise #2. So, are you for or against the
proposal made by the authors of the original paper that started this
thread? Seems that you are both for and against. Please take a
stand.
RL
But not better things to do than reply to my Usenet posts. Hahaha!
RL
"Moves are different" says Johnny. This is irrelevant.
Normalization, again. Normalization. What you were not as a child.
>
> Simply they had another option, if they wanted to change the tool. They
> could have done the same thing, but instead of manipulating crafty, they
> could have manipulated say Arena or Winboard. Then the questions could
> have been asked to Crafty, Fruit, Toga, Shredder, Rybka, and other VERY
> strong UCI engines. The Differences between the engines in style and
> strength and the world champions would have made for a much more
> interesting set of answers, and would have killed this argument before
> it started.
Ponder this: if Crafty, Fruit, Toga, Shredder, Rybka, and other VERY
strong UCI engines *ALL* rate the best players ever as Capa, Kramnik,
Karpov, Kasparov, *in that order* (cuz 'puters don't lie!), what does
that say?
If Fruit, Toga, Shredder, Rybka *ALL* rate the very best players as
"Capa, Kramnik, Kasparov,Karpov, *in that order*", but Crafty puts
Karpov before Kasparov, does this mean that the top two players are
indeed Capa and Kramnik?
If, like another poster says, Tal won a match by playing complications
(but lost the rematch BTW), does this make Tal the 'best' player at
that point, or, over his career, or, do we analyse ALL his games,
including game played when he first learned chess, or do we just count
the 20 games where he became champion? Some players can "peak", but
should not the test be over the player's career (like Lasker)?
If Kata Kamsky becomes champion by spiking Nigel Short's orange juice,
does that make Kamsky the better player, because he used "shock"
tactics? Is Tal's 'unsound' sacrifices good chess, or shock tactics?
If "Fischer Fear" where 6 great players all choked and got
whitewashed, in Fischer's championship run, a true test of Fischer?
Or just good players choking?
Long story short: we need further research, to the extend anybody is
motivated to do it (since let's face it, computer chess outside of
writing chess programs for the masses is not exactly the best funded
field anymore), but at the moment, the tentative research shows:
!!! Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in
that order* (cuz 'puters don't lie!) !!!
What part of that sentence did you not understand?
RL
> If, like another poster says, Tal won a match by playing complications
> (but lost the rematch BTW), does this make Tal the 'best' player at
> that point, or, over his career, or, do we analyse ALL his games,
> including game played when he first learned chess, or do we just count
> the 20 games where he became champion? Some players can "peak", but
> should not the test be over the player's career (like Lasker)?
It seems unfair to penalize a player who continues to play once his
skills have deteriorated.
As for what we should test, it depends on what you want to know. If you
simply want to know who was the strongest at their peak, it's rather
silly to include games which weren't at their peak, and I'm not sure how
meaningful to talk about Capa being better as a 14-year-old than Tal was
means anything, at all, when talking about their strength as champion.
My point, however, was that you can test your hypothesis about how
meaningful these results are by looking at small samples where we can
conclusively say one player played better, such as Tal-Botvinnik 1.
You could do the same thing in T-B 2, of course, but since I suspect the
program's bias will be towards Botvinnik in both matches, it doesn't
really tell you anything interesting.
The point was there were very specific numbers in there. Very specific
numbers are problematic. It wasn't just comparative results. And there
are reasons to suspect those results.
As I said elsewhere I think it is an interesting first try.
As I said elsewhere, they can get further by modifying the client rather
than the engine.
And I have no idea what the heck you were trying to say. Maybe I am
thrown off the train with phrases that start with "cuz".
Fact: The various named chess engines produce significantly different
move rankings in key positions. This is already well studied in the
literature. See for example this study of Fritz8 vs Junior9 which
represent two extremes online at
http://www.dcs.bbk.ac.uk/~mark/download/fritz_junior_icga.pdf
The worlds greatest ever chess player rankings should not be a
function of the engine they are compared against!
Who is to say which of Crafty, Shredder, Junior, Fritz, Rybka, Fruit
etc etc is the closest approximation to optimum GM play. I suspect
Crafty was only marginally adequate for this test, but looking at the
apparent correlation of the blunder rates with overall rms player
error in the original paper I think they do have a point. It will be
interesting to see what happens if/when the test is repeated with
other engines and a hefty search depth.
My point here is that comparing them to a single engine produces an
inherent systematic bias in favour of players with a style similar to
the specific named engine and in this case at a rather limited ply
depth. I reckon that by about ply 22* with extensions any of the top
the engines would be able to annotate GM level games authoritatively
(though some would take much longer than others to do it). *The main
exceptions are in nasty endgame transitional positions with active
high mobility pieces but well out of range of the tablebases where
even the top engines can still get lost.
>
> > Perhaps ranking
> > them by percentage blunder rate might be meaningful though (and well
> > within the capability of any good chess engine). It is surprising how
> > effective blunder check can be even on GM level games given sufficient
> > time.
What I am saying here is that the detection of GM *blunders* is well
within the capacity of any of the half decent chess engines and that
these results are unambiguous. The problem with this is that the
unforced error rate of top players is very low so this factor only
determines the outcome of a small percentage of games.
In recent history most notably when Kramnick overlooked a mate in one
against Deep Fritz in an otherwise drawn game 2.
http://www.chessbase.com/newsdetail.asp?newsid=3514
>
> Premise #2
>
> Logically, Premise #1 =~= Premise #2.
You must have pretty dumb "logic" if you cannot see the difference
between detecting *blunders* in games and scoring players according to
much smaller deviations from the engines preferred "best" line. Even
more so when the engine was not allowed sufficient time to look deep
enough to match or exceed GM strength play.
> So, are you for or against the
> proposal made by the authors of the original paper that started this
> thread? Seems that you are both for and against. Please take a
> stand.
I believe that where the engine evaluation has sufficient signal to
noise to make a clear call on the best move being different to the GM
choice the methodology will work just fine. However, there are a lot
of positions in most games where the continuation lines are too close
to call even with the current crop of state of the art engines.
Although the paper makes the claim that these will average out - the
systematic bias in favour of playing like the specific named engine
will not.
Regards,
Martin Brown
It especially doesn't see far enough into the future if it's been set
to search to only twelve ply (or whatever depth it was that the
researchers chose).
> Like peeling an onion each time you make the engines an order of
> magnitude more powerful or add enhanced heuristics you allow deeper
> searching of the game tree that may alter the outcome.
I'm not sure exactly how this is `like peeling an onion' but that
doesn't really seem to be important. :-)
Dave.
--
David Richerby Poisonous Nuclear Beer (TM): it's
www.chiark.greenend.org.uk/~davidr/ like a refreshing lager that's made of
atoms but it'll kill you in seconds!
I assumed the tricks Martin was talking about were of the form of
steering the game into positions where other engines do badly once the
book is exhausted. As such, it doesn't help if the other engine is
using the same book.
Dave.
--
David Richerby Zen Ghost (TM): it's like a haunting
www.chiark.greenend.org.uk/~davidr/ spirit that puts you in touch with
the universe!
No shit, Sherlock!
> For instance, at one time there was a big difference between chess
> programs from say, 1980, where now all such programs are
> "compressed" near the bottom of the current list. Old magazine ads
> might list a Mephisto at 2200, and a Fidelity at 1800, while now you
> could find both programs having been beaten to a pulp by their
> successors, scrunched together at say 1900 and 1650.
You seem to be making the mistake of assuming that `2200' means some
fixed level of strength. (Otherwise, it would be entirely
unremarkable that a program that formerly scored 2200 now scores
1900.) Ratings do not measure strength.
Dave.
--
David Richerby Pointy-Haired Newspaper (TM): it's
www.chiark.greenend.org.uk/~davidr/ like a daily broadsheet that's
completely clueless!
> Perhaps you just don't understand what we are getting at...
>
> In this study they crippled crafty down to 12 ply. Now What we
> are trying to get at is that many (if not all of them)
> of the world champions saw well
> beyond 12 ply. Crafty would mark these moves as Errors, where
> a stronger program would note that they were not. This is due to
> the playing styles of each of the world champions. Tal and
> Alekhine were calculating machines and would often complicate
> a position... The crippled crafty would simply mark these as
> errors.
Two people who often post in these newsgroups [both with PhDs] are now
almost ready to publicly produce their reviews of the new MAMS book, How to
Fool Fritz, which addresses many of these subjects.
-----------
>> A larger question looms from this thread: have you people not learned
>> anything after nearly a generation of computer chess? That the 'puter
>> is never wrong?
>
> lol but different programs are definately wrong. I have used
> several programs to examine my games from Chessmaster 10k,
> Fritz 7, Fritz 8 and Rybka. Its amazing to see the difference
> in what each program believes is 'right'. Guess which one
> it closest to chosing the 'right' answer? I put my money on the
> strongest program.
Although MAMS concentrates on Fritz - so that at least the same evaluation
can be produced uniformly throughout the book, the commentary makes clear
that any software program can be evaluated this way.
But the thesis of the book is much as Inconnux suggests above - re Alekhine
and Tal, that software evaluation often provides a lousy guide to complex
positional situations. In fact, is usually 'blind' to evaluating the respect
worth even /within/ its search depth.
Sometimes by overriding the computer move just a few times [or even once]
during the game a radically different evaluation shows up just a few moves
later - and these intercessions are usually positional and somehow Fritz
can't find the same line itself - but after being shown it is quite capable
of carrying on and winning the game.
An end-note is that author Alberts suggests various means of successfully
playing against Fritz's blindness to positional evaluation.
Cordially, Phil Innes
I never really understand that comment. IE, (a) what in your opinion does
measure strength, and (b) what do ratings measure?
Phil Innes
Chess One (Phil Innes) wrote:
>I never really understand that comment.
That's pretty much true of *every* comment. When, as is true in
your case, someone values stroking their own inflated ego higher
than understanding or learning, they remain in willful ignorance.
Thanks for this citation. This paper, whose abstract and a key
paragraph I reproduce below, does not support anything of relevance to
this thread. All it shows is that the various chess engines produce
different move rankings. What is key, which the paper does not
address, and which I intuitively surmise (having played various
engines over the years) is whether the top few computer generated
moves, when closely ranked together, are indeed the best moves in any
given position. That is to say, whether these moves lead to winning
positions (unless counteracted by another move of course). This is
the key, not whether BxN or NxN is ranked first or second.
>
> The worlds greatest ever chess player rankings should not be a
> function of the engine they are compared against!
>
But they are not. NORMALIZATION, I repeat. What this means is that
you use ONE program, and you set the Rand() function seed to zero, so
that the SAME repeatable move selection algorithm is used to rate
every human player. So you will never have players ranked as a
function of the engine, except for the trivial example where the
engine goes through an evaluation to determine whether a given move is
sound. For that matter, you can employ a human to mechanically go
through an algorithm, if you fear computers. Further, if you object
to an algorithm being used to determine chess moves, then say so, and
be laughed at, given that microprocessors and chess software have
shown they can beat or draw the best human in a match.
> Who is to say which of Crafty, Shredder, Junior, Fritz, Rybka, Fruit
> etc etc is the closest approximation to optimum GM play.
That's not the test, whether PCs play close to GMs. In fact, it's
well known that computers play different than humans. And again,
normalization means it doesn't matter which of these engines are used,
since largely they all use the famous Alpha-Beta and min-max
algorithms, with pruning, and certain ranking functions like scoring
positions with open files, central pawn rollers, good bishops, pins
and the like more than the obverse.
> I suspect
> Crafty was only marginally adequate for this test, but looking at the
> apparent correlation of the blunder rates with overall rms player
> error in the original paper I think they do have a point. It will be
> interesting to see what happens if/when the test is repeated with
> other engines and a hefty search depth.
Yes, that would be interesting, but it may not change the rankings
much, see the above.
>
> My point here is that comparing them to a single engine produces an
> inherent systematic bias in favour of players with a style similar to
> the specific named engine and in this case at a rather limited ply
> depth.
Unsupported by any evidence. THis is your intuition, and my intuition
says the opposite (see the above). Of course in Chessmaster xxxx you
can set the parameters slightly different so the 'puter 'plays like'
Capa, or Fischer, or Petrosian, but at the end of the day, if you use
the same parameters to rate all humans, you will not differ much in
the ranking of their play, I intuit, since chess is largely tactics.
> I reckon that by about ply 22* with extensions any of the top
> the engines would be able to annotate GM level games authoritatively
> (though some would take much longer than others to do it). *The main
> exceptions are in nasty endgame transitional positions with active
> high mobility pieces but well out of range of the tablebases where
> even the top engines can still get lost.
>
Maybe. I think they are already at 15 ply or so, without much
pruning, no? But this is immaterial to this thread. I've noticed
that at five seconds Fritz largely scores the winning moves (top 3)
the same as at 30 seconds, and probably (never had time to test this)
the same at 180 seconds. Of course certain positions are exceptions,
that readers of this thread will gleefully point to, but these are
rare exceptions, not the rule.
>
>
> > > Perhaps ranking
> > > them by percentage blunder rate might be meaningful though (and well
> > > within the capability of any good chess engine). It is surprising how
> > > effective blunder check can be even on GM level games given sufficient
> > > time.
>
> What I am saying here is that the detection of GM *blunders* is well
> within the capacity of any of the half decent chess engines and that
> these results are unambiguous. The problem with this is that the
> unforced error rate of top players is very low so this factor only
> determines the outcome of a small percentage of games.
Yes, you're right, I understood this point about blunders, agreed--
blunders are rare and rarely decide games. Thanks for the link to the
Hawaiian professor's site on error rates, which was interesting.
> You must have pretty dumb "logic" if you cannot see the difference
> between detecting *blunders* in games and scoring players according to
> much smaller deviations from the engines preferred "best" line. Even
> more so when the engine was not allowed sufficient time to look deep
> enough to match or exceed GM strength play.
>
> > So, are you for or against the
> > proposal made by the authors of the original paper that started this
> > thread? Seems that you are both for and against. Please take a
> > stand.
>
> I believe that where the engine evaluation has sufficient signal to
> noise to make a clear call on the best move being different to the GM
> choice the methodology will work just fine.
Or moves (top two or three moves) where the top two or three moves are
not substantially different but equally lead to winning positions. We
agree.
> However, there are a lot
> of positions in most games where the continuation lines are too close
> to call even with the current crop of state of the art engines.
Yes, but think logically for once Martin: if these continuation lines
are "too close to call", how much are the human GMs penalized by
Crafty? Answer: very little! Because the difference between what
the GM chose (let's say move 3 of the top three by Crafty) and what
Crafty chose (the first move), is, by definition, very close. So
let's say in centipawns the 'best' Crafty move is +85, while the
actual GM move was rated +75, meaning a penalty of -10 centipawns is
applied. Overall, this is a trivial penalty, because the continuation
lines were too close to call. However, if the GM move chosen rates
only +5, then rightly Crafty is penalizing the GM a hefty +80 cpawns,
and clearly, based on the superior knowledge that PCs have shown to
have about the game of chess, the GM is (usually) picking an inferior
line. Of course there are exceptions--most notably a positional
sacrifice (not a pseudosacrifice, I trust you know the difference)--
well beyond the move horizon of the program, but these exceptions are
rare in chess (which is why they are so delightful when seen). BTW on
this last point: my Pentium IV PC at 30 second a move is great at
scoring exchange sacs in the Sicilian where in certain lines Black
exchanges QR for QK at about even--showing that indeed processors are
not as bad as people think at even scoring positional sacrifices.
> Although the paper makes the claim that these will average out - the
> systematic bias in favour of playing like the specific named engine
> will not.
Normalization. Irrelevant, since the 'heart' of these engines is the
same and chess is primarily tactics.
RL
>From the paper:
Anecdotal evidence exists that in many positions two distinct chess
engines will choose different
moves and, moreover, that their top-n ranking of move choices also
differ. Here we set out
to quantify this difference, including the difference between move
choices by chess engines
and those made by humans. For our analysis we used FRITZ 8 and JUNIOR
9 as representative
chess search engines and the POWERBOOK opening book as representing
human choices. We
collected the top-5 ranked moves and their scores as reported by FRITZ
and JUNIOR, after 15
and 30 minutes of thinking time, and the top-5 moves recorded in the
POWERBOOK, for the
Nunn2 test positions and the initial board position. The data analysis
was carried out using
several nonparametric measures, including the amount of overlap in the
top-5 choices of the
engines and their association as measured by three variants of
Spearman's footrule. Our preliminary
results show that, overall, the engines differ substantially in their
choice of moves,
and, furthermore, the engines' choices also differ substantially from
human choice.
The results confirm that, overall, the engines differ in their choice
of moves. Although the overlap in the
top-5 move choices is about 3 on average, the top-1 overlap is close
to 0 and the top-2 overlap is close
to 1. The F, G, and M measures show that FRITZ and JUNIOR rank moves
in a different order, and when
there is agreement, it is not necessarily in the top-3 move choices.
There is higher agreement between
FRITZ's ranking and that of humans than there is between JUNIOR's and
humans' rankings. Both FRITZ's
and JUNIOR's rankings are stable over time, on average, although there
are still fluctuations in the rankings.
Furthermore, FRITZ's score difference between moves is slightly higher
than JUNIOR's, possibly indicating
that FRITZ is 'more confident' in its ranking than JUNIOR is. Finally,
the average scores of moves per rank
are similar and decreasing with rank, and they indicate a small
advantage for White in the positions tested.
Nothing.
> (b) what do ratings measure?
Performance.
We've been through this a hundred times in these groups. While I'm
prepared to explain it to newbies, I'm not doing it again for somebody
who's been here longer than I have.
Dave.
--
David Richerby Expensive Sushi (TM): it's like a raw
www.chiark.greenend.org.uk/~davidr/ fish but it'll break the bank!
troll off nitwit! address the subject, or wait! do you even understand the
subject - prove it! Pi
Fine. But though we have been through this so many times, perhaps the reason
is that the explanations aren't very convincing? And that's why people
continue to challenge it! :)
I don't understand terms which are undefined, since they can mean about
anything. ie, 'performance' is a measure quantifiable by rating, and isn't
performance synonymous with strength in that people use the terms
interchangably?
Since Dave is perhaps exhausted explaining the issue, can anyone else
actually say the difference between
asking about the strength of a player in respect of other players, and
asking about the performance of a player in respect of other players?
As I say, there may be some worthwhile distinction, and I do not want to
slight Dave, except to say that whatever the distinction is escapes me and
presumably the previous 100 people who have inquired.
Phil Innes
>> (b) what do ratings measure?
>
> Performance.
>
> We've been through this a hundred times in these groups. While I'm
> prepared to explain it to newbies, I'm not doing it again for somebody
> who's been here longer than I have.
Ratings measure win/lose predictability in the pool that is rated against.
A side effect of rating, is to understand an *implied* strength based on
that rating.
It is difficult to provide a number of implied strength to two
candidates in the pool. Those that lose the vast majority of the games,
and those that win the vast majority of the games.
When pools are small or closed to other populations, it is difficult to
correlate the numbers of one pool to another, especially in implied
strength, if not in predictability of win/loss ratio.
Occasional cross checking of the pool (like the Fritz 10/Kramnik match),
can help provide validity of the implied strength of the numbers. So
long as the match was fair, the results were as predicted, and the match
wasn't entirely one-sided.
We have a couple of interesting candidates that are in the Computer Pool
that haven't been fully calibrated by the human pool. Rybka and Hydra.
But back to subject...
The problem here, is that crafty is well off the pace. There are
engines that soundly beat crafty in what appears to be purely based on
strength on not just tricks, and that those engines are beat soundly by
Rybka.
The other problem is the 100cp measure for blunder. Because that does
not appear to be enough.
This means, that this comparison is an interesting set of questions, but
that the questions raised about using a restrained version of crafty are
so severe that even if the only easy method because it is open source
that it calls into question the validity of its conclusions.
Clearly, they are other opportunities to control the engine, and ask the
questions that are in the researchers hands. In ways that use engines
that are closer to world championship level than crafty. In ways that
let you use engines of different styles.
You will, ultimately, however always have to do with the difficult
question of truth. Which is only implied through results.
> It seems unfair to penalize a player who continues to play once his
> skills have deteriorated.
Another interesting point.
Which means that you really want to see this as some sort of career
graph, rather than just a final number.
> Who is to say which of Crafty, Shredder, Junior, Fritz, Rybka, Fruit
> etc etc is the closest approximation to optimum GM play.
This is probably too low a measure. "GM" play and World champion play
are usually not close (latest FIDE world knockout champions aside).