Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Cray Blitz / ACM '94

7 views

Skip to first unread message

Robert Hyatt

unread,

Jul 7, 1994, 1:00:47 PM7/7/94

Many have asked what happened to us at the ACM event this year. (we
finished with 2.5 out of 5.0...) Rather than repeating this as email
dozens of times, I thought that I would give my thoughts on the tournament
as well as report on a really interesting problem we encountered.

First, we were not playing any worse than in previous years, except for
a couple of things that I'll discuss later. Everyone else is simply
getting better. 10 years ago, we were outsearching everyone but Belle
and as a result, we really only had to win one game in a tournament to
finish first, simply beat Belle. Now, everybody is searching deeply
enough that simple tactical crushes no longer happen very often (Spector
game against us being an exception where a single bad move in the opening
let the roof fall in...)

A couple have noted that we were running on a machine 1/4 the speed of the
fastest machine we have used. Our machine was a 4 processor Cray C90 with
a gigabyte of memory, rather than the normal 16 processors with 8 gigabytes
of memory. However, we found ourselves doing 9-10 ply searches anyway so
the machine speed really had no noticeable affect. Yes we might have gotten
another ply at times, but we don't think that this had any impact or our play
at all.

If you play over our games, the two we lost were very informative. In looking
at our evaluation procedures, the only piece that has practically no scoring
is the bishop. We have not found a reasonable way to handle mobility to
understand good/bad diagonals, good/bad bishops, etc. If you look at the
Zarkov game, both bishops were locked onto the queenside behind a wall of
pawns and almost became "tall pawns" themselves. Zarkov then played quite
well, moved it's pieces to the kingside, where Cray Blitz could not use its
bishops for defense and won a well-played game. CB understands well that two
bishops are better than a bishop/knight or knight/knight, but it tried so hard
to preserve the bishop pair that after it did, they were useless...

In the wchess (Dave Kittinger's program) game, the same thing happened again.
CB wound up with the bishop pair again, but both bishops were on its second
rank with pawns diagonally in front of each bishop on *both* diagonals. Again,
the bishops became tall pawns and playing two pieces down was too much to
overcome.

The really sad thing was that CB "saw" itself losing both of these games at
least two or three moves *before* its opponents saw anything. In both games
it saw significant tactics that the opponents both missed, but, due to the
locked in bishops, it could not avoid problems even with its deep searches.
(one variation I'm looking at right now is 27 plies deep, with not all moves
checks and captures...) In short, its tactics were superb, but its positional
understanding had a small "chink" that its two opponents exploited in simply
superb fashion. I am finally rewriting Cray Blitz for the final time, using
the "bit board" approach which will be significantly faster since I believe that
I can get everything to vectorize much better on the Crays. Additionally, I
have discovered a way to handle piece mobility that is not computationally too
expensive to bear, which should show immediate improvement in the "great bishop
debacle of '94"

One final problem caused us some difficulty, particularly in the NOW game which
we were lucky to draw (more on this later..) but also in the game against
Innovation which we won, but in which the program nearly went "ape..."

Several that attended watched as we debugged between rounds, so I'll take this
opportunity to explain the bug I just found and fixed.

About 20 years or so ago, after watching (then) Blitz lose endgame after endgame
where pawn races were lost, I wrote a fairly simple piece of code that correctly
evaluated positions that met the following criteria:

1. If one side has at least one passed pawn, then the other side can't have any
pieces, otherwise races are very difficult to analyze. The following cases are
included:

o if the losing king is "outside the square" of the pawn, the pawn wins.

o if the losing king is "inside the square" but the queening square is
controlled by the winning king, the pawn wins.

o special cases for rook-pawn are also included, such as the winning king
in front of the pawn with the losing king "locking" the winning king in
front is handled correctly.

2. If both sides have a passed pawn and no pieces, then the "queening race" is
handled correctly

o if one side queens two moves before the other, he wins;

o if one side queens first and either checks the opponent or the
queen attacks the queening square of the opponent then he wins
unless the other king is close enough to *his* queening square
that the pawn can still promote.

Our "bug" came about due to an interesting set of circumstances. First, before
explaining the bug, here's what had to happen:

o one side has a passed pawn while the other did not;

o the opposing king was "in the square" so that the pawn couldn't
run and promote;

o the pawn must still be on its original square (ie, it could advance two
squares on its first move...)

The bug happened as follows: Both sides' queening distance was set to 100 to
indicate that neither has a won race yet;

next, we evaluated the only passed pawn on the board, and find that its
queening distance is 6 (assuming pawn is on h2, this would be moves h3, h4,
h5, h6, h7, h8=q, nevermind that no one would play h3 instead of h4...)
Now we discover that the opposing king is in front of the pawn and that it
can't queen anyway... set the queening distance back to 100. Now the bug:
if the pawn is on it's original square, decrement the queening distance by
one since we know that playing h3 instead of h4 simply throws a tempo when
checking the "square of the pawn." Now the queening distance is 99, which
is bogus. In seeing this the program notices that it is not "100" which is
the value used to indicate no pawn queens freely, and therefore it starts
moving a pawn on the board, when one is not there. In short, the scoring
routine "creates" a new pawn on the board and leaves it there. It turns out
to have little effect since we maintain a list of where each "real" piece is
and therefore we won't generate moves for this phantom pawn, but it is there.

It's real impact is that suppose this pawn gets put on square a6 (for example)
and left there since no moves will ever cause it to move around since CB really
doesn't ever notice it. However, when we make a move like Ba6, we update the
hash key by the normal trick of exclusive or'ing the from square, the to square,
and (unfortunately in this case) the captured piece. Since the move Ba6 should
capture *nothing* the hash key should get two changes. However, since this
phantom pawn is sitting there, the hash key got modified a third time and is
now "out of sync" with the real board.

The result? We now has things from the transposition table when we shouldn't,
which gives misleading or erroneous scores all over the place. I have played
back through the NOW game and we play that game significantly differently now.
I have no idea whether we would now win, lose or draw, and it's not worth
the time to try and understand what we did that might have been bad. One
amusing thing was in the innovation game, we were winning pretty easily after
an even struggle out of the opening, when we noticed that the program had
announced a mate in 5. We were over a rook ahead, and it didn't surprise
me at all until I saw that the first move was something like dxe8=q+. On
our "real" board, there was no CB pawn on d7. When I asked the program to
display what "it" thought the board looked like, damned if it didn't have a
pawn on d7. and on d6, and on d4, and on... After entering the next move,
the program played back through all the moves and "fixed" things, but that
was "eye popping.."

Now for the funny part. When was this bug "born" you say? I have saved
every version of the source code dating back to March of 1986, the year
we won the World Computer Chess Championship for the second time. It was
there then. I have some listings that go back to 1981 and, you guessed
it, the bug was there then too. Why did it just now become so critical?

A couple of guesses: 1. The pawn had to be on its original square, which is
not too common with passed pawns. This probably prevented its happening too
frequently, although Bert and I can recall an occasional game where the program
reported that it detected hash "collisions" when it compared the move it found
in the hash table with the game board and discovered that the move was illegal.
These were few and far between, sometimes none at all in a 5 round tournament,
so that we simply (naively it turns out!) assumed that they were "real"
collisions. Some of you might recall the discussion we had a few months back
about "how many hash key bits are needed for safety?" and the results I posted
from Cray Blitz indicating that our 64 bit hash key failed to produce any
collisions for many 2 billion node searches. That should have got me to
thinking, but I suspect that I might have gotten a little "lazy" in my
old age (46 is old?)

guess 2: Over the past few years, we have really extended the search along
"interesting" lines. Particularly when pushing passed pawns, pushing any
pawns past the fourth rank (which frequently produces passed pawns) along
with our other extensions. It is pretty likely that our increased search
depth has made it more likely that we can see how to create passed pawns
even when the passer has never moved. For example, one of the NOW buggy
positions takes the current version of Cray Blitz (version 49g) only 8
plies and a second or so before it reports the bad hashing stuff. I then
went back to version 46e (about 1.5 years old, which we used in Indianapolis
and which finished 2nd there) and it takes 13 plies and several minutes before
it, too, finds bad hashing going on. Since they both have exactly the same
buggy passed pawn evaluator, obviously something we are now doing lets the
program reach the position when the older version can't. We played the last
round of the recent ACM event with version 46e thinking that we were playing
with a code that was correct after all of the "crap" we saw in two of the
games with pawns "appearing." Of course, we were simply giving up enough
search depth along selected lines that it appeared less likely that the bug
would be seen. I have now played both versions through the Now game, and
around move 33, 49g starts having problems where 46e sails cleanly on its
way. After move 38-39, neither have any more problems since all pawns are
off their original squares, or else can't become passed.

In all, this has been an interesting "lesson" in humility. Just because
something has apparently worked for 20 years, doesn't mean that it is, in
fact, correct. I have long believed that the phrase "debugged parallel
program" is some sort of an oxymoron, but now I tend to include "debugged
chess program" as well. CB is simply too big, too complex, and too well
written (AKA optimized for the Cray which makes the program even more
difficult to read and understand) for it to ever be 100% bug-free. From
a philosophical point of view, I suppose that's good since this is supposed
to be an artificial intelligence experiment, and humans are certainly never
100% "debugged." From that point of view, I hereby give the Cray Blitz team
a tremendous "job well done" since we have captured so much of the "essence"
of human play into the program.. :^) Now, if we could only capture an
equivalent amount of skill to at least partially offset these damn bugs...

One final comment. I am utterly amazed to listen to some of the programmers
each year. A more cocky group would be hard to find. Many find lots of fault
with Hsu's program... "it plays horribly positionally" "it doesn't understand
that type of opening" "it .....". Dave Kittinger was playing us and fully
expected to see Star Socrates blow deep thought away since it was such a
superior chess playing program... A word to the wise, don't ever count deep
thought (or deep *) out. It has the following going for it: (1) superior
speed... using 12 processors is significantly more efficient than using 512
as Star Socrates was; (2) superior search extensions (3) any other feature
of your choice in this space... In short, Hsu's program is still the "cream
of the crop" whether or not it loses a game to a computer. Personally, I
believe that Cray Blitz can (and will one day) beat deep * in a single game
in an ACM or whatever event. I don't believe that we will ever be able to
win a match based on our hardware future and what deep blue will look like
later this year. It's the same old same old... Faster hardware makes the
selectivity less important, which reduces errors, which produces better play,
etc. Socrates is (was) a microcomputer-based program that used selectivity
to produce the depth necessary to stay on the board with a deep thought or
a Cray Blitz or a HiTech or whatever... However, moving it to faster
hardware simply magnifies the selectivity errors such a program inevitably
has. Deep Thought doesn't have such shortcomings.

I hate to say it, but "good work Hsu, Murray, John, etc." I'm impressed.

Bob Hyatt

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

0 new messages