Cray Blitz Evaluation

55 views
Skip to first unread message

Robert Hyatt

unread,
Mar 5, 1993, 9:44:01 AM3/5/93
to
Here is an excerpt from the book chapter I mentioned. After getting over
100 email requests, this seems like the best way to distribute it. One note,
the typesetter macros used in the book are not available here and as a result
I did a lot of "tweaking and fiddling" to make it readable and postable via
pnews. If you find something that reads "funny" please let me know and I
will fix it. I tried to proofread it, but since I wrote it it is easy to
"read between the lines" and let something slip by.

By the way, note that this is "state of the art" for Cray Blitz as of the
publication data of the book "Computers, Chess and Cognition". Since this
was written late in 1989, some things have probably changed a little since
then. Certainly the machine has as we are now running on a C90 and searching
about 500,000 nodes per second as opposed to 200,000 in 1990.

Bob

--------------------------cut here---------------------------------------

Cray Blitz version 46e (current version=47d)

Evaluation Procedure

The evaluation procedures in Cray Blitz have evolved
over the years and are a direct result of analyzing the many
games played by the program. Some of the ideas are well-
known and came from the early days of computer chess pro-
grams. Other ideas address specific weaknesses that the
program exhibited at some point during tournament play.

The following sections explain the scoring ideas for
each piece. However, the descriptions do not include the
actual point scores because they are too specific to Cray
Blitz and the machine it runs on. In fact, the numbers
might actually lead the beginning computer chess programmer
completely astray because there is a specific interaction
between search strategies, search speed, and evaluation.

The basic scoring unit in Cray Blitz is a pawn worth
1,000 points. Following elementary chess theory, knights
and bishops are worth 3,000 points, a rook is worth 5,000
points, a queen is worth 9,000 points, and the king is worth
infinitely many points. Since current chess knowledge indi-
cates that bishops are worth more than knights, the program
does contain such knowledge. However, the knowledge is
dynamic rather than static so that in certain cases a knight
can be better.

In the following sections, if the program is playing
white, bonuses for white pieces are positive numbers and
bonuses for black pieces are negative; the opposite holds
true for penalties.

King Scoring

King scoring recognizes two distinct phases in the
game. If the opposing side has no more than 13,000 points
of material (not counting pawns), then the king uses endgame
scoring (described later in this section). If the opposing
side has more than 13,000 points of material (a queen and
rook, for example) then the king uses middlegame scoring.
The middlegame scoring quantifies the safety (or lack
thereof) of the area around the kings. The location of
friendly pawns governs the king safety.

The absolute location of the king is the first scoring
component for king safety. The king prefers either knight
file with the rook or bishop file a second choice; the kings
dislike the two central files even though in some positions
they might actually be better. In addition to the preferred
files, the kings prefer the back rank as first choice; suc-
cessive ranks are worse in a linear manner. The optimum
square for the king is therefore on the first rank on either
of the knight files; alternatives are worse in proportion to
the distance from the "optimum" square. This number
represents a starting point for the king safety score, other
values can easily outweigh the initial approximation and
make the king stay in the center or move somewhere else, as
needed.

Holes in the pawn structure or open lines leading to
the king are positional weaknesses that lead to attacking
chances. The program penalizes holes in the pawn structure
surrounding the king. For example, pawns at e3, f2, g3 and
h2 leave weak squares or holes at f3 and h3. The program
carefully examines the status of files around the king,
specifically the files from the edge of the board to the
file one past the king. For a king on the knight file, the
safety zone includes the bishop through the rook files. For
a king on the rook file, the safety zone still includes the
same three files. For a king on the kings bishop file, the
zone would include the king, bishop, knight and rook files.
There are three levels of penalties associated with king
safety. For completely open files, the penalty is substan-
tial since enemy pieces have direct access to the king. If
the king does not have a friendly pawn in front of it, but
an opposing pawn is blocking the file, the penalty is less,
but still substantial since opening the file is under the
opponent's control. If the opponent has a half-open file
bearing on the king (no opponent pawn but the king does have
one), then the penalty is less because it is somewhat harder
to force the file open.

Another safety consideration is luft, or breathing
space. A king trapped on the back rank with the three
squares on the second rank in front of it blocked by
friendly pieces/pawns or else controlled by the opponent
earns a penalty. This avoids those positions where Cray
Blitz overlooks back-rank mates at nodes near the maximum
depth of the tree.

The final safety term penalizes moving the king and/or
rooks before castling. Moving the king earns the largest
penalty, moving the king-rook is next, and moving the
queen-rook earns the smallest penalty. Upon reaching the
endgame phase (13,000 points) Cray Blitz disables the above
penalties. >From this point forward, the only scoring con-
cerns centralizing the king. As described later, passed-
pawn scoring depends on king location, but since pawn posi-
tions drive this heuristic, the pawn module computes the
score.

Queen Scoring

The queen-scoring routine recognizes three basic posi-
tional features:
1) centralization, 2) king tropism (closeness to
opponent's king), and 3) the opponent's king safety.
Of the three considerations, the last needs clarification.
The value of a queen is directly related to the safety of
the opponent's king. If the opponent has an exposed king
position, then the program's queen becomes more valuable and
vice-versa. This gives the program some sense of when a
queen trade is advantageous and when it is not. Rook Scoring

The rook-scoring routine recognizes five basic posi-
tional features:
1) king tropism, 2) open and half-open files, 3) rooks
on the 7th rank, 4) rooks behind a passed pawn, and 5)
rook mobility.
Rook mobility is a recent addition that attempts to elim-
inate tactics based on a rook that is trapped, or a rook
that cannot move without losing material. The mobility term
penalizes a rook that cannot move horizontally because of
friendly pieces, pawns and/or the edge of the board. An
example would be playing Ra2 to defend a pawn at b2. The
rook at a2 incurs a penalty for no horizontal mobility.
Although this heuristic evaluates some cases incorrectly, it
does eliminate some awkward rook moves that later cause the
program great difficulty. In one game played, Cray Blitz won
a pawn, but then defended a pawn on b2 with Ra2 and later
lost the b2 pawn as a result.)

Half-open files and king tropism are really long-range
heuristics that try to anticipate attacks and file openings
long before the search actually reaches the position. This
attracts rooks to places where they will probably be useful
later, although at times the rooks end up on squares where
they have to move again.

Bishop Scoring

The bishop-scoring routine recognizes three basic posi-
tional features:
1) the presence of two bishops, 2) "good" and "bad"
bishops, and 3) open diagonals.
The concept of "good" and "bad" bishops is a measure of how
many friendly pawns are on the same colored squares as a
bishop. The more friendly pawns there are on the same
colored squares, the more restricted the bishop becomes.
With enough of these pawns, a bishop can become worth less
than a knight since the knight does not "slide" and cannot
be blocked by pawns.

Open diagonals are similar to open files for rooks
except that it is easier to close a diagonal than it is to
close a file. This makes diagonals more difficult to evalu-
ate, but it helps to recognize when a bishop attacks the
king zone along with other pieces.

Knight Scoring

The knight-scoring routine recognizes three basic posi-
tional features:
1) central control, 2) king tropism, and 3) outpost
knight.
The concept of an outpost knight is a knight on the
opponent's side of the board that is occupying a square that
cannot be attacked by a pawn. The only exception is that a
knight on the edge of the board cannot be an outpost.

Pawn Scoring

Nimzovitch said that "pawns are the soul of chess," and
to that ideal a significant part of the programs's total
positional evaluation time is devoted to pawn positions.
The pawn-scoring routine recognizes eleven basic positional
features.
1) center pawn movement, 2) advancement, 3) pawn rams, 4)
presence of eight pawns, 5) doubled/tripled pawns, 6) iso-
lated pawns, 7) backward pawns, 8) passed pawns, 9) out-
side passed pawns, 10) square of the king, and 11) king
against king and pawn(s).
The center pawn movement term produces a large penalty if
both the king-pawn and queen-pawn stand on their original
square. Only one center pawn standing on its original
square produces a smaller penalty. This encourages advanc-
ing the central pawns to create the open type of positions
that the program plays best.

Pawn-advancement scoring is a dynamic system that
varies depending on the type of opening, always with the
idea of favoring open positions where the program's tactical
ability is most useful. An example is that for openings
based on d4 for white (Cray Blitz), then the program strives
to play c4 when possible to open the position (the same is
true for d5 as black). This idea of altering the scoring
based on the opening is described in more detail later.

The match between Cray Blitz and David Levy in 1984 led
to the development of the pawn-ram term. In that match,
Levy won each game easily by blocking things up so well that
the program could find no tactics since no open lines were
available. Ivan Bratko suggested this term which adds a
penalty for each pawn ram (where two pawns of opposing sides
are "face to face" on the same file).

The presence of eight pawns incurs a penalty to
encourage at least one pawn exchange (this is an asymmetric
scoring term that only evaluates rams from the program's
point of view). This term attempts to avoid locked posi-
tions in a manner similar to the pawn-ram term. (Note that
Ken Thompson developed this idea for Belle '80).

The isolated-pawn term evaluates pawns with no friendly
pawns on adjacent files. An isolated pawn with no opposing
pawns in front of it receives the largest penalty since it
is easy to attack from the front with rooks and queen. An
isolated pawn with enemy pawns in front of it is not as bad,
because it is harder to attack and win it. This scoring
routine also evaluates the special case of artificially iso-
lated pawns which occurs when a pawn advances so far that
other pawns are unable to support it.

Backward pawns are simply special cases of isolated
pawns. A backward pawn has adjacent pawns advanced far
enough that they cannot defend it. If the backward pawn is
on an open file, the penalty is the same as that for an iso-
lated pawn on an open file, otherwise it is the same as that
for an isolated pawn on a closed file.

Passed pawns are surprisingly difficult to evaluate in
middle-game positions. For example, is an isolated passed
pawn weak or strong? For this reason, isolated passed pawns
earn no special bonus as long as the opponent has more than
13,000 points of material (pieces only) since enough pieces
can be used to attack the pawn continually, and perhaps
eventually win it. On the other hand, when forced to accept
such a weak pawn, the program understands that trading
pieces makes the pawn more valuable.

The outside passed pawn is a well-known positional
advantage which in simple king and pawn endings is almost
always a win for the side with such a pawn. This evaluation
is complex and also handles the case where both sides have
an outside passed pawn (the most outside wins unless a pawn
defends the other one). The program correctly assesses
these positions and generates the appropriate bonus for the
correct side.

When one side has passed pawns and the other side has
no pieces (or when both sides have passed pawns and neither
side has pieces), the evaluator quickly determines whether
the resulting position is won or lost by counting squares
and noting where the pawn(s) queen. The simplest idea is
the square of the pawn, where the pawn can race to promotion
without being caught by the opposing king. Variations on
this theme include both sides having passed pawns (where one
or both are unstoppable). Here, the program correctly
understands concepts such as "queening at least two moves
before the other side wins" (unless the opposing king is
supporting the opposing pawn to promotion which results in a
draw), understanding queening with check or queening and
simultaneously attacking the opponent's queening square
(unless the opposing king is also defending the queening
square, resulting in a draw for KP vs. K endings). The pro-
gram can solve any type of pawn race position without
searching a tree of any type.

The one principle adhered to rigidly in the evaluation
is correctness. If there is any doubt in the outcome, then
the passed pawn score produces nothing. If the scoring term
produces a bonus, then one side is definitely winning (as
opposed to "probably" winning). The bonus is more than a
rook, but less than a queen so that obtaining a queen is
better than simply being able to queen. The insistence on
correctness here is important, in that if the position can-
not be classified as won or lost, then a deeper tree search
(before applying this routine) may clear things up and allow
proper evaluation.

In simple king against king and pawn(s) end games,
current programs can solve the resulting positions easily
and quickly. The difficulty comes when trading pieces
and/or pawns and reaching such a position deep enough in the
tree that a search to the conclusion is impossible. This
evaluation routine allows Cray Blitz to play these endings
correctly with no tree search whatever, or conversely to
reach such a position anywhere in the tree search and still
correctly assess the resulting position as won or lost.

Miscellaneous Scoring

Several special-purpose scoring routines are used to
handle opening development, to assess cases where both sides
have a bishop and pawns, and finally to evaluate rook-pawn
endings. During opening development, Cray Blitz penalizes
moving any piece twice before moving all pieces once. This
is done by examining the current search variation and penal-
izing moves that do not move pieces/pawns from their origi-
nal squares. The penalty is dropped either after move fif-
teen, or after developing all pieces and castling the king.
Although such a concept seems almost too primitive to
include in a former World Champion, the program's ability to
find tactical forays before completing development (without
this heuristic) still amazes its authors. The program has
won (and lost) games where one piece never moved from its
original square. An example was the queen bishop in the game
against Belle in the 1981 ACM North American Computer Chess
Championship and the game against Deep Thought in the 6th
World Computer Chess Championship in Edmonton, Canada. This
simple idea eliminates these premature skirmishes and
results in more logical play during the opening. The rou-
tine also implements the common opening knowledge for the
particular opening it is playing. For example, it penalizes
blocking the c-pawn with a knight in queen-pawn openings.
This guides the program along usual book opening theory,
even when its opening book does not include an early reply
by the opponent. It also prevents the program from trying
to exploit some tactical weakness of the opponent without
having all its pieces developed, so that it later falls into
tactical difficulty itself. Note that this is one of the
few asymmetrical scoring terms in the program, applying only
to the program's pieces and not the opponent's.

The next special-purpose scoring routine evaluates cer-
tain types of drawn endgames such as bishops of opposite
color and endgames with a rook-pawn. When in a bishop of
opposite colors endgame, the positional score is reduced,
since drawing is probable, unless one side is significantly
ahead (two pawns or more). The program correctly evaluates
king against king and rook-pawn, and also king and rook-pawn
against king and bishop (with other pawns present or not).
The essence here is that it is possible to be material
ahead, but in reality be in a drawn ending. This routine
alters the score to reflect a draw, even though one side is
ahead in material. An example here would be to avoid a
position with king, rook-pawn and bishop of the wrong color
against a lone king where one side is 4,000 points ahead in
a drawn position. Using this routine, the program avoids
numerous swindles by strong humans who found other programs
susceptible to these traps. It appears that the idea can be
generalized to other known draws.

Draw Scoring

A heuristic added several years ago was to not assign
draw scores a value of zero, but rather assign them a value
equal to the ply where the search reports a draw. This
makes the program accept a draw when it is behind, but it
forces it to follow the longest possible drawing sequence it
can while still maintaining the forced draw. Against
humans, delaying the draw for as long as possible gives them
ample opportunity to blunder the draw away and lose. For
the program, since a two-fold and three-fold repetition are
identical, this heuristic also allows the program to find a
way around the first two-fold repetition and perhaps avoid
the draw altogether if it is possible (and if it is desir-
able to do so).

To implement this heuristic, the program reserves a
scoring window between 0 and 100 and uses these values
exclusively for draws. If a real score is greater than
zero, 100 is added to it. This leaves a hole in the scoring
range that contains no positional component. Thus, a posi-
tional score of 1 is reported as 101, leaving a 100 point
range and preventing the draw scores from interacting with
or duplicating positional scores.

The only negative aspect comes from the transposition
table scoring where the score might indicate that there is a
draw in n-ply from the stored position. In fact, the
looked-up draw score can be false because a shorter draw
occurs somewhere in the variation. This difficulty caused
Slate and Atkin to avoid storing any draw scores in the
transposition table. While this design is inherently inac-
curate, program performance improved dramatically in posi-
tions where draws occur frequently Even though both draw
scoring methods have inaccuracies, both will always find the
draw and accept or decline as the score dictates.

Hash Scoring

Since the king safety and pawn scoring is so complex
and time-consuming, the program reduces the computational
overhead that these routines consume by storing the values
in a hash table and re-using them as needed For pawn scor-
ing, the locations of all pawns on the board are used to
compute a hash key for each resulting pawn formation encoun-
tered during the tree search (as opposed to Slate and Atkin
which used three adjacent files to form a key to retrieve a
score for the three files only). The hashing routine uses
the complete pawn structure to form the hash key so that
more interactions between pawns can be included than were in
the early Slate and Atkin's program. The search updates the
hash key dynamically when moving, capturing, or promoting
pawns. The program keeps all pawn scoring terms in the
hashed scoring table, except for those that depend on the
locations of pieces (since piece locations are not used for
forming the hash key). During a typical 10-ply middlegame
search, 99% of all pawn positions are retrieved from the
table, reducing the pawn scoring overhead to one percent of
its original total. The evaluation routines compute the
remaining pawn scoring terms at each of the remaining nodes
and then add the result to the value retrieved from the hash
table. This allows us to implement almost any type of
pawn-evaluation strategy without considering the computa-
tional time required. In a typical chess competition,
2^23 entries for pawn positions are used, where each entry
is one 64 bit word.

For king safety, the location of all pawns on the board
and the king position are used to create a hash key. This
key is dynamically updated as the board changes. The search
then uses the key to save the computed king safety for later
reuse. In typical 10-ply middlegame searches, over 95% of
the positions reached require no king safety evaluation
since the value comes from the hash table. In a typical
chess competition, 2^24 entries are used for king safety
positions, where each entry is one 64 bit word.

Miscellaneous Features

Scoring Parameter Blocks

The scoring parameters were recently moved to a common
block in lieu of the hard coding used in the past. This
provided two immediate benefits. The first is that it makes
tuning the program's scoring values simpler and eliminates
the constant re-compiling that hard-coded values required.
It is now possible to alter a value, test the new code
against the old and tune the value as necessary, all without
re-compiling the source code.

A second benefit is the ability to tune the evaluation
parameters to more closely fit the game. For example, most,
if not all programs play Nc3 after playing d4 when playing
without an opening book. Now, the program uses one parame-
ter block for playing e4 openings, another parameter block
for playing d4 openings, and so forth. In this way, the
opening book dictates which basic plan the program follows
by loading the appropriate parameter block depending on the
book line played. Previous to this change, Cray Blitz
played e4 openings better because they result in more open
(and tactical) positions. Now it is possible to train the
program to play any opening system by adjusting the various
positional parameters to make the program follow the known
plans for such openings.

Tree Search Hashing

Cray Blitz uses the transposition/refutation table
hashing briefly described by Slate and Atkin and in more
detail by Marsland. The search stores real search scores as
well as search bounds since the real score is frequently
unknown when using the alpha-beta algorithm. The differ-
ences in the implementation by Cray Blitz from Chess 4.X
will be described here.

First, a complete hash table entry occupies 96 bits, of
which 40 bits are used to hold the hashed board value. The
low order bits of a 64-bit hash key are used for the hash
address and the high order 40 bits are stored in the table
to resolve collisions. For typical hash table sizes (2^26
entries on a Cray C90), the entire 64 bits are used. In
1974, Cray Blitz adopted the concept of storing a hashed
board representation rather than a true representation of
the board to save memory; after years of testing most other
programs now subscribe to this method as well. The only
danger inherent in this technique is that two different
positions can produce the same 64 bit hash key. Since the
program stores a "best" move in the hash table entry, the
move is validated to further "prove" that the two positions
are identical. Hours of simulation produced almost no colli-
sions with the current hashing algorithm that follows the
ideas presented by Zobrist. The only concern is that as
machine speeds continue to improve, a 64-bit hash key might
soon be insufficient. In 1974 simulations using thirty-two
bits produced no collisions due to the slow processor speeds
available then. However, proved that 32 bits are completely
inadequate for the size of tree searched by today's faster
processors.

The next part of the entry is either the true score or
the appropriate upper or lower search bound if the true
score is not known. The search only needs one bound if the
real score is not known, and it does not need either bound
if the real score is known.

The third piece of information, called the draft by
Warnock and Wendroff (authors of Lachex) is the search depth
below the current position that the search penetrated to.
For example, when doing an 8-ply search and storing a posi-
tion at ply three, the draft is five.

The last useful piece of information kept in the table
is the best move from the current position. This move is
always tried first in the search to improve the move order-
ing, and is saved, even in the quiescence analysis. The
only requirement is that the type of move must match the
search region. For example, a simple piece move (not a cap-
ture) is not allowed in region three.

Deciding what to keep and what to discard in the table
is a difficult issue. The first decision made was that a
single probe (as made by Slate and Atkin) into the hash
table is not sufficient. The program retrieves several hash
table entries with a negligible time penalty by utilizing
vector operations on the Cray computer system. The hashing
routine uses the low order N bits of the hash key as the
base hash table address and the middle ten bits as the ran-
dom offset for the re-hash of the second through the eighth
entries (it currently uses eight possible table entries to
store and/or look up positions). >From the list of eight
possible hash table locations, the following criteria select
the best table entry for replacement (stopping with the
first criterion met, if any): (1) store over a position from
a previous search (NOT the same thing as the current itera-
tion), (2) store over a position that has a smaller draft
(i.e. retain the one that saves the most work later.), and
(3) do not replace real scores by search bounds.

These rules let Cray Blitz find about 30% of typical
middlegame positions in the hash table, and well beyond 90%
in certain endgame positions.

One important implementation detail (that most chess
programs avoid) is that the hash table is never cleared.
The table carries information from move to move in a con-
tinuous manner. The program recognizes that a position
comes from a previous search so that it may be overwritten
first, but the table is never cleared since it contains
potentially useful information.

Time Utilization

Another area where Cray Blitz has been innovative is in
the utilization of time during tournament play Using this
timing algorithm, the program goes into a "deep think" when
it finds itself in trouble. The search remembers the value
returned from an iteration. It is possible that the next
iteration finds that the best move really loses material
when searching deeper. If the next iteration fails low but
the search reaches the target time for the move before com-
pleting the iteration, the program continues to think beyond
the time limit to find a move that does not lose anything.
The amount of extra time used is proportional to the
material lost. This algorithm was developed to avoid the
circumstance where the program sees a loss of material and
the programmers are on the edge of their seats hoping that
the program finds the saving move before exhausting the
search time. This overflow algorithm minimizes these
occurrences and generally helps save material at least once
in each game played. Avoiding one weak move is ample justif-
ication, even though the program sometimes "deep thinks"
when forced to give back a pawn it "won" earlier, after
finding that the pawn really cannot be held.

To assist with the time overflow algorithm, the program
attempts to accumulate extra time during the search. First,
the program attempts to save a fixed amount of time (set by
the operator and normally 20 minutes for tournament games).
The large opening book and thinking on the opponent's time
usually saves enough time to meet this requirement. Until
saving the required amount of time, the program computes the
target time by taking the total time allowed, subtracting
the operator overhead (normally 10 minutes) and dividing by
the total moves in the time control. As it gets ahead on
time, it continues to use this target until it saves the
amount of time required (20 minutes here). Once it saves
the necessary 20 minutes, it then computes the target time
as the time left on the clock less the 20 minute deep think
buffer, and divides this by the number of moves remaining.
Thereafter, as it saves time by correctly predicting the
opponent's move, the average time per move increases
steadily.

Another timing issue is the decision of whether to
start the n+1 th ply iteration when it is hopeless to fin-
ish the iteration. The next iteration is always started for
an important reason. Since the program uses the aspiration
search and computes the lower and upper search bounds for
iteration n from the iteration n-1 score (plus and minus
a quarter of a pawn), a fail-low condition happens quickly
(if it is going to). The program therefore starts the next
iteration and gives it a chance to fail low. This invokes
the deep think time overflow algorithm to let the program
search for a solution to the problem that it would not have
seen had it stopped after the last iteration. This allows
the program to more or less verify that the move and score
from iteration n are reasonable, even though it really
cannot search iteration n+1 within normal time con-
straints. This also happens at least once per game,
although sometimes it is impossible to avoid the material
loss an iteration discovers.

Also, starting the next iteration and then aborting it
on a time-out interrupt does not waste the time, even if the
search does not fail low. After announcing the move, the
search is re-started and the transposition table prevents
searching the parts of the tree already examined, effec-
tively skipping the search back to the point at which the
interruption occurred.
--
!Robert Hyatt Computer and Information Sciences !
!hy...@cis.uab.edu University of Alabama at Birmingham !

Reply all
Reply to author
Forward
0 new messages