Alpha-beta inconsistencies

Chua Kong Sian

unread,

Feb 18, 1994, 9:38:36 PM2/18/94

to

I would like to know how chess programmers deal with alpha-beta
inconsistencies. Or is this a secret? First, let me explain
what I mean.

Alpha-beta implicitly assumes that the score return from the search
is independent of the search window. Therefore when a fail high
occurs, a search window of <beta, inf> can be carried out with the
confidence that the score will be guaranteed to lie within this
window.

But what happens when the scoring mechanism itself is dependent upon
the values of alpha & beta? This can arise because not all terminal
nodes need to be exactly evaluated. For nodes whose estimated score
lies outside alpha/beta, an estimate can be used. In this case, when
a fail high occurs and a search is carried out about <beta, inf>, it
is entirely possible that a score < beta can be obtained giving rise
to inconsistencies.

If the inconsistency happens only at ply 1, we can always re-search
using <-inf, inf> and be assured of a valid score. However, if some
form of scouting search is used at internal nodes, this would be harder
to resolve.

Any insights, solutions, hints?

Kong Sian

Robert Hyatt

unread,

Feb 23, 1994, 9:30:50 AM2/23/94

to

In article <199402231...@ibm3090.rz.uni-karlsruhe.de> UK...@ibm3090.rz.uni-karlsruhe.de (Peter W. Gillgasch) writes:
>In article <2k3u3c$g...@nuscc.nus.sg>,

>Solution: You are talking about an "estimated" value that is outside
>the alpha-beta window. The estimated values are normaly obtained
>using the "fast" evaluation (incremental first order evaluation) only.
>If you limit your "slow" evaluation (non incremental second order
>evaluation) like pawn structure and king safety evaluation to yield
>a value that is within a certain window, say (-max2,max2) then you
>can store the node value in the hashtable as
>lower bound = estimate - max2 or upper bound = estimate + max2
>iff these values are outside the alpha-beta window.
>The only trouble with this solution is that you have to limit the
>2nd order eval. But a range of, say, +/- 3 pawns is in almost all
>cases realistic, and in endgames with passed pawns you can adjust
>this range.
>PWG.

Also, you are going to have to live with such inconsistencies. A real
interesting one occurs when using transposition tables. Suppose you
are waiting on your opponent's move, and your program ponders quite
deeply (much deeper than a normal search due to your opponent's long
think time. You can load the transposition table with lots of
interesting things.

Now, it is your move, your opponent didn't make the move you expected
and you re-start the search from ply 1. For move "x" you find a
"beta cutoff" score in the table, indicating that this branch is better
than the current branch. Unfortunately, you were able to "prove" this
in the "long" search because the basic search depth was quite high
allowing the selective search more room to look at interesting things.

You then get a move xxx fails high, but when you relax "beta" and do
the search again, this "fail hi" entry in the table is now useless
since the stored value of beta is lower than the current (real) value
for beta. You search the branch, but since the basic search depth
(iteration) is significantly lower than that used to compute the table
entry, you can't follow the checks and whatever deep enough to see
how good this move is. You simply assume it was better than the old
beta value. This is pretty bad, as this move might win a queen or
whatever, and yet beta might have been .001 pawns better than the
score when this branch was computed (we use the zero-width window
search after firmly establishing alpha at the root of the tree.).

The search can then change to another move, but the next iteration might
well make this "xxx" move fail high again, only to return no P.V. since
again the search can't see enough to understand why the move is good.
Really annoying... it happens with one processor or with n processors.

To solve the problem you mention, we simply latch the lowest and highest
evaluations our positional evaluator produces during a search. For the
quick cutoff test, we check (material score) + highest positional score
or (material score) + lowest positional score. This is because our
endgame pawn scoring can produce values of up to +/-8 pawns for those
endings that are clearly won. Additionally, our "determined outcome"
evaluator can produce scores of another +/- 15 pawns when it notices that
one side has a clear win (KR vs k, KBN vs K, etc... ie, any ending that
the program "knows" it can win handily)

There is a tiny "hole" in this logic, as the minimum positional score
might not represent the score to be returned for "this" node as this
might be the very first lost KP vs K ending we've encountered. We
accept this and keep right on going.

--
!Robert Hyatt Computer and Information Sciences !
!hy...@cis.uab.edu University of Alabama at Birmingham !

Peter W. Gillgasch

unread,

Feb 23, 1994, 6:25:00 AM2/23/94

to

In article <2k3u3c$g...@nuscc.nus.sg>,
nsr...@leonis.nus.sg (Chua Kong Sian) writes:

Solution: You are talking about an "estimated" value that is outside

Robert Hyatt

unread,

Feb 24, 1994, 9:24:00 AM2/24/94

to

In article <1994Feb24.0...@cca.vu.nl> wie...@iodine.chem.vu.nl (Gijsb. Wiesenekker) writes:
>Peter W. Gillgasch (UK...@ibm3090.rz.uni-karlsruhe.de) wrote:
>: In article <2k3u3c$g...@nuscc.nus.sg>,

>In ZZZZZZ-3.0 there were alpha/beta inconsistencies, as I found out
>when I built in a warning error in the aspiration search whenever
>a fail-low or fail-high move failed high or failed low in the research.
>I consider research errors serious bugs, because I think the alpha-beta
>search result is useless if lower and upper bounds are not handled
>correctly.
>After some debugging, I found out that I could avoid all research
>errors by doing the following:
>
>(1)
>Do not use extensions that are dependent of alpha or beta (e.g.
>'there is a threat if the null move score falls 2 pawns below alpha')
>When researching, alpha or beta may have changed, so some extensions may
>not occur. Use something like the current material balance instead (e.g.
>'there is a threat if the null move score falls 2 pawns below the
>current material balance')
>
>(2)
>Do not use search history dependent extensions (e.g. 'if there were
>two or more checks in the current line of play, then..' or 'if there
>were three or more captures in the current line of play, then..' or
>'if there were two threat extensions in the current line of play, then..')
>If you do use search history dependent extensions, make sure that you
>include the history dependencies in the hash key if you
>use transposition tables (e.g. hash the number of threat extensions
>in the current line of play into the hash key).
>
>(3)
>Only use the hash table score if the current search depth is equal to
>the requested search depth, and not if the current search depth is
>less than or equal to the stored search depth.
>
>The only search inconsistency left in ZZZZZZ is the fact that I
>only allow null-moves cutoff's when not in the principle variation.
>This gives research errors when the null-move heuristic fails, but
>this is nice, because in this way I can detect (half of the)
>zugzwang cases.
>
>Gijsbert Wiesenekker
>ZZZZZZ author.

I'm afraid you "threw out the baby with the bath water here"... Not
using "search history dependent extensions" is a severe limitation.
A fail high followed by a fail low on the research is rare, but when
it happens, it doesn't cause any problem that your solution doesn't
make worse. IE, with this hi/low fail, you DO know that the move is
better. If you omit the extensions, you don't learn anything at all.

David Slate and I had many discussions about draw scores in the
hash table... we do it, he didn't. His reason: some draw inconsistencies
that he didn't like. We simply tolerate them as the result (even when
inaccurate) is still better (a draw is a draw is a draw)

(3) makes *NO* sense at all..... you are really struggling to make something
"fit" your idea of what it should do.

In short, if you search with a window of +/- infinity ALL THE TIME, you also
can eliminate your fail hi/fail low problem.... Why not go all the way and
solve the problem? (hint: many more nodes will reduce your depth and you
pay in yet another way.)

Gijsb. Wiesenekker

unread,

Feb 25, 1994, 12:55:54 PM2/25/94

to

Robert Hyatt (hy...@cis.uab.edu) wrote:
: In article <1994Feb25....@cca.vu.nl> wiese...@sara.nl writes:
: >(1)
: >I think fail highs followed by fail lows DO make your solution worse, because
: >it indicates that the handling of lower and upper bounds in the alpha-beta
: >is wrong, making the alpha beta search result useless. What is the
: >value of the alpha-beta if beta cutoffs are based on incorrect fail highs?
: >And what do you do about 'incorrect fail lows' in the iterative deepening?
: >Incorrect fail highs are detected because they are researched,
: >but incorrect fail lows are not because they are not researched.

: The fail high is NOT incorrect. That CANNOT happen in a correct
: alpha/beta search. The problem is, you often know that a position
: is better than another, but, when doing a shallower search next time,
: you can't see why. If you ever search to the point where the
: position from the table has insufficient "draft" then you are also
: deep enough that your "search history" extensions will carry you to
: the same point again, but this time you won't fail high but will get
: the real value. I have NEVER had a position where we failed hi and
: then failed low, and the fail high move was bad. It is ALWAYS better
: than the move it failed high after. The only problem is that we can't
: find out how much better it is, and another move may fail high behind
: this move and replace it. This isn't a BAD thing, but it means that
: TWO moves are better than the old best move, and we simply take the
: second without knowing whether the second fail high move is better or
: worse than the first fail high move. we DO know that both are better
: than the original move, else our alpha/beta implementation is simply
: wrong.

Of course, if you know that your alpha-beta routine is correct, and
you know that the origin of the fail low is for example a hash
table score with a greater search depth (see below) than you
can ignore this.
But if you do not know that, the origin of the fail low after
the fail high can also be a bug or a problem in your alpha/beta routine.
The researched fail high move can for example fail low because (due
to for example incorrect hashing) additional extensions are used
in the research, so that the program now sees that the fail high
move instead of winning material, now loses material (again, this
would not happen in a correct alpha-beta routine).
The problem might also be a hash fault: the move failed high because of a
wrong hash table entry (a different position with the same hash key).
If the entry gets overwritten before the research, the research might
fail low.
I wanted to make sure that the only origin for research errors (or
warnings as you would say) were zugzwang cases so that I can detect
these.
(As in GNU-chess, I only allow null-moves when not in the principle variation.
So in cases of zugzwang a move might fail high/low, but when researched
it might fail low/high because the program is now in the pv and the null-move
is not allowed any more)

: >
: >(2)
: >I only wanted to stress that if you do use search history dependent
: >extensions, you should include the history dependency in the hash key to
: >avoid hash table inconsistencies. Of course, every chess program uses
: >search history dependent extensions.

: This is good and bad. Including the history will reduce the number of
: hash hits you get, since you now differentiate identical chess positions
: by the sequence of moves leading to them. This is exactly why the
: transposition table is useful in the first place.

The search history determines what we are going to do in that position.
If the number of threat extensions is for example search history dependent
(we only allow two threat extensions in a line of play for example),
we can reach the same chess position in different ways: after one
threat extension (so we will only allow one more when searching that
position) or after two threat extensions (we will not allow them any more).
So these for humans identical chess positions are not identical in the program:
when searched, they will give different scores and should have a
different hash key (not by including the moves that lead to that position into
the key, but by including the number of threat extensions that we
are going to allow).

: >
: >(3)
: >If you use the hash table score if the current search depth is less than
: >the stored search depth, you assume that there is a relation between the
: >scores at depth (d + n) and the scores at depth (d) (that the scores
: >at depth (d + n) always lie within the search window for depth (d)
: >for example). As no such relation exists (then we would not need
: >iterative deepening) you should only use the score if the stored depth
: >is equal to the requested depth, or make sure that you research the
: >move at at least the stored depth. Consider the following example:
: >
: >Assume the position occurs in the hash table, the stored depth is
: >greater than the requested depth, the score is a lower bound and
: >greater than beta. Then we return the stored score.
: >Here we might have a problem, because the score is based on a larger
: >search depth. It could be a mate score for example, which might
: >not have been discovered when we would search the position at
: >the requested depth.
: >In the iterative deepening we now have a fail high, so we research.
: >If the hash table entry has been overwritten, we now search the position
: >at the requested depth and we might not find mate, giving a fail low..

: My response still is, "so what?" If a master tells you that when playing
: the xxx variation of the King's gambit as black, you never play the move
: "yyy" because it loses. I would take his word, even though I don't
: understand exactly why. This is what the table is doing. It is saying
: that this move is bad, but we currently aren't searching deep enough to
: understand why... given enough time we can. How can avoiding such a
: move be bad?

Again, I wanted to eliminate research errors that were due to this.

: >
: >(4)
: >I have had the same problem with draw scores in the hash tables, so
: >I do not store them.
: >
: >Finally I want to remark that the above solutions to the alpha-beta
: >inconsistencies eliminated all research errors in ZZZZZZ with no
: >performance loss (it solved even more win at chess positions,
: >because moves that should have failed high, but were 'incorrect
: >fail lows', now did fail high..)
: >

: I have run exactly the same tests. We have solved almost all of the
: win at chess positions, and when I was working on this "problem" and
: tried various solutions, all they did was slow the search down. They
: NEVER made it find better moves at shallower depths, better moves at
: equal depths, period. If you are playing better after "fixing" a
: problem that really isn't "broke" I would strongly suspect that either
: your search has one or more bugs, or your hash table implementation has
: a problem. If you have a position or positions to try, let me know.

As every chess programmer knows: methods that 'work great' or 'make
no difference' in other chess programs usually 'work bad' or 'make
a big difference' in your chess program, because so much depends on
all the tricks used by the program. (The history heuristic does
nothing in ZZZZZZ, and discounting immediate recaptures in the search
depth makes it run twice as slow. No doubt they work great in other programs).

Gijsbert Wiesenekker
ZZZZZZ author.

Robert Hyatt

unread,

Feb 25, 1994, 3:45:34 PM2/25/94

to

OK. Maybe we ought to separate the two topics: (a) what to do about fail
high/research/fail low when debugging your search and (b) what to do about
fail high/research/fail low on a "production" program.

For (a) anything goes... it is hard to debug. Try debugging on a C90 where
we can search 100,000,000 nodes... tough to print out and follow through by
hand. However, for (b) we don't worry about it after we have carefully
exercised the search for hundreds or thousands of test positions.

BTW, null moves work EVERYWHERE. In Cray Blitz, when backing up a score,
procedure backup() simply asks: "are we backing up a score to a node
which just tried a null move?" If the answer is yes, we DON'T back up
the score, just return up the tree one level and try a real move since
the null move "can't" be best. This simply leaves the previous score
(alpha or beta) whatever it was. Simple and works well. To show how well
the null move works, I'm going to include two short searches, one with
the null move (the first one) and one with the null move disabled (the
second one.) The search will be to a fixed depth so that time and node
count comparisons are valid.

First, the null move search:

This is a position (move 22) from the Mephisto Genius 2.0 - Cray Blitz
game I'm going to post this afternoon. Depth is set to 7 plies to get a
reasonable search time. I didn't do anything to the hash table which
means it's going to be way too small. However, it doesn't matter for
this test.

While I'm waiting, we use the "recursive" implementation of null move
which means that multiple null moves can be tried in the same variation,
just not back-to-back which would degrade the search to nothing.

OK, machine is slow, but here is the search USING null move:

depth time eval variation (s 1)
14:25:29 1 0:00 0.227 Bf5 ...
14:25:29 1-> 0:00 0.227 Bf5 ...
14:25:29 2 0:00 0.227 Bf5 Qd4
14:25:29 2a 0:00 ++0 a5
14:25:29 2 0:00 0.251 a5 Ra1 axb4 axb4
14:25:29 2-> 0:00 0.251 a5 Ra1 axb4 axb4
14:25:29 3 0:00 0.251 a5 Ra1 axb4 axb4
14:25:30 3-> 0:00 0.251 a5 Ra1 axb4 axb4
14:25:30 4 0:00 0.351 a5 Rb1 axb4 axb4
14:25:30 4-> 0:00 0.351 a5 Rb1 axb4 axb4
14:25:34 5 0:01 0.395 a5 Bc3 axb4 axb4 Bf5 Qd4
14:25:36 5-> 0:02 0.395 a5 Bc3 axb4 axb4 Bf5 Qd4
14:25:45 6 0:05 0.430 a5 Bc3 Nd5 Rc1 f6 f4 fxe5 fxe5 axb4
axb4
14:25:55 6-> 0:07 0.430 a5 Bc3 Nd5 Rc1 f6 f4 fxe5 fxe5 axb4
axb4
14:28:09 7 0:38 0.520 a5 Ba1 axb4 axb4 Ra7 Bc3 Ra2 Rc1
14:28:58 7-> 0:52 0.520 a5 Ba1 axb4 axb4 Ra7 Bc3 Ra2 Rc1
time: 0:52 0:01 0.3p nodes: 1537023 h 20% 97% 94% 29038 nps

Here is the same search NOT USING null move:

depth time eval variation (s 1)
14:32:37 1 0:00 0.227 Bf5 ...
14:32:37 1-> 0:00 0.227 Bf5 ...
14:32:37 2 0:00 0.227 Bf5 Qd4
14:32:37 2a 0:00 ++0 a5
14:32:37 2 0:00 0.251 a5 Ra1 axb4 axb4
14:32:37 2-> 0:00 0.251 a5 Ra1 axb4 axb4
14:32:37 3 0:00 0.251 a5 Ra1 axb4 axb4
14:32:37 3-> 0:00 0.251 a5 Ra1 axb4 axb4
14:32:38 4 0:00 0.351 a5 Rb1 axb4 axb4
14:32:39 4-> 0:00 0.351 a5 Rb1 axb4 axb4
14:32:42 5 0:01 0.395 a5 Bc3 axb4 axb4 Bf5 Qd4
14:32:45 5-> 0:03 0.395 a5 Bc3 axb4 axb4 Bf5 Qd4
14:32:51 6 0:07 0.430 a5 Bc3 Nd5 Rc1 f6 f4 fxe5 fxe5 axb4
axb4
14:33:04 6-> 0:18 0.430 a5 Bc3 Nd5 Rc1 f6 f4 fxe5 fxe5 axb4
axb4
14:34:10 7 0:54 0.542 a5 Ba1 Nf5 bxa5 Nxd6 Rxd6 Qa2 Bd4 Qxa3
14:36:36 7-> 1:55 0.542 a5 Ba1 Nf5 bxa5 Nxd6 Rxd6 Qa2 Bd4 Qxa3
time: 1:55 0:02 0.5p nodes: 3608883 h 17% 97% 95% 31302 nps

Discussion: First, null move took 52 seconds to search the position, 115 seconds
without. Roughly 2-1 time advantage for null move. Nodes: 1.5M with null move
and 3.6 without. Again, the expected 2-1 (actually slightly more). Nodes
per second went up slightly without null move as well if you really want a big
number here.

Note that the 7-ply variation and score are different when you use null-move
and when you don't. a really "nice" feature, but for a speedup of 2x, we
tolerate it.

7-ply is minimal for recursive null move, as we hardly see any difference between
recursive null-move and non-recursive null-move at depth 5 or 6 (all I can do
on my sun sparc.) On the C90 (above) 8 or 9 would have probably even been more
dramatic, and with the full machine (16 cpus) and 10 -11 plies, it is even
better.

>
>: >
>: >(2)
>: >I only wanted to stress that if you do use search history dependent
>: >extensions, you should include the history dependency in the hash key to
>: >avoid hash table inconsistencies. Of course, every chess program uses
>: >search history dependent extensions.
>
>: This is good and bad. Including the history will reduce the number of
>: hash hits you get, since you now differentiate identical chess positions
>: by the sequence of moves leading to them. This is exactly why the
>: transposition table is useful in the first place.
>
>The search history determines what we are going to do in that position.
>If the number of threat extensions is for example search history dependent
>(we only allow two threat extensions in a line of play for example),
>we can reach the same chess position in different ways: after one
>threat extension (so we will only allow one more when searching that
>position) or after two threat extensions (we will not allow them any more).
>So these for humans identical chess positions are not identical in the program:
>when searched, they will give different scores and should have a
>different hash key (not by including the moves that lead to that position into
>the key, but by including the number of threat extensions that we
>are going to allow).

As I said, you can do this. However, we prefer to accept the inaccuracy
that our method causes, since it doesn't hurt, and often helps if you get
a "glimpse" of the future that you really shouldn't be seeing.

If I said this, I "lied" as we DO store draw scores in the hash table,
and, since our draws are treated just like mates (ie draw in "n" plies)
we occasionally get some really funny draw scores (draw in 90 plies when
the program can't reach past 64 plies due to table dimensions....)

>: >
>: >Finally I want to remark that the above solutions to the alpha-beta
>: >inconsistencies eliminated all research errors in ZZZZZZ with no
>: >performance loss (it solved even more win at chess positions,
>: >because moves that should have failed high, but were 'incorrect
>: >fail lows', now did fail high..)
>: >
>
>: I have run exactly the same tests. We have solved almost all of the
>: win at chess positions, and when I was working on this "problem" and
>: tried various solutions, all they did was slow the search down. They
>: NEVER made it find better moves at shallower depths, better moves at
>: equal depths, period. If you are playing better after "fixing" a
>: problem that really isn't "broke" I would strongly suspect that either
>: your search has one or more bugs, or your hash table implementation has
>: a problem. If you have a position or positions to try, let me know.
>
>As every chess programmer knows: methods that 'work great' or 'make
>no difference' in other chess programs usually 'work bad' or 'make
>a big difference' in your chess program, because so much depends on
>all the tricks used by the program. (The history heuristic does
>nothing in ZZZZZZ, and discounting immediate recaptures in the search
>depth makes it run twice as slow. No doubt they work great in other programs).
>

I still don't see how you can "further qualify" hash table entries so that
you don't get confused and not lose some speed. I suspect that you haven't
tried the right positions yet. In any case, it appears to me that when you
start folding in "other things" into the hash table (we already have to deal
with castling and enpassant) positions become that much more unique and less
likely to help along other branches since the positions begin to depend as
much on the branch itself as they depend on the placement of the pieces.

James Alan Riechel

unread,

Feb 26, 1994, 2:50:34 PM2/26/94

to

Just to clarify:

(a) by "fail high" do you mean when a new (greater) alpha cutoff
is found, thereby invalidating the current one?

(b) likewise, by "fail low" do you mean when a new (lesser) beta
cutoff is found, thereby invalidating the current one?

Thank you,

---------------------------- -James Riechel (jrie...@cc.gatech.edu)
| *k: ::: ::: ::: |
| ::: ::: ::: ::: |
| *p ::: ::: :::*p ::: |
| ::: ::: ::: *p: |
| ::: ::: ::: P ::: |
| ::: ::: ::: P ::: P |
| ::: ::: ::: P :K: |
| ::: ::: ::: ::: | White to play and draw!
----------------------------

Robert Hyatt

unread,

Feb 26, 1994, 10:11:13 PM2/26/94

to

The general terms "fail hi" and "fail lo" usually refer to the root
of the tree. On a fail hi condition, the current branch is better than
or equal to the current beta value. Most of us now use the zero-width
window where beta is set to root_score+1 after the first move is searched
and a value for it is determined. The fail hi here is not very exciting
yet as we may be talking about only a few "millipawns." The larger the
gap between the current root score (alpha) and beta, the more "interesting"
the fail high condition becomes.

Fail lo can only happen on the first move at the root. This means that
the first branch has a score lower than whatever alpha (lower bound) value
was chosen to start the search. In Cray Blitz, when starting iteration
"n" we use the score from iteration n-1 and then add +/- 250 (1/4 of a pawn
for a total window of 1/2 pawn centered on the previous score.) Obviously,
when C-B "fails low" we take notice because the first move is going to be
at least 1/4 pawn worse than we expected from the previous search. If the
first move fails high, we are also more interested as this means that the
first move is at least 1/4 pawn higher than we expected.

Hope that this helps....

Bob