Testing Chess Programs

Jan Eric Larsson

unread,

Feb 9, 1996, 3:00:00 AM2/9/96

to

Hi!

There has been some discussion about this lately, and I found it one
of the most important practical issues in building a strong program:

How does one test and verify a chess program?

Of course, some bugs are easy to spot. If a position is obviously
scored wrong, mate when there is none, say, and this information is
updated to the root, the bug is easy to see. It may not be easy to
understand where and why it happens, but then, that's chess
programming for you :-)

But what about all those other bugs we never see? What about when a
variation is scored completely wrong, but only in situations where it
won't reach the root node or main variation. This may simply look like
if the program is not searching deeply enough, does not have the
extension for this situation, or maybe as a misevaluation.

Other bugs may be, for example, expanding a move several times from
the same node, (first as a hash move and then then during the rest of
the moves phase, for example). This will simply look like less speed
or less efficient tree pruning.

So I'm asking for all kinds of experiences and practical methods and
tricks from the bunch of chess programmer out there. Note that I'm
not demanding theoretical methods as much as sound methodology tricks.

I'll contribute two myself:

1. Robert Hyatt once wrote that he actually prints out rather large
trees and checks them by hand, (eh, eye), which certainly will
give away lots of bugs that would otherwise go unnoticed. Whether
this is a trick or not can be discussed, but it is important to
note that this can be done even if the trees searched contains
tens of thousands of nodes.

2. In one of my earlier chess programs, (Citrus C; basically never
completely finished), I implemented move generation and threat
detection in both bitboards and mailbox versions, and I had a
setup where one representation was used to check the other.
I discovered lots of bugs concerning capture promotions to
corner squares, en passant moves on the capture search horizon,
etc. Some work, of course, but a search that I really thought
was fairly bug-free certainly wasn't.

Any experiences, comments, or suggestions?

Jan Eric Larsson
Knowledge Systems Laboratory, Stanford University, Room 248, M/C 9020
Gates Computer Science Building 2A, Stanford, CA 94305, USA
Phone: +1 415 725 3859, Fax: +1 415 725 5850, E-mail: Lar...@KSL.Stanford.Edu

Robert Hyatt

unread,

Feb 9, 1996, 3:00:00 AM2/9/96

to

In article <4fginq$5...@nntp.Stanford.EDU>,
Jan Eric Larsson <lar...@hpp.Stanford.EDU> wrote:
-->
-->Hi!
-->
-->There has been some discussion about this lately, and I found it one
-->of the most important practical issues in building a strong program:
-->
--> How does one test and verify a chess program?
-->
-->Of course, some bugs are easy to spot. If a position is obviously
-->scored wrong, mate when there is none, say, and this information is
-->updated to the root, the bug is easy to see. It may not be easy to
-->understand where and why it happens, but then, that's chess
-->programming for you :-)
-->
-->But what about all those other bugs we never see? What about when a
-->variation is scored completely wrong, but only in situations where it
-->won't reach the root node or main variation. This may simply look like
-->if the program is not searching deeply enough, does not have the
-->extension for this situation, or maybe as a misevaluation.
-->
-->Other bugs may be, for example, expanding a move several times from
-->the same node, (first as a hash move and then then during the rest of
-->the moves phase, for example). This will simply look like less speed
-->or less efficient tree pruning.
-->
-->So I'm asking for all kinds of experiences and practical methods and
-->tricks from the bunch of chess programmer out there. Note that I'm
-->not demanding theoretical methods as much as sound methodology tricks.
-->
-->I'll contribute two myself:
-->
-->1. Robert Hyatt once wrote that he actually prints out rather large
--> trees and checks them by hand, (eh, eye), which certainly will
--> give away lots of bugs that would otherwise go unnoticed. Whether
--> this is a trick or not can be discussed, but it is important to
--> note that this can be done even if the trees searched contains
--> tens of thousands of nodes.
-->
-->2. In one of my earlier chess programs, (Citrus C; basically never
--> completely finished), I implemented move generation and threat
--> detection in both bitboards and mailbox versions, and I had a
--> setup where one representation was used to check the other.
--> I discovered lots of bugs concerning capture promotions to
--> corner squares, en passant moves on the capture search horizon,
--> etc. Some work, of course, but a search that I really thought
--> was fairly bug-free certainly wasn't.
-->
-->Any experiences, comments, or suggestions?
-->
-->
-->Jan Eric Larsson
-->Knowledge Systems Laboratory, Stanford University, Room 248, M/C 9020
-->Gates Computer Science Building 2A, Stanford, CA 94305, USA
-->Phone: +1 415 725 3859, Fax: +1 415 725 5850, E-mail: Lar...@KSL.Stanford.Edu
-->

One important part of crafty is the "DEBUG" switch. This will slow it
down by a factor of 3x, but carefully validates each bitboard, hash
keys, piece counts, etc. Other variations of this switch will validate
the evaluation hashing by retrieving the hash score, computing it anyway,
and comparing. Several other things are also done. I do this every so
often to make sure obvious bugs haven't gotten in.

Debugging the hash table is nearly impossible, except on rare occasions
because of the size of the trees. Two techniques I use regularly are
(a) dump a tree and carefully go through it, looking for moves that are
searched twice, not searched at all, etc. and (b) "read" the code by
studying a module carefully to see if anything "jumps" out.

Lots of witchcraft and voodoo to make one of these things work well...

:)

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

Thomas Likens

unread,

Feb 10, 1996, 3:00:00 AM2/10/96

to

Excellent, another interesting thread!!

Another technique is to compare the attack.from tables against the attack.to tables.
Of course this is only possible if your program stores both sets of tables :)
If you incremetally update the piece attack tables it is possible to also generate
the attack tables from scratch and compare the two results. If a mis-match occurs
it is often useful to log the offending position to either the screen or a debug log file.
If you use the rotated bitboard idea you can compare the queen, rook, bishop attacks
generated with the rotated bitboards against the more traditional bitboard methods and flag
and error if they do not agree. Another point is that the traditional debugging ideas such
as input parameter validation, unit testing, using assertions, walking through your code with
a good source code debugger are especially relevant when programming computer chess since a
bug that rears its ugly head in a 7-ply deep search is especially nasty to track
down. The technique of double redundancy-checking can be used when hashing in an analogous
fashion. Incremental hashing scores can be compared against hashing scores generated from
scratch. Of course adding debug scaffolding usually slows the program's execution time
down to a crawl so it pays to make the debug checking optional. I use a number of compile
time defines, but command-line switches could also be used. My experience has been that
the time invested in debug scaffolding has paid off handsomely in the long run. It is
very comforting at two A.M after your program crashes hideously to be able to turn on
the appropriate debug switch and watch the program flag the offending code. It has saved
me countless hours.

--Tom Likens

Patrick Leung

unread,

Feb 11, 1996, 3:00:00 AM2/11/96

to

There's a branch in Computer Science where we prove a program's
correctness. For a simple, short program, writing the proof is
probably not too difficult, but for any long, complicated program
such a proof is not practical to write.

Here are some of the other things you can do to verify a program's
correctness:

1. testing, testing, testing.
Simply run your computer program. Test it yourself or have other
people test it with as many possbilities as possible.
More importantly, test the boundary points if any exist.
Now, make sure that the output is correct.
Use sanity checks where they may apply.
However, for any program like chess, where there are so many
possiblities, you can not possibily test every single case.

2. If you're the programmer, you can read, and reread your code.
It's pretty easy to slip if you're not careful.
You may also want to run your program with a debugger.

3. If possible, take the output of another a computer program
which does exactly the same thing by
someone else which you know for certain is correct, and compare
that output to your own.

4. Right now, I think there are a few computer software out there
in development which will check help programmers check their
program's correctness automaticly.
The ADA compiler does check for syntatic and semantic errors.

5. Verify your algorithms with books, or other resources.

6. Any others you can think of not listed above.

Some of the above suggestions may not apply the chess programs specificly;
I'm being a little more general here.

I don't think there's really no good way to verify a [complex] computer
program's correctness.

The best thing to do is still to write "perfectly correct" programs
from the start, and continue that way; so, you won't have to debug
anything later. Of course, programmers are still humans, and do make
mistakes.

Patrick

Steven J. Edwards

unread,

Feb 11, 1996, 3:00:00 AM2/11/96

to

It is very important to test the basic move generation, move
execution, move retraction, and check detection before advancing to
other parts of a program.

One way of doing this is to implement an enumeration routine. This
routine recursively calls itself to enumerate all possible and legal
move pathways to a specified depth. The idea is to compare the
reported sums with those of a known good program.

For example, some enumerations of the starting position are:

ply 1: 20
ply 2: 400
ply 3: 8902
ply 4: 197281
ply 5: 4865609
ply 6: 119060324

Once this routine is in place and the program checks out for
enumerations of the initial array, a more thorough test can be made by
performing enumerations of all the positions in a test suite and
checking them against a known good program. Until recently, there was
no easily available "known good" program that could provide a
reference. Fortunately, this has changed and the freely available
program Crafty does have a suite enumerator that can generate such
counts. The command of interest is:

epdenum <depth-limit> <EPD-input-file> <EPD-output-file>

A dozen or so standard EPD test suite text files can be found at the
chess.onenet.net ftp site in the pub/chess/Tests directory.

Running the Crafty command:

epdenum 1 WAC.epd WACd1.epd

produces:

Enumeration to depth 1 totals 11962

And running the Crafty command:

epdenum 2 WAC.epd WACd2.epd

produces:

Enumeration to depth 2 totals 393416

The output EPD files contain, for each position, the enumeration for
that position. So, if your program can output EPD for enumeration
activity, it is easy to find disreprencies by simple file comparison.

-- Steven (s...@mv.mv.com)

Vincent Diepeveen

unread,

Feb 12, 1996, 3:00:00 AM2/12/96

to

In <4fkd0j$h...@azure.acsu.buffalo.edu> ple...@cs.buffalo.edu (Patrick Leung) writes:

>There's a branch in Computer Science where we prove a program's
>correctness. For a simple, short program, writing the proof is
>probably not too difficult, but for any long, complicated program
>such a proof is not practical to write.

>Here are some of the other things you can do to verify a program's
>correctness:

>1. testing, testing, testing.
> Simply run your computer program. Test it yourself or have other
> people test it with as many possbilities as possible.
> More importantly, test the boundary points if any exist.
> Now, make sure that the output is correct.
> Use sanity checks where they may apply.
> However, for any program like chess, where there are so many
> possiblities, you can not possibily test every single case.

I can make the question even more detailed:

We know that a certain fault in the program causes to give in the
root the wrong evaluation. How do you find the position where it misevaluated?

This would be of great help to all chessprogrammers!
Debugging is possible as soon as you know WHERE the fault has been made.

>6. Any others you can think of not listed above.

Cheap.

Vincent Diepeveen
vdie...@cs.ruu.nl
--
+--------------------------------------+
|| email : vdie...@cs.ruu.nl ||
|| Vincent Diepeveen ||
+======================================+

Jon Dart

unread,

Feb 12, 1996, 3:00:00 AM2/12/96

to

I implemented a console program that added ECO codes to PGN files.
I never released it because it wasn't all that reliable (in terms
of accurately coding games), but as a nice side effect it was
a very effective test of my chess engine (which the ECO program
used).

For each game in the PGN file, and for each position in
that game, the program generated all legal moves and tested that
the move in the PGN file matched one of the generated moves.
There's probably 100,000 games or more available in PGN format
on the Net. In running some of these game files through the
program, I found a number of bugs in move generation and related
operations. Of course, when you find an error, you have to check
to see if the PGN file is wrong: but most of the ones I tried were
ok.

This doesn't test subtle problems like evaluation bugs, but it is
a good way to make sure your chess engine is basically solid.

--Jon

--
Jon Dart
jd...@netcom.com
--
"Put the boogie beat in it." -- John Lee Hooker

Vincent Diepeveen

unread,

Feb 12, 1996, 3:00:00 AM2/12/96

to

In <4fginq$5...@nntp.Stanford.EDU> lar...@hpp.Stanford.EDU (Jan Eric Larsson) writes:

>But what about all those other bugs we never see?

>I'll contribute two myself:
>

>1. Robert Hyatt once wrote that he actually prints out rather large

> trees and checks them by hand, (eh, eye), which certainly will

> give away lots of bugs that would otherwise go unnoticed. Whether

> this is a trick or not can be discussed, but it is important to

> note that this can be done even if the trees searched contains

> tens of thousands of nodes.
>

>2. In one of my earlier chess programs, (Citrus C; basically never

> completely finished), I implemented move generation and threat

> detection in both bitboards and mailbox versions, and I had a

> setup where one representation was used to check the other.

> I discovered lots of bugs concerning capture promotions to

> corner squares, en passant moves on the capture search horizon,

> etc. Some work, of course, but a search that I really thought

> was fairly bug-free certainly wasn't.

I have a combination and a third etc.
For bugs that cause my program to get hung, or for incremental evaluation
bugs i use 2)

However this still does not give you why it doesn't play the right move,
even if it has the knowledge for a position.

for that i use 1). I print out a whole tree (tens of thousands),
with a lot of different scores, including alfa,beta,evaluation,depth,previous
variation, backed up scores.

As soon as is detect something is wrong, i try to find out why it doesn't
want to search that node. Usually there is a stupid bug in evaluation that
causes it to get a large score.

The problem is of course that you cannot look after millions of evaluations.
For that reason i never know what Diep is doing at great depths.
So i try to get that position to the root.

A big problem are extensions: the first few ply you can easily debug them,
but what happens at great depth i never know.

I have several extensions, and that gives a huge problem.
For example: the first few ply they seem to work fine, but when
getting > 10 ply they begin to overlap; 1 extensions takes care that next
ply the next extension works etc.

For that reason i'm also having a third debug option: i print out every second
the current variation it is searching. I also print out things like the
average length of a variation etc.

This night i tested Diep in the beginning position.
When i left this morning, it was busy searching 15 ply. Although the search
depth is limited to 15 ply, the average length of a variation was at that
moment 21 ply! I cannot explain that... ...i'm even using nullmove.
It is NOT a counting fault.

This is a big problem in Diep: when searching the PV the average length of
a variation is simply too big. This 21 ply average length will go down
when it starts to search other variation than the PV. the average length
will then become something between 13.5 and 14.5 ply, because of nullmove.

Now i consider that there is a huge bug in my extensions: 21 ply average
over all searched variations is simply too huge. Although i know
that it MUST be the extensions, i simply do not know how to improve them,
and they seem to work fine with limited time controls (<= 15 minutes).

Another thing i simply don't get is cutoff's.

For example when searching 14 ply yesterday in a previous test,
the PV was 1.d4 Nf6 ...

After calculating the PV it went on searching the other 19 moves.
When searching 1.c4 i saw something very very strange.

1.c4,Nc6 Nf3 couldn't get a cutoff. It tried move after move,
it tried c4,Nc6 Nf3,d5 and about all moves for black.

However, after it played c4,Nc6 Nf3,h5 i figured out that it got a cutoff.
WHY does c4,nc6 nf3,h5 give a cutoff, and
why does c4,nc6 nf3,nf6 or e5 NOT give a cutoff.

Problem is of course that this cutoff move h5 is stored in hashtable.
Say that is uses after Nf3,Nc6 c4 the hashtable, and sees h5 to be
the best move. This can't be too well for my branching factor; probably
it is gonna think for ages before seeing that h5 isn't giving a cutoff.

How can it store such bad moves in hashtable?
Why do these bad moves give a cutoff?

I don't know.

For 1 case i did found it out: i gave isolated pawns
a bonus when being further at the board. So a good move

>Any experiences, comments, or suggestions?
>
>

>Jan Eric Larsson

>Knowledge Systems Laboratory, Stanford University, Room 248, M/C 9020

>Gates Computer Science Building 2A, Stanford, CA 94305, USA

>Phone: +1 415 725 3859, Fax: +1 415 725 5850, E-mail: Lar...@KSL.Stanford.Edu
>

Vincent Diepeveen