Questions about Endgame tablebases

Alexander Nitschke

unread,

Mar 19, 1998, 3:00:00 AM3/19/98

to

I am looking for Ken Thompson's endgame CDs. Does someone know where to
get them?

I am also interested in some links to the theory of endgame tablebases.
For example: The five piece endgame KQR against KQ has
64*64*64*64*64=1.073.741.824 possible positions, if you count all
positions where more than one piece is on a square and all illegal
positions too. You have to multiply this by 2 to take into respect which
side is to move in the position. The number of positions in a database
of course must not be so large because of mirroring on the x- or the
y-axis or a diagonal or combined mirroring. The number of positions can
be reduced this way to 10*64*64*64*64=167.772.160 positions times 2. It
is reasonable to assume one byte per position in the database to
indicate how many moves are needed to mate or if it is a draw. So you
would need the above size in bytes to cover all positions in a database.
And the tablebases of Steven J. Edwards, which are available on
ftp://ftp.cis.uab.edu/pub/hyatt/TB/ have exactly this size.

Now the endgame database on the Fritz 5 CD is much smaller, it is only
29.105.205 bytes big. Now I would like to know, which possibilities
exist to further reduce the size of these databases. Can someone help me
there?

--
Alexander

bruce moreland

unread,

Mar 19, 1998, 3:00:00 AM3/19/98

to

On Thu, 19 Mar 1998 11:58:30 +0100, Alexander Nitschke
<alexander...@ww.tu-berlin.de> wrote:

>Now the endgame database on the Fritz 5 CD is much smaller, it is only
>29.105.205 bytes big. Now I would like to know, which possibilities
>exist to further reduce the size of these databases. Can someone help me
>there?

Thompson does a few extra things, as far as I know:

1) He Huffmann encodes the heck out of everything, which produces a
huge space savings.

2) Since there are always two kings on the board, and they can't be
adjacent, he only regards positions where they are not adjacent. This
reduces his space requirements by a little bit.

3) He records distance to conversion, rather than distance to mate.
Distance to mate produces a more diverse range of values, which makes
his Huffman encoding work better.

4) He doesn't distinguish between winning and drawing results *for the
weaker side to move*. For the stronger side, he records the distance
to conversion, so he has a diverse range of values, but for the weaker
side, if the game is lost he records a zero, and if it is won or drawn
he records a one, so he only records two possible values. This also
makes his Huffmann encoding work better.

bruce

Komputer Korner

unread,

Mar 19, 1998, 3:00:00 AM3/19/98

to

I don't know about Ken's files, perhaps Bob Hyatt can get them for you along
with the program that lets them run. His compressioin is done on the fly
which is not the same as how the others have done it.
Steven Edwards certainly has done a lot. Talk to Bruce Moreland also.

Quoting Steven

"Steve Edward's Endgame Tablebase generator is now available for WIN95.

site:chess.onenet.net
dir:pub\chess\uploads\tb\tbgen.zip

README: very brief documentation file for tbgen (tablebase generator)

Revised: 1997.06.04

Comments to the author: s...@mv.mv.com (Steven J. Edwards)

This file provides a brief description of the program tbgen (tablebase
generator). The program is used to generate exhuastive, full
information chess endgame databases that provide instantly located,
perfect evalautions. The evaluations are stored in the generated
files, one byte per position, with each file containing entries for
each possible (and impossible) position. The evaluations are of the
forms "mate in N", "lose (get mated) in N", "draw", and "illegal".
Values for the number N (measured in fullmoves, not ply) for mates
range from mate in 1 upto mate in 126 and for losses in 0 (lose in 0
means checkmated) to lose in 125 moves. Each file is for a given
class (e.g., KBNK) and for a given side to move (e.g., White). So,
the file KQKR.tbw is for White (with the queen) to move and the the
file KRKN.tbb is for Black (with the knight) to move.

Currently, endgame classes with at least one pawn per color (e.g.,
KPKP) are not supported. It is hoped that the release of the code
will stimulate work on this and other tablebase topics.

Each run of the program produces one tablebase file pair for an
endgame class. One of the files is for WTM (White to move, suffix
".tbw") and the other is for BTM (Black to move, suffix ".tbb").
Details of invocation can be found in the source file main.c.

Tablebases produced by this program can be used by a variety of chess
applications, both commercial and research. The adventurous
experimenter is directed to the publicly distributed chess playing
program Crafty written by Robert Hyatt; the source distribution
includes the files epd.h and epd.c which implement the functions neede
to probe the tablebases. Hint: look for the function names starting
with "EPDTB". Also, the Internet newsgroup rec.games.chess.computer
often has discussions related to tablebase design, construction, and
implementation.

To generate the program, put all the source files in a directory and
compile them with your favorite ANSI C compiler. Example:

gcc -O2 -o tbgen *.c

To generate the first tablebase, enter:

tbgen KK . . . .

Then try:

tbgen KNK . . . .
tbgen KBK . . . .
tbgen KRK . . . .
tbgen KQK . . . .

Then try:

tbgen KPK . . . .

Then try:

tbgen KBNK . . . .

See the tbgen source files for further details on various topics
including the significance of the last four command line parameters.

Some miscellaneous files

depend: list of first order tablebase dependencies
rdepend: list of fully recursive tablebase dependencies
genlist: list of one possible order of tablebase generation
namegen.c: generates TB names and other items (see source)

Have fun!

Addendum 11/01/97:

Mike Byrne (ches...@voicenet.com)finally figured out how to compile
for Win95/NT.

All rights are maintained by Steven J. Edwards. No warranty is
implied or given - use at your own risk. Go for that 32 man database
-- (only kidding... :) .
remove the "nospam." and substitute "com" for "spam" in the domain..
the spamming is out of control and even just the "nospam." is no
longer working

--
- -
Komputer Korner

The inkompetent komputer

If you see a 1 in my email address, take it out before replying.
Please do not email both me and the r.g.c.c. at the same time. I read all
the postings on r.g.c.c.
Also every statement of mine should be taken with a grain of salt. Read at
your own risk and
assume that it is only this humble komputer's opinion.
Alexander Nitschke wrote in message <3510FA56...@ww.tu-berlin.de>...

>I am looking for Ken Thompson's endgame CDs. Does someone know where to
>get them?
>
>I am also interested in some links to the theory of endgame tablebases.
>For example: The five piece endgame KQR against KQ has
>64*64*64*64*64=1.073.741.824 possible positions, if you count all
>positions where more than one piece is on a square and all illegal
>positions too. You have to multiply this by 2 to take into respect which
>side is to move in the position. The number of positions in a database
>of course must not be so large because of mirroring on the x- or the
>y-axis or a diagonal or combined mirroring. The number of positions can
>be reduced this way to 10*64*64*64*64=167.772.160 positions times 2. It
>is reasonable to assume one byte per position in the database to
>indicate how many moves are needed to mate or if it is a draw. So you
>would need the above size in bytes to cover all positions in a database.
>And the tablebases of Steven J. Edwards, which are available on
>ftp://ftp.cis.uab.edu/pub/hyatt/TB/ have exactly this size.
>

>Now the endgame database on the Fritz 5 CD is much smaller, it is only
>29.105.205 bytes big. Now I would like to know, which possibilities
>exist to further reduce the size of these databases. Can someone help me
>there?
>

>--
>Alexander

Anders Thulin

unread,

Mar 20, 1998, 3:00:00 AM3/20/98

to

In article <3510FA56...@ww.tu-berlin.de>,
Alexander Nitschke <alexander...@ww.tu-berlin.de> wrote:

>I am also interested in some links to the theory of endgame tablebases.
>For example: The five piece endgame KQR against KQ has
>64*64*64*64*64=1.073.741.824 possible positions,

Not quite. 64*65*63*62*61 is a better approximation, and even so
there are a number of illegal positions that can be weeded out.
(Second king can't be placed next to first king, and depending on
who's to move, can't be placed in some other positions either. And if
you have pawns, they can't be placed on line 1 or 8. And so on.)

And, as you observe, the first factor can be reduced to 10 fairly
easily.

>positions too. You have to multiply this by 2 to take into respect which
>side is to move in the position.

You could do that, but you can also store white-to-move positions
only, and search for one one ply extra for black-to-move positions and
get the same answer. (Thompson uses this method: trades space for
computation time.)

>Now the endgame database on the Fritz 5 CD is much smaller, it is only
>29.105.205 bytes big. Now I would like to know, which possibilities
>exist to further reduce the size of these databases. Can someone help me
>there?

Look over the uncompressed databases, and you'll find that there are
patterns: certain patterns of 8, 16, 32, 64, ... bytes appear
comparatively often. Extract those patterns, and use them as basis for
Huffman-encoding the database.

To make access reasonably fast, code pages of data. Thus you can go
directly to the page in which the wanted byte is, and decode only
that. This requires an extra table with <real-byte-offset:offset-to-
codepage> entries -- easy to create during compression. For extra
performance, cache the decoded pages. Very useful for CD-based
databases!

The Thompson databases use a fixed Huffman code table (one for all
databases) covering 8-byte patterns + an extra encoding level on top
of this, the details of which I don't remember.

I've experimented with a similar encoding method that started with
patterns of 4096 byte length, encoded those that occurred often
enough, and left the remainder for lower-length encodings (i.e. a
4096-byte long section was encoded as two 2048-byte patterns, or one
2048 and two 1024 patterns, or ... well, you get the idea). Then
continue with 2048-byte patters, and so on all the way down to the
single-byte patterns which always are encoded (if you get that far).
This was done with both Thompson and Stevens databases, both of which
have patterns with 8 and 8x8 length. Other database representations
may need other pattern lengths.

This method gives good compression. In some files (like KNNN - K,
unless I misremember) there are huge sections of illegal positions or
positions with the same 'score' that almost disappear in this way.

Only problem is that you have to store some pretty big code tables
with each file. It's probably possible to select just 4 or 5
pattern-lengths (two of which are 4096 and 1), collect statistics over
*all* databases, and then build a static Huffman code table with
reasonable compression performance still. I haven't experimented
further with this idea, though, and probably won't.

--
Anders Thulin Anders...@lejonet.se 013 - 23 55 32
Telia ProSoft AB, Teknikringen 6, S-583 30 Linkoping, Sweden