Drastically reducing endgame TableBase size

Zachary Brown

unread,

Sep 18, 1997, 3:00:00 AM9/18/97

to

As I understand it, endgame tablebases work essentially (I'm not trying to
give an exact description) like this: given a position, you hash to a
point in the tablebase file, and the byte at that location tells how many
moves to win, loss, or draw.

My question is, why bother hashing to unique locations all the time? If
there are twenty positions that have mate in 8, why don't they all hash
to the same location?

Another idea would be to train a backprop neural net on the entire
tablebase dataset, until it could unerringly return the number of moves to
mate in any position of the tablebase. The trained network, released as a
c-library, would allow programs to have endgame solutions compiled in,
taking up a *vastly* reduced space. (We could hope for as little as 20K or
so)

Has anyone looked into those two ideas: 1) finding better hash functions,
and 2) training neural nets? I don't see any reason for the current
bloated size of tablebases. At least not until we know why those
solutions can't work.

Zack

Robert Hyatt

unread,

Sep 21, 1997, 3:00:00 AM9/21/97

to

Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:
: As I understand it, endgame tablebases work essentially (I'm not trying to

: give an exact description) like this: given a position, you hash to a
: point in the tablebase file, and the byte at that location tells how many
: moves to win, loss, or draw.

: My question is, why bother hashing to unique locations all the time? If
: there are twenty positions that have mate in 8, why don't they all hash
: to the same location?

first, we don't "hash". for a 4 piece ending, we form a database address
of 4 six-bit fields, one field for the 6-bit square number the corresponding
piece stands on. (I am omitting the compression possible by constraining
one piece to a small fraction of the total board to keep the discussion from
getting too complicated here.) This gives us a 24-bit address. What kind
of function could you produce that would map *all* mates in 30 to the same
disk address? I can't think of any way possible, because I don't know any-
thing about a position with 4 pieces except for the locations of the four
pieces.

: Another idea would be to train a backprop neural net on the entire

: tablebase dataset, until it could unerringly return the number of moves to
: mate in any position of the tablebase. The trained network, released as a
: c-library, would allow programs to have endgame solutions compiled in,
: taking up a *vastly* reduced space. (We could hope for as little as 20K or
: so)

Neural nets don't really do this. they can be trained to recognize
certain winning characteristics or certain winning positions. But there
is no guarantee that they will recognize *all* winning positions, or even
*most* of them. If it gets the wrong answer, it will be disastrous.

: Has anyone looked into those two ideas: 1) finding better hash functions,

brucemo

unread,

Sep 21, 1997, 3:00:00 AM9/21/97

to

Zachary Brown wrote:

> Has anyone looked into those two ideas: 1) finding better hash functions,
> and 2) training neural nets? I don't see any reason for the current
> bloated size of tablebases. At least not until we know why those
> solutions can't work.

If you can figure out a way to compress ten megabytes of pretty diverse
data into 20K, I suggest you patent it.

bruce

Steven J. Edwards

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

zbr...@lynx.dac.neu.edu (Zachary Brown) writes:

>As I understand it, endgame tablebases work essentially (I'm not trying to
>give an exact description) like this: given a position, you hash to a
>point in the tablebase file, and the byte at that location tells how many
>moves to win, loss, or draw.

>My question is, why bother hashing to unique locations all the time? If
>there are twenty positions that have mate in 8, why don't they all hash
>to the same location?

There is no hashing involved. It is a simple table index.

>Another idea would be to train a backprop neural net on the entire
>tablebase dataset, until it could unerringly return the number of moves to
>mate in any position of the tablebase. The trained network, released as a
>c-library, would allow programs to have endgame solutions compiled in,
>taking up a *vastly* reduced space. (We could hope for as little as 20K or
>so)

The data is available for you to try: pub/chess/TB at chess.onenet.net

>Has anyone looked into those two ideas: 1) finding better hash functions,
>and 2) training neural nets? I don't see any reason for the current
>bloated size of tablebases. At least not until we know why those
>solutions can't work.

Suppose you could build an NN which would be as good as a GM in
endgame classes like KQKR or KRBKR. It would take a whole lot of
bits, no? And maybe several minutes per move, too. Yet it would
still not play as well or be as fast as the equivalent tablebases.

-- Steven (s...@mv.mv.com)

Robert Hyatt

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

Peter Fendrich (pet...@homemail.com) wrote:
: Robert Hyatt wrote:
: >
: > Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:

: -snip-
: > : Another idea would be to train a backprop neural net on the entire

: > : tablebase dataset, until it could unerringly return the number of moves to
: > : mate in any position of the tablebase. The trained network, released as a
: > : c-library, would allow programs to have endgame solutions compiled in,
: > : taking up a *vastly* reduced space. (We could hope for as little as 20K or
: > : so)

: >
: > Neural nets don't really do this. they can be trained to recognize

: > certain winning characteristics or certain winning positions. But there
: > is no guarantee that they will recognize *all* winning positions, or even
: > *most* of them. If it gets the wrong answer, it will be disastrous.

: On the contrary I think this is a really intresting idea. It could be
: implemented as:
:
: 1) Generate the Tablebase as usual.

: 2) Train a neural net specialised on the specific tablebase.
: The net should be trained on the whole tablebase.
: Very often it is possible to get about 90% hit rate in problems
: like this.

: 3) Go through the tablebase with the trained net and mark which
: positions
: is wrong.

: 4) The marked positions are the only one needed to be stored in the
: tablebase. The rest is handled by the net.
:
: 5) The chess program now has to first look in the tablebase. If we
: have
: a hit we use that value. If not we apply the neural net.

: One problem here is to find some good input nodes which are easy to
: compute
: during the game. We don't want to drop in performance compared to a
: tablebase
: search.
: Which hit rate is really needed to make this approach usable? With the
: poor
: hit rate of 50% we would need to store only half the tablebase.
: The trade off between hit rate and easy computed input nodes is an
: interesting
: discussion.

: This is really worth trying!

: Peter

one interesting problem with this is what to store? IE, a tablebase
for 4 piece endings, with a max mate of < 128 can use one byte. So
worst case is 2^24 entries of one byte each. If you reduce this to
1/10th of it's original size, you have to store the position *and*
the mate-in-N value for that position. Assuming you choose to store
the position as a 24-bit Godel number (as we use to probe the real
tablebase), you store 4 bytes per position rather than 1, so you get
a savings of barely 50%. And now, in addition to having the problem
of running the position thru the NN and getting an answer, you have to
first test to see if it is in the ancillary file of known buggy
positions, and to answer that requires some sort of search. For larger
databases like 5-piece endings the size of an entry goes up of course...

Urban Koistinen

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

18 Sep 1997 07:18:19 -0400 Zachary Brown wrote:
! As I understand it, endgame tablebases work essentially (I'm not trying to
! give an exact description) like this: given a position, you hash to a
! point in the tablebase file, and the byte at that location tells how many
! moves to win, loss, or draw.

! My question is, why bother hashing to unique locations all the time? If
! there are twenty positions that have mate in 8, why don't they all hash
! to the same location?

! Another idea would be to train a backprop neural net on the entire
! tablebase dataset, until it could unerringly return the number of moves to
! mate in any position of the tablebase. The trained network, released as a
! c-library, would allow programs to have endgame solutions compiled in,
! taking up a *vastly* reduced space. (We could hope for as little as 20K or
! so)

! Has anyone looked into those two ideas: 1) finding better hash functions,
! and 2) training neural nets? I don't see any reason for the current
! bloated size of tablebases. At least not until we know why those
! solutions can't work.

The reason is that no one has bothered much with compressing tablebases.
The best tries so far are huffman and gzip. (separately)

About your ideas:
1) I don't think that idea is any good.

2) can be generalized as finding a function for guessing something
about the endgame. When you have found a good guesser, you can then
make it safe by testing it against a endgame database, noting all
cases where it is wrong. Then, to use it, first look up the
correction table to see if the function is wrong, and if not,
apply it.
(or test for wrongness afterwards, whichever might be better)

Urban Koistinen

Peter Fendrich

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

Robert Hyatt wrote:
>
> Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:

-snip-
> : Another idea would be to train a backprop neural net on the entire
> : tablebase dataset, until it could unerringly return the number of moves to
> : mate in any position of the tablebase. The trained network, released as a
> : c-library, would allow programs to have endgame solutions compiled in,
> : taking up a *vastly* reduced space. (We could hope for as little as 20K or

Peter

>
> : Has anyone looked into those two ideas: 1) finding better hash functions,
> : and 2) training neural nets? I don't see any reason for the current
> : bloated size of tablebases. At least not until we know why those

Urban Koistinen

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

Mon, 22 Sep 1997 21:37:10 +0100 Peter Fendrich wrote:
! Robert Hyatt wrote:
! >
! > Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:

! -snip-
! > : Another idea would be to train a backprop neural net on the entire
! > : tablebase dataset,
! > : until it could unerringly return the number of moves to
! > : mate in any position of the tablebase.
! > : The trained network, released as a
! > : c-library, would allow programs to have endgame solutions compiled in,
! > : taking up a *vastly* reduced space.
! > : (We could hope for as little as 20K or so)

! > Neural nets don't really do this. they can be trained to recognize
! > certain winning characteristics or certain winning positions. But there
! > is no guarantee that they will recognize *all* winning positions, or even
! > *most* of them. If it gets the wrong answer, it will be disastrous.

! On the contrary I think this is a really intresting idea. It could be
! implemented as:
!
! 1) Generate the Tablebase as usual.

! 2) Train a neural net specialised on the specific tablebase.
! The net should be trained on the whole tablebase.
! Very often it is possible to get about 90% hit rate in problems
! like this.

! 3) Go through the tablebase with the trained net and mark which
! positions are wrong.

! 4) The marked positions are the only one needed to be stored in the
! tablebase. The rest is handled by the net.

There would be some overhead as you have to store
information about which positions are wrong.

! 5) The chess program now has to first look in the tablebase. If we
! have
! a hit we use that value. If not we apply the neural net.

! One problem here is to find some good input nodes which are easy to
! compute
! during the game. We don't want to drop in performance compared to a
! tablebase
! search.
! Which hit rate is really needed to make this approach usable? With the
! poor
! hit rate of 50% we would need to store only half the tablebase.
! The trade off between hit rate and easy computed input nodes is an
! interesting
! discussion.

! This is really worth trying!

! > : Has anyone looked into those two ideas:
! > : 1) finding better hash functions,
! > : and 2) training neural nets? I don't see any reason for the current
! > : bloated size of tablebases. At least not until we know why those

! > : solutions can't work.

Urban Koistinen

Ren Wu

unread,

Sep 22, 1997, 3:00:00 AM9/22/97

to

On 18 Sep 1997 07:18:19 -0400, zbr...@lynx.dac.neu.edu (Zachary Brown)
wrote:

Snip

>Has anyone looked into those two ideas: 1) finding better hash functions,

>and 2) training neural nets? I don't see any reason for the current

>bloated size of tablebases. At least not until we know why those

>solutions can't work.
>
>Zack

Jonathan Schaeffe has tried using neural net to compress his checkers
endgaem databases. But it didn't work well. In one of his papers, he
report this. Sorry I can't remember the paper's title.

I had a unsucessful attemp on using Atree(Adaptive Logic Network). I
train the ALN using a small chinese chess endgame database and hoping
the ALN can leran to classfy the positions. It fails, it wouldn't get
the error rate down to the acceptable level.

Ren.

- remove one loop if you reply by email

Dan Thies

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

On Sun, 21 Sep 1997 18:28:23 -0700, brucemo <bru...@seanet.com>
wrote:

>Zachary Brown wrote:
>
>> Has anyone looked into those two ideas: 1) finding better hash functions,
>> and 2) training neural nets? I don't see any reason for the current
>> bloated size of tablebases. At least not until we know why those
>> solutions can't work.
>

>If you can figure out a way to compress ten megabytes of pretty diverse
>data into 20K, I suggest you patent it.

Maybe we could translate the databases into bitmap images, and use
JPEG compression. ;-)

Has anyone looked into the idea of using JPEG to reduce the size of
bitboards?

Urban Koistinen

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

22 Sep 1997 20:15:45 GMT Robert Hyatt wrote:

! Peter Fendrich (pet...@homemail.com) wrote:
! : Robert Hyatt wrote:
! : >
! : > Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:

! : -snip-
! : > : Another idea would be to train a backprop neural net on the entire

! : > : tablebase dataset, until it could unerringly
! : > : return the number of moves to

! : > : mate in any position of the tablebase.
! : > : The trained network, released as a
! : > : c-library, would allow programs to have endgame solutions compiled in,
! : > : taking up a *vastly* reduced space.
! : > : (We could hope for as little as 20K or so)

! : >

! : > Neural nets don't really do this. they can be trained to recognize
! : > certain winning characteristics or certain winning positions. But there
! : > is no guarantee that they will recognize

! : > *all* winning positions, or even

! : > *most* of them. If it gets the wrong answer, it will be disastrous.

! : On the contrary I think this is a really intresting idea. It could be
! : implemented as:
! :
! : 1) Generate the Tablebase as usual.

! : 2) Train a neural net specialised on the specific tablebase.
! : The net should be trained on the whole tablebase.
! : Very often it is possible to get about 90% hit rate in problems
! : like this.

! : 3) Go through the tablebase with the trained net and mark which
! : positions

! : is wrong.

! : 4) The marked positions are the only one needed to be stored in the
! : tablebase. The rest is handled by the net.

! :

! : 5) The chess program now has to first look in the tablebase. If we
! : have
! : a hit we use that value. If not we apply the neural net.

! : One problem here is to find some good input nodes which are easy to
! : compute
! : during the game. We don't want to drop in performance compared to a
! : tablebase
! : search.
! : Which hit rate is really needed to make this approach usable? With the
! : poor
! : hit rate of 50% we would need to store only half the tablebase.
! : The trade off between hit rate and easy computed input nodes is an
! : interesting
! : discussion.

! : This is really worth trying!

! one interesting problem with this is what to store? IE, a tablebase
! for 4 piece endings, with a max mate of < 128 can use one byte. So
! worst case is 2^24 entries of one byte each. If you reduce this to
! 1/10th of it's original size, you have to store the position *and*
! the mate-in-N value for that position. Assuming you choose to store
! the position as a 24-bit Godel number (as we use to probe the real
! tablebase), you store 4 bytes per position rather than 1, so you get
! a savings of barely 50%.

Are you sure better ways are not obvious to you?

! And now, in addition to having the problem
! of running the position thru the NN and getting an answer, you have to
! first test to see if it is in the ancillary file of known buggy
! positions, and to answer that requires some sort of search. For larger
! databases like 5-piece endings the size of an entry goes up of course...

Heard about hash tables?

Just go find a good fast function for guessing on win/draw/loss,
and the rest is easy.

Urban Koistinen

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Mon, 22 Sep 1997 22:34:20 GMT Ren Wu wrote:
! On 18 Sep 1997 07:18:19 -0400, zbr...@lynx.dac.neu.edu (Zachary Brown)
! wrote:

! Snip

! >Has anyone looked into those two ideas: 1) finding better hash functions,
! >and 2) training neural nets? I don't see any reason for the current
! >bloated size of tablebases. At least not until we know why those

! >solutions can't work.

! >
! >Zack

! Jonathan Schaeffe has tried using neural net to compress his checkers
! endgaem databases. But it didn't work well. In one of his papers, he
! report this. Sorry I can't remember the paper's title.

! I had a unsucessful attemp on using Atree(Adaptive Logic Network). I
! train the ALN using a small chinese chess endgame database and hoping
! the ALN can leran to classfy the positions. It fails, it wouldn't get
! the error rate down to the acceptable level.

What classification?
What acceptable level?

Urban Koistinen

Dan Thies

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

On 21 Sep 1997 14:05:09 GMT, hy...@crafty.cis.uab.edu (Robert Hyatt)
wrote:

>Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:
>: Another idea would be to train a backprop neural net on the entire

>: tablebase dataset, until it could unerringly return the number of moves to
>: mate in any position of the tablebase. The trained network, released as a

>: c-library, would allow programs to have endgame solutions compiled in,

>: taking up a *vastly* reduced space. (We could hope for as little as 20K or
>: so)
>

>Neural nets don't really do this. they can be trained to recognize

>certain winning characteristics or certain winning positions. But there

>is no guarantee that they will recognize *all* winning positions, or even

>*most* of them. If it gets the wrong answer, it will be disastrous.

Right. Neural nets are not a 100% proposition. The idea of ANNs in
chess is useful - I use one to do king safety in Knowchess, that's way
better than anything I could have cooked up "by hand." But if I could
have trained a network to solve every endgame, I would have done it by
now.

Storage is cheap. What's murder is downloading those tablebases at
33.6Kbps.....

Dan

Robert Hyatt

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Urban Koistinen (m10...@abc.se) wrote:

: Are you sure better ways are not obvious to you?

I'm not sure of anything. But obviously the "position" *and* "result"
have to be stored, while in the traditional tablebase, only the result
is stored, since the position is used as a unique index into that file.

: ! And now, in addition to having the problem

: ! of running the position thru the NN and getting an answer, you have to
: ! first test to see if it is in the ancillary file of known buggy
: ! positions, and to answer that requires some sort of search. For larger
: ! databases like 5-piece endings the size of an entry goes up of course...

: Heard about hash tables?

Heard about collisions? Can't stand 'em here.

: Just go find a good fast function for guessing on win/draw/loss,

Robert Hyatt

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Urban Koistinen (m10...@abc.se) wrote:
: 23 Sep 1997 13:41:26 GMT Robert Hyatt wrote:
: ! Urban Koistinen (m10...@abc.se) wrote:

: ! : Are you sure better ways are not obvious to you?

: ! I'm not sure of anything. But obviously the "position" *and* "result"
: ! have to be stored, while in the traditional tablebase, only the result
: ! is stored, since the position is used as a unique index into that file.

: ! : ! And now, in addition to having the problem
: ! : ! of running the position thru the NN and getting an answer, you have to
: ! : ! first test to see if it is in the ancillary file of known buggy
: ! : ! positions, and to answer that requires some sort of search. For larger
: ! : ! databases like 5-piece endings the size of an entry goes up of course...

: ! : Heard about hash tables?

: ! Heard about collisions? Can't stand 'em here.

: With algebra, collision free hash functions are a dime a dozen.

While I agree with you most of thie time, the above is *wrong*. Because
you are saying you can compress something 100% lossless, which is *not*
possible. The definition of "hashing" is to apply a mathematical function
to a value "P" and get a resulting value "H" that takes substantially less
memory. *anytime* you map N bits to M bits where M is < N this becomes a
"many to one" mapping. Which means non-uniqueness is a problem. Otherwise,
we could take such a perfect mapping and reduce 16mb to 1 bit and get the
original value back back. And make a large fortune in the process...

So we are not going to be able to "hash" things perfectly. Which leads to
your next statement:

: Say you have 20M positions with unique indexes.
: Hash them to get even distribution of positions that need to be
: corrected.

Somehow this does *not* sound easy. Because the "value" I want to
hash is simply the locations of the 4 pieces I am trying to evaluate.
I don't know of any mathematical function that can tell me which of these
is going to have to be corrected, which makes it appear to me that I am
going to have a file about as big as the original tablebase. IE what
characteristic can you key on that says "this one needs correcting because
the NN is wrong, while this one is o.k..." ?

: Group the positions evenly so that each group on average contains
: a reasonable number of positions that need to be corrected.

The sense I get from this is that a theoretical mathematician is explaining
an idea, but using the traditional "hand waving" saying "this is possible,
but I leave the details to the reader." Remember, this is in a time-critical
place in the code. It does no good to reduce the file size by 50%, or even
by 75%, if the overall time is multiplied by a factor of 10... It's the
difference between theoretically possible and practical... My first impression
is that it is not practical... but I'm listening if you have some good ideas
as to how to implement these mapping/grouping functions...

: (the example was 90% correct predictions)
: For the example it would be reasonable to make each group contain
: about 25.6 such positions, that is, cover 256 positions.
: Now, for each group we already know which 256 positions are covered
: by it, so each correction can only be for one of 256 and so you
: only need one byte/position to say which.
: There would be some overhead for indexing groups, on the order of
: 2 bytes/group.
: The same idea would apply to larger tables.

: There are plenty other ways, what is best depends.

: ! : Just go find a good fast function for guessing on win/draw/loss,
: ! : and the rest is easy.

: Urban Koistinen

Urban Koistinen

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

23 Sep 1997 13:41:26 GMT Robert Hyatt wrote:
! Urban Koistinen (m10...@abc.se) wrote:

! : Are you sure better ways are not obvious to you?

! I'm not sure of anything. But obviously the "position" *and* "result"
! have to be stored, while in the traditional tablebase, only the result
! is stored, since the position is used as a unique index into that file.

! : ! And now, in addition to having the problem
! : ! of running the position thru the NN and getting an answer, you have to
! : ! first test to see if it is in the ancillary file of known buggy
! : ! positions, and to answer that requires some sort of search. For larger
! : ! databases like 5-piece endings the size of an entry goes up of course...

! : Heard about hash tables?

! Heard about collisions? Can't stand 'em here.

With algebra, collision free hash functions are a dime a dozen.

Say you have 20M positions with unique indexes.
Hash them to get even distribution of positions that need to be
corrected.

Group the positions evenly so that each group on average contains
a reasonable number of positions that need to be corrected.

Peter Fendrich

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Robert Hyatt wrote:
>
> Peter Fendrich (pet...@homemail.com) wrote:

> : Robert Hyatt wrote:
> : >
> : > Zachary Brown (zbr...@lynx.dac.neu.edu) wrote:
>

> : -snip-
> : > : Another idea would be to train a backprop neural net on the entire

> : > : tablebase dataset, until it could unerringly return the number of moves to
> : > : mate in any position of the tablebase. The trained network, released as a
> : > : c-library, would allow programs to have endgame solutions compiled in,
> : > : taking up a *vastly* reduced space. (We could hope for as little as 20K or
> : > : so)
> : >
> : > Neural nets don't really do this. they can be trained to recognize
> : > certain winning characteristics or certain winning positions. But there
> : > is no guarantee that they will recognize *all* winning positions, or even
> : > *most* of them. If it gets the wrong answer, it will be disastrous.
>

> : On the contrary I think this is a really intresting idea. It could be

> : implemented as:
> :

> : 1) Generate the Tablebase as usual.
>

> : 2) Train a neural net specialised on the specific tablebase.

> : The net should be trained on the whole tablebase.

> : Very often it is possible to get about 90% hit rate in problems

> : like this.

>
> : 3) Go through the tablebase with the trained net and mark which

> : positions
> : is wrong.

>
> : 4) The marked positions are the only one needed to be stored in the

> : tablebase. The rest is handled by the net.

> :

> : 5) The chess program now has to first look in the tablebase. If we

> : have

> : a hit we use that value. If not we apply the neural net.
>

> : One problem here is to find some good input nodes which are easy to

> : compute

> : during the game. We don't want to drop in performance compared to a

> : tablebase
> : search.

> : Which hit rate is really needed to make this approach usable? With the

> : poor

> : hit rate of 50% we would need to store only half the tablebase.

> : The trade off between hit rate and easy computed input nodes is an

> : interesting
> : discussion.

>
> : This is really worth trying!
>

> : Peter

>
> one interesting problem with this is what to store? IE, a tablebase

> for 4 piece endings, with a max mate of < 128 can use one byte. So

> worst case is 2^24 entries of one byte each. If you reduce this to

> 1/10th of it's original size, you have to store the position *and*

> the mate-in-N value for that position. Assuming you choose to store

> the position as a 24-bit Godel number (as we use to probe the real

> tablebase), you store 4 bytes per position rather than 1, so you get

> a savings of barely 50%. And now, in addition to having the problem

> of running the position thru the NN and getting an answer, you have to

> first test to see if it is in the ancillary file of known buggy

> positions, and to answer that requires some sort of search. For larger

> databases like 5-piece endings the size of an entry goes up of course...

It's a problem for sure but why give up?
I haven't studied how the transformation to Godel numbers is done.
Is it possible to manipulate the formula, to make the (say) 10%
errorness
positions to be grouped within some interval? If you know that positions
above or below some number are handled by the neural net both the search
time and tablespace is reduced.
Another way would be to bias the net to have a higher hit rate within
a certain Godel number interval. I'm not sure what this would lead to
but it's a start... :)

A combination of techniques like this and the "collision free" hash
table proposed by Urban, might be a solution.

My feeling is that there are interesting possibilities but I will not
work with this myself.
If anyone starts doing tests wihtin this subject I think it would be
very interesting to discuss it further.

Peter

Peter Fendrich

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Ren Wu wrote:
>
> On 18 Sep 1997 07:18:19 -0400, zbr...@lynx.dac.neu.edu (Zachary Brown)

> wrote:
>
> Snip

>
> >Has anyone looked into those two ideas: 1) finding better hash functions,

> >and 2) training neural nets? I don't see any reason for the current

> >bloated size of tablebases. At least not until we know why those

> >solutions can't work.
> >
> >Zack
>

> Jonathan Schaeffe has tried using neural net to compress his checkers

> endgaem databases. But it didn't work well. In one of his papers, he

> report this. Sorry I can't remember the paper's title.
>

> I had a unsucessful attemp on using Atree(Adaptive Logic Network). I

> train the ALN using a small chinese chess endgame database and hoping

> the ALN can leran to classfy the positions. It fails, it wouldn't get

> the error rate down to the acceptable level.
>

> Ren.
>
> Ren.
>
> - remove one loop if you reply by email

I'm sorry to say that this is not an unusual situation when dealing
with neural nets. It's very hard to know in advance what the result
will be in a completely new applications with neural nets.

My approach wouldn't be to try to figure out by logic what parameters
are most appropriate as input nodes.
- I would start with the database and collect statistical information
about correlations between parameters or use one of the data mining
techniques available to collect similar information.
- I would use a general backproagation network to start with and
configure the parameters in a way that guides the net in a direction
pointed out by the correlation analysis.
- Maybe now I would switch to another type of net.

The alfa/beta search used today is very specialised to chess and has
a lot of built in knowledge about chess in general. In a similar way
I think that one have to guide the neural net.
So I'm quite convinced of that the position expressed as piece x on
square y and so on, is not a useful way of getting good input nodes.

The main theme here is to get knowledge about the population to build
it in the parameters themself and decrease the amount of information
the net has to adapt to.

Nothing new in fact ...:)

Peter

Robert Hyatt

unread,

Sep 23, 1997, 3:00:00 AM9/23/97

to

Peter Fendrich (pet...@homemail.com) wrote:

: It's a problem for sure but why give up?

: I haven't studied how the transformation to Godel numbers is done.
: Is it possible to manipulate the formula, to make the (say) 10%
: errorness
: positions to be grouped within some interval? If you know that positions
: above or below some number are handled by the neural net both the search
: time and tablespace is reduced.

I don't see how. the Godel number is simply the concatenation of the
squares of the 5 pieces in the position. IE, 0-63 = 2^6 which is
6 bits of data. So for five pieces, we simply take the squares of the
two kings, plus the squares of the other three pieces (in the same order
as the database was constructed with) and concatenate them into a 30-bit
address that we probe... So I don't see any way to modify this number
to somehow "group" bad cases together...

: Another way would be to bias the net to have a higher hit rate within

: a certain Godel number interval. I'm not sure what this would lead to
: but it's a start... :)

that might be much better, but I suspect *very* difficult. IE if you
are only varying the pawn location, that *wildly* changes the evals
for win/lose/draw,

: A combination of techniques like this and the "collision free" hash

Jim Caccamise

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

Dan Thies <rt...@wt.net> wrote in article
<34274022...@snews.zippo.com>...

> On Sun, 21 Sep 1997 18:28:23 -0700, brucemo <bru...@seanet.com>
> wrote:
>
> >Zachary Brown wrote:
> >

> >> Has anyone looked into those two ideas: 1) finding better hash
functions,
> >> and 2) training neural nets? I don't see any reason for the current
> >> bloated size of tablebases. At least not until we know why those
> >> solutions can't work.
> >

> >If you can figure out a way to compress ten megabytes of pretty
diverse
> >data into 20K, I suggest you patent it.
>
> Maybe we could translate the databases into bitmap images, and use
> JPEG compression. ;-)
>
> Has anyone looked into the idea of using JPEG to reduce the size of
> bitboards?
>

The data compression technique used needs to be lossless. I'm not
sure, but I believe that JPEG compression is not lossless. Small data
losses are not a problem for image compression - if a few bits get
changed the image still looks the same. I'm not experienced with
tablebases, but I wonder what compression is achieved simply using ZIP
format. (Note: ZIP compression programs typically have parameters to
trade off compression ratio vs. performance.)

I use ZipMagic, a shareware program, which works in Win3.1, Win95, or
Win98. ZipMagic makes ZIP files act as normal Windows folders, so the
contents can be accessed without unzipping them. Thus, a Windows chess
program can access ZIP data directly without decompression. I'm not sure
about the performance penalty, but experiments with chess Opening Books
and TableBases may be worthwhile. Users with limited disk space may
benefit - with some performance penalty. More users could benefit when
more Windows chess programs become available.

For information on ZipMagic visit the MIJENIX Corporation home page at
http://www.mijenix.com/

Also check out their newly released FreeSpace for Win95 and WinNT(NTFS
partitions only). I don't have enough info. yet to decide if FreeSpace
is a better solution than ZipMagic.

Jim Caccamise

Robert Hyatt

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

Rolf Czedzak (r...@sensecom.de) wrote:
: Robert Hyatt wrote: <608gu6$9...@juniper.cis.uab.edu>

: > Urban Koistinen (m10...@abc.se) wrote:

: []

: > : ! positions, and to answer that requires some sort of search. For
: > larger : ! databases like 5-piece endings the size of an entry goes up
: > of course...
: >
: > : Heard about hash tables?
: >
: > Heard about collisions? Can't stand 'em here.

: SHould be possible to generate _perfect_ hash function(s) for known
: (fixed&finite -nothing a 4/5men ending would vulnerate) input domain.
: There's a GNU tool called gperf(?).

This means we should also be able to compress such databases perfectly
in the first place? I know of *no* "perfect" hashing functions, because
they are, by definition, "lossy" because you are reducing the number of
bits used to represent the original value. And there is no way to do this
sort of "hashing" in a lossless manner. There are a few tricks such as
culling illegal positions, and limiting one of the pieces to a few squares
and using rotating/reflecting/etc to make things smaller.

The problem here is that we already have the "Godel" number represented as
30 bits for a 5-piece ending, which is a deep compression from the number
of bits required to store a full chess position. So we've already done a
lot of compressing to get to 30. Going below 30 seems difficult...

Robert Hyatt

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

Jim Caccamise (c...@magicnet.net) wrote:
: Dan Thies <rt...@wt.net> wrote in article

: <34274022...@snews.zippo.com>...
: > On Sun, 21 Sep 1997 18:28:23 -0700, brucemo <bru...@seanet.com>
: > wrote:
: >
: > >Zachary Brown wrote:
: > >
: > >> Has anyone looked into those two ideas: 1) finding better hash
: functions,
: > >> and 2) training neural nets? I don't see any reason for the current
: > >> bloated size of tablebases. At least not until we know why those
: > >> solutions can't work.
: > >
: > >If you can figure out a way to compress ten megabytes of pretty
: diverse
: > >data into 20K, I suggest you patent it.
: >
: > Maybe we could translate the databases into bitmap images, and use
: > JPEG compression. ;-)
: >
: > Has anyone looked into the idea of using JPEG to reduce the size of
: > bitboards?
: >

: The data compression technique used needs to be lossless. I'm not
: sure, but I believe that JPEG compression is not lossless. Small data
: losses are not a problem for image compression - if a few bits get
: changed the image still looks the same. I'm not experienced with
: tablebases, but I wonder what compression is achieved simply using ZIP
: format. (Note: ZIP compression programs typically have parameters to
: trade off compression ratio vs. performance.)

Tablebases (many) compress quite well, because those databases that are
mostly draws have nearly all 0 scores. for example, KN vs K compresses
*very* well. :)

For the ones with few draws, the compression ratio is miserable compared
to this. But 2:1 is generally doable for the majority since there are
still lots of entries (adjacent in the file) where the result is a
draw or whatever..

:
: I use ZipMagic, a shareware program, which works in Win3.1, Win95, or

: Win98. ZipMagic makes ZIP files act as normal Windows folders, so the
: contents can be accessed without unzipping them. Thus, a Windows chess
: program can access ZIP data directly without decompression. I'm not sure
: about the performance penalty, but experiments with chess Opening Books
: and TableBases may be worthwhile. Users with limited disk space may
: benefit - with some performance penalty. More users could benefit when
: more Windows chess programs become available.

If a program is "lazy" and only accesses the files at the root of the
tree, this sort of compression can work perfectly fine. If the files
are accessed *during* the search, then any extra access time is going
to drive the search speed downward. That has to be carefully weighed
for advantages vs the speed loss...
:
: For information on ZipMagic visit the MIJENIX Corporation home page at

: http://www.mijenix.com/
:
: Also check out their newly released FreeSpace for Win95 and WinNT(NTFS
: partitions only). I don't have enough info. yet to decide if FreeSpace
: is a better solution than ZipMagic.

: Jim Caccamise
:

Rolf Czedzak

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

Robert Hyatt wrote: <608gu6$9...@juniper.cis.uab.edu>

> Urban Koistinen (m10...@abc.se) wrote:

[]

> : ! positions, and to answer that requires some sort of search. For
> larger : ! databases like 5-piece endings the size of an entry goes up
> of course...
>
> : Heard about hash tables?
>
> Heard about collisions? Can't stand 'em here.

SHould be possible to generate _perfect_ hash function(s) for known
(fixed&finite -nothing a 4/5men ending would vulnerate) input domain.
There's a GNU tool called gperf(?).

> : Just go find a good fast function for guessing on win/draw/loss,

> : and the rest is easy.
>
> : Urban Koistinen

Rolf (C)

Urban Koistinen

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

24 Sep 1997 18:40:51 GMT Robert Hyatt wrote:
! Rolf Czedzak (r...@sensecom.de) wrote:
! : Robert Hyatt wrote: <608gu6$9...@juniper.cis.uab.edu>
! : > Urban Koistinen (m10...@abc.se) wrote:

! : > : ! positions, and to answer that requires some sort of search. For
! : > larger : ! databases like 5-piece endings the size of an entry goes up
! : > of course...
! : >

! : > : Heard about hash tables?

! : >
! : > Heard about collisions? Can't stand 'em here.

! : SHould be possible to generate _perfect_ hash function(s) for known
! : (fixed&finite -nothing a 4/5men ending would vulnerate) input domain.
! : There's a GNU tool called gperf(?).

! This means we should also be able to compress such databases perfectly
! in the first place? I know of *no* "perfect" hashing functions, because
! they are, by definition, "lossy" because you are reducing the number of
! bits used to represent the original value. And there is no way to do this
! sort of "hashing" in a lossless manner. There are a few tricks such as
! culling illegal positions, and limiting one of the pieces to a few squares
! and using rotating/reflecting/etc to make things smaller.

! The problem here is that we already have the "Godel" number represented as
! 30 bits for a 5-piece ending, which is a deep compression from the number
! of bits required to store a full chess position. So we've already done a
! lot of compressing to get to 30. Going below 30 seems difficult...

Actually, both SJE and Thompson have gone below 30 :-)
More seriously, perfect hashing is easy if you don't need
to reduce the number of bits. Look at it as shuffling a
deck with 2^30 cards. If you do this without losing any
cards, or getting any glued together and you remember
exactly how you moved them, then there has been no loss
of information.
The reason to hash would be to get an even distribution
of the positions that needs to be corrected. Maybe that
is not necessary so forget about hashing.

a = g>>8; /* a is 22 bits */
b = g&0xff; /* b is 8 bits */

Use a to access a structure with corrections for positions
(a<<8) .. (a<<8)+0xff
Now when we find a correction in the structure we found
through a, we already know the top 22 bits and so, only
need to store the remaining 8 bits. With some run length
encoding (RLE) on average 4 bits per correction should
be enough for the case of 90% prediction rate.
With greater prediction, more bits are needed per instance
for fewer instances.
For example, with 95% prediction, 5 bits per correction
would be needed but only half as many of those so you
need 5/(4*2) => 62.5% of the space compared to 90% prediction.
When you get 97.5% prediction, you need 6/(4*4) => 37.5%
of the space compared to 90% or, 6/(5*2) => 60% of the
space compared to 95% prediction.
The number of bits for storing the value of the corrections
would remain constant.
It might be useful to calculate the false prediction so
that the fact that the position does not have that value
can be used to encode the value in less space.
In particular, if you only store (win, no win), excluding
one leaves only one, and a single possible code needs
zero bits to be encoded.
It should be much easier to predict whether a position
is a win than to predict the distance to the win, and
that prediction is all that a chess program should need
at its tip nodes.
At the root, distance to mate or conversion or game50
would be needed, but then, at the root, there should
be plenty of time for accessing the database, which
may even be kept at a different location and queried
over the net.

Good night! I hope I didn't get too unclear,
Urban Koistinen soon to lie asleep

Cameron Hayne

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

to

In article <342bb181...@nntp.a001.sprintmail.com>,
co...@sprintmail.com (Dave) wrote:

>To me the question is do we want to program a computer that's great at
>*calculating* the best move in an endgame, or do we want to program
>the computer to *look up* the best move in a massive database.

We want to do whichever makes the computer a stronger chess player.
I.e. whichever results in the computer winning more games.

>Do you expect the program to play perfectly in the endgame? Why?

It is not a question of expectations. It is a question of what is
possible. If it is possible to have databases that result in perfect
endgame play (for those cases falling within the scope of the database),
then of course we want the computer to use these databases.

> Endgame databases are
>a crutch that should be removed from tournament playing ASAP. If the
>chess program itself can't find the win in the endgame position, why
>should it just be able to look it up? This could wind up in the
>ludicrous position where, whichever program has the largest capacity
>disk array (to hold the largest database), will be the WMCCC champ!

Ummh, what about CPU speed ? As in: "Whichever ... has the fastest CPU
(to compute the most variations), will be the WMCCC champ".
A disk drive is part of a computer just as a CPU and RAM are.
Programmers can shift resource usage back and forth to optimize
with respect to various criteria. And it seems silly to put
arbitrary limits on usage of one or another of these resources.

The real issue here is what are the rules for computer chess ?
What resources are computer programs allowed to make use of ?
I would say that "the rules" permit anything that is available.
I.e no holds barred. If spending a million dollars on disk space
and filling it up with pre-computed information helps the
program play better chess, that's just fine.
But I guess the line should be drawn at the use of human capabilities
to help the program. Current technology only allows human input
via relatively obvious and externally discernable means:
keyboard, mouse & other pointing devices, voice recognition, eye tracking.
In the future it might be possible to have direct input from the brain -
grandmasters might be able to share their "vibes" about a position
with the program. At that point, it might start getting difficult
to distinguish where the human ends and where the machine begins
(cf. the Borg of Star Trek) - but for now and for the forseeable future,
we can distinguish.

And so "the rules" are clear:
Make the best use of any and all computer hardware you can afford
to create a machine that plays chess. How it plays chess we don't care
- we just care how *well* it plays chess.

--
Cameron Hayne
Centre de recherche informatique de Montreal

Dave

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

To me the question is do we want to program a computer that's great at
*calculating* the best move in an endgame, or do we want to program
the computer to *look up* the best move in a massive database.

Having the computer calculate the moves (even if every move is not
optimal!) is far preferable to me. I want the program to be a *problem
solver* in the game, not a darn reference librarian whose big talent
is looking up the fruits of some other program's work!

Do you expect the program to play perfectly in the endgame? Why? It

doesn't play perfectly in the opening, or in the middlegame. (And if
it did play perfectly in these area's we probably wouldn't know it as
perfect, only as "very good"). Chess programs already play the endgame
well, and as we develop more skill in programming, and the hardware
continues to improve, so will the level of play. Endgame databases are

a crutch that should be removed from tournament playing ASAP. If the
chess program itself can't find the win in the endgame position, why
should it just be able to look it up? This could wind up in the
ludicrous position where, whichever program has the largest capacity
disk array (to hold the largest database), will be the WMCCC champ!

*Now* we can talk about shrinking the endgame tablebases to the size
they really should be - zero!!

Endgame databases are a very ingenious work. All those who believe
they'll just willy-nilly shrink it right on down there - well,
Think again! If the positions could be categorized except by the
specific dynamics of each position - we would have solved the whole
game of chess decades ago.

These tablebases are 100% muscle IMHO, with no fat anywhere in sight.
(if you doubt it, read up on them). They're great for solving puzzles,
analyzing other's endgame play, and of course, as tutors in proper
endgame play for yourself, or your program. I don't think they should
be allowed for tournament playing programs, however. The tournaments
are serious competition for the programs. No crutches allowed.

Regards,

Dave

Robert Hyatt

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Urban Koistinen (m10...@abc.se) wrote:

yes. I had mentioned the rotating/reflecting trick above, which trims this
a bit, but from 30 bits we might get rid of 2 by reducing the squares for a
king from 64 to 10...

: More seriously, perfect hashing is easy if you don't need

: to reduce the number of bits. Look at it as shuffling a
: deck with 2^30 cards. If you do this without losing any
: cards, or getting any glued together and you remember
: exactly how you moved them, then there has been no loss
: of information.
: The reason to hash would be to get an even distribution
: of the positions that needs to be corrected. Maybe that
: is not necessary so forget about hashing.

: a = g>>8; /* a is 22 bits */
: b = g&0xff; /* b is 8 bits */

: Use a to access a structure with corrections for positions
: (a<<8) .. (a<<8)+0xff
: Now when we find a correction in the structure we found
: through a, we already know the top 22 bits and so, only
: need to store the remaining 8 bits. With some run length
: encoding (RLE) on average 4 bits per correction should
: be enough for the case of 90% prediction rate.

I don't disagree with you at all, but I simply don't know how
to "distribute" the positions that cause problems, because if
you look at them it is probably going to happen that the ones
that cause problems are all somehow "odd" or "exceptional"... but
there won't be something that stands out to say "I'm an exception
because I am two files away from the king" or some such thing.
Possible for sure, but doing so certainly leaves me foggy. :)

: With greater prediction, more bits are needed per instance

nope.. makes sense. although I'm not sure I have any idea of where
to start. Of course, we'd first need the NN to recognize some of the
won positions to see what ended up as "exceptions"...

Urban Koistinen

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

25 Sep 1997 01:01:43 GMT Robert Hyatt wrote:
! Urban Koistinen (m10...@abc.se) wrote:
! : 24 Sep 1997 18:40:51 GMT Robert Hyatt wrote:
! : ! Rolf Czedzak (r...@sensecom.de) wrote:
! : ! : Robert Hyatt wrote: <608gu6$9...@juniper.cis.uab.edu>
! : ! : > Urban Koistinen (m10...@abc.se) wrote:

! : ! : > : ! positions, and to answer that requires
! : ! : > : ! some sort of search. For larger
! : ! : > : ! databases like 5-piece endings the size
! : ! : > : ! of an entry goes up of course...
! : ! : >

! : ! : > : Heard about hash tables?
! : ! : >
! : ! : > Heard about collisions? Can't stand 'em here.

! : ! : SHould be possible to generate _perfect_ hash function(s) for known
! : ! : (fixed&finite -nothing a 4/5men ending would vulnerate) input domain.
! : ! : There's a GNU tool called gperf(?).

! : ! This means we should also be able to compress such databases perfectly
! : ! in the first place? I know of *no* "perfect" hashing functions, because
! : ! they are, by definition, "lossy" because you are reducing the number of
! : ! bits used to represent the original value.
! : ! And there is no way to do this
! : ! sort of "hashing" in a lossless manner. There are a few tricks such as
! : ! culling illegal positions,
! : ! and limiting one of the pieces to a few squares
! : ! and using rotating/reflecting/etc to make things smaller.

! : ! The problem here is that we already have
! : ! the "Godel" number represented as
! : ! 30 bits for a 5-piece ending, which is
! : ! a deep compression from the number
! : ! of bits required to store a full chess position.
! : ! So we've already done a
! : ! lot of compressing to get to 30.
! : ! Going below 30 seems difficult...

! : Actually, both SJE and Thompson have gone below 30 :-)

! yes. I had mentioned the rotating/reflecting trick above, which trims this
! a bit, but from 30 bits we might get rid of 2 by reducing the squares for a
! king from 64 to 10...

! : More seriously, perfect hashing is easy if you don't need
! : to reduce the number of bits. Look at it as shuffling a
! : deck with 2^30 cards. If you do this without losing any
! : cards, or getting any glued together and you remember
! : exactly how you moved them, then there has been no loss
! : of information.
! : The reason to hash would be to get an even distribution
! : of the positions that needs to be corrected. Maybe that
! : is not necessary so forget about hashing.

! : a = g>>8; /* a is 22 bits */
! : b = g&0xff; /* b is 8 bits */

! : Use a to access a structure with corrections for positions
! : (a<<8) .. (a<<8)+0xff
! : Now when we find a correction in the structure we found
! : through a, we already know the top 22 bits and so, only
! : need to store the remaining 8 bits. With some run length
! : encoding (RLE) on average 4 bits per correction should
! : be enough for the case of 90% prediction rate.

! I don't disagree with you at all, but I simply don't know how
! to "distribute" the positions that cause problems, because if
! you look at them it is probably going to happen that the ones
! that cause problems are all somehow "odd" or "exceptional"... but
! there won't be something that stands out to say "I'm an exception
! because I am two files away from the king" or some such thing.
! Possible for sure, but doing so certainly leaves me foggy. :)

For now, don't worry about distributing the exceptions.
When you test the prediction function against the database
you get a new binary database that for each position says
if it is needs to be corrected.
As few bits would be set, put them in a sparse matrix,
and use the standard techniques to weight savings in space
against costs of decoding the matrix.

! : With greater prediction, more bits are needed per instance
! : for fewer instances.
! : For example, with 95% prediction, 5 bits per correction
! : would be needed but only half as many of those so you
! : need 5/(4*2) => 62.5% of the space compared to 90% prediction.
! : When you get 97.5% prediction, you need 6/(4*4) => 37.5%
! : of the space compared to 90% or, 6/(5*2) => 60% of the
! : space compared to 95% prediction.
! : The number of bits for storing the value of the corrections
! : would remain constant.
! : It might be useful to calculate the false prediction so
! : that the fact that the position does not have that value
! : can be used to encode the value in less space.
! : In particular, if you only store (win, no win), excluding
! : one leaves only one, and a single possible code needs
! : zero bits to be encoded.
! : It should be much easier to predict whether a position
! : is a win than to predict the distance to the win, and
! : that prediction is all that a chess program should need
! : at its tip nodes.
! : At the root, distance to mate or conversion or game50
! : would be needed, but then, at the root, there should
! : be plenty of time for accessing the database, which
! : may even be kept at a different location and queried
! : over the net.

! : Good night! I hope I didn't get too unclear,
! : Urban Koistinen soon to lie asleep

! nope.. makes sense. although I'm not sure I have any idea of where
! to start. Of course, we'd first need the NN to recognize some of the
! won positions to see what ended up as "exceptions"...

Right, that is where to start.
Using NN or something else.

Urban Koistinen

Marcel van Kervinck

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Dave (co...@sprintmail.com) wrote:
> These tablebases are 100% muscle IMHO, with no fat anywhere in sight.
> (if you doubt it, read up on them). They're great for solving puzzles,
> analyzing other's endgame play, and of course, as tutors in proper
> endgame play for yourself, or your program. I don't think they should
> be allowed for tournament playing programs, however. The tournaments
> are serious competition for the programs. No crutches allowed.

Why not? If an author has computed them by himself, I see no
reason to exclude them. Using someone else's endgame databases
is another matter. One could successfully argue that in that
case he isn't the sole author of the chess program.

Not that it matters, though, because they are in the public domain.
Everyone has access to them, so there is no unfairness at all.

Marcel
-- _ _
_| |_|_|
|_ |_ Marcel van Kervinck
|_| mar...@stack.nl

Dr A. N. Walker

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Robert Hyatt wrote:
> nope.. makes sense. although I'm not sure I have any idea of where
> to start. Of course, we'd first need the NN to recognize some of the
> won positions to see what ended up as "exceptions"...

People have been guessing at 10% "exceptions". In the case
of KQvKR, the work by Stephen Milner and myself reported to ACC8
shows that this ending can be solved by simple heuristics with only
a few hundred "exceptions", around 0.02%; even these are all cases
where simple loop-breaking strategies would help. Some other 4-piece
endings are certainly harder, and heuristics aren't neural nets, of
course.

Again in the specific case of KQvKR, if all you want to know
if whether the position is won, drawn or lost, this is an essentially
trivial decision: it is always a win for the Q unless there is a
more-or-less immediate draw/loss [easily spotted either by heuristics
or by neural nets]. Some other endings will be much harder; there
seems to be no easy way of knowing whether KRvKN is winning or not,
for example, and little progress [AFAIK] towards sets of heuristics
that would help significantly.

--
Andy Walker, Maths Dept., Nott'm Univ., UK.
a...@maths.nott.ac.uk

brucemo

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Marcel van Kervinck wrote:

> Why not? If an author has computed them by himself, I see no
> reason to exclude them. Using someone else's endgame databases
> is another matter. One could successfully argue that in that
> case he isn't the sole author of the chess program.
>
> Not that it matters, though, because they are in the public domain.
> Everyone has access to them, so there is no unfairness at all.

Crafty is not quite in the public domain, but you could make the
argument that this is the same sort of thing. Everyone has access to
it. So why shouldn't anyone be allowed to compete with it?

bruce

brucemo

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Dave wrote:
>
> To me the question is do we want to program a computer that's great at
> *calculating* the best move in an endgame, or do we want to program
> the computer to *look up* the best move in a massive database.

I can write a function that helps in KRP vs KR endings. The function
says: "Stay in front of the pawn," in computerese.

During the game, this function is executed, the program stays in front of
the pawn (or drives the opposing king away from the pawn) and draws (or
wins). Maybe.

The program of course didn't figure any of this out for itself. It may
have some vague idea to put the king in front of the pawn, without this
function, but it's a much better idea to tell it to do this in stronger
terms. Is it illegal that I gave it this hint?

I also spent some time building endgame databases. The one for KRP vs KR
took 2 days to build, as well as a tremendous amount of development time,
and these made the above function obsolete.

I don't see what the difference between these two approaches is, really.
I created both of them. It's just that the second approach works better.

bruce

Urban Koistinen

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

25 Sep 1997 13:13:17 +0200 Marcel van Kervinck wrote:
! Dave (co...@sprintmail.com) wrote:
! > These tablebases are 100% muscle IMHO, with no fat anywhere in sight.
! > (if you doubt it, read up on them). They're great for solving puzzles,
! > analyzing other's endgame play, and of course, as tutors in proper
! > endgame play for yourself, or your program. I don't think they should
! > be allowed for tournament playing programs, however. The tournaments
! > are serious competition for the programs. No crutches allowed.

! Why not? If an author has computed them by himself, I see no
! reason to exclude them. Using someone else's endgame databases
! is another matter. One could successfully argue that in that
! case he isn't the sole author of the chess program.

Same argument would apply to using games played by others when
building the opening book.

! Not that it matters, though, because they are in the public domain.
! Everyone has access to them, so there is no unfairness at all.

I don't have access to the 5-man of SJE.
When I had arranged for getting them the way SJE first suggested,
he changed distribution policy and sent them to Bob who was
supposed to make them available over the net. They are still
unavailable. Now Recordable CDs have gotten cheap, but still
no 5-man. If I had them, I could likely make copies and send
them anywhere for $10/CD.

Urban Koistinen

Robert Hyatt

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

Urban Koistinen (m10...@abc.se) wrote:

Steven only did a couple. He was involved building a new house, and
then with machine problems. He never got to the "interesting" cases
like KRP vs KR. I do have a couple of his, KRN vs KR, and (I think)
1/2 of the KRB vs KR pair. They are a problem to download however,
although if you want what I have, I'll certainly make the available..

Bruce Moreland has done a nice tablebase constructor that is fast as
all hell, taking about 2 days for the worst case I have tried so far.
Most 5 man databases are a 24 hour run on a P6/200 with *at least*
64 megs of RAM...

I'm trying to find time to work on the probe code for them so they can
be added to Crafty as an option, then you can use either Steven's format
or Bruce's..

Unfortunately, "Paris calls..." :)

: Urban Koistinen

Dave

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

On Thu, 25 Sep 1997 09:27:06 -0700, brucemo <bru...@seanet.com>
wrote:

>Dave wrote:
>>
>> To me the question is do we want to program a computer that's great at
>> *calculating* the best move in an endgame, or do we want to program
>> the computer to *look up* the best move in a massive database.

<The paragraph below is highly edited and out of context. Poses the
<questions well, though. Please see Bruce's original post for actual
<text.

>I don't see what the difference between these two approaches is, really.
>I created both of them. It's just that the second approach works better.
>

> Is it illegal that I gave it this hint?

>
>bruce

Short answer:
In computer vs. computer - no.

In human vs. computer - yes.

Reason:
Computer has access to huge pre-computed database. Human has access
to no database over the board.

Anyone caught comparing human memory to an exact, huge, endgame
database will be reminded of the uproar said databases caused when
they showed how poorly GM's played compared to the database. Also, you
might be a significant source of derisive laughter. Don't go there,
please!

In chess competitions which have program vs. program, tablebases are
fine! The playing field is level, it's library vs. library (among
other things) so use it - it's a fair and sporting contest.

Where it's program vs *human*, however - that's a different matter.
Any heuristic you can program in is fine - any tablebase, database,
etc. for the contest - is a cheat, *pure and simple*!

It's a lot like the advent of the motor car. When they were brand new,
it was entertaining (and quite fair) to have them race against some of
the better horses in town, even the bicyclist! But what about now?
Would anyone want to see a race between a car and a horse? Would any
bicyclist agree to race against a car? Would anyone pay to see it?
Hell, no! It wouldn't be fair!

I think it's the same way with databases and chess tournaments against
human players. They'll be interesting for a while, but when the
databases become more comprehensive and available, it will lose it's
sporting attraction. The simple truth is, it won't be fair.

Please note I'm not saying programs shouldn't use them in matches
against other programs! (and in the short term, against human
opponents, as well). Soon, however, that will change.

If any one out there can't see any difference between a heuristic
which guides the search/eval8 *during* a match, and a completed
database done *before* the game begins, let me know.

Your enlightenment will begin with 30 days of intensive counseling by
R.T.!! :))

That should silence *all* my critics! (LOL)

Regards,

Dave

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

On 25 Sep 1997 21:50:48 GMT, TUESCHEN.MEDIZ...@t-online.de
(Rolf Tueschen) wrote:

>You forgot about me.
>
>What was your exact wish? Something "intense"? Gimmi a call. :)
>
>The Pope Himself
>
Not at all, Rolf! You just do what you do best:

Drive them crazy by giving them "advice??"
and rambling discourse.

Your gift will finally be appreciated, Rolf!

Dave

Rolf Tueschen

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

to

co...@sprintmail.com (Dave) wrote:

>If any one out there can't see any difference between a heuristic
>which guides the search/eval8 *during* a match, and a completed
>database done *before* the game begins, let me know.

>Your enlightenment will begin with 30 days of intensive counseling by
>R.T.!! :))

>That should silence *all* my critics! (LOL)

You forgot about me.

What was your exact wish? Something "intense"? Gimmi a call. :)

The Pope Himself

>Regards,

>Dave

Steven J. Edwards

unread,

Sep 26, 1997, 3:00:00 AM9/26/97

to

hy...@crafty.cis.uab.edu (Robert Hyatt) writes:

>Urban Koistinen (m10...@abc.se) wrote:
>: 25 Sep 1997 13:13:17 +0200 Marcel van Kervinck wrote:
>: ! Dave (co...@sprintmail.com) wrote:
>: ! > These tablebases are 100% muscle IMHO, with no fat anywhere in sight.
>: ! > (if you doubt it, read up on them). They're great for solving puzzles,
>: ! > analyzing other's endgame play, and of course, as tutors in proper
>: ! > endgame play for yourself, or your program. I don't think they should
>: ! > be allowed for tournament playing programs, however. The tournaments
>: ! > are serious competition for the programs. No crutches allowed.

Ah, but what if the program itself constructed the tablebases in the
first place?

>: ! Why not? If an author has computed them by himself, I see no
>: ! reason to exclude them. Using someone else's endgame databases
>: ! is another matter. One could successfully argue that in that
>: ! case he isn't the sole author of the chess program.

No one since Turing (he was the first) can lay claim to being the sole
author of a chessplaying program. And computer generated exhuastive
endgame databases have been around since KRKN was built in 1970.

>: Same argument would apply to using games played by others when
>: building the opening book.

>: ! Not that it matters, though, because they are in the public domain.
>: ! Everyone has access to them, so there is no unfairness at all.

>: I don't have access to the 5-man of SJE.

My poor little ol' modem had enough of a hard time uploading the three
and four man tablebases.

>: When I had arranged for getting them the way SJE first suggested,
>: he changed distribution policy and sent them to Bob who was
>: supposed to make them available over the net. They are still
>: unavailable. Now Recordable CDs have gotten cheap, but still
>: no 5-man. If I had them, I could likely make copies and send
>: them anywhere for $10/CD.

You are welcome to build them yourself if you have access to the
resources needed. See the uuencoded gzip compressed tar archive
pub/chess/uploads/TB/tbgen.tar.gz at the chess.onenet.net ftp site.
This is the ANSI C tablebase constructor program which is free for all
to use.

>Steven only did a couple. He was involved building a new house, and
>then with machine problems. He never got to the "interesting" cases
>like KRP vs KR. I do have a couple of his, KRN vs KR, and (I think)
>1/2 of the KRB vs KR pair. They are a problem to download however,
>although if you want what I have, I'll certainly make the available..

While I've been living in the house for six months, I'm still
finishing it off. Coming in at some $60K over budget, it damn near
finished me off. So no new computers for another year or so. By then
they'll have affordable 750 MHz quad-Alpha Linux boxes -- which would
still be challenged by most of the six man tablebases.

The statistical reports (*.tbs text files) of all the five man classes
I generated can be found in the pub/chess/TB and pub/chess/uploads/TB
directories at the chess.onenet.net ftp site.

>Bruce Moreland has done a nice tablebase constructor that is fast as
>all hell, taking about 2 days for the worst case I have tried so far.
>Most 5 man databases are a 24 hour run on a P6/200 with *at least*
>64 megs of RAM...

>I'm trying to find time to work on the probe code for them so they can
>be added to Crafty as an option, then you can use either Steven's format
>or Bruce's..

After some further work, I 've come to see that my indexing scheme,
while simple, is too limiting for doing some (most? all?) of the six
man classes. There are problems with indexing 32 or more bits, there
are problems with representing extremely long mates/losses (i.e.,
greater than 250 ply), and various resource problems. These are going
to take more than a little while to solve. I have some ideas on this,
but I would like to get the opinions of other programmers of
exhuastive endgame database generators as they will likely see things
I haven't.

>Unfortunately, "Paris calls..." :)

I hope the call wasn't made collect. }:-0

-- Steven (s...@mv.mv.com)

Cameron Hayne

unread,

Sep 26, 1997, 3:00:00 AM9/26/97

to

In article <342ac49c...@nntp.a001.sprintmail.com>,
co...@sprintmail.com (Dave) wrote:

>Where it's program vs *human*, however - that's a different matter.
>Any heuristic you can program in is fine - any tablebase, database,
>etc. for the contest - is a cheat, *pure and simple*!

Dave, you are (IMHO) making an artificial distinction between two
different ways of programming. You seem to prefer calculation to
lookup for some reason. But for me, it is as if you are saying that
children should calculate multiplications from first principles like:
2 * 7
= 7 + 7
= 7 + (6+1)
= (7+1) + 6
= 8 + 6
= 8 + (5+1)
= (8+1) + 5
= 9 + 5
= 9 + (4+1) = ... etc
instead of doing a memory lookup to get the answer.

>It's a lot like the advent of the motor car. When they were brand new,
>it was entertaining (and quite fair) to have them race against some of
>the better horses in town, even the bicyclist! But what about now?
>Would anyone want to see a race between a car and a horse? Would any
>bicyclist agree to race against a car? Would anyone pay to see it?
>Hell, no! It wouldn't be fair!

Exactly. And to me, computer chess is *not* about creating computer
opponents for "fair" matches with humans, anymore than making cars
more powerful is aimed at running races against humans.
It is about creating the best chess playing program. Who cares about
the human chess players ? They were interesting to play against
during the early days of the technology, but considerations of
fairness have never entered into it.

It is another matter entirely - and not an uninteresting one
- to create computer programs that are deliberately crippled
or weakened so that human players have a chance. There are several
such attempts available for playing against on the Internet Chess Club.

>If any one out there can't see any difference between a heuristic
>which guides the search/eval8 *during* a match, and a completed
>database done *before* the game begins, let me know.

Before you start that unmentionable punishment, let me assure you that
I do see the difference. I just don't *care* about the difference.
All's fair in love and (computer) chess.

Urban Koistinen

unread,

Sep 26, 1997, 3:00:00 AM9/26/97

to

25 Sep 1997 23:54:59 GMT Robert Hyatt wrote:
! Urban Koistinen (m10...@abc.se) wrote:

! : I don't have access to the 5-man of SJE.
! : When I had arranged for getting them the way SJE first suggested,
! : he changed distribution policy and sent them to Bob who was
! : supposed to make them available over the net. They are still
! : unavailable. Now Recordable CDs have gotten cheap, but still
! : no 5-man. If I had them, I could likely make copies and send
! : them anywhere for $10/CD.

! Steven only did a couple. He was involved building a new house, and
! then with machine problems. He never got to the "interesting" cases
! like KRP vs KR. I do have a couple of his, KRN vs KR, and (I think)
! 1/2 of the KRB vs KR pair. They are a problem to download however,
! although if you want what I have, I'll certainly make the available..

Will be interesting to see how fast the download goes.
I would be using a 500 Kbit/s connection.
How about trying this 1-2 weeks from now,
or would right after Paris be better?

! Bruce Moreland has done a nice tablebase constructor that is fast as
! all hell, taking about 2 days for the worst case I have tried so far.
! Most 5 man databases are a 24 hour run on a P6/200 with *at least*
! 64 megs of RAM...

I'd take those too, if they are available.
I think they should be made more generally
available, so that more people can tinker
with them without having to build them.
At least SJEs format is much simpler and
so better for tinkering with than KTs.

Urban Koistinen

Robert Hyatt

unread,

Sep 26, 1997, 3:00:00 AM9/26/97

to

Urban Koistinen (m10...@abc.se) wrote:

: 25 Sep 1997 23:54:59 GMT Robert Hyatt wrote:
: ! Urban Koistinen (m10...@abc.se) wrote:

: ! : I don't have access to the 5-man of SJE.
: ! : When I had arranged for getting them the way SJE first suggested,
: ! : he changed distribution policy and sent them to Bob who was
: ! : supposed to make them available over the net. They are still
: ! : unavailable. Now Recordable CDs have gotten cheap, but still
: ! : no 5-man. If I had them, I could likely make copies and send
: ! : them anywhere for $10/CD.

: ! Steven only did a couple. He was involved building a new house, and
: ! then with machine problems. He never got to the "interesting" cases
: ! like KRP vs KR. I do have a couple of his, KRN vs KR, and (I think)
: ! 1/2 of the KRB vs KR pair. They are a problem to download however,
: ! although if you want what I have, I'll certainly make the available..

: Will be interesting to see how fast the download goes.
: I would be using a 500 Kbit/s connection.
: How about trying this 1-2 weeks from now,
: or would right after Paris be better?

Tell me when. I have 'em compressed on a zip drive and can copy them
to our ftp machine. We have a T3 link out of here, so they should go
quickly for someone directly on the net. Notice that KRP vs KR is *not*
done. Nor KQR vs KR, and only 1/2 of KRB vs KR... so it is very
incomplete at present..

: ! Bruce Moreland has done a nice tablebase constructor that is fast as

: ! all hell, taking about 2 days for the worst case I have tried so far.
: ! Most 5 man databases are a 24 hour run on a P6/200 with *at least*
: ! 64 megs of RAM...

: I'd take those too, if they are available.
: I think they should be made more generally
: available, so that more people can tinker
: with them without having to build them.
: At least SJEs format is much simpler and
: so better for tinkering with than KTs.

Nice thing about Bruce's stuff is you can build 'em *way* faster than you
can download 'em... Because the downloading part is the pits...

: Urban Koistinen

brucemo

unread,

Sep 27, 1997, 3:00:00 AM9/27/97

to

Steven J. Edwards wrote:

> After some further work, I 've come to see that my indexing scheme,
> while simple, is too limiting for doing some (most? all?) of the six
> man classes. There are problems with indexing 32 or more bits, there
> are problems with representing extremely long mates/losses (i.e.,
> greater than 250 ply), and various resource problems. These are going
> to take more than a little while to solve. I have some ideas on this,
> but I would like to get the opinions of other programmers of
> exhuastive endgame database generators as they will likely see things
> I haven't.

As near as I can guess, you make your temporary file, write your stuff
to it assuming one byte per entry, and when you are done, your temporary
file becomes your permanent file.

That is a guess though, I've never looked at your stuff, but I get the
idea that you would have no reason to have a temporary file that was
distinct from your final product.

My thing writes into a temporary file, and when it is done, it writes a
new file using as few bits as are possible to store the largest value,
then deletes the temporary file.

My final product is still way larger than Thompson's format, but I would
assume faster to access. It is a little smaller than yours (especially
in 4-man classes), but I would assume a little slower to access, because
of the bit-aligned accesses.

One thing I have done is that I can pass a switch in to the builder,
which makes the temporary file use 16-bit elements, which should be
enough forever. The permanent file still writes the minimum number of
bits, same as the old way. So for 4-man classes, and most 5-man
classes, you get exactly the same output, you just use twice as much
temporary space. For the ones where you don't get exactly the same
output, you probably crash or get invisible bugs or something using
8-bit entries :-) I used 16-bit values to build KQP vs KQ, that one is
right at the edge (mate in 124 I believe), none of the rest that I have
tried have required more than 8 bits.

I experimented with the idea of allowing the number of bits to be even
more configurable in the temporary file, so for instance I could use 9
bits per element, which would be only slightly larger than 8 bits, but
which would work for most databases, until we start to do 6-man, which I
don't intend to do until computers get a lot cooler :-)

The problem with this was that those bit-aligned accesses ate up a lot
of time, so I went straight to 16, which is more efficiently accessed.

bruce

brucemo

unread,

Sep 27, 1997, 3:00:00 AM9/27/97

to

Dave wrote:

> Where it's program vs *human*, however - that's a different matter.
> Any heuristic you can program in is fine - any tablebase, database,
> etc. for the contest - is a cheat, *pure and simple*!

I think this is extremely arbitrary, since humans and computers behave so
differently. Their various components do not analogize well.

Since it is possible to include any static data as part of an executable,
what you have said is that certain algorithms are legal and certain are
not.

In an efort to get Graham in a tizzy (hopefully), you've also made any
data-mining algorithms illegal, IMHO :-)

bruce

Komputer Korner

unread,

Oct 2, 1997, 3:00:00 AM10/2/97

to

Botvinnik tried this and for 20 years got nowhere to the point that Hans
Berliner accused him of faking computer chess research.
--
- -
Komputer Korner

The inkompetent komputer

If you see a 1 in my email address, take it out before replying.
Please do not email both me and the r.g.c.c. at the same time. I read all
the postings on r.g.c.c.

Graham Douglass wrote in article <60vpph$g...@drn.zippo.com>...
> Instead of recoding every possible position, would it not be possible to
record
> trajectory algorithms?
>
> For example, any chess player faced with the prospect of chasing a pawn
to a
> queening position would look to see whether their king could "get into
the box".
>
> For positions with small numbers of pieces, it would intuitively seem
that there
> must be similar (but more complicated) trajectory calculations available.
>
> If, for a given piece combination of pieces, there are more than one
possible
> trajectory calculations, then one could try categorising the positions
into
> different trajectory calculations.
>
> This would obviously be messy and time consuming, but, if it were
possible,
> could it result in a data package that is smaller than a tablebase?
>

David Brinegar

unread,

Oct 2, 1997, 3:00:00 AM10/2/97

to

Graham, Douglass wrote:

> Instead of recoding every possible position, would it not be possible to record
> trajectory algorithms?
>
> For example, any chess player faced with the prospect of chasing a pawn to a
> queening position would look to see whether their king could "get into the box".
>
> For positions with small numbers of pieces, it would intuitively seem that there
> must be similar (but more complicated) trajectory calculations available.
>
> If, for a given piece combination of pieces, there are more than one possible
> trajectory calculations, then one could try categorising the positions into
> different trajectory calculations.
>
> This would obviously be messy and time consuming, but, if it were possible,
> could it result in a data package that is smaller than a tablebase?

I doubt it. But lets assume that we can make lean and mean trajectory algorithms
that generate accurate play given anything from a defined set of positions which you
might dump into it.

One problem is that these sets of positions are mostly unknown. Look at Nunn's book
on rook endings for example: here's an amazing grandmaster who culled new ideas
about basic rook endings from a tablebase, turned tablebases into algorithms, and he
says the task is barely begun. So you'd only get partial categorization of accurate
play where you can definitely say "in such a position you play thusly." Once you
got there you could finish the game in perfect form.

Ok so what do you do with the exceptions and the uncategorized positions? We have
to have a solution, otherwise we've lost data. Use a tablebase? Then at best we
are comparing the size of the known trajectories versus the chunk of tablebase they
displace (ignoring the referencing problem caused by deleting parts of a
tablebase.) How much tablebase went into each of the trajectory algorithms in
Nunn's book? My intuition says this is size-inefficient data storage, because the
sets of positions that are solved by an identical method seem small to me.

Also, how do you identify a position and send it to the right algorithm? It seems
to me you must have a reference from position to trajectories. So, can you
reference trajectory positions efficiently? Since tablebases have referencing built
in (the reason for their great size) any such referencing data at all goes against
the algorithm method. My intuition is convinced.

What you propose seems to me to be better as an endgame assistant than a way of
playing endgames perfectly. And you could write a book on your new discoveries!

Admittedly there is a lot of space in tablebases that looks wasted, that looks like
it will never arise in practical play, but I can't think of a more efficient,
lossless structure to cover all possible positions. Then again I'm a sucker for
simplicity. If size really is a prohibitive factor, why not just compress them?

David Brinegar