Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Crafty's opening book - technical details

39 views
Skip to first unread message

Robert Hyatt

unread,
Oct 4, 1996, 3:00:00 AM10/4/96
to

Here's the "crafty" way to create a large book file that lets me get
away with doing only one I/O operation to read in the entire set of
legal moves for a position:

First, I parse the moves one by one and as I step down the book line,
I use MakeMove() to update the hash key after each move as you might
expect. However, here's the "twist" I use. I take the lower 48 bits
from this "new" hash signature, and the upper 16 bits from the previous
hash signature to form the hash signature that I write out to a file in
order as they are produced. The interesting thing to note is that all
positions with the same "parent" position will agree in the high-order
16 bits. Of course, other positions will also agree in the high-order
bits too, but this doesn't hurt, as explained later.

Next, I take this huge set of hash signatures (one per PGN move of
course) and sort, which collects duplicate positions together. I
then "squeeze" out the duplicates (you can count duplicates if you
want, as I do, and also if you carry a win/loss/draw flag with each
signature, derived from the PGN result field, you can count wins,
draws and losses as you eliminate these duplicates.

The format of the book file is a set of indices that take me to
specific locations in the book file (currently 32,768 of these
"pointers" although this is a #define option.) index '0' points
into this large book file to a "cluster". A "cluster" is a group
of positions which match in the upper 16 bits, with a count as the
first word telling how many positions are in this cluster. If you
followed the way I formed the hash signature for the book file,
you will notice that a cluster is simply all moves that have a
parent position with the same 16 upper bits. So, index '0'
then points to cluster 0, which is all book positions with the
upper 16 bits = 0, index '1' points to cluster 1, which is all
book positions with the upper 16 bits = 0000 0000 0000 0001, and
so forth, all the way to cluster 32767 which points to the cluster
with all of the upper 16 bits = 0111 1111 1111 1111.

Very simple and compact.

To access this quickly, I call Book() from the current position. Book
simply takes the upper 16 bits of the current hash key, and uses that
to locate and read in the cluster that matches these 16 bits. It then
generates and makes each legal move, and looks in the cluster to see if
it finds a match with the rightmost 48bits of the current hash signature,
and the rightmost 48bits of the cluster hash signature. If so, this is
a known book move.

The only minor difficulty here is that the cluster may have several hundred
book positions in it, because a cluster contains *all* known book positions
for any parent position with the same upper 16 bits. If you take a book
with 8,000,000 positions, then each cluster (optimally) will have
8,000,000 / 32,768 which is about 250 positions, assuming the positions
are evenly distributed across all "clusters." My current book has
3,739,564 unique positions after trimming all lines to 50 plies, max.
This produced a max cluster size of 166.

So, to find any move from the current position, I read in one cluster,
and search it sequentially for any hash signatures that match a hash
signature produced by making a legal move in this position. Many of
the moves in the cluster don't go with this position of course, because
they are from parents with the same upper 16 bits, but (possibly) different
lower 48 bits. However, looping over 168 doublewords, or reading in 168
doublewords + the win/loss/draw counts takes *no* time, particularly when
compared to the time required for doing a fseek(). Since I do exactly one
fseek() to find this, it is just about instantaneous.

My book size is then 32768 ints (one word per cluster for the cluster index
map), another 32768 ints because each cluster has as its first entry a count
of the number of actual positions in that cluster, plus N * the size of a
single position, where N is the number of unique positions. For crafty, a
position is 16 bytes, 8 for the signature (I know, the upper 16 is not needed
and if I find a use for these bits, I'll use 'em for something else.. :) )
2 for 16 boolean flags (don't ever play, always play, etc.) and 6 for 3 2-byte
counts for wins, draws and losses for each move to help crafty choose which
one to play.

If that's too complicated, let me know and I'll try again. The idea is really
trivial, the explanation is not.

The only flaw in this method is that "adding" is impossible in the current
format, because there's no space to add. You could use the old database
approach of specifying a "fill percent" so that each cluster would have an
extra 10% added to it to allow space to insert new moves, but I don't do
this because I don't add. On the P6/200, this entire process takes less than
10 minutes to parse the 200K PGN games, create and sort the keys, and build
the book file. Quick enough that I have not yet worried about working on an
add to book facility. BTW, this win/loss/draw stuff is new to crafty v10.x,
so everyone might as well get ready to re-create the book files when 10.x
becomes available. :) However, it gives some interesting options, such as
playing the move with the best winning percentage, the best drawing chances,
etc. Since it is now easy to compute these with the counts available.

There are some "flaws" with using these counts that I'll discuss when 10.x
becomes public in a week or so, and we might generate a lively debate about
how to use the info that Crafty now has, when trying to choose a reasonable
book line.

Discussion or questions always welcome of course.

Bob


Walter Ravenek

unread,
Oct 5, 1996, 3:00:00 AM10/5/96
to

In article <5336ks$2...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
(Robert Hyatt) wrote:

[snip]


> Very simple and compact.
>
> To access this quickly, I call Book() from the current position. Book
> simply takes the upper 16 bits of the current hash key, and uses that
> to locate and read in the cluster that matches these 16 bits. It then
> generates and makes each legal move, and looks in the cluster to see if
> it finds a match with the rightmost 48bits of the current hash signature,
> and the rightmost 48bits of the cluster hash signature. If so, this is
> a known book move.

[snip]

Your discussion is quite clear.

My question with this approach is: are you not bothered with the
transpositions you miss? Of course there are problems with not
storing the "history" of the moves: the book may cause the program
to play an inferior move because that way it transposes to a known
position.
( A nice example: suppose you have french in your book. Your opponent
has white and plays 1. d4 d5 2. e4. Now you the program will play
2 ... e6 instead of taking the pawn. )

Walter Ravenek

Robert Hyatt

unread,
Oct 5, 1996, 3:00:00 AM10/5/96
to

Walter Ravenek (rav...@chem.vu.nl) wrote:
: In article <5336ks$2...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
: (Robert Hyatt) wrote:
:
: [snip]
: > Very simple and compact.

: >
: > To access this quickly, I call Book() from the current position. Book
: > simply takes the upper 16 bits of the current hash key, and uses that
: > to locate and read in the cluster that matches these 16 bits. It then
: > generates and makes each legal move, and looks in the cluster to see if
: > it finds a match with the rightmost 48bits of the current hash signature,
: > and the rightmost 48bits of the cluster hash signature. If so, this is
: > a known book move.
:
: [snip]

:
: Your discussion is quite clear.
:
: My question with this approach is: are you not bothered with the
: transpositions you miss? Of course there are problems with not
: storing the "history" of the moves: the book may cause the program
: to play an inferior move because that way it transposes to a known
: position.
: ( A nice example: suppose you have french in your book. Your opponent
: has white and plays 1. d4 d5 2. e4. Now you the program will play
: 2 ... e6 instead of taking the pawn. )
:
: Walter Ravenek

I don't miss *any* at all, because I use the hash signature to look up
positions, and the database is just like a large hash table. The only
difficulty would be the following:

1. book line: e4 e5 nf3 nc6

2. game actually goes Nf3, there's no guarantee crafty will play Nc6
so that if opponent then plays e4 it would find e5 and go back into
book. However *if* crafty plays Nc6 since it has no book line, and *if*
the opponent plays e4, then crafty transposes right back into the book,
because the hash signature after 2. ... e5 will be found. I keep a window
of 5 moves so that once I leave book, I keep probing the database for the
next 5 moves to transpose back if posible. It happens all the time, too.
I've seen it go out, back in 4 moves later, back out, back in, repeating
for a while. Hartman once played a game, and at the point where he
resigned, crafty was back in book for the 8th time (or so) at move 45...


To solve your problem above, in tournament mode Crafty uses a short search
of all non-book positions to look for this. It happens lots of times, and
has been the subject of discussion for years. Example:

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 Ba4

game goes:

1. e4 e5 2. Bb5 a6 4. Nf3 and black promptly plays Nc6 rather than
ripping the bishop, because Nc6 goes back into book. The short search
solves this completely because it would notice that there's a move that
seems to win material at a fairly quick depth, so crafty ignores the
transposition back into book and does a normal search. If the move wins
the piece after a long search, it rips it...


Walter Ravenek

unread,
Oct 7, 1996, 3:00:00 AM10/7/96
to

In article <535qme$3...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
(Robert Hyatt) wrote:

[snip]

> : My question with this approach is: are you not bothered with the


> : transpositions you miss? Of course there are problems with not
> : storing the "history" of the moves: the book may cause the program
> : to play an inferior move because that way it transposes to a known
> : position.
> : ( A nice example: suppose you have french in your book. Your opponent
> : has white and plays 1. d4 d5 2. e4. Now you the program will play
> : 2 ... e6 instead of taking the pawn. )
> :
> : Walter Ravenek
>
> I don't miss *any* at all, because I use the hash signature to look up
> positions, and the database is just like a large hash table.

Then I misunderstand. I thought you used 16 bits of the hash key from
the current position to force clustering of moves from a given position,
so that you need only one disk access. How do you find the move if the
position you search from is different than the one from which the book
move was made?

>> To solve your problem above, in tournament mode Crafty uses a short search
> of all non-book positions to look for this. It happens lots of times, and
> has been the subject of discussion for years. Example:
>
> 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 Ba4
>
> game goes:
>
> 1. e4 e5 2. Bb5 a6 4. Nf3 and black promptly plays Nc6 rather than
> ripping the bishop, because Nc6 goes back into book. The short search
> solves this completely because it would notice that there's a move that
> seems to win material at a fairly quick depth, so crafty ignores the
> transposition back into book and does a normal search. If the move wins
> the piece after a long search, it rips it...

I also do that. My program gives book moves a bonus of 0.5 pawns. Then it
searches for a preset time. If the book move is still on top, it accepts
the book move. If another move is on top, it continues searching and after
each iteration it checks again whether the book move is on top. This way
I also save some time if the program "understands" (i.e., doesn't think it
fails terribly) the book move after a somewhat deeper search.
The problem I quoted happens if I switch this mechanism off.

Walter Ravenek

Martin Borriss

unread,
Oct 7, 1996, 3:00:00 AM10/7/96
to

In article <535qme$3...@juniper.cis.uab.edu>,

hy...@crafty.cis.uab.edu (Robert Hyatt) writes:
>Walter Ravenek (rav...@chem.vu.nl) wrote:
>: In article <5336ks$2...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
>: (Robert Hyatt) wrote:
>: >
>: > To access this quickly, I call Book() from the current position. Book

>: > simply takes the upper 16 bits of the current hash key, and uses that
>: > to locate and read in the cluster that matches these 16 bits. It then
>: > generates and makes each legal move, and looks in the cluster to see if
>: > it finds a match with the rightmost 48bits of the current hash signature,
>: > and the rightmost 48bits of the cluster hash signature. If so, this is
>: > a known book move.
>:
>: My question with this approach is: are you not bothered with the
>: transpositions you miss?

[...]


>
>I don't miss *any* at all, because I use the hash signature to look up

>positions, and the database is just like a large hash table. The only
>difficulty would be the following:
>
>1. book line: e4 e5 nf3 nc6
>
>2. game actually goes Nf3, there's no guarantee crafty will play Nc6
>so that if opponent then plays e4 it would find e5 and go back into
>book. However *if* crafty plays Nc6 since it has no book line, and *if*
>the opponent plays e4, then crafty transposes right back into the book,
>because the hash signature after 2. ... e5 will be found. I keep a window
>of 5 moves so that once I leave book, I keep probing the database for the
>next 5 moves to transpose back if posible. It happens all the time, too.
>I've seen it go out, back in 4 moves later, back out, back in, repeating
>for a while. Hartman once played a game, and at the point where he
>resigned, crafty was back in book for the 8th time (or so) at move 45...
>

Thank you for the explanation. Three questions:

1) I did not quite understand why your approach recognizes transpositions.
I thought of the following scenario:

step 1) "Bookbuilding":
You have the line 1.e4 e5 2.Nf3.
You have a 64-bit hashkey for the position above.

There are two moves here: 2...Nc6 and 2...d6. You execute those moves and
have corresponding hashkeys.

For both moves you take the high 16 bit from the parent and the lower 48 bit
of the children. Quite clear. They go into the same 'cluster' in your book
file.

step 2) Looking up a move, transposition:
Assume we have played 1.Nf3 Nc6 2.e4 so far. Now Crafty should be able to
recognize 2...e5 as transposition into the book. Can it be found?

You have the hash signature of "1.Nf3 Nc6 2.e4" position. You take the
higher 16 bit to find the right cluster to search through. But why are we
in the correct cluster? The cluster number we got from "1.Nf3 Nc6 2.e4" has
to match the cluster number of "1.e4 e5 2.Nf3" to find our transposition
back into the book?! I can't see this happen?!

What am I missing here?

2) Which way do you get the flags for the book moves? Is this based on
statistics you do during bookbuilding? It would be nice to read comments
and base the flags on those.

3) You said the book isn't modifiable since it is easy to rebuild. What
happens if you discover that Crafty dislikes a certain opening move which
has a flag saying 'playable'? Can you easily 'patch' the book then?

Martin

--
Martin....@inf.tu-dresden.de

Robert Hyatt

unread,
Oct 7, 1996, 3:00:00 AM10/7/96
to

Martin Borriss (bor...@os.inf.tu-dresden.de) wrote:
: In article <535qme$3...@juniper.cis.uab.edu>,

: hy...@crafty.cis.uab.edu (Robert Hyatt) writes:
: >Walter Ravenek (rav...@chem.vu.nl) wrote:
: >: In article <5336ks$2...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
: >: (Robert Hyatt) wrote:
: >: >
: >: > To access this quickly, I call Book() from the current position. Book

: >: > simply takes the upper 16 bits of the current hash key, and uses that
: >: > to locate and read in the cluster that matches these 16 bits. It then
: >: > generates and makes each legal move, and looks in the cluster to see if
: >: > it finds a match with the rightmost 48bits of the current hash signature,
: >: > and the rightmost 48bits of the cluster hash signature. If so, this is
: >: > a known book move.
: >:

Nothing. When I developed this quick approach, I intentionally left this
"hole", as it happened before I started playing with a short search to
verify book moves (in tournament mode.) This solves the problem that was
the subject of a lot of discussion a while back (1. e4 e5 2. nf3 a6
3. Bb5 Nc6??). I was skeptical of allowing these.

I did fix this early by simply picking up transpositions directly in the
create program (for this type of case) and then adding the transpositions
to a "cluster" even though it didn't really belong. I later took this out
as it caused a couple of bad moves that humans found and started playing
around with on ICC. :) I should probably put it back since Crafty can
now use a search to discover that Nc6 is a blunder above and axb6 is
better...


:
: 2) Which way do you get the flags for the book moves? Is this based on


: statistics you do during bookbuilding? It would be nice to read comments
: and base the flags on those.

I only put flags in the small "books.bin" file. This uses a very tiny
pgn file which simply makes crafty play certain moves (!) or avoid
certain moves (?) and which can be edited and re-created in < 1 second
should it need to be changed...

:
: 3) You said the book isn't modifiable since it is easy to rebuild. What


: happens if you discover that Crafty dislikes a certain opening move which
: has a flag saying 'playable'? Can you easily 'patch' the book then?

I modify the start.pgn file, put a "?" after the offending move, and rebuild
this books.bin file. When crafty reads from book.bin, it also reads from
books.bin and "merges" the flags so that books.bin is used primarily to
make it play or avoid particular moves.
:
: Martin
:

Robert Hyatt

unread,
Oct 9, 1996, 3:00:00 AM10/9/96
to

Vincent Diepeveen (vdie...@cs.ruu.nl) wrote:
: In <ravenek-0710...@news.vu.nl> rav...@chem.vu.nl (Walter Ravenek) writes:
:
: >(Robert Hyatt) wrote:
: >
: >[snip]

: >
: >> : My question with this approach is: are you not bothered with the
: >> : transpositions you miss? Of course there are problems with not

: >> : storing the "history" of the moves: the book may cause the program
: >> : to play an inferior move because that way it transposes to a known
: >> : position.
: >> : ( A nice example: suppose you have french in your book. Your opponent
: >> : has white and plays 1. d4 d5 2. e4. Now you the program will play
: >> : 2 ... e6 instead of taking the pawn. )
: >> :
: >> : Walter Ravenek
: >>
: >> I don't miss *any* at all, because I use the hash signature to look up
: >> positions, and the database is just like a large hash table.
: >
: >Then I misunderstand. I thought you used 16 bits of the hash key from

: >the current position to force clustering of moves from a given position,
: >so that you need only one disk access. How do you find the move if the
: >position you search from is different than the one from which the book
: >move was made?
: >
: >>> To solve your problem above, in tournament mode Crafty uses a short search
: >> of all non-book positions to look for this. It happens lots of times, and
: >> has been the subject of discussion for years. Example:
: >>
: >> 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 Ba4
: >>
: >> game goes:
: >>
: >> 1. e4 e5 2. Bb5 a6 4. Nf3 and black promptly plays Nc6 rather than
: >> ripping the bishop, because Nc6 goes back into book. The short search
: >> solves this completely because it would notice that there's a move that
: >> seems to win material at a fairly quick depth, so crafty ignores the
: >> transposition back into book and does a normal search. If the move wins
: >> the piece after a long search, it rips it...
: >
: >I also do that. My program gives book moves a bonus of 0.5 pawns. Then it
: >searches for a preset time. If the book move is still on top, it accepts
: >the book move. If another move is on top, it continues searching and after
: >each iteration it checks again whether the book move is on top. This way
: >I also save some time if the program "understands" (i.e., doesn't think it
: >fails terribly) the book move after a somewhat deeper search.
: >The problem I quoted happens if I switch this mechanism off.
:
: Diep rips off the bishop:
:
: I do not only store positions: i store positions with moves.
: This also decreases the number of diskaccess. I only need an (log n)
: diskacces, where storing only positions needs (n log n) diskaccesses.

Or, if you use crafty's scheme, it uses "1" disk access. :)

:
: So Diep does detect the main line, but not if you do it the wrong order.
: For this reason transpositions into the mainline must be putted in by
: hand in Diep, but it definitely rips off bishops.
:
: I have seen it just too often that humans transpose to the mainline in the
: wrong way. Even in my own games i have allowed opponents to win in this
: way.

I agree. If you transpose by finding *the* move to re-enter the book, you
can make ugly mistakes... which is why I elected to use the method I use
now...

:
: In the case that a tranposition into a mainline is not in the book,
: then Diep usually gets back in book a move further.
:
: Another advantage of this way of storing is that i have a better oversight;
:
: suppose:
: a 1. e4,e5 2. d4,Nf6
: b 1. e4,e5 2. Nf3,Nf6
:
: Now storing positions will probably force a program to play
: using game a: Nf3
: game b: d4
:
: Leading to the same position. Anyway it is not clear how big the chance
: is that it chooses a certain move.
:
: I want however that after game a my program has a
: 99% chance to play dxe5
: 1% Nc3
:
: after game b:
: i want Diep to have a 30% chance to play d4 and a 70% chance to play Nxe5
:
: A disadvantage of my system is that it needs few % more diskspace.
: If you store a position, you only need 8 bytes (63 bits hashing, 1 bit color),
: or 2 hashtables (1 for white and one for black): also 8 bytes.
:
: I need however: 8 bytes hashing, 4 bytes index (to the moves),
: 1 byte number of moves, 2 bytes per move
: (move number and % chance that it may be played).
:
: So when i store 2 moves for every position, then my system is cheaper,
: when i store on average little more than 1 move a position then my
: system takes more bytes. It appears that openingsbook is usually little
: more than 1 move a position leading to more diskspace needed, than the 8
: bytes a position approach. considering that todays harddisks are megabytes
: and an openingsbook of around 50,000 moves (and about 45,000 positions)
: eats about (45,000 x (8+4+1)) + (50,000 x 2) = 685kb this disadvantage
: is not a real disadvantage, it's a feature!
:
: Vincent Diepeveen
: vdie...@cs.ruu.nl
:
: >Walter Ravenek
: --
: +--------------------------------------+
: || email : vdie...@cs.ruu.nl ||
: || Vincent Diepeveen ||
: +======================================+

Vincent Diepeveen

unread,
Oct 9, 1996, 3:00:00 AM10/9/96
to

So Diep does detect the main line, but not if you do it the wrong order.


For this reason transpositions into the mainline must be putted in by
hand in Diep, but it definitely rips off bishops.

I have seen it just too often that humans transpose to the mainline in the
wrong way. Even in my own games i have allowed opponents to win in this
way.

In the case that a tranposition into a mainline is not in the book,

0 new messages