In my chess program, I've always used 32 bits to hold the hashed board
representation. Although most programmers I've talked to seem to use
more, Belle, back in the 1980's used 32, and I copied them.
Clearly, the less bits you use to represent a board, the more likely it
is a "false match" will occur. Today, I think I saw this happen, but I
can't be sure, because I can't reproduce the fault (so it might be
another bug).
How many bits are others using for a hashed board representation? Am I
living dangerously using only 32?
Cheers!
--
Tom King
Cheers,
Tom
Tom King <t...@hatbulb.demon.co.uk> wrote:
: All,
CDB uses two hashing schemes: a primary hash code of 16 bits that
is used to index the hash table of the database, and a secondary
hash code of 8 bits that is used to avoid needless comparisons
of positions. Both hash codes are generated from CDB's 192-bit
compact position representation.
I use 32. So does Ed Schroder, I think. At least he did so until recently.
The following trick, mentioned by Ed a couple of years ago, reduces the
risk of false matches:
Store the last move (moving piece and destination square) in the hash record.
When the program tries to find the board position in the hash table, check
that the last move from the hash table position is consistent with the board
position. If the last move played in the hash position was Be2 and there is
no bishop at e2 in the board position, we know that we have a false match.
Tord
--
"The true Christian should be aware of mathematicians and those who make
empty prophecies. Chances are already there that mathematicians have
made a covenant with the Devil to confine man in the bounds of Hell"
- St. Augustin
: I use 32. So does Ed Schroder, I think. At least he did so until recently.
: The following trick, mentioned by Ed a couple of years ago, reduces the
: risk of false matches:
: Store the last move (moving piece and destination square) in the hash record.
: When the program tries to find the board position in the hash table, check
: that the last move from the hash table position is consistent with the board
: position. If the last move played in the hash position was Be2 and there is
: no bishop at e2 in the board position, we know that we have a false match.
: Tord
That works. But you really aren't storing 32 bits any longer, you are
also storing the move, which is probably going to take 16 more? In any
case, anything to enhance 32 is a win. But with today's speeds, 64 is not
a huge cost, particularly with incremental updating...
: --
: "The true Christian should be aware of mathematicians and those who make
: empty prophecies. Chances are already there that mathematicians have
: made a covenant with the Devil to confine man in the bounds of Hell"
: - St. Augustin
--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170
: In my chess program, I've always used 32 bits to hold the hashed board
: representation. Although most programmers I've talked to seem to use
: more, Belle, back in the 1980's used 32, and I copied them.
: Clearly, the less bits you use to represent a board, the more likely it
: is a "false match" will occur. Today, I think I saw this happen, but I
: can't be sure, because I can't reproduce the fault (so it might be
: another bug).
: How many bits are others using for a hashed board representation? Am I
: living dangerously using only 32?
yes. I don't recall the numbers now, but about 3 years ago I ran a test
comparing 32 and 64. The numbers for 32 were absolutely frightening. The
best way to test is to store your 32 bit hash, and then store the real board
at the same time. When you get a hash match, compare the real positions. I
suspect you will get a false match every few seconds. With 64 bits, I ran
dozens of games and averaged only one or two false matches per game. That
few likely won't affect the PV, but it is always possible...
There was some real data posted by myself and John Stanback back when
we were testing this...
: Cheers!
: --
: Tom King
In Zarkov I maintain 2 32 bit hash values for the board position. The
first is used to index the hash table. The second position is stored in
the hash table and is used to verify whether the position in the table
matches the current board position. So, if for example 20 bits of index
is used along with the 32 bit verification value the total number of
bits is 52. This seems to work ok. A while back I experimented and
found that using 20 bits of index and 12 bits of verification was very
bad with several collisions per second. Increasing from 12 to 20 bits
of verification reduced collisions to about 1 per million nodes as I
recall. So I guess that with 32 bits of verification collisions would
be on the order of 1 per 100 millions nodes which is probably ok.
John
MacChess uses a 64-bit hash.
> Am I living dangerously using only 32?
Well, at the cost of a lot of overhead, you could implement some check
function (say, run two 32-bit and a 64-bit hash in parallel, and see
if they ever disagree. You could have this experimental version run
autoplay while you are away from the computer for a while.
As processors get faster, and hash tables get larger, the frequency
of hash collisions goes up, yes? I imagine that people running
300,000 nps programs with 64Mb hash (like MacChess on a 300 MHz G3)
are at higher risk than, say, 40,000 nps and 256K hash (like HIARCS 6.0
Mac on a 300 MHz G3).
fow...@netcom.com (Richard A. Fowell)
> CDB uses two hashing schemes: a primary hash code of 16 bits that
> is used to index the hash table of the database, and a secondary
> hash code of 8 bits that is used to avoid needless comparisons
> of positions. Both hash codes are generated from CDB's 192-bit
> compact position representation.
I'm trying to learn more about how these databases function, could you
(or anybody really) perhaps elaborate more on exactly how you use
these hash codes. It isn't obvious to me, for example, how you would
use the 8 bit hash code to, as you say, "avoid needless comparisons".
Pointers to a web page that discusses this would also be very useful.
Thanks
--
Michael Tiller
Take a look at http://reality.sgi.com/pmk_craypark, where you
can access a document that I wrote in which I describe the
internal workings of CDB in great detail.
CDB maintains a hash table with 65536 entries, each of which is
a linked list of known positions with the same primary 16-bit
hash value. Each position carries a secondary 8-bit hash code,
too. When searching the database to see whether it contains
a given position, CDB follows these steps:
1) Compute the primary and secondary hash values for the position
2) Use the primary hash value to index the hash table and find
the right linked list of positions
3) Step through this list, looking for positions with a matching
secondary hash value
4) For those positions that have the right secondary hash value,
compare the position to the position for which we're searching
In situations in which CDB is searching the database for the
location of a position that is known a priori to be present,
the secondary hash codes allow an important optimization:
if there's only one position in the linked list with a matching
secondary hash value, that must be the one for which we're
searching, and no comparison of positions is required. This
optimization is important because it's expensive to extract
a position from the database for comparison. CDB stores an
actual chessboard arrangement in the database for a tiny
minority of positions, and has to reconstruct such an
arrangement from the moves for all of the others. It's
a classic space/time tradeoff that I've made in favor of
smaller databases, and since CDB is still pretty fast
I think it's worked out well.
An important aspect of hashing positions is to evenly distribute
them across the space of hash values. In CDB, you can use
the "File->Database Statistics" command to get the mean
number of positions per hash bucket, minimum and maximum
counts, and standard deviation. CDB's primary hash function
yields a pretty low standard deviation (I've spent a lot of
time tuning it), but the primary hash bucket with the
largest number of positions still has way too many entries
when compared to the mean.
Hope this helps.
-Peter
No. I don't really store the move, only the piece type (3 bits) and the
destination square (6 bits). That is, I need only 9 bits more.
>In any
>case, anything to enhance 32 is a win. But with today's speeds, 64 is not
>a huge cost, particularly with incremental updating...
64 bits is not a huge cost, but 32 seems to work OK for me and I am too lazy
to change the code.
Nice idea, maybe I'll try that. Either that, or bite the bullet and go
for 64 bits.
--
Tom King
i'll have a hunt on deja news. thanks!
>
>
>: Cheers!
>: --
>: Tom King
>
--
Tom King