Hash tables----Clash!!!-What happens next?

MANOHARARAJAH VALAVAN

unread,

Mar 15, 1994, 5:45:29 PM3/15/94

to

Hello there,

According to my David Levy and Monty Newborn in "How computers play chess",
hash tables are used to prevent repetitions of seach....Thats fine but I am not clear
on one point they make. I am aware that when using a hash code of 32 bits,for example,
different positions may be assigned the same hash code. In this case they say that
"Programs have various strategies for deciding what to do when a clash occurs. Some
programs attempt to store the position at the next location. If that location is
found to be occupied also, the program can give up or it can replace the older of the
two entries just examined. Most programs try more than once to find an unoccupied
location; Cray Blitz tries eight times."

(maybe Robert Hyatt can comment on this)

I understand that even if a clash occurs, storing is a pretty easy task of finding
the next empty slot....But how about reading the position when the position crops up
again. What happens if I arrive at the position that had a clash before....How can
I read the scores and various other data stored. Aren't I going to read the wrong
position's scores and data??

Can somebody explain this to me?

Robert Hyatt

unread,

Mar 15, 1994, 11:03:03 PM3/15/94

to

In article <2m5nj9$b...@news.eecs.uic.edu> dha...@bert.eecs.uic.edu (David Hanley) writes:
>MANOHARARAJAH VALAVAN (man...@ecf.toronto.edu) wrote:
>: Hello there,
>
> Your question is not chess-specific, and really not appropriate for
>this group. I'm not trying to be unkind, but you won't get an answer here.
>You vould post in a programming group, but you would be better off
>getting a good text on basic data structures.
>
> P.S. I'd be surprised if they don't double hash.
>
> dave
>

Most don't "double hash" due to the extra time required. However, for
Cray Blitz, since we run on a vector machine that effectively makes loading
eight words just as fast as loading one word, we do use "double hashing" to
choose the best place to store a table entry.

The idea:

given a 64 bit (or whatever word size you want) take the low order "n"
bits where "n" is log2 (tablesize) and use this as the "first" entry (x)
to probe. take the next "m" bits (for Cray Blitz, m=9 and use this
as a "vector stride" (y) and then load eight words, table(x), table(x+y)
table(x+2y), ..., table(x+7y). One of these will be the one chosen to
be replaced by the current store, or else one of these will be the "match"
for the current lookup. The only minor difficulty to handle is that you
don't dimension the table for (n) entries [table(n)] but, rather, we have
to dimension the table larger [table(n+7*512)] to handle the positions
where we hash to near the end of the table, and the vector loads would then
step beyond the end... no, there isn't a "modulus" mechanism for vector
loads to make this load "wrap around" to the beginning.

Performance measurements seem to suggest that we have one of the best-performing
hashing algorithms around, based on the number of positions we typically find
in the table during a search (our average is around 30% in the middlegame
and much higher in endings) I am including (right below here) some hashing
statistics from one of our games from last year. the stat lines are formatted
with the following data: time: time for this move followed by the "real" chess
clock time (which started at 4:00). nn.np is the processor utilization on a 16
processor C90 (16.0 is perfect) Nodes is total nodes searched for this move,
the percentage after "h" is hash table positions successfully found. The
remaining two percentages are pawn hashing (which means we hardly ever have
to go through our complex pawn structure scoring code because we have already
seen this pawn structure before and scored it once) and king safety hashing
(same idea as pawn hashing.) the number at the end is nodes per second (on
a dedicated C90.)

time: 4:01 4:16 15.7p nodes:110752032 h 25% 99% 98% 458581 nps
time: 3:20 4:19 16.0p nodes: 95190816 h 23% 99% 98% 475906 nps
time: 3:20 4:23 16.0p nodes:101335537 h 26% 99% 99% 506627 nps
time: 3:20 4:26 16.0p nodes: 98571846 h 27% 99% 99% 492809 nps
time: 3:20 4:29 16.0p nodes: 99253399 h 28% 99% 99% 495895 nps
time: 3:20 4:33 16.0p nodes: 98689436 h 27% 99% 99% 493249 nps
time: 3:34 4:51 16.0p nodes:130879200 h 36% 99% 98% 610557 nps
time: 4:02 4:52 16.0p nodes:136365966 h 33% 99% 99% 563472 nps
time: 5:54 4:52 16.0p nodes:205142439 h 35% 99% 99% 578273 nps
time: 4:44 4:57 16.0p nodes:157608239 h 34% 99% 99% 554939 nps

As you can see, these are from very early (since the clock time is 4:16,
16 minutes have elapsed since the beginning of the game [off of Cray Blitz's
clock] in the game Cray Blitz vs Startech (HiTech on a CM-5 machine). For the
endgame, the following values are produced: (these are run using a 8 processor
Cray YMP)

time: 0:06 7:58 7.9p nodes: 1746702 h 58%100% 99% 279472 nps
time: 0:27 7:58 8.0p nodes: 9703616 h 62% 99% 99% 349932 nps
time: 0:50 7:58 8.0p nodes: 14689498 h 53%100% 99% 290593 nps
time: 0:36 7:59 8.0p nodes: 10495706 h 63%100% 99% 289536 nps
time: 0:24 7:59 8.0p nodes: 7699969 h 61%100% 99% 309235 nps
time: 1:06 8:00 8.0p nodes: 19833741 h 54%100% 99% 299512 nps
time: 0:57 8:01 8.0p nodes: 17203658 h 63%100% 99% 297795 nps
time: 0:07 8:01 8.0p nodes: 1911929 h 63%100%100% 254245 nps

Hope this helps some.

--
!Robert Hyatt Computer and Information Sciences !
!hy...@cis.uab.edu University of Alabama at Birmingham !

Robert Hyatt

unread,

Mar 15, 1994, 11:06:12 PM3/15/94

to

I previously answered the first part of your question. For the last part,
remember that a table entry contains a "key", score, score type, best move,
etc. This key must EXACTLY match the hash key before we consider this a
"hit". This means that the 64 bit hash key must match completely, even though
we only used the low order "n" bits of this key to probe into the table.
In fact, since we used the low order "n" bits to prob into the table, we don't
even bother storing them since we "know" what they were already (which saves
table space.)

Bob

Don French

unread,

Mar 16, 1994, 3:53:47 AM3/16/94

to

I would be curious to know how the position is converted to a hash code. Does
the low-order 32 bit value contain the most significant positional information?
If the algorathim cannot be easily defined, is it available via ftp somewhere?

Jonathan Schaeffer

unread,

Mar 17, 1994, 7:36:09 AM3/17/94

to

col...@qucis.queensu.ca (Paul Colley) writes:

>If Rob Hyatt's test had happened to include such values, he might have
>observed a radically different result. What is the probability of
>choosing a "bad" set of Zobrist values for Chess? What criterion
>gurantee that the values you use are "good" for Chess? As far as my
>limited knowledge goes, these are open questions.

>In conclusion, the very good result observed in a 32-bit version of
>Cray Blitz comes partially from the type of hashing done (Zobrist with
>values tested for uniqueness), and partly from sheer good luck in
>choosing the initial values. I wouldn't place too much faith in the
>result holding for a different chess program, or even for Cray Blitz
>with a different initial random seed.

Good point. I can speak from experience here. In the early versions of
my chess program Phoenix, I generated my Zobrist hash numbers using my
student id number as a seed, naively thinking the radom numbers
generated by this seed would be good enough. A few years later I put
code in to detect when my 32-bit hash key matched the wrong position.
To my surprise, there were *lots* of errors. I changed my seed to
another number and the error rate dropped dramatically. With this
better seed, it became very, very rare to see a hash error.

All randomly generated numbers are not the same!

Marcel van Kervinck

unread,

Mar 17, 1994, 8:42:26 AM3/17/94

to

Robert Hyatt (hy...@cis.uab.edu) wrote:
> Your math is lacking when you apply it to a "good" hashing function.
> I have run "sparc blitz" using a 32bit hash key, and, for grins and
> giggles, also stored the exact board position in another large array.
> Testing on my sparc produced absolutely no "collisions" for any
> reasonable search, well beyond your 10^5.

I really don't understand this. My program Rookie until recently used
32bit hash keys. 16 bits were used as an index, the other 16 were stored
in the table. The keys are generated in exactly the same way as yours,
i.e. by XORing some random value for each piece-square combination,
and zeros for empty squares. Some special tests are run to ensure that
every legal move actually changes the hashkey in a unique way.

Whenever a position is _not_ found in the tables, I compute the 'closeness'
of the current hash key and the key stored in the table. 'closeness' is
defined as the number of high order bits that match, or in other words:
the length of the largest common prefix. E.g. The closeness of 100110 and
100011 is 3. During the search I keep track of the largest closeness value
encountered. This maximum is a measurement for the frequency of near-clashes.
In Rookie using 32bits hashes this value can range from 16 to 31.
A 31 means that two positions were searched that have keys that differ only
in the last bit.

I frequently see hashkeys in a <10^5 searches with closeness 31, indicating a
near-clash. For me this indicates that it is very likely that real hash
clashes also occured (but could not be detected, since positions with
closeness 32 are considered to be the same). This seems to contradict your
observation.

Marcel
-- _ _
_| |_|_|
|_ |_ Marcel van Kervinck
|_| mar...@stack.urc.tue.nl

Robert Hyatt

unread,

Mar 17, 1994, 11:06:30 AM3/17/94

to

O.K. lets go back to the "origin" of this technique. "how" did you generate
your random numbers? IE, did you use the usual "random()" program? (such
things as drand48() or whatever?" If so, what did you do about the exponent
field? If you left it on, you were not generating 32bit random values at
all, but, rather, something significantly less (no more than 24 and maybe less
than that.)

First, let's see how "random" your random numbers are, to make sure we are
comparing apples to apples here (see my previous post to see what I do in
Cray Blitz to strip the exponent off.)

Robert Hyatt

unread,

Mar 17, 1994, 11:11:18 AM3/17/94

to

In article writes:
>nl!newshost.cca.vu.nl!iodine.chem.vu.nl!wiesen
>From: wie...@iodine.chem.vu.nl (Gijsb. Wiesenekker)
>Subject: Re: Hash tables----Clash!!!-What happens next?
>Message-ID: <1994Mar17....@cca.vu.nl>
>Sender: ne...@cca.vu.nl
>Reply-To: wiese...@sara.nl
>Organization: VU Amsterdam - dienst CCA
>X-Newsreader: TIN [version 1.2 PL2]
>References: <CMq9v...@ecf.toronto.edu> <1994Mar16.0...@cca.vu.nl> <1994Mar16....@cis.uab.edu> <1994Mar16.1...@cca.vu.nl>
>Date: Thu, 17 Mar 1994 07:03:31 GMT
>Lines: 74
>
>Gijsb. Wiesenekker (wie...@iodine.chem.vu.nl) wrote:
>: : >If you use 32 bits for the code, the chance that a hash fault occurs is already
>: : >0.70 if you search 10^5 positions. As you usually search about 10^5-10^6 positions,
>: : >you should not use 32 bits for the hash key.
>: : >If you use 64 bits, the chances are 0.00000003 if you search 10^6 positions and
>: : >0.0003 if you search 10^8 positions.
>: : >
>: : >In ZZZZZZ 64 bits are used, so the chance that it makes a wrong move because
>: : >of a hash fault is much smaller than the chance that it makes
>: : >a wrong move because of it's poor evaluation function..
>: : >
>: : >Gijsbert Wiesenekker
>
>
>: : Your math is lacking when you apply it to a "good" hashing function.

>: : I have run "sparc blitz" using a 32bit hash key, and, for grins and
>: : giggles, also stored the exact board position in another large array.
>: : Testing on my sparc produced absolutely no "collisions" for any

>: : reasonable search, well beyond your 10^5. You are assuming that two
>: : positions in the table at the same time are randomly distributed
>: : by key, but this is not anywhere near what happens. The hashing is
>: : much more sensitive to "depth" than to "breadth".. IE, by xor'ing
>: : lots of things together you run a greater risk of collisions than you
>: : do when doing "shallow searches".
>
>: : As I've said, you can easily test this by simply storing the whole
>: : board (for debugging purposes.) You might be surprised if you are
>: : using the old "Zobrist" hashing algorithm, that it is doing
>: : significantly better than you think.
>
>
>: : --

>: : !Robert Hyatt Computer and Information Sciences !
>: : !hy...@cis.uab.edu University of Alabama at Birmingham !
>

>: You are quite right, that is one of my assumptions, so my numbers are worst
>: case numbers.
>
>: I ran a test once on a database consisting of 20.000 grandmaster draughts
>: games (the 10x10 version). I used 64-bit Zobrist hashing, and just checked
>: if the 64 bits were different, but the first 32-bits were equal
>: (which would have been the bits if I had used 32-bit hashing).
>: 20 out of the about 2.000.000 unique positions had the same 32-bit hash key,
>: but these were well seperated by depth, so would not cause any trouble
>: in a normal search.
>
>: Gijsbert Wiesenekker
>
>I just ran the same test on all the Tal.pgn games from uoknor. As
>expected, out of the 114149 unique positions, there were two with the same
>32-bit Zobrist hash key. Here they are: (upper case black, lower case white).
>
>......K.
>.....P.P
>...P..P.
>.b.p....
>.N..p...
>......p.
>R....p.p
>...r..k.
>
>and the other one:
>
>R..RQK..
>P....P.P
>.PP..NP.
>Np..P...
>q...p...
>..p..n..
>p...bppp
>r..r..k.
>
>Gijsbert Wiesenekker
>

Looks like this supports my claim that the more distance between the positions,
the more likely they are to hash to the same key. These positions are wildly
different and it would take many moves to make them match (yes, I know, you
really can't since you can't "make" pawns appear where there aren't any.
Hopefully, you see my point.)

Tony Warnock

unread,

Mar 17, 1994, 12:55:41 PM3/17/94

to

The following paper descibes a method of generating the numbers
for a hash table. By using error correcting codes, we ensure that
positions that are close on the board are not close in the hash
space. Some experiments showed that we got an improvement in
collision rate compared to using a random set of numbers.

MacWilliams and Sloane's book on error correcting codes has
the gory details about the theory and programming.

Tony Warnock & Burton Wendroff:
"Search Tables in Computer Chess"
ICCA Journal (March 1988), pp. 10-13.

Tony Warnock

When a man's house is on fire it's time to break off chess.
- Thomas Fuller (1608-1661)

Marcel van Kervinck

unread,

Mar 18, 1994, 9:53:58 AM3/18/94

to

Robert Hyatt (hy...@cis.uab.edu) wrote:
> Your math is lacking when you apply it to a "good" hashing function.
> I have run "sparc blitz" using a 32bit hash key, and, for grins and
> giggles, also stored the exact board position in another large array.
> Testing on my sparc produced absolutely no "collisions" for any
> reasonable search, well beyond your 10^5.

mar...@stack.urc.tue.nl (Marcel van Kervinck) writes:
> I frequently see hashkeys in a <10^5 searches with closeness 31, indicating a
> near-clash. For me this indicates that it is very likely that real hash
> clashes also occured (but could not be detected, since positions with
> closeness 32 are considered to be the same). This seems to contradict your
> observation.

Robert Hyatt (hy...@cis.uab.edu) wrote:
> First, let's see how "random" your random numbers are, to make sure we are
> comparing apples to apples here (see my previous post to see what I do in
> Cray Blitz to strip the exponent off.)

I implemented my own random number generator, guided by Knuth's book.
It seems to work perfectly. I don't depend on routines by others.
So my numbers are truly '32bit' random.

I realized that some confusion might arise: I am refering to <10^5
searches that start with an already filled table. (Like in a game)
Starting with an empty table one needs almost 10^5 nodes to fill
most empty slots. Whenever I do start with an empty table, I usually
find the 1st 31bit match within 200,000 nodes, and have several (about
6? I didn't do this often enough to get an accurate estimate) pairs of
31bit matches before reaching 10^6 nodes.

Robert Hyatt

unread,

Mar 18, 1994, 10:53:48 PM3/18/94

to

Maybe I misunderstand what it is you are testing. Are you generating
positions to fill the table and then searching as the game proceeds
and logging the "hits"?? Or are you simply playing a game, and watching
for hits during normal play?

Albert Gower

unread,

Mar 19, 1994, 10:16:53 AM3/19/94

to

Robert Hyatt (hy...@cis.uab.edu) wrote:
: In article <CMrB8...@hawnews.watson.ibm.com> f...@watson.ibm.com (Feng-Hsiung Hsu) writes:
: >In article <1994Mar16....@cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:
: >>As you can see, these are from very early (since the clock time is 4:16,

: >>16 minutes have elapsed since the beginning of the game [off of Cray Blitz's
: >>clock] in the game Cray Blitz vs Startech (HiTech on a CM-5 machine).

: > ^^^^^^^^
: >
: >I did not realize Startech had actually played games. When did this happen?
: >And what was the result? If you are allowed to say, that is. The rumor I
: >heard was that Thinking Machine people had given up on porting Hitech, and
: >were talking with Larry Kaufman, who used to be associated with Greenblat's
: >program.

: Startech played at the Indianapolis tournament and ran on (I believe) a
: 512 node CM-5. Caused quite a bit of controversy as HiTech was also
: present at the tournament.

: I somehow don't have my tournament booklet from the event, and thus can't
: tell you where they finished. I believe that the final standing was that
: Socrates Won, we finished 2nd, and I don't remember beyond that. Bert,
: got your ears on? (or anyone else.)

: In any case, measuring their performance was difficult. They searched
: from 40K nodes per second up to 300K. Unfortunately, it seemed that when
: they were in trouble, they searched "at their worst" and slowed down
: quite a bit, which, of course, is exactly the opposite of what you want
: to see happen.

: Brad xxx (don't remember his last name) seemed to think that StarTech
: was better than HiTech, but I do seem to recall that they drew in this
: event. They were supposedly the same code, same book, same everything,
: except "smashed onto the CM-5 architecture." Actually played well,
: but I "believe" they finished at 2.5 out of 5.0 points (both them and
: HiTech, but don't quote me without a reliable source to back this up.)

: Bob

: --

: !Robert Hyatt Computer and Information Sciences !
: !hy...@cis.uab.edu University of Alabama at Birmingham !

The tournament program lists Bradley Kuszmaul, Charles Leiserson, and
Ryan Rifkin c/o BK,MIT Laboratory for Computer Science as the authors
of Startech. It was running on a Connection Machine CM-5, 128 processors.
I do not have a complete cross-table but what I do have shows them with
3 points, tied with Hitech and Zarkov.

Bert

Albert Gower
University of Southern Mississippi