Why Random Number Needed In HashFunction[piece][position]

Cheok Yan Cheng

unread,

Jun 12, 2001, 6:22:42 AM6/12/01

to

I am new in hashing table function that used in transpositioning. I
try to learn it by reading the source code if crafty. Since I cannot
find an good article that describe on it, there is a question that I
would like to face out :

1. Why we need to use the Random Number in
HashFunction[piece][position]. Cann't we just construct by using the
number 1,2,3,4 ... Erghh... Since there will be a chance that same
random number will be construct and it seem not so good .

thanks

regards
yccheok

Russell Reagan

unread,

Jun 12, 2001, 6:49:59 AM6/12/01

to

> I am new in hashing table function that used in transpositioning. I
> try to learn it by reading the source code if crafty. Since I cannot
> find an good article that describe on it, there is a question that I
> would like to face out :

I have never understood how the transpositionable table is implemented. I
understand the reasoning and "how" it works, but I don't know if I could
implement one. Well, I _could_ implement one, but I doubt it would work
efficiently. If someone wouldn't mind taking the time to explain this I
would apprecate it also. Here is the only explaination that I've ever read
on it, and I have no idea how to go about implementing something like this.

"Generate 12x64 N-bit random numbers (where the transposition table has 2^N
entries) and store them in a table. Each random number is associated with a
given piece on a given square (i.e., black rook on H4, etc.) An empty
square is represented by a null word.
Start with a null hash key.
Scan the board; when you encounter a piece, XOR its random number to the
current hash key. Repeat until the entire board has been examined."

I guess I understand it a little better now that I've re-read it a few
times. But what is the best way to do this? An array would seem a little
overkill, to allocate an array of a million bytes, and not use them all, or
allocate an array that is too small and then run out of room. It seems like
you would need some kind of dynamic data structure like a binary search tree
or something. If you could implement it as a binary search tree, you could
also avoid positions that might generate repeat hash keys, because you can
get rid of hash keys altogether. Access would be a little slower than a
pre-allocated array, but a binary search tree seems like it would be much
more stable. Maybe I'm underestimating the performance advantages because I
don't really understand the correct way to implement this data structure in
a chess program yet.

Robert Hyatt

unread,

Jun 12, 2001, 8:32:18 AM6/12/01

to

consecutive integers won't do well. what happens when you exclusive-or
2 and 3? you get 1. That isn't a very big change in the hash signature
and will certainly cause hash collisions later. You want numbers with a
reasonable hamming distance between them. Random is certainly better
than consecutive ints.

> thanks

> regards
> yccheok

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

Cheok Yan Cheng

unread,

Jun 13, 2001, 3:00:22 AM6/13/01

to

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in message news:<9g524i$ha$1...@juniper.cis.uab.edu>...

> Cheok Yan Cheng <ycc...@yahoo.com> wrote:
> > I am new in hashing table function that used in transpositioning. I
> > try to learn it by reading the source code if crafty. Since I cannot
> > find an good article that describe on it, there is a question that I
> > would like to face out :
>
> > 1. Why we need to use the Random Number in
> > HashFunction[piece][position]. Cann't we just construct by using the
> > number 1,2,3,4 ... Erghh... Since there will be a chance that same
> > random number will be construct and it seem not so good .
>
> consecutive integers won't do well. what happens when you exclusive-or
> 2 and 3? you get 1. That isn't a very big change in the hash signature
> and will certainly cause hash collisions later. You want numbers with a
> reasonable hamming distance between them. Random is certainly better
> than consecutive ints.
>
>
> > thanks
>
> > regards
> > yccheok

Hyatt :

Thanks. May I have some questions on detecting the collision?

It is quite easy to detect the collision if I am using bit board configfuration.
However, what am I doing now is chinese chess (size 10x9)which is insufficient
to use bit board. (I use array instead)

For array, am I going to use a looping to compare the board position with
hashtable's or .... is there any better method ?

Also, back to your message archive in 1996, I saw a message that discuss on
tranposition table :
" If you only get to 3 plies, it's difficult to transpose."

what is the meaning of "piles"? and
what is the meaning of "hamming distance" ?

thanks.

regards
yccheok

Russell Reagan

unread,

Jun 13, 2001, 4:51:24 AM6/13/01

to

> what is the meaning of "piles"? and

Well you wrote "piles", but I will assume you meant "plies" since that is
what the original quote from Dr. Hyatt said. A ply is basically a
"half-move". For example, if I make one move, and you make one move, that
is 2 plies. It's just a word used when talking about computer chess
programs to avoid confusion. For example, if I told you that my chess
program can look 10 moves ahead of the current position on the board, does
that mean 10 moves for me and 10 moves for my opponent? Or does it mean 10
moves total (5 moves for me and 5 for my opponent)? If I tell you that my
program can look 10 ply ahead, that means that my program can look 10
half-moves ahead (5 moves for me, 5 moves for my opponent).

Robert Hyatt

unread,

Jun 13, 2001, 9:37:27 AM6/13/01

to

Cheok Yan Cheng <ycc...@yahoo.com> wrote:
> Hyatt :

> Thanks. May I have some questions on detecting the collision?

> It is quite easy to detect the collision if I am using bit board configfuration.
> However, what am I doing now is chinese chess (size 10x9)which is insufficient
> to use bit board. (I use array instead)

> For array, am I going to use a looping to compare the board position with
> hashtable's or .... is there any better method ?

You should "incrementally update" the hash signature. That is the
purpose of the Zobrist hash algorithm we all use. Means you don't have to
loop over the entire board to computer the hash signature, you simply update
the signature as you make or unmake a move.

> Also, back to your message archive in 1996, I saw a message that discuss on
> tranposition table :
> " If you only get to 3 plies, it's difficult to transpose."

> what is the meaning of "piles"? and
> what is the meaning of "hamming distance" ?

Plies is search depth in half-moves. IE e4 e5 Nf3 is a search of 3 plies.

hamming distance is the number of bits that are different in two distinct
binary values. IE for 000111 and 000001 the hamming distance is 2, as there
are two bits that are different in the two values.

> thanks.

Bruce Moreland

unread,

Jun 13, 2001, 12:26:53 PM6/13/01

to

Russell Reagan <rre...@mail.utexas.edu> wrote in message
news:9g4s18$h8g$1...@oprah.cc.utexas.edu...

Every piece of every color on every square has an associated hash key. A
white king on a1 has a particular key. A white king on a2 has a particular
key. Etc. The reason you use random numbers is that you want the keys to
be different. You don't want patterns. You want entropy. You want similar
positions to have wildly different hash keys, so that you avoid collisions.

They array of keys is something like this:

KEY key_array[max_pieces][max_colors][max_squares].

You can create the key for a position by XOR'ing the keys for the pieces on
their squares together into one position key. You end up with some
gibberish number that fairly uniquely describes the position.

And the cool thing about XOR is that once you have a piece XOR'd into the
key, you can take it out the same way you put it in -- by XORing its key
into the position key. This takes advantage of the fact that:

A ^ B ^ B = A.

This is because:

B ^ B = 0

and

A ^ 0 = A

You can also update a piece's position by XOR'ing out the key for the piece
on its current square, and XORing in the key for the piece on its new
square.

bruce

Russell Reagan

unread,

Jun 13, 2001, 7:23:35 PM6/13/01

to

Thanks Bruce! Helpful as always.

So what kind of data structure holds all of these hash keys? And what value
are we trying to hold on to about the hash keys? For example, are we
wanting to know the score of the position in our table? Or are we wanting
to know the best move that was returned by a previous search? Or are we
just wanting to know if the position has occurred before?

Is an generic array typically the way that this is implemented? If so, how
large should the array be to ensure that you don't use it all up? Or should
you use something dynamic like a vector, which is basically an array with
the ability to change it's size, or lists, or binary search trees, etc.?

Thanks again,

Russell

Cheok Yan Cheng

unread,

Jun 13, 2001, 10:56:31 PM6/13/01

to

> > It is quite easy to detect the collision if I am using bit board configfuration.
> > However, what am I doing now is chinese chess (size 10x9)which is insufficient
> > to use bit board. (I use array instead)
>
> > For array, am I going to use a looping to compare the board position with
> > hashtable's or .... is there any better method ?
>
> You should "incrementally update" the hash signature. That is the
> purpose of the Zobrist hash algorithm we all use. Means you don't have to
> loop over the entire board to computer the hash signature, you simply update
> the signature as you make or unmake a move.
>

Hayatt :
I think you had misunderstood my questions : My question is ...

For example :
HashFunction[W_ROOK][A5] equal 1001b
HashFunction[W_KNIGHT][C4] equal l101b

HashFunction[W_PAWN][G5] equal 0100b

HashFunction[W_KING][D4] equal 1111b

For a board situation where (ROOK, KINGHT, KING), hash key is 1011
For a board situation where (PAWN, KING) , hash key is 1011

Then, this may make a different chess position have a same hash key
value. How can I re-solve this collision problem if I am having array
data structure ?

Robert Hyatt

unread,

Jun 13, 2001, 11:42:00 PM6/13/01

to

Cheok Yan Cheng <ycc...@yahoo.com> wrote:

With 64 bit hash signatures, this is so unlikely as to be considered
"impossible". You get so few collisions you can safely ignore them,
but _only_ if you use a 64 bit signature.

Russell Reagan

unread,

Jun 14, 2001, 12:56:46 AM6/14/01

to

> With 64 bit hash signatures, this is so unlikely as to be considered
> "impossible". You get so few collisions you can safely ignore them,
> but _only_ if you use a 64 bit signature.

By 64-bit signature, do you mean use a 64-bit number as your hash key? Like
a 64-bit integer for example?

Russell Reagan

unread,

Jun 14, 2001, 12:56:46 AM6/14/01

to

> With 64 bit hash signatures, this is so unlikely as to be considered
> "impossible". You get so few collisions you can safely ignore them,
> but _only_ if you use a 64 bit signature.

By 64-bit signature, do you mean use a 64-bit number as your hash key? Like

David Rasmussen

unread,

Jun 14, 2001, 2:23:30 AM6/14/01

to

"Russell Reagan" <rre...@mail.utexas.edu> wrote in message

news:9g8si9$n2u$1...@oprah.cc.utexas.edu...

> Thanks Bruce! Helpful as always.
>
> So what kind of data structure holds all of these hash keys? And what
value

Typically just an array of hashentries.

> are we trying to hold on to about the hash keys? For example, are we
> wanting to know the score of the position in our table? Or are we wanting
> to know the best move that was returned by a previous search? Or are we
> just wanting to know if the position has occurred before?
>

Hashentries will typically hold the hashkeys, the best move in this
position, and a value, that is either a upper or lower bound on the score,
or an exact score.

> Is an generic array typically the way that this is implemented? If so,
how
> large should the array be to ensure that you don't use it all up? Or
should

The size of the array is typically defined by the user and the array is
allocated dynamically, when changed. All of the array is used, if the search
uses enough nodes to fill it up. Typically this is just a one or two level
hashtable, so a replacement scheme is used. For example, of two searches,
the deeper one is stored/kept, the newer one is stored/kept etc.

> you use something dynamic like a vector, which is basically an array with
> the ability to change it's size, or lists, or binary search trees, etc.?
>

An ordinary array will do. That's exactly the idea of a hash table in
general. A hash table is an associative container, not a sequential one.
This means that it isn't traversed as such, contrary to, say, a list or a
tree. Lookups are O(1), because you calculate the index into the hash table
with you hashing function (in this case, the zobrist method), in constant
time, and then you get an index, which you can access directly.

In a general hashtable (almost never used in chess programs), there will be
a sequential container (such as a list) in each hash slot, to avoid
collisions. So if two elements hash to the same index, they will be
sequentially in a list there. If the hashing function is good, elements will
be spaced equally over the entire table, making the lists of roughly equal
length, length = number of elements in table / hash slots. When the number
of elements in the table grows above some number, the hashtable is typically
expanded to, say, twice it's size, and elements are rehashed, so that list
lengths are divided by 2 on average. This ensures constant access times.

David Rasmussen

unread,

Jun 14, 2001, 2:24:10 AM6/14/01

to

"Russell Reagan" <rre...@mail.utexas.edu> wrote in message

news:9g9g2r$o5o$1...@oprah.cc.utexas.edu...

exactly.

David Rasmussen

unread,

Jun 14, 2001, 2:24:56 AM6/14/01

to

"Robert Hyatt" <hy...@crafty.cis.uab.edu> wrote in message

news:9g9bq8$dc4$1...@juniper.cis.uab.edu...

> With 64 bit hash signatures, this is so unlikely as to be considered
> "impossible". You get so few collisions you can safely ignore them,
> but _only_ if you use a 64 bit signature.
>

That is true for chess, but is it true for chinese chess?

Tord Kallqvist Romstad

unread,

Jun 14, 2001, 3:29:04 AM6/14/01

to

"David Rasmussen" <ho...@oek.dk> writes:

I am fairly certain it is --- the game is not _that_ different from
western chess. By the way, I have always used 32 bit hash keys (for
western chess) and have not experienced any sort of problem yet ...

--
Tord Romstad

Tony Werten

unread,

Jun 14, 2001, 3:54:30 AM6/14/01

to

Cheok Yan Cheng heeft geschreven in bericht
<4d2111c5.01061...@posting.google.com>...

No it isn't. Just make a new datatype that consist of an int64 and a int32.
(maybe you have some use for the remaining 6 bits.)

Tony

David Rasmussen

unread,

Jun 14, 2001, 4:09:36 AM6/14/01

to

"Tord Kallqvist Romstad" <rom...@janus.uio.no> wrote in message
news:gqkg0d3...@janus.uio.no...

What?! I find it hard to believe you don't have a bug then. Many people have
reported many collisions using 32-bit keys.

Tord Kallqvist Romstad

unread,

Jun 14, 2001, 5:02:40 AM6/14/01

to

"David Rasmussen" <ho...@oek.dk> writes:

> "Tord Kallqvist Romstad" <rom...@janus.uio.no> wrote in message
> news:gqkg0d3...@janus.uio.no...

> > By the way, I have always used 32 bit hash keys (for western
> > chess) and have not experienced any sort of problem yet ...
>
> What?! I find it hard to believe you don't have a bug then.

I have plenty of bugs, but no serious bugs related to the
transposition table at the moment, it seems.

> Many people have reported many collisions using 32-bit keys.

I actually think a lot of people are still using 32-bit keys. Just
comparing the key of the current position against a key in the
transposition table is probably too risky, you will have many
collisions. However, if you check that the best move stored in the
transposition table is actually legal, things improve a lot. In
addition to this, I also use a trick suggested by Ed Schroder (who
also uses 32 bit keys, I think): I store the last move played (or
rather the moving piece (3 bits) and the destination square (6 bits))
in the transposition table. If the move stored in the hash table was
Be2, I then check that e2 is indeed occupied by a bishop of the right
colour.

My experience is that 32 bit keys with these extra tests are
completely safe.

--
Tord Romstad

Tony Werten

unread,

Jun 14, 2001, 4:59:03 AM6/14/01

to

Tord Kallqvist Romstad heeft geschreven in bericht ...

This is asking for trouble. Even my slow chessprogram reaches 1Mn/s in
endgame positions. 32 bits can't handle that.

Tony

>
>--
>Tord Romstad

Cheok Yan Cheng

unread,

Jun 14, 2001, 7:38:31 AM6/14/01

to

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in message news:<9g9bq8$dc4$1...@juniper.cis.uab.edu>...

Thanks. erghh.... But this sound like we are just taking the chance by
hoping that collision will not happen. If we are lucky, that's good.
But if we are not lucky ......

Hyatt, could you please give me a ROUGH PROOF that the chance for
collision is so so so small that we can ignore it.

Also, you mentioned that "you can safely ignore them". But how about
if I have a very good position in hash table. Its hash key is same as
a very bad position. Then, when the computer encounter the very bad
position, it conclude that the position as "very good position". It
will most porbably choose that variation of move.

In this case, can I "safely ignore them" ?

thanks a lot for the help.

regards
yccheok

Robert Hyatt

unread,

Jun 14, 2001, 8:30:48 AM6/14/01

to

Yes...

Dan Andersson

unread,

Jun 14, 2001, 8:26:33 AM6/14/01

to

> Also, you mentioned that "you can safely ignore them". But how about
> if I have a very good position in hash table. Its hash key is same as
> a very bad position. Then, when the computer encounter the very bad
> position, it conclude that the position as "very good position". It
> will most porbably choose that variation of move.

In the majority of the cases the value will never back propagate far enough
in the search tree to do any substantial damage. It might cause the quality
of move ordering of a branch to drop a bit, though. On a side note: It can
be hard to find errors in a tree search because of this behaviour, as
alpha-beta in effect 'hides' many of the errors we commit.

Robert Hyatt

unread,

Jun 14, 2001, 8:31:51 AM6/14/01

to

David Rasmussen <ho...@oek.dk> wrote:

definitely. The larger the branching factor, the better this approach
does. narrow trees run a higher chance of producing collisions than
bushy trees...

Robert Hyatt

unread,

Jun 14, 2001, 8:34:00 AM6/14/01

to

> --
> Tord Romstad

You aren't using a 32 bit key. You are using something quite a bit
longer since you are factoring in two moves (before and after). I
don't like the "trick" myself... straight 64 bits is easier/faster.

Robert Hyatt

unread,

Jun 14, 2001, 8:32:53 AM6/14/01

to

Tord Kallqvist Romstad <rom...@janus.uio.no> wrote:

> "David Rasmussen" <ho...@oek.dk> writes:

The problem is there. Several of us ran extensive tests a few years ago
in a discussion here. with a 32 bit key, at 100K nodes per second, you
have collisions _every_ second.

with 64 bit keys, the same tests produced maybe one collision per hour.

> --
> Tord Romstad

Robert Hyatt

unread,

Jun 14, 2001, 8:40:09 AM6/14/01

to

Cheok Yan Cheng <ycc...@yahoo.com> wrote:

> Thanks. erghh.... But this sound like we are just taking the chance by
> hoping that collision will not happen. If we are lucky, that's good.
> But if we are not lucky ......

Several of us ran tests a few years ago. 32 bits at 100K nodes per
second would produce a collision every couple of seconds or less. With
64 bits, we reduced that to a collision every hour or two. One collision
per hour is not a real problem. Because first you consider that probability
is pretty low, and second, when you do get a collision, what is the probability
that it happens on a node that will affect anything at all?

> Hyatt, could you please give me a ROUGH PROOF that the chance for
> collision is so so so small that we can ignore it.

The math is pretty easy. For 32 bits, you have only four billion unique
hash signatures. For any position P, there is a probability of 1/4000000000
of a collision. if you search on todays machines, you can certainly hit
one million nodes per second. Which means the probability of a collision is
1/4000000000 * 1000000 which is 1/4000. Not good odds.

For 64 bits... you have roughly 2e20 unique signatures. At 1M nodes per
second, the probability of a collision is only 1/2e14, which is incredibly
small.

> Also, you mentioned that "you can safely ignore them". But how about
> if I have a very good position in hash table. Its hash key is same as
> a very bad position. Then, when the computer encounter the very bad
> position, it conclude that the position as "very good position". It
> will most porbably choose that variation of move.

Two issues: 1. probability of a collision; 2. probability it happens at
a node that will influence the root move/score. 1. is very low. 2. makes
it impossibly small.

you can ignore it.

> In this case, can I "safely ignore them" ?

> thanks a lot for the help.

> regards
> yccheok

--

Tord Kallqvist Romstad

unread,

Jun 14, 2001, 9:41:38 AM6/14/01

to

Robert Hyatt <hy...@crafty.cis.uab.edu> writes:

> Tord Kallqvist Romstad <rom...@janus.uio.no> wrote:
> > I actually think a lot of people are still using 32-bit keys. Just
> > comparing the key of the current position against a key in the
> > transposition table is probably too risky, you will have many
> > collisions. However, if you check that the best move stored in the
> > transposition table is actually legal, things improve a lot. In
> > addition to this, I also use a trick suggested by Ed Schroder (who
> > also uses 32 bit keys, I think): I store the last move played (or
> > rather the moving piece (3 bits) and the destination square (6 bits))
> > in the transposition table. If the move stored in the hash table was
> > Be2, I then check that e2 is indeed occupied by a bishop of the right
> > colour.
>
> > My experience is that 32 bit keys with these extra tests are
> > completely safe.
>

> You aren't using a 32 bit key. You are using something quite a bit
> longer since you are factoring in two moves (before and after).

To a certain extent you are right, but everybody stores the "after"
move anyway. You could possibly say that I am using 41 bits, since
the "before" move is stored in 9 bits.

> I don't like the "trick" myself... straight 64 bits is
> easier/faster.

Possibly. I started programming in x86 assembly language, thus 32 bit
hash keys were a bit easier to code. Besides, RAM was expensive those
days, and it was an advantage that the transposition table entries
were small. I have never bothered to make a major change to the
transposition table code since then.

--
Tord Romstad

Bruce Moreland

unread,

Jun 14, 2001, 8:21:10 PM6/14/01

to

Others are responding to your post, but I'll say one thing, which is that
when a chess engine is doing its business, it is best if *nothing* is
dynamic. Everything should be allocated before the thing starts thinking.
If you are allocating and freeing things in an inner loop, you are going to
go too slow.

bruce

Russell Reagan <rre...@mail.utexas.edu> wrote in message

news:9g8si9$n2u$1...@oprah.cc.utexas.edu...

Cheok Yan Cheng

unread,

Jun 15, 2001, 10:34:17 PM6/15/01

to

> With
> 64 bits, we reduced that to a collision every hour or two. One collision
> per hour is not a real problem. Because first you consider that probability
> is pretty low, and second, when you do get a collision, what is the probability
> that it happens on a node that will affect anything at all?

What is the meaning means by a collision every hour or two.\?
1. Does it means the longer the time goes on, the more collision will
occur ?

2. For "time" pharase, it means computer's average thinking time per
move OR the total time that computer taken for the whole game.

Robert Hyatt

unread,

Jun 15, 2001, 11:12:11 PM6/15/01

to

Cheok Yan Cheng <ycc...@yahoo.com> wrote:

>> With
>> 64 bits, we reduced that to a collision every hour or two. One collision
>> per hour is not a real problem. Because first you consider that probability
>> is pretty low, and second, when you do get a collision, what is the probability
>> that it happens on a node that will affect anything at all?

> What is the meaning means by a collision every hour or two.\?
> 1. Does it means the longer the time goes on, the more collision will
> occur ?

It means that for any position searched, there is a probability of P that
you will get a collision. After searching _enough_ nodes, the probability
is N*P that you will get a collision. There must be a value N, such that
N*P is close to 1.0.

IE you _will_ get collisions. They are unavoidable. With a 32 bit
signature, they happen frequently. With 64 bits, they are infrequent
enough to safely ignore.

> 2. For "time" pharase, it means computer's average thinking time per
> move OR the total time that computer taken for the whole game.

Either one. The more you search, the greater the chance of running into
a collision. Whether for a single search, for for a span of several moves
during a single game.