Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Collision probability

142 views
Skip to first unread message

Dennis Breuker

unread,
Apr 15, 1996, 3:00:00 AM4/15/96
to
Since there seems to be much talk about the number of bits
in a hash key needed to ensure that no errors occur during
a search, I will post something I wrote a little while ago.
It is a preliminary version of text which will appear in my
Ph.D. thesis.

People who do not like math should skip about everything, and
only look at the two formulas 2.1 and 2.2, and at the example
at the end.

Enjoy!

Dennis Breuker (bre...@cs.rulimburg.nl | http://www.cs.rulimburg.nl)

________________________________________________________________________


Errors from transposition tables
================================

Implementing a transposition table as a hash table introduces two types of
error, identified as early as 1970 by Zobrist. The first type of error
(called *type 1* by Zobrist (1970)) is the more important. Since the
number of available hash values is much less than the total number of
positions in chess, it must happen that two different positions yield the same
hash value. This is a serious error, because when a type-1 error occurs, the
information in this entry will be used in the wrong position and, if so,
will introduce search errors. Often it is possible to detect this error by
testing the move suggested by that transposition-table entry for legality in
the position, effectively lowering the error rate. If the move is illegal,
then the table entry must concern another position than the one being
investigated. Note that if the move *is* legal, the positions still may
differ. The probability of the occurrence of type-1 errors can be lowered
by increasing the number of bits in the hash value.

The second type of error (called *type 2*, or *clash* by Zobrist (1970)),
occurs when two different board positions map onto the same entry in the
transposition table, i.e., the same hash index. This is commonly known as
a *collision* (Knuth, 1973). The probability of the occurrence of collisions
can be lowered by increasing the number of bits in the hash index (i.e.,
the number of entries in the transposition table).


Probability of errors
---------------------

The probability of a type-1 error and the probability of a collision are both
calculated in the same way. The only difference is the number of
distinguishable positions (for a type-1 error this is the number of
possible hash values (i.e., 2^k, where k is the number of bits of the hash
value) and for a collision this is the number of possible hash indices,
i.e., table entries (i.e., 2^n, where n is the number of bits of the hash
index).

Let N be the number of distinguishable positions, and M be the
number of different positions which have to be stored in the
transposition table (this number is equal to the number of non-empty
entries in the transposition table after the search has been completed,
augmented with the number of collisions during the search).
The probability that all M positions will have different hash numbers
(i.e., the probability that no errors occur) is given by

N-0 N-1 N-(M-1) N!
P(no errors) = --- * --- * ... * ------- = ------------
N N N (N-M)! * N^M

When N is sufficiently large (which is the case with a transposition
table), this equation can be approximated by

M * (M-1)
P(no errors) = e^( - --------- )
2N

If M is sufficiently large as well, the formula yields

M^2
P(no errors) = e^( - --- ) (2.1)
2N

This result equals the formula given by Gillogly (1989) (The article
contains a typing error. The probability given here is right (Gillogly, 1994))
Note that the problem of calculating the probability that at least one error
occurs (being 1-P(no errors)), is analogous to the problem widely known as
the *birthday paradox* (Feller, 1950), where the probability of at least
two persons having the same birthday in a group of M persons (N being 365)
has to be calculated.

The expected number of errors can be calculated as well. Feldmann (1993)
derives the following formula for the expected number of errors (E):

N-1
E = M - N * ( 1 - (---)^M )
N

When N is sufficiently large (which is the case with a transposition
table), this formula can be approximated by

M
E = M - N * ( 1 - e^(- ---) ) (2.2)
N

As an example we consider a program which searches 5,000 nodes per second.
If it plays a game using a total of two hours thinking time, the number of
nodes searched is 36 million. Suppose that for about 30% of the nodes,
an attempt is made to store them in the transposition table. In this example,
this is some 11 million nodes. If the hash value consists of 32 bits,
the probability of at least one type-1 error is

11,000,000^2
1 - e^( - ------------ )
2 * 2^32

which is very close to 1. So a hash value of 32 bits clearly is too small.
If we want to reduce the probability of at least one error to less than
1 percent, Equation 2.1 says that at least 53 bits are required.
When using a 64-bit hash value, the probability is reduced to about
3 * 10^(-6). In this case, the number of expected type-1 errors is about 0.09.

References
==========
Feldmann R. (1993).
Game Tree Search on Massively Parallel Systems.
Ph.D. thesis, University of Paderborn.

Feller W. (1950).
An Introduction to Probability Theory.
Wiley, New York.

Gillogly J.J. (1989).
Transposition Table Collisions.
Workshop on New Directions on Game-Tree Search (pre-prints)
(ed. T.A. Marsland), p. 12. University of Alberta, Edmonton, Canada.

Gillogly J.J. (1994).
Personal communication.

Knuth D.E. (1973).
The Art of ComputerProgramming. Volume 3: Sorting and Searching.
Addison-Wesley Publishing Company, Reading, Massachussetts.

Zobrist A.L. (1970).
A New Hashing Method with Application for Game Playing.
Technical Report #88, Computer Science Department,
The University of Wisconsin, Madison.
Reprinted (1990) in ICCA Journal, Vol. 13, No. 2, pp. 69-73.

Kevin Barnes

unread,
Apr 16, 1996, 3:00:00 AM4/16/96
to
Dennis Breuker <bre...@cs.rulimburg.nl> wrote:

>Enjoy!

>________________________________________________________________________


>Probability of errors
>---------------------


Assuming the hashing function produces an even distribution of hash
values across all postions to be stored in the table, a probability
which seems very low :)


kba...@t1sys.cfer.com
(303) 633-3689


0 new messages