Hashing function for board positions

J Scalo

unread,

May 12, 1997, 3:00:00 AM5/12/97

to

Well, the GK-DB match is over, so back to work...

So here's a question for the guru's -- How the heck do you come up with a
good hashing function for a board position? There's volumes written on
hashing functions but they're all directed at small elements like words,
numbers, etc. Every hashing function I've come up with for a board
position leads to very unevenly distributed keys, which of course is not
good.

I'm working with a straight ahead 64-byte board, with each square
represented by a signed byte -- negative for black, positive for white.
123456 = pbnrqk. 0 = empty. So the starting position would look like --

040100000000FFFC
030100000000FFFD
020100000000FFFE
050100000000FFFB
060100000000FFFA
020100000000FFFE
030100000000FFFD
040100000000FFFC

with carriage returns added for effect.

The dilemma is to come up with a hash function that distributes keys
evenly over tens of thousands of table entries, while being efficient
enough to handle hundreds of thousands of hits.

Any ideas?

Thanks.

-j
---
j.s...@ix.netcom.com

Ptolemy

unread,

May 12, 1997, 3:00:00 AM5/12/97

to

(I've never programmed chess, so this is just an idle thought.) You
could cut the size of the hash space in half by encoding all the
positions as positive, whether the pieces are black or white. Then
store the information from both perspectives (white and black) at the
termnal node and make a decision outside the hash about which one you
should use.

-Ptolemy the (much) Younger

p.s. If there's an egregious error here, I'll realize it 2 seconds after
I hit the send button.

Kirt Undercoffer

unread,

May 12, 1997, 3:00:00 AM5/12/97

to

Probably you'll have to change your representation in order to
hash efficiently or be able to map to another representation
which is more suitable for hashing. I'm not a hash
expert but have used hash functions on several projects. Generally
just hashing numbers will tend to create an uneven distribution
(which you report). However, I'm currently using a hash function
tuned to a nearly numeric key (it has a character
between two 3 digit numbers if I remember correctly) for
a work project which produces a very good distribution across
thousands of keys.

Kirt Undercoffer
kund...@osf1.gmu.edu

J Scalo (j.s...@ix.nospam.com) wrote:
: Well, the GK-DB match is over, so back to work...

Jim Gillogly

unread,

May 12, 1997, 3:00:00 AM5/12/97

to

Kirt Undercoffer wrote:
>
> : The dilemma is to come up with a hash function that distributes keys
> : evenly over tens of thousands of table entries, while being efficient
> : enough to handle hundreds of thousands of hits.

You could use Zobrist hashing like most other chess programs.
There's a quicky introduction in the "community" section of the
IBM GK vs DB site:

http://www.minds.com/cgi-bin/maslink.cgi/splash?chessmatch

After registering (free) and logging in, go to topic 30 (Computer
Engineering/Chess Engineering) and message 56 (of 63).

Zobrist hashing has the advantage of giving excellent position
distribution while being very inexpensive to compute, since it
can be done incrementally rather than doing something to every
piece on the board.

Jim Gillogly
j...@acm.org

Remi Coulom

unread,

May 13, 1997, 3:00:00 AM5/13/97

to

Kirt Undercoffer wrote:
[...]

> J Scalo (j.s...@ix.nospam.com) wrote:
> : Well, the GK-DB match is over, so back to work...
>
> : So here's a question for the guru's -- How the heck do you come up with a
> : good hashing function for a board position? There's volumes written on
> : hashing functions but they're all directed at small elements like words,
> : numbers, etc. Every hashing function I've come up with for a board
> : position leads to very unevenly distributed keys, which of course is not
> : good.
>
> : I'm working with a straight ahead 64-byte board, with each square
> : represented by a signed byte -- negative for black, positive for white.
> : 123456 = pbnrqk. 0 = empty. So the starting position would look like --
>
> : 040100000000FFFC
> : 030100000000FFFD
> : 020100000000FFFE
> : 050100000000FFFB
> : 060100000000FFFA
> : 020100000000FFFE
> : 030100000000FFFD
> : 040100000000FFFC
>
> : with carriage returns added for effect.
>

> : The dilemma is to come up with a hash function that distributes keys
> : evenly over tens of thousands of table entries, while being efficient
> : enough to handle hundreds of thousands of hits.
>

> : Any ideas?
>
> : Thanks.
>
> : -j
> : ---
> : j.s...@ix.netcom.com

Since Bob Hyatt decided to stop answering question in this group, I will
try to do his job.
The usual technique to compute hash codes consists in xoring random 64
bits values associated to the position of each piece on each square. This
sentence is not very clear, but you will understand better by taking a look
at the code used in the chess library I wrote :
(If you want to take a look at the missing include files, you can download
my chess library from :
http://www-ensimag.imag.fr/eleves/Remi.Coulom/tcb.htm
)

----------------------- Beginning of code -------------------------------
////////////////////////////////////////////////////////////////////////////
// Default constructor
// Initialization of codes
////////////////////////////////////////////////////////////////////////////
CHashCodeComputer::CHashCodeComputer(void)
{
CRandom rnd;

//
// Codes for pieces
//
{
for (int sq = 65; --sq >= 0;)
for (int piece = 25; --piece >= 0;)
thPiece[sq][piece] = RandomHash(rnd);
}

//
// Codes for en passant target square
//
{
for (int sq = 65; --sq >= 0;)
thEnPassant[sq] = RandomHash(rnd);
}

//
// Codes for other flags
//
hWhiteToMove = RandomHash(rnd);
hWhiteOO = RandomHash(rnd);
hWhiteOOO = RandomHash(rnd);
hBlackOO = RandomHash(rnd);
hBlackOOO = RandomHash(rnd);
}

////////////////////////////////////////////////////////////////////////////
// Compute hash code for a given position
////////////////////////////////////////////////////////////////////////////
void CHashCodeComputer::Compute(CHashCode &h, const CPosition &pos) const
{
h.Set1(0);
h.Set2(0);

//
// Codes for pieces
//
{
for (int sq = CSquare::h8 + 1; --sq >= 0;)
if (pos.GetPiece(sq) != CPiece::Empty)
h ^= thPiece[sq][pos.GetPiece(sq)];
}

//
// Codes for en passant target square
//
h ^= thEnPassant[pos.GetEnPassant()];

//
// Codes for other flags
//
if (pos.GetPlayer() == CPlayer::White) h ^= hWhiteToMove;
if (pos.GetWhiteOO()) h ^= hWhiteOO;
if (pos.GetWhiteOOO()) h ^= hWhiteOOO;
if (pos.GetBlackOO()) h ^= hBlackOO;
if (pos.GetBlackOOO()) h ^= hBlackOOO;
}

---------------------------- End of code ----------------------------------

One of the interesting advantage of this hashing scheme is that when you
have the hash code for a given position, and you make a move in this
position, you can compute the hash code for the resulting position with
just a few xors. For instance, when a pawn moves from e2 to e4, you xor the
code for a pawn on e4, the code for a pawn on e2, and the code for white to
move to get the new hash code.

There are variations to this techniques. You can try to find better than
random hash codes, use more or less than 64 bits for the hash codes, etc.

I hope this makes things clearer for you

Remi

bob_j...@compuserve.com

unread,

May 26, 1997, 3:00:00 AM5/26/97

to

In article <j.scalo-1205...@sjx-ca25-07.ix.netcom.com>,
j.s...@ix.nospam.com (J Scalo) wrote:
...

> So here's a question for the guru's -- How the heck do you come up with a
> good hashing function for a board position?

...

> The dilemma is to come up with a hash function that distributes keys
> evenly over tens of thousands of table entries, while being efficient
> enough to handle hundreds of thousands of hits.

...
> j.s...@ix.netcom.com

I've seen that any position can be represented in 256 bits,
that is 64*4, using 4 bits for each square to record if a
black pawn, white pawn, ..., white king, nothing is there
(13 possible values, a value can be represented in 4 bits).

I've seen that Zobrist hashing lets you find the hash
value of a new position quickly given the previous
position, and that 64-bit hash values are usually used.

I didn't see anyone mention that if you use an n-bit hash,
you should expect to see your first collision after about
2^^(n/2) keys. (After 2^^32 positions for a 64-bit hash.)
I doubt anyone's exploring more than 2^^64 positions, so a
128-bit hash is unlikely to ever give a false hit.

I didn't see anyone mention that if you use a 256-bit hash
value, you can use Zobrist hashing to get a good
distribution plus guarantee that you never get a collision.
The trick is to choose the table values so that the Zobrist
hash simulates a universal hash, and to check that the random
values chosen for the universal hash are linearly independent.

- Bob Jenkins
http://ourworld.compuserve.com/homepages/bob_jenkins

-------------------==== Posted via Deja News ====-----------------------
http://www.dejanews.com/ Search, Read, Post to Usenet

Robert Hyatt

unread,

May 27, 1997, 3:00:00 AM5/27/97

to

bob_j...@compuserve.com wrote:
: In article <j.scalo-1205...@sjx-ca25-07.ix.netcom.com>,

: j.s...@ix.nospam.com (J Scalo) wrote:
: ...
: > So here's a question for the guru's -- How the heck do you come up with a
: > good hashing function for a board position?
: ...
: > The dilemma is to come up with a hash function that distributes keys
: > evenly over tens of thousands of table entries, while being efficient
: > enough to handle hundreds of thousands of hits.
: ...
: > j.s...@ix.netcom.com

: I've seen that any position can be represented in 256 bits,
: that is 64*4, using 4 bits for each square to record if a
: black pawn, white pawn, ..., white king, nothing is there
: (13 possible values, a value can be represented in 4 bits).

: I've seen that Zobrist hashing lets you find the hash
: value of a new position quickly given the previous
: position, and that 64-bit hash values are usually used.

correct...

: I didn't see anyone mention that if you use an n-bit hash,

: you should expect to see your first collision after about
: 2^^(n/2) keys. (After 2^^32 positions for a 64-bit hash.)
: I doubt anyone's exploring more than 2^^64 positions, so a
: 128-bit hash is unlikely to ever give a false hit.

When you do the math, one error every 4 billion nodes is not going to cause any
measurable difficulty. Bruce Moreland and I discussed this a long while back.
There are two kinds of errors: (1) benign collisions, those that don't affect
the root move or PV (if you look at alpha/beta, the PV is pretty concise, so
that most nodes would probably result in benign collisions.) (2) error-causing
collisions that will change the move or score. We both are convinced that (2)
is so rare as to be ignorable... And since we are not yet searching 4 billion
nodes (that would take over an hour of searching on a fast machine for Crafty)
I think this can be safely overlooked.

: I didn't see anyone mention that if you use a 256-bit hash

: value, you can use Zobrist hashing to get a good
: distribution plus guarantee that you never get a collision.
: The trick is to choose the table values so that the Zobrist
: hash simulates a universal hash, and to check that the random
: values chosen for the universal hash are linearly independent.

More is better, of course. But also slower. I'd do 256, if I thought 64 was
causing problems... I used 64 on the Cray, searching 5M nodes per second with
no problems I detected either. We did (myself, John Stanback and others) run
some tests a couple of years ago, and pretty well proved that 64 is safe for
the present...

: - Bob Jenkins