move generators in computer chess

Deniz Yuret

unread,

Oct 11, 1994, 10:01:34 AM10/11/94

to

Are there any "optimally efficient" move generation algorithms that use more
memory and/or different representations to minimize the number of cycles per
generated move? I have read about the move generator in Chess 4.5 (Slate and
Atkin) that uses 64 bit bitmaps. I know how standard programs like Gnuchess
generate moves by going around the board and applying a specific routine for
each piece. I would like to know if there are new ideas. I am not a regular
of this group, please reply to my address (and rec.games.chess if you'd like).

thanks,
deniz

Hans-Henrik Grand

unread,

Oct 14, 1994, 10:19:27 AM10/14/94

to

Thus spake de...@great-grain.ai.mit.edu (Deniz Yuret):

>thanks,
>deniz

This is an interesting question!

Wich move-generator is the most effective?
I think that all depends on wich hardware that
the chessprogram is running, eg. wich commands
are available on the processor, how long are the
registers and are more than one processor available ?

If you know of any effective movegenerators please
post me too.
(My hardware is the intel family: 486 Dx2 66 Mhz.)

Yours Sincerely

Hans-Henrik Grand. (h...@daimi.aau.dk)
--
Sometimes You wonder if chess is better then soccer, and
sometimes You wonder if chess is better then sex..
Ics-login: TheViking

Paul Colley

unread,

Oct 14, 1994, 1:28:37 PM10/14/94

to

In article <37m41f$f...@belfort.daimi.aau.dk>,
Hans-Henrik Grand <h...@daimi.aau.dk> wrote:

>Wich move-generator is the most effective?
>I think that all depends on wich hardware that
>the chessprogram is running, eg. wich commands
>are available on the processor, how long are the
>registers and are more than one processor available ?

Definitely depends on the hardware. In one extreme, there do exist
custom computers in which "generate a pseudo-legal chess move for this
position" is an instruction. Bit boards are effective on computers
with 64-bit words and fast bit operations. And so on.

An interesting perspective (definitely biased towards custom hardware)
is found in:
------------------------------------------------------------------------------
Author: Ebeling, Carl.
Title: All the right moves : a VLSI architecture for chess / Carl
Ebeling.
Published: Cambridge, MA : MIT Press, 1987.
Description: 145 p. ; 24 cm.
Subjects: Chess--Computer programs.
Computer architecture.
Integrated circuits--Very large scale integration.
Notes: Originally presented as the author's thesis (Ph. D.--Carnegie-
Mellon University, 1986).
Bibliography: p.
ISBN: 0262050358
------------------------------------------------------------------------------

A more general reference is:
------------------------------------------------------------------------------
Author: Frey, Peter W. (Peter William), 1942-
Title: Chess skill in man and machine.
Edition: 2nd ed.
Published: New York : Springer-Verlag, c1983.
Description: xiv, 329 p. : ill. ; 25 cm.
Series: Texts and monographs in computer science
Subjects: Chess--Data processing--Addresses, essays, lectures.
Notes: Edited by Peter W. Frey. Bibliography: p. 315-323.
ISBN: 0387907904
------------------------------------------------------------------------------

Neither of these references is going to give the state of the art, but
they are a starting point.

- Paul Colley
University: col...@qucis.queensu.ca
Home: paco...@ember.uucp watmath!ember!pacolley +1 613 545 3807
"Anyone who attempts to generate random numbers by deterministic means is,
of course, living in a state of sin." - John von Neumann

Robert Hyatt

unread,

Oct 14, 1994, 6:06:37 PM10/14/94

to

In article <37m41f$f...@belfort.daimi.aau.dk>,
Hans-Henrik Grand <h...@daimi.aau.dk> wrote:

From experience:

the bit-board approach from Slate-Atkin's program is probably impossible to
beat for several reasons:

1. when you make a move, you update the attack tables so you in essence
re-generate just the part of the move list affected by the move...

2. you can generate captures easily without generating the rest of
the moves. If you want to generate the rest later, you can do so with
no loss of time except for (possibly) a procedure call.

3. you can even be more specific and generate moves for pieces that are
hung, pawns that are passed, etc. Again, the only potential overhead is
the extra procedure calls needed.

Bob Hyatt

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

Steven J. Edwards

unread,

Oct 14, 1994, 9:44:09 PM10/14/94

to

hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>In article <37m41f$f...@belfort.daimi.aau.dk>,
>Hans-Henrik Grand <h...@daimi.aau.dk> wrote:
>>Thus spake de...@great-grain.ai.mit.edu (Deniz Yuret):
>>
>>>Are there any "optimally efficient" move generation algorithms that use more
>>>memory and/or different representations to minimize the number of cycles per
>>>generated move? I have read about the move generator in Chess 4.5 (Slate and
>>>Atkin) that uses 64 bit bitmaps. I know how standard programs like Gnuchess
>>>generate moves by going around the board and applying a specific routine for
>>

>>Wich move-generator is the most effective?
>>I think that all depends on wich hardware that

>the bit-board approach from Slate-Atkin's program is probably impossible to
>beat for several reasons:

>1. when you make a move, you update the attack tables so you in essence
>re-generate just the part of the move list affected by the move...

True. Another bonus is that these same tables can also be used in the
positional scoring code so there's a win there as well. Same thing
for the move ordering routines.

>2. you can generate captures easily without generating the rest of
>the moves. If you want to generate the rest later, you can do so with
>no loss of time except for (possibly) a procedure call.

Also, a little extra coding will help make an efficient check-evasion
generator based on examining attacks by the passive color.

>3. you can even be more specific and generate moves for pieces that are
>hung, pawns that are passed, etc. Again, the only potential overhead is
>the extra procedure calls needed.

Also, the target square set can be specified as well, so it is easy to
generate moves for, say, all hung pieces to unattacked squares.

Oh, and once 64 bit CPUs like the PPC620 become common in the desktop
market, a bitboard approach will immediately double in efficiency from
an earlier implementation on 32 bit machines. The same cannot be said
of the older offset generation mechanism.

Spector uses bitboards for just about everything. The only two places
that the offset method is used are:

1) The one time initialization of various bitboard constants.

2) The tablebase retroanalysis move generation and check detection
subsystem.

-- Steven (s...@mv.mv.com)

Peter Gillgasch

unread,

Oct 16, 1994, 1:18:10 PM10/16/94

to

In article <Cxoy5...@mv.mv.com>, s...@mv.mv.com (Steven J. Edwards) writes:
|> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|>

Hmpf. Obviously everybody is happy with the bitboard approach and I am
the only programmer who still uses a table driven approach...

I have a few Q's for the bitboard users:

1. Ok, generation and maintaining of the attack bitboards is fast.
But how do you convert these bitboards into a move format
efficiently?

2. Can anybody run the following benchmark and post the result and
a hardware spec:

Clear the board, put kings on e1 and e8, place a white queen on
d4. Then run a loop (say 1 million times or more) over:
Create all pseudo legal moves of the piece located on d4 *and*
create a pseudo legal move list for that piece.

When I run that benchmark on a SPARC my program generates
2.7 million pseudo legal moves per second. On a 25 MHz 68030
machine the program generates 0,38 million pseudo legal moves per
second.

Using a profiler I found out that when running the WAC.ci benchmark at 10
sec per problem my program spents about 15% on move generation (total).
Generating capture moves needs about 9%. So I don't know if it would
make sense to optimize the move generation code, but 64 bit cpu's are
lurking behind the corner and probably I could do other interesting
things with the bitboards...

C'mon guys, convince me. Please!

-- Peter

+-------------------------------//-------------------//---------\\
| Peter W. Gillgasch //the way is clear // Gothic \\
| Klosterweg 28/I-113 //The road is closed // computer \\
| 76131 Karlsruhe/Germany //the damage done // award for the \\
| Phone ++49/(0)721/6904255 //and the course // Macintosh SE: \\
| email gil...@ira.uka.de //imposed you //Dead and monochrome\\
+-------------------------//-------------------//---------------------\\

Steven J. Edwards

unread,

Oct 16, 1994, 2:01:50 PM10/16/94

to

Concerning offset vs. bitboard representation:

Because the selection of data representation permeates an entire
chessplaying program, one cannot limit benchmarking to only the move
generation portion. With offset generation, one can get moves fairly
fast, but that's all one gets. With bitboards, one has all these nice
attack and occupation structures that are useful for other parts of
the program as well.

Perhaps the best testimony is the historical evidence. I would guess
that every chess programmer started with the offset method (as it is
much simpler to write). But for those with access to machines with
decent CPUs and good memory bandwidth, a switch to bitboard
representation was made.

-- Steven (s...@mv.mv.com)

Robert Hyatt

unread,

Oct 16, 1994, 11:38:40 PM10/16/94

to

In article <37oa3v$i...@news.kth.se>,
Urban Koistinen <md85...@hemul.nada.kth.se> wrote:
>In <37mvk2$k...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>
>
>:I forgot to mention: I am using this technique in the new Cray Blitz which
>:does run on a machine with 64 bit words and can do things like count leading
>:zeros in a word, count the number of 1 bits in a word, etc.
>
>A fast count leading zeroes is essential for a bitboard move generator.
>
>:Surprisingly, it is also fast on a non-64-bit machine as well.. these
>:operations can be handled by simple table lookups... using pre-computed
>:values. On a PC, the 64-bit leading zero "instruction" turns into 4 array
>:accesses (worst case where the first 1 bit is in the rightmost 16 bits)
>:which is extremely quick...
>
>Early on I dismissed normal bitboard move generators as I didn't have a fast
>count leading zeroes instruction. Instead I use a [64][32] bitboard so that
>I only need to examine 16 bits when generating moves to a square.
>(The attack table is incrementally updated and forward pawn moves are handled
>one row at a time.)
>
>Maybe I should try [N][64] bitboards some time?
>What N do you use?
>--

On the Cray, as you suspect, I use a 64-bit word and use the leadz instruction
which counts leading zeros *fast* (a couple of clock cycles). On a 32-bit
machine, using C, I UNION the 64-bit bit-board with 4 16-bit integers and then
use these integers as indices into a 65536 element array where each element
was initialized to the number of leading zeros in its binary address, starting
at "0". the leadz function then becomes something like this:

long leadz(BITBOARD arg1)
{
union doub {
unsigned short i[4];
BITBOARD d;
};
union doub x;
extern long leading_zeroes[65536];
x.d=arg1;
if (x.i[0])
return (leading_zeroes[x.i[0]]);
if (x.i[1])
return (leading_zeroes[x.i[1]]+16);
if (x.i[2])
return (leading_zeroes[x.i[2]]+32);
if (x.i[3])
return (leading_zeroes[x.i[3]]+48);
return(64);
}

Which is o.k. for suns, but has to be further hacked to run on a PC since
it is a "little endian" machine that stores the bytes in reverse order. in
any case, this turns into nothing more than a load (or two or three or four)
and a return with an add, which should meet your requirement for speed.

One caveat: it might actually be better to do it as 8 8-bit tests as the
entire 256 elements of the array would then fit into cache, as opposed to this
large 64k word array that will swamp most caches, assume you do enough of
the leadz's to make it important.

Bob

Robert Hyatt

unread,

Oct 16, 1994, 11:53:40 PM10/16/94

to

In article <37rn8i$q...@nz12.rz.uni-karlsruhe.de>,

Peter Gillgasch <gil...@i41s19.ira.uka.de> wrote:
>In article <Cxoy5...@mv.mv.com>, s...@mv.mv.com (Steven J. Edwards) writes:
>|> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>|>
>
><much stuff praising the bitboard approach deleted, sorry Bob>
>
>|> Spector uses bitboards for just about everything. The only two places
>|> that the offset method is used are:
>|>
>|> 1) The one time initialization of various bitboard constants.
>|>
>|> 2) The tablebase retroanalysis move generation and check detection
>|> subsystem.
>|>
>|> -- Steven (s...@mv.mv.com)
>
>Hmpf. Obviously everybody is happy with the bitboard approach and I am
>the only programmer who still uses a table driven approach...
>
>I have a few Q's for the bitboard users:
>
>1. Ok, generation and maintaining of the attack bitboards is fast.
> But how do you convert these bitboards into a move format
> efficiently?

Trivially:

let's generate white queen moves: assume a 64-bit word with 1-bits
on for the squares attacked by this mystical queen.

then, set moving_piece=queen, from=location of queen, and to becomes
leading_zeros(queen_attack_board); then we clear this bit with one
and operation and loop until there are no more to squares.

>
>2. Can anybody run the following benchmark and post the result and
> a hardware spec:
>
> Clear the board, put kings on e1 and e8, place a white queen on
> d4. Then run a loop (say 1 million times or more) over:
> Create all pseudo legal moves of the piece located on d4 *and*
> create a pseudo legal move list for that piece.
>
> When I run that benchmark on a SPARC my program generates
> 2.7 million pseudo legal moves per second. On a 25 MHz 68030
> machine the program generates 0,38 million pseudo legal moves per
> second.

You are grossly missing the point. The advantage of bit-boards is
two-fold for move generation: assuming I already have the attack
squares list, I am going to loop *exactly* as many iterations as the
queen has legal moves, no need to test anything, no need to test for
the edge of the board, etc. creating and incrementally updating this
attacked_squares is quite easy, just requires a lot of code. It is
at least an order of magnitude faster than having to recompute everything
from scratch like the old Cray-blitz. After moving a single piece, the
set of legal moves is not changed too much. This bit-board approach
saves a *bunch* of time, but just generating the same set over and over
"hides" the payoff.

Current simple "tests" have shown that move generation alone, for normal
middle-game positions, is now about 10x faster in the new Cray Blitz,
even when running it on a sparc which has to "cobble" the 64-bit operations
required from multiple 32 bit operations.

>
>Using a profiler I found out that when running the WAC.ci benchmark at 10
>sec per problem my program spents about 15% on move generation (total).
>Generating capture moves needs about 9%. So I don't know if it would
>make sense to optimize the move generation code, but 64 bit cpu's are
>lurking behind the corner and probably I could do other interesting
>things with the bitboards...
>
>C'mon guys, convince me. Please!

How about some of the following: want to find out if your bishop is "bad"
then population_count(and(bishop_attacck,!my_pawns)) and you have the number
of squares attacked by this bishop, not counting the squares occupied by my
own pawns. 4 clock cycles on a Cray.

want to know if your rooks are connected? locate the first rook (leadz again)
and then see if this square is attacked by the second rook (from its attack
board) two leadz operations, one and and you got it...

how about "is my pawn on d4 passed?" create a set of masks for every possible
pawn location on the board. for the d4 mask, you put 1's on c5,d5,e5,c6,d6,
and so forth. then, with one and operation (this mask for d4 and your pawn
bit-board) i get an answer... one and operation = 2 clocks on the Cray.

need I go on? its a huge win *everywhere*...

Bob

Robert Hyatt

unread,

Oct 14, 1994, 6:10:10 PM10/14/94

to

I forgot to mention: I am using this technique in the new Cray Blitz which
does run on a machine with 64 bit words and can do things like count leading
zeros in a word, count the number of 1 bits in a word, etc.

Surprisingly, it is also fast on a non-64-bit machine as well.. these

operations can be handled by simple table lookups... using pre-computed
values. On a PC, the 64-bit leading zero "instruction" turns into 4 array
accesses (worst case where the first 1 bit is in the rightmost 16 bits)
which is extremely quick...

Bob

Robert Hyatt

unread,

Oct 17, 1994, 2:11:59 PM10/17/94

to

In article <37u3o7$e...@nz12.rz.uni-karlsruhe.de>,
Peter Gillgasch <gil...@i41s19.ira.uka.de> wrote:
>Note: This is a (more or less) scientific discussion. I don't want to
> imply that a certain method is superior to another. No sweat! Ok?

>
>
>In article <37ssg4$1...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>|> In article <37rn8i$q...@nz12.rz.uni-karlsruhe.de>,
>|> Peter Gillgasch <gil...@i41s19.ira.uka.de> wrote:
>|> >Hmpf. Obviously everybody is happy with the bitboard approach and I am
>|> >the only programmer who still uses a table driven approach...
>|> >
>|> >I have a few Q's for the bitboard users:
>|> >
>|> >1. Ok, generation and maintaining of the attack bitboards is fast.
>|> > But how do you convert these bitboards into a move format
>|> > efficiently?
>|>
>|> Trivially:
>|>
>|> let's generate white queen moves: assume a 64-bit word with 1-bits
>|> on for the squares attacked by this mystical queen.
>|>
>|> then, set moving_piece=queen, from=location of queen, and to becomes
>|> leading_zeros(queen_attack_board); then we clear this bit with one
>|> and operation and loop until there are no more to squares.
>

>Ok, so you have to loop over your '1' bits plus you have the cost of
>leading_zeroes(), clearing bits and you have to maintain the attack
>boards during DoMove() and you have to restore them during UndoMove()
>and in the quiesence search you are running the risk that eval() kills
>the subtree (no further moves will be generated) and the score is far
>out from your alpha/beta window, so eval() doesn't need the maintained
>bitboards, too, so there was some wasted computation.

1. don't restore anything, just "revert" to the previous data... ie,
make_move() can return a "new" set of attack boards. when done, discard
'em and use what you fed to make_move() to update... I don't even have
an "unmake_move()" procedure...

2. leadz is free on the Cray and nearly free using the simple table-lookup
idea.

3. yes, some computation can be wasted, but it is *not* very much as the
code to update the attack boards is both quick and clean. such wasted work
is still less than what we used to do. By the way, for your move generation
test, change it to the following: Put two queens on the board (same side)
with each queen attacking one enemy pawn. NOW, try to generate captures.
The bit-board move generator will execute a loop exactly one time for each
queen, directly spitting the capture out. Our old move generator had to
step through all eight directions, and all squares in each direction looking
for an enemy piece. Note that we take some "hits" by updating the attack
boards in the quiescence even though we will never "generate" these moves.
However, we get it back with interest since we can generate captures so much
more efficiently. Since this usually dominates a full-width Chess 4.x clone,
its a win any way you cut it.

4. Don't forget that even if you don't use the attack boards to generate
non-captures, you can still use 'em for evaluation (mobility, multiple hung
pieces, etc.)

>
>|> >
>|> >2. Can anybody run the following benchmark and post the result and
>|> > a hardware spec:
>

> <stupid benchmark & machissimo claims by myself deleted ;-) >
>
> *** which doesn't mean that I am not interested in results! ***

>
>|>
>|>
>|> You are grossly missing the point.
>

Didn't mean to sound condescending... my apology if it sounded that
way. Late at night, debugging a really stupid error, etc...

>Hmpf. Maybe? Ok, the benchmark wasn't "real world" but more later...

>
>|> The advantage of bit-boards is
>|> two-fold for move generation: assuming I already have the attack
>|> squares list, I am going to loop *exactly* as many iterations as the
>|> queen has legal moves, no need to test anything, no need to test for
>|> the edge of the board, etc. creating and incrementally updating this
>|> attacked_squares is quite easy, just requires a lot of code. It is
>|> at least an order of magnitude faster than having to recompute everything
>|> from scratch like the old Cray-blitz. After moving a single piece, the
>|> set of legal moves is not changed too much. This bit-board approach
>|> saves a *bunch* of time, but just generating the same set over and over
>|> "hides" the payoff.
>

>I simply loop over all *ever possible* moves of the piece located on the
>"from" square. This is not the "offset" approach, it is table driven. The
>table is constructed at program startup *once* and the "edge effects"
>are already taken into account. The only test I do in that
>move-generation loop is wether we hit a blocker (or hit the "from"
>square, which means that we are done) and if we hit a blocker, whether
>it is a friendly piece. The whole loop *including* the
>creation and storing of the move descriptors fits into *12* machine
>instructions of the MC68030.
>So I loop over *exactly* the squares that are pseudolegal "to" squares
>for that piece located on "from" *plus* sometimes a square that is
>occupied by a friendly piece.

>
>|>
>|> Current simple "tests" have shown that move generation alone, for normal
>|> middle-game positions, is now about 10x faster in the new Cray Blitz,
>|> even when running it on a sparc which has to "cobble" the 64-bit operations
>|> required from multiple 32 bit operations.
>|>
>

>I really don't want to argue with you Bob, but as far as I know you had
>an offset approach in version 49 and you had to handle the edge effects
>during runtime (that means during move generation). Now you can use the
>"leadz" machine instruction of the Cray and your code makes better use
>of the vector instructions of the Cray, I suppose. Plus you don't have
>to handle the edge effects anymore. That should explain the 10x
>improvement. I wouldn't be surprised if you would get a similar
>improvement if you decided to use a table driven approach.

Probably correct. However, remember that the reason I chose bit-boards was
*not* for the move generation efficiency... the fact that it works well is
great, but I was willing to accept it even if it was no better than what we
did before! I wanted piece mobility for nothing and would have been happy
if that was all I got... it turns out that we got much more, and I haven't
finished yet so I'm still not sure how much better things will be.

for example, we had a routine that used to take roughly 20% of the total search
time. It's sole purpose was, given a target square, can the piece on that square
be captured profitably (ignoring pins, overloaded pieces, etc., ie, just
enumerate all the pieces attacking that square for both sides, minimax the
captures and return an expected gain (or zero or loss of course). Now this
code is significantly faster... I have a single bit mask that gives every
square that attacks this square. by stepping through the piece boards for
both sides, I can quickly enumerate the exchange sequence, without having to
step through empty squares and the like. *another* gain (I was debugging
this last nite after writing it...) The really horrendous part of all of this
was the make_move() routine which has to update the to.attack bit boards, the
from.attack bit boards, the hash key, the bit-board representation of the
chess board, and probably some other stuff as I get on with this.

For fun, here is my "current" bit-board structure to represent the chess
board:

typedef struct {
BITBOARD w_pawn; /* locations of white pawns */
BITBOARD w_knight; /* locations of white knights */
BITBOARD w_bishop; /* locations of white bishops */
BITBOARD w_rook; /* locations of white rooks */
BITBOARD w_queen; /* locations of white queens */
BITBOARD w_king; /* locations of white kings */
BITBOARD w_occupied; /* locations of white anything */

BITBOARD b_pawn; /* locations of black pawns */
BITBOARD b_knight; /* locations of black knights */
BITBOARD b_bishop; /* locations of black bishops */
BITBOARD b_rook; /* locations of black rooks */
BITBOARD b_queen; /* locations of black queens */
BITBOARD b_king; /* locations of black kings */
BITBOARD b_occupied; /* locations of black anything */

BITBOARD hash_key; /* hash key used to access trans table */
BITBOARD pawn_hash_key; /* hash key used to lookup pawn scores */

BITBOARD rooks_queens; /* has a 1 for any square with a rook or
queen for either side. used to update
the attack boards. if you move a pawn
to a new square, you look vertically and
horizontally from the new square (and the
old square, of course) to see if you see a
rook or queen (using this board.) If so,
you have some attacks to update.
BITBOARD bishops_queens; /* ditto for bishops and queens */

BITBOARD enpassant_target; /* one bit set whenever a pawn advances two
ranks */

int w_castle; /* two bits, one for each rook. 0=that rook
moved, whole thing=0 means king moved */
int b_castle; /* ditto for black */

} CHESS_BOARD;

>
>|>
>|> >
>|> >Using a profiler I found out that when running the WAC.ci benchmark at 10
>|> >sec per problem my program spents about 15% on move generation (total).
>|> >Generating capture moves needs about 9%. So I don't know if it would
>|> >make sense to optimize the move generation code, but 64 bit cpu's are
>|> >lurking behind the corner and probably I could do other interesting
>|> >things with the bitboards...
>|> >
>|> >C'mon guys, convince me. Please!
>|>
>|> How about some of the following: want to find out if your bishop is "bad"
>|> then population_count(and(bishop_attacck,!my_pawns)) and you have the number
>|> of squares attacked by this bishop, not counting the squares occupied by my
>|> own pawns. 4 clock cycles on a Cray.
>|>
>|> want to know if your rooks are connected? locate the first rook (leadz again)
>|> and then see if this square is attacked by the second rook (from its attack
>|> board) two leadz operations, one and and you got it...
>|>
>|> how about "is my pawn on d4 passed?" create a set of masks for every possible
>|> pawn location on the board. for the d4 mask, you put 1's on c5,d5,e5,c6,d6,
>|> and so forth. then, with one and operation (this mask for d4 and your pawn
>|> bit-board) i get an answer... one and operation = 2 clocks on the Cray.
>|>
>|> need I go on? its a huge win *everywhere*...
>

>Hmmm. Ok, I get the point. Some of the nifty features of the bitboard
>approach you have explained in the last paragraph are in my program
>actually handled by - surprise! - bitboards or precomputed "rank masks".
>Maybe I throw the "rank masks" out of my program and use precomputed
>bitboards as well. Will save some loops. But since I hash nearly every
>term of the eval() function the room for improvement is very limited.
>
>I furthermore admit that I have *2* bitboards that I actually maintain
>between moves (but this is only very rarely done), but I don't use them
>for move-generation, only for eval().
>
>I don't know if I will give it a try, obviously the bitboard approach
>has something to offer, but obviously it is a "nothing works until
>everything works" change to my program. Plus I am not really convinced
>that it will speed up my beast... Although it could help me to add some
>cool features to eval()... I will think about it. We have some 64 bit
>machines...
>
>Thanks for the input!
>

It *is* a real pain to write and debug. I have intentionally wrote the
make_move() routine in a straightforward manner for ease of repair later
as I add more things to be incrementally updated (something that is
probably inevitable...) However, we *demanded* mobility as I can show
you two games that we lost last year due to having (keeping) the bishop
pair, but having the bishops so "locked in" that they were nothing more
than "tall pawns." we wanted (demanded) the improved knowledge to help
the program either trade 'em when they are not so hot, or else open the
pawns up to make 'em better. right now, the old program will do neither.
We tried a direct mobility computation, but the cost was *way* too high
and slowed the program down enough that it played smarter, but lost due
to less depth. Now we hope to "have our cake *and* eat it too"

One final point: It has always been obvious that incrementally updating
something is beneficial if you use it often. If you modify it for several
plies in succession before using it, you can take a hit on deep endings.
For the bit boards, we basically are maintaining a set of legal moves for
the current board, and when the current board changes, we only change
exactly those things that should be changed... and I mean EXACTLY
what needs to be changed and nothing more. That's what makes make_move()
so complex, making sure that we modify only what needs to be modified without
wasting time computing anything else. For example, when you move a piece
to a new square that blocks a queen that was attacking through that
square, we have to modify that queen's attack board. However, we only
modify the one and only one diagonal (or rank or file) that we just
blocked. As a result, the incremental code does much less work than
the traditional move generator, leaving our generate_moves() with nothing
to do but enumerate squares, and you can have a great deal of leeway in
source and target squares. Try to generate moves that capture a queen.
For us, it's easy now. Before, it was difficult.

Urban Koistinen

unread,

Oct 15, 1994, 6:15:27 AM10/15/94

to

In <37mvk2$k...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:

:I forgot to mention: I am using this technique in the new Cray Blitz which

:does run on a machine with 64 bit words and can do things like count leading
:zeros in a word, count the number of 1 bits in a word, etc.

A fast count leading zeroes is essential for a bitboard move generator.

:Surprisingly, it is also fast on a non-64-bit machine as well.. these

:operations can be handled by simple table lookups... using pre-computed
:values. On a PC, the 64-bit leading zero "instruction" turns into 4 array
:accesses (worst case where the first 1 bit is in the rightmost 16 bits)
:which is extremely quick...

Early on I dismissed normal bitboard move generators as I didn't have a fast

count leading zeroes instruction. Instead I use a [64][32] bitboard so that
I only need to examine 16 bits when generating moves to a square.
(The attack table is incrementally updated and forward pawn moves are handled
one row at a time.)

Maybe I should try [N][64] bitboards some time?
What N do you use?
--

Urban Koistinen - md85...@nada.kth.se
Stop software patents, interface copyrights: contact l...@uunet.uu.net

David Hanley

unread,

Oct 18, 1994, 1:32:47 PM10/18/94

to

: Which is o.k. for suns, but has to be further hacked to run on a PC since

: it is a "little endian" machine that stores the bytes in reverse order. in
: any case, this turns into nothing more than a load (or two or three or four)
: and a return with an add, which should meet your requirement for speed.

One thing I don't understand: Once we have a bitboard of the
queen's moves, and an an intersection with the other peices on the
board( friend and foe ) whe have movement vectors with holes in them.
How do we efficently trim these vectors to end in the right place(
after captures or interruptions? )

------------------------------------------------------------------------------
| David James Hanley -- dha...@lac.eecs.uic.edu -- C++, OOD, martial arts |
| Laboratory for advanced computing | My employer barely KNOWS me. |
------------------------------------------------------------------------------
James Joyce had escaped from the normal constrictions of ego by
pondering deeply what it feels like to be a woman. Albert Einstein
had escaped from the restrictions of ego by pondering deeply what it
feels like to be a photon. Joyce approached art with the methodology
of a scientist; Einstein approached science with the intuition of an
artist.

Peter Gillgasch

unread,

Oct 17, 1994, 11:03:35 AM10/17/94

to

Note: This is a (more or less) scientific discussion. I don't want to
imply that a certain method is superior to another. No sweat! Ok?

In article <37ssg4$1...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:

Ok, so you have to loop over your '1' bits plus you have the cost of

leading_zeroes(), clearing bits and you have to maintain the attack
boards during DoMove() and you have to restore them during UndoMove()
and in the quiesence search you are running the risk that eval() kills
the subtree (no further moves will be generated) and the score is far
out from your alpha/beta window, so eval() doesn't need the maintained
bitboards, too, so there was some wasted computation.

|> >

|> >2. Can anybody run the following benchmark and post the result and
|> > a hardware spec:

*** which doesn't mean that I am not interested in results! ***

|>
|>

|> You are grossly missing the point.

Hmpf. Maybe? Ok, the benchmark wasn't "real world" but more later...

|> The advantage of bit-boards is

|> two-fold for move generation: assuming I already have the attack
|> squares list, I am going to loop *exactly* as many iterations as the
|> queen has legal moves, no need to test anything, no need to test for
|> the edge of the board, etc. creating and incrementally updating this
|> attacked_squares is quite easy, just requires a lot of code. It is
|> at least an order of magnitude faster than having to recompute everything
|> from scratch like the old Cray-blitz. After moving a single piece, the
|> set of legal moves is not changed too much. This bit-board approach
|> saves a *bunch* of time, but just generating the same set over and over
|> "hides" the payoff.

I simply loop over all *ever possible* moves of the piece located on the

"from" square. This is not the "offset" approach, it is table driven. The
table is constructed at program startup *once* and the "edge effects"
are already taken into account. The only test I do in that
move-generation loop is wether we hit a blocker (or hit the "from"
square, which means that we are done) and if we hit a blocker, whether
it is a friendly piece. The whole loop *including* the
creation and storing of the move descriptors fits into *12* machine
instructions of the MC68030.
So I loop over *exactly* the squares that are pseudolegal "to" squares
for that piece located on "from" *plus* sometimes a square that is
occupied by a friendly piece.

|>

|> Current simple "tests" have shown that move generation alone, for normal
|> middle-game positions, is now about 10x faster in the new Cray Blitz,
|> even when running it on a sparc which has to "cobble" the 64-bit operations
|> required from multiple 32 bit operations.
|>

I really don't want to argue with you Bob, but as far as I know you had

an offset approach in version 49 and you had to handle the edge effects
during runtime (that means during move generation). Now you can use the
"leadz" machine instruction of the Cray and your code makes better use
of the vector instructions of the Cray, I suppose. Plus you don't have
to handle the edge effects anymore. That should explain the 10x
improvement. I wouldn't be surprised if you would get a similar
improvement if you decided to use a table driven approach.

|>
|> >

|> >Using a profiler I found out that when running the WAC.ci benchmark at 10
|> >sec per problem my program spents about 15% on move generation (total).
|> >Generating capture moves needs about 9%. So I don't know if it would
|> >make sense to optimize the move generation code, but 64 bit cpu's are
|> >lurking behind the corner and probably I could do other interesting
|> >things with the bitboards...
|> >
|> >C'mon guys, convince me. Please!
|>
|> How about some of the following: want to find out if your bishop is "bad"
|> then population_count(and(bishop_attacck,!my_pawns)) and you have the number
|> of squares attacked by this bishop, not counting the squares occupied by my
|> own pawns. 4 clock cycles on a Cray.
|>
|> want to know if your rooks are connected? locate the first rook (leadz again)
|> and then see if this square is attacked by the second rook (from its attack
|> board) two leadz operations, one and and you got it...
|>
|> how about "is my pawn on d4 passed?" create a set of masks for every possible
|> pawn location on the board. for the d4 mask, you put 1's on c5,d5,e5,c6,d6,
|> and so forth. then, with one and operation (this mask for d4 and your pawn
|> bit-board) i get an answer... one and operation = 2 clocks on the Cray.
|>
|> need I go on? its a huge win *everywhere*...

Hmmm. Ok, I get the point. Some of the nifty features of the bitboard

approach you have explained in the last paragraph are in my program
actually handled by - surprise! - bitboards or precomputed "rank masks".
Maybe I throw the "rank masks" out of my program and use precomputed
bitboards as well. Will save some loops. But since I hash nearly every
term of the eval() function the room for improvement is very limited.

I furthermore admit that I have *2* bitboards that I actually maintain
between moves (but this is only very rarely done), but I don't use them
for move-generation, only for eval().

I don't know if I will give it a try, obviously the bitboard approach
has something to offer, but obviously it is a "nothing works until
everything works" change to my program. Plus I am not really convinced
that it will speed up my beast... Although it could help me to add some
cool features to eval()... I will think about it. We have some 64 bit
machines...

Thanks for the input!

-- Peter

Robert Hyatt

unread,

Oct 19, 1994, 9:54:26 AM10/19/94

to

In article <19941018.213513...@UICVM.UIC.EDU>,

David Hanley <dha...@matisse.eecs.uic.edu> wrote:
>: Which is o.k. for suns, but has to be further hacked to run on a PC since
>: it is a "little endian" machine that stores the bytes in reverse order. in
>: any case, this turns into nothing more than a load (or two or three or four)
>: and a return with an add, which should meet your requirement for speed.
>
> One thing I don't understand: Once we have a bitboard of the
>queen's moves, and an an intersection with the other peices on the
>board( friend and foe ) whe have movement vectors with holes in them.
>How do we efficently trim these vectors to end in the right place(
>after captures or interruptions? )
>

oops! If you could do as you say, it would be "too" easy and everyone would
already be using bit boards. The real "trick" is that the attack boards
(to.attacks[0-63]) are already "trimmed"!!

Whenever you make a move, you have to do the following things to the attack
board set:

[I'm going to ignore the from.attacks stuff and just talk about to.attacks
which is what we are using to generate moves...]

1. clear to.attacks[from] since the square is not empty and there are no
attacks to other squares from an empty square.

2. set to.attacks[to] to the "default" set of attacks for a piece of whatever
type is being moved to this square. Then, with a lot of smoke, mirrors and
prayers, you trim the bit vectors (for sliding pieces only) at any point where
they intersect an occupied square [and beyond that point as well...].

3. for the from square, you look up each diagonal to see if you "see" a
queen or bishop. If so, you can extend that diagonals attack from the
square where you find the bishop/queen since this square used to block that
piece from moving further, but now doesn't. Then look in both directions on
the rank/file to see if you "see" a rook/queen. If so, do the same thing for
that piece's attack vector in this direction only.

4. for the to square, you do the same thing, except that now you are going
to "trim" attack vectors since this square is now occupied. note that if this
is a capture, you omit this step since the square was occupied before we did
the capture, and is still occupied afterwards, which means other pieces attacks
through this square haven't changed.

For anyone interested, I'm willing to "share the wealth" here and ship you my
"make_move()" code which updates the to.attack[], from.attack[], and the hash
key. Give me a week or so to get the new program playing to the point where I
can let it run for a while and make sure that there are no bugs in the code.
I have had make_move() play through several old games, just making and updating
the data structures, and then using a different approach to validate them. So
far, I have a version that seems to work with the games I have. I doubt that
this is enough testing to validate the routine, however... It would need to
include all kinds of promotions, captures, enpassants, castling, etc... where
these games don't cover all cases...

Bob

t

Steven J. Edwards

unread,

Oct 17, 1994, 6:05:11 PM10/17/94

to

I have a long ditto on what Bob posted with a few additions:

1) In addition to the two occupancy by color bitboards that Cray Blitz
maintains, Spector also has the apmbb (all pieces merged bitboard),
which is the inclusive OR of the two color occupancy boards. This is
handy for determining blocks.

2) Spector has a constant value 64 by 64 element bitboard array
indexed by from-square and to-square, called the interpath array.
Each element is a bitboard with its active squares (if any) set to be
the squares between the two indexing points. With this, it only takes
a simple comparison to determine if a block exists (see #1).

3) Spector has a constant value 8 (eight sweep directions) by 64
(squares) element array of bitboards called the path extent array of
bitboards. For any given ortho/diagonal direction and a square, the
active bits (if any) define those square that line in the indexing
direction away from the indexing square. This is very useful for
helping with pin and skewer detection.

4) There is another constant value table indexed by a pair of squares
that gives the direction, if any, that connects from the first to the
second square. This works well in conjunction with the first three
points. (This also handles the eight knight directions.)

There are others as well. The idea is that everywhere in an offset
program that a for loop of 64 squares appears, such a loop is a
natural candidate for conversion to a bitboard operation, or at least
a loop controlled by a bitboard scan.

Spector even has bitboards for white and black squares used to help
format the board display and to count pawns' square colors for bishop
utility scoring.

-- Steven (s...@mv.mv.com)

Robert Hyatt

unread,

Oct 17, 1994, 10:51:36 PM10/17/94

to

Steven mentioned that he keeps a bit-board with all occupied squares set. I
use this info, but found it cheaper to or the white_occupied and black_occupied
bit-boards together, as they are required once each node, and would have to
be dynamically updated once for each node, which really saves nothing nor
costs nothing. However, I do need the white_occupied to generate moves for
white (which obviously can't move to squares it attacks, but which are
occupied by white pieces.) In any case, this looks better and better, but
until I get it working, I can't peg exactly how much better or worse it is
than our old code...

Bob

Robert Hyatt

unread,

Oct 17, 1994, 10:43:27 PM10/17/94

to

In article <limunltdC...@netcom.com>,
Lloyd Lim <limu...@netcom.com> wrote:

>In article <37srk0$1...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>In article <37oa3v$i...@news.kth.se>,
>>Urban Koistinen <md85...@hemul.nada.kth.se> wrote:
>>>In <37mvk2$k...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>>
>>>:I forgot to mention: I am using this technique in the new Cray Blitz which
>>>:does run on a machine with 64 bit words and can do things like count leading
>>>:zeros in a word, count the number of 1 bits in a word, etc.
>>>
>>>A fast count leading zeroes is essential for a bitboard move generator.
>>

>>On the Cray, as you suspect, I use a 64-bit word and use the leadz instruction
>>which counts leading zeros *fast* (a couple of clock cycles). On a 32-bit
>>machine, using C, I UNION the 64-bit bit-board with 4 16-bit integers and then
>>use these integers as indices into a 65536 element array where each element
>>was initialized to the number of leading zeros in its binary address, starting
>>at "0". the leadz function then becomes something like this:
>>

>>[deleted]
>
>How often do you need to know the number of leading zeroes? I'm curious,
>since our research program is offset-based.
>
>If you just want to manipulate bits, then there are some tricks you
>can use that should eliminate the need for any tables. Suppose b is
>the bit-board and it contains 0x00101100.
>
> b & (b - 1) clears the trailing bit (i.e. 0x00101000)
> b ^ (b & (b - 1) gives you the trailing bit (i.e. 0x00000100)
>
>If you iterate from the trailing bit instead of the leading bit,
>then it should be quite fast.
>
>I haven't seen this mentioned in any of the chess literature. Maybe this
>is one of those tricks chess hackers keep to themselves. Have I missed
>something here?
>

That's a pretty well-known hack to cray programmers that don't like to load
masks to do things. However, we aren't so interested in clearing the bit
as we are in finding it's identity. This is what the leading zeros function
does (of course, if we had a trailing zero function (instruction) it would
be just as good. With leading zero you are able to obtain the "square" number
for the piece represented by the "1"... some machines have the direct leading
zero function (mainly vector machines that use this for vector compares
among other things) while others have odd shift instructions like the Vaxen...
that will shift until the sign bit changes and return the shift count...
which can be a leading zero counter quite easily. BTW, getting the trailing
bit still leaves us with the problem of it's binary ID, ie, in your case, I
would want either "2" or "3" depending on your "origin".

Lloyd Lim

unread,

Oct 17, 1994, 5:17:08 PM10/17/94

to

In article <37srk0$1...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>In article <37oa3v$i...@news.kth.se>,
>Urban Koistinen <md85...@hemul.nada.kth.se> wrote:
>>In <37mvk2$k...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>
>>:I forgot to mention: I am using this technique in the new Cray Blitz which
>>:does run on a machine with 64 bit words and can do things like count leading
>>:zeros in a word, count the number of 1 bits in a word, etc.
>>
>>A fast count leading zeroes is essential for a bitboard move generator.
>

>On the Cray, as you suspect, I use a 64-bit word and use the leadz instruction
>which counts leading zeros *fast* (a couple of clock cycles). On a 32-bit
>machine, using C, I UNION the 64-bit bit-board with 4 16-bit integers and then
>use these integers as indices into a 65536 element array where each element
>was initialized to the number of leading zeros in its binary address, starting
>at "0". the leadz function then becomes something like this:
>

>[deleted]

How often do you need to know the number of leading zeroes? I'm curious,
since our research program is offset-based.

If you just want to manipulate bits, then there are some tricks you
can use that should eliminate the need for any tables. Suppose b is
the bit-board and it contains 0x00101100.

b & (b - 1) clears the trailing bit (i.e. 0x00101000)
b ^ (b & (b - 1) gives you the trailing bit (i.e. 0x00000100)

If you iterate from the trailing bit instead of the leading bit,
then it should be quite fast.

I haven't seen this mentioned in any of the chess literature. Maybe this
is one of those tricks chess hackers keep to themselves. Have I missed
something here?

+++
Lloyd Lim Internet: LimU...@netcom.com
Lim Unlimited America Online: LimUnltd
330 W. Iris Ave. AppleLink: LimUnltd
Stockton, CA 95210-3738 CompuServe: 72647,660

Marcel van Kervinck

unread,

Oct 20, 1994, 6:30:20 AM10/20/94

to

Lloyd Lim (limu...@netcom.com) wrote:

> If you just want to manipulate bits, then there are some tricks you
> can use that should eliminate the need for any tables. Suppose b is
> the bit-board and it contains 0x00101100.

> b & (b - 1) clears the trailing bit (i.e. 0x00101000)
> b ^ (b & (b - 1) gives you the trailing bit (i.e. 0x00000100)

The latter may be rewritten as b & -b

> I haven't seen this mentioned in any of the chess literature. Maybe this
> is one of those tricks chess hackers keep to themselves. Have I missed
> something here?

Read K&R (ANSI C), the b&-b trick is mentioned in one of the exercises.

Another mindboggling routine I recently came up with efficiently
generates all subsets of the set represented by v:

void subsets(int v)
{
int u=0;
do printf("%d\n",u); while(u=u-v&v);
}

Does anybody have simular tricks?

Marcel
-- _ _
_| |_|_|
|_ |_ Marcel van Kervinck
|_| mar...@stack.urc.tue.nl

Vincent Diepeveen

unread,

Oct 20, 1994, 10:33:59 AM10/20/94

to

In <384kd9...@life.ai.mit.edu> de...@great-grain.ai.mit.edu (Deniz Yuret) writes:

>Recently, there has been some discussion on my question about optimal move
>generation for computer chess.

>correctly mention that this is a hardware dependent problem. However it is

>Bob Hyatt says:
> >> 1. when you make a move, you update the attack tables so you in essence
> >> re-generate just the part of the move list affected by the move...

>However, when the program needs to actually read these moves into a list, it
>has to get them one by one using leadz() instructions. So this still uses an

First you must update attacktables, then at least 64*n instructions to
get these attacktables to a movelist.

I don't think this useful: by just generating it the way GNU-chess does
you only need cheap compares (now i am talking about the PC), and avoiding
the overhead of the incremental update + 64*n instructions needed by bitboards
to convert bits into real moves.
In the GNU-chess-way you do not scan more than there are possible moves,
which if far less then 64*n.

The overhead is very little, and the straight way of calculating moves is
really fast, needing less than about 10% in my own chessprogramm.

The real problem in a chessprogramm (if you don't have a supercomputer) is
of course to evaluate very fast and well.

Now suppose i evaluate very well: how to speed up things?
My current evaluator(the evaluator in my brute force programm i mean)
only evaluates about a 1000 times a second on
a 486 dx 66, which is really slow(only reaching 5 ply in the middlegame).

How do i quickly and cheap read exchangetables, and how must they
look like?

What about situations that are not in the tables, what to do with it?

Evaluating pawnstructure:
That is no problem, but how to make it fast?

What if the strength of your pawnstructure depends on nonpawn-pieces
you have on the board?

Is it even possible to put this in tables?

Does someone has got ideas/plans?

Vincent Diepeveen.
vdie...@cs.ruu.nl
--
+--------------------------------------+
|| email : vdie...@cs.ruu.nl ||
|| fidonet: 2:280/206.23 ||
+======================================+

John Tromp

unread,

Oct 23, 1994, 1:24:46 PM10/23/94

to

In article <385grs$k...@tuegate.tue.nl>, bue...@asterix.urc.tue.nl (Marcel van Kervinck) writes:
> Another mindboggling routine I recently came up with efficiently
> generates all subsets of the set represented by v:
>
> void subsets(int v)
> {
> int u=0;
> do printf("%d\n",u); while(u=u-v&v);
> }
>
> Does anybody have simular tricks?

Well, the following code shows a very simple iterative solution to the
Towers of Hanoi problem, using bit fiddling:

for (x = 1; x < (1 << npegs); x++)
printf("move a disc from %d to %d\n", (x&x-1)%3, ((x|x-1)+1)%3);

regards,

%!PS % -John Tromp (tr...@math.uwaterloo.ca)
42 42 scale 7 9 translate .07 setlinewidth .5 setgray/c{arc clip fill
setgray}def 1 0 0 42 1 0 c 0 1 1{0 3 3 90 270 arc 0 0 6 0 -3 3 90 270
arcn 270 90 c -2 2 4{-6 moveto 0 12 rlineto}for -5 2 5{-3 exch moveto
9 0 rlineto}for stroke 0 0 3 1 1 0 c 180 rotate initclip}for showpage

DR. ROY SCHMIDT

unread,

Nov 2, 1994, 12:47:01 AM11/2/94

to

John Tromp (tr...@math.uwaterloo.ca) wrote:

: bue...@asterix.urc.tue.nl (Marcel van Kervinck) writes:
: > Another mindboggling routine I recently came up with efficiently
: > generates all subsets of the set represented by v:

[mindboggling set aside here]

: Well, the following code shows a very simple iterative solution to the

: Towers of Hanoi problem, using bit fiddling:

[bit fiddling meets an "or" mask here]

Chess!! I repeat, CHESS!! There really are groups available for this sort
======= ^^^^^^^
of discussion, but r.g.c. ain't the place....

Roy

--
Roy Schmidt sch...@uxmail.ust.hk
Information & Systems Management Dept, School of Business and Management
The University of Science and Technology
Clearwater Bay, Sai Kung, HONG KONG