bitboard move generation

Robert Hyatt

unread,

Oct 25, 1994, 10:32:50 AM10/25/94

to

After re-reading my own post last nite, I noticed that I gave a less than
optimal algorithm for the "trailz()" function. Also, due to many questions,
I thought that I would give a "complete" description of updating the attack
bit-boards.

Why is leadz()/trailz() needed? remember that to compute the attack bit-board
for a bishop, there are 4 diagonals that radiate away from the bishop. For
starters, lets number the bits from left to right to match with a chess
board as follows:

bitboard: 56 57 58 59 60 61 62 63
48 49 50 51 52 53 54 55
40 41 42 43 44 45 46 47
32 33 34 35 36 37 38 39
24 25 26 27 28 29 30 31
16 17 18 19 20 21 22 23
8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7

chessboard: a8 .. .. .. .. .. .. h8
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
a1 .. .. .. .. .. .. h1

note: you can orient the chess board and bit board anyway you want, I like
the idea of square 00 matching with a1 for ease of remembering...

now, lets put a bishop on the chess board on square d4 (27). If the board
was set up so that there are no "blockers" (pieces or pawns of either side)
on the diagonals radiating from this square, the attack vector would look
like this:

bitboard: 0 0 0 0 0 0 0 1
1 0 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0
1 0 0 0 0 0 1 0

A couple of notes first. Notice that the bit for the square where the
bishop is located is "0" since the bishop doesn't attack that square.
Note that the 4 diagonals away from the bishop's square are all 1's
while everything else is zero. This is the "default" bishop attack
board -- there are 64 of 'em initialized before the game starts since
they never change.

That was easy. Now lets add three "blockers, one on square 54 (g7) and
the other two on the same diagonal at squares 13 (f2) and 20 (e3). Before
we adjust the bitboard, notice that the diagonal radiating up to the right
should now have zeroes at bit 63 since the bishop can't attack beyond the
"blocker" at 54. Also, the diagonal radiating down to the right should
have zeroes at squares 13 (f2) and 6 (g1) since the bishop doesn't attack
past the blocker at 20 (e3).

Now to fix this attack board without doing any looping:

assume 4 sets of masks: mask_plus7dir[0..63] mask_plus9_dir[0-63]
mask_minus7dir[0..63] mask_minus9dir[0..63]. These masks are also
initialized as the start of execution since they never change. The
mask_plus9dir[0..63] is a mask that has ones only on the squares that
radiate up to the right from [square] (add 9 to a square number and you get
the next square in that direction.)

to fix the "plus9dir" diagonal (up to the right) we take the
mask_plus9_dir[27] which gives us the following:

bitboard: 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

the occupied_squares bit board looks like this: (recall that we have a
bishop on d4 (27) and blockers on g7 (54), e3 (20) and f2 (13)

occupied_squares
bitboard: 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0

next, complement (bit-wise "not" operation) the occupied squares bit
board, then "and" that with the plus9dir diagonal above and you get the
following:

blocking_squares
bitboard: 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Notice that the diagonal now has a zero where it is blocked. Now comes
the fun part. set every other bit in the bit-board to "1" except for the
bits on this diagonal (leave 'em alone) by or'ing this bit-board with the
complement of the mask_plus9dir[27] bitboard. that gives us the following:

blocking_squares
bitboard: 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

Complementing this gives us:

blocking_squares
bitboard: 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

And, lo' and behold, we have a "1" on the square(s) that block this
diagonal, and zeroes everywhere else. using the leadz() function (which
I'll describe below) returns a 54 (which, of course, we already knew was
the last square a bishop can attack in this direction due to the blocker
on this square...) We use this, with one final pre-computed set of
masks clear_plus9dir[0..63][0..63] In this case, we use clear_plus9dir[27][54]
which, as you might guess, is a mask containing ones on all squares between 27
and 54 (only on the diagonal -- plus9dir of course). this mask looks like this:

mask_plus9dir[27]54]:
bitboard: 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

This now gives us the up-right diagonal of *real* bishop attacks. repeating
this for the other three and combining them gives us the complete set of *real*
bishop attacks.

now, on to the leadz() and trailz() routines. note that we want to find the
bit on the diagonal that is "closest" to the bishop since it is the *real*
blocker. For the up-right diagonal, leadz() will do this since it starts at
the left-most bit (square 00) and counts until it finds the first 1 bit, which
*must* be on this diagonal since we masked the rest off. Note that it will also
be the "first" blocker in that direction. However, when we do the right-down
diagonal, using a similar technique, leadz() would find the last blocker since
it would first find the blocker at (13). Here, we use trailz() which finds the
correct blocker at (20).

leadz() uses the hardware leading_zero instruction on the Cray, although
several of the newer micros have it as well. My implementation of trailz()
uses leadz() combined with a binary subtraction trick:

take the following (real short) binary number:

0000 1101 0000

where we want to find the bit position of the last one instead of the
first one (which leadz() would find as "4" since bits 0,1,2 and 3 are
zero.

if you subtract "1" from the above number, you will get the following:

0000 1100 1111

The reason lies in binary subtraction and "borrowing". In short, subtracting
1 from any binary number will turn the right-most "1" bit into a zero (due to
borrowing from it) and will turn all bits to the right of this bit into "1".

If you exclusive or these two values together:

0000 1101 0000
0000 1100 1111

you get:
0000 0001 1111

applying leadz() to this returns "7" which, as you notice is the bit
position of the last "1". In "C-macro-ese (C macro)" you get the
following on a Cray:

#define first_one(a) _leadz(a)
#define last_one(a) _leadz((a)^(a-1))

Note that I use functions "first_one()" to return the bit position of the
first "1" bit, and last_one() to return the bit position of the last "1".
Using a name like "trailz()" is confusing since it would not return the
number of trailing zeroes at all.

Hope this (long) description of dynamically computing a pieces attack
bitboard helps to clear things up. Note that after making a move, you
do the following:

1. clear attack[from] since the square is now empty.

2. do the just-described algorithm for the piece on [to] and set
attack[to] to the result.

3. from the [from] square, grab the "blockers" for all 8 directions. For
diagonals, if the blocker is a bishop/queen, it now attacks more squares since
the piece on square [from] was blocking it. update the correct diagonal only,
since only one diagonal of the piece is affected. For rooks/queens, do the
same but on ranks/files only.

4. for the [to] square, do the same again, but note that we are now reducing
the attacks for the blockers since we just blocked this square. Of course, if
we are capturing on [to] we omit this step since the square is still blocked
and nothing has changed.

Sounds like a lot of work. The major thing is that there are no loops at all,
and the work represents a lot less than that needed to recompute all of this
from scratch. Most of the attack vectors don't change from move to move, so
the overall cost is, at worst, at least as fast as the old way of generating
moves and, with some work is significantly faster. Of course, you gain in
a lot of other places as well, making the payoff get larger and larger as
you find more ways to use this attack information.

Vincent Diepeveen

unread,

Oct 26, 1994, 7:14:25 AM10/26/94

to

In <38j4ui$9...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>After re-reading my own post last nite, I noticed that I gave a less than
>optimal algorithm for the "trailz()" function. Also, due to many questions,
>I thought that I would give a "complete" description of updating the attack
>bit-boards.

>

> bitboard: 0 0 0 0 0 0 0 1
> 1 0 0 0 0 0 1 0
> 0 1 0 0 0 1 0 0
> 0 0 1 0 1 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 1 0 1 0 0 0
> 0 1 0 0 0 1 0 0
> 1 0 0 0 0 0 1 0

>That was easy. Now lets add three "blockers, one on square 54 (g7) and

>the other two on the same diagonal at squares 13 (f2) and 20 (e3). Before
>we adjust the bitboard, notice that the diagonal radiating up to the right
>should now have zeroes at bit 63 since the bishop can't attack beyond the
>"blocker" at 54. Also, the diagonal radiating down to the right should
>have zeroes at squares 13 (f2) and 6 (g1) since the bishop doesn't attack
>past the blocker at 20 (e3).

>Now to fix this attack board without doing any looping:

> occupied_squares

> bitboard: 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 1 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 1 0 0 0 0
> 0 0 0 0 1 0 0 0
> 0 0 0 0 0 1 0 0
> 0 0 0 0 0 0 0 0

>next, complement (bit-wise "not" operation) the occupied squares bit
>board, then "and" that with the plus9dir diagonal above and you get the
>following:

> blocking_squares

> bitboard: 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 0 1
> 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

I don't see the use of the array above;
is a simple AND of the occupied_squares with the
bishopbitboard not enough to get the bitboard below?

> bitboard: 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 1 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0

Very fast, no doubt about that!
Of course, you need a 64 bit machine...

Suppose a their is a white bishop on d4 and a white queen on f6,
threatening mate on g7 and h8.

In my attacktable i want not only to see this bishop attacking e5
as you algorithm does,
but i also want to see it attack g7 (and h8), so that the programm
knows (in quiescencesearch) that it is possible to move the queen
to g7 and h8 (because this field is attacked twice and only covered 1 time
by the black king).

How do you do that?

>More if you have questions.

Is my question well described (knowing my English is very poor)?

Vincent.

Vincent Diepeveen
vdie...@cs.ruu.nl
--
+--------------------------------------+
|| email : vdie...@cs.ruu.nl ||
|| fidonet: 2:280/206.23 ||
+======================================+

David Hanley

unread,

Oct 26, 1994, 1:38:02 PM10/26/94

to

Kudos to My Hyatt for sharing this information! I know many
researchers jealously guard their innovations; it's really great to
see my Hyatt go out of his way to explain something that many of us
might onday use against his program. :) It's good to see the spirit
of pure science shining through!

------------------------------------------------------------------------------
| David James Hanley -- dha...@lac.eecs.uic.edu -- C++, OOD, martial arts |
| Laboratory for advanced computing | My employer barely KNOWS me. |
------------------------------------------------------------------------------
To the prettiest one.

Pascal BOURQUIN

unread,

Oct 28, 1994, 5:18:21 AM10/28/94

to

Robert Hyatt (hy...@willis.cis.uab.edu) wrote:

: #define first_one(a) _leadz(a)
: #define last_one(a) _leadz((a)^(a-1))

Is the following not faster ? :
#define last_one(a) _leadz((a)&(-a))
^^^^^
Thank you very much, Robert, for showing us your techniques.

Another question to the only Cray chess programmer :
Why are you so afraid of loops ?
Is it because Cray microprocessor(s ?) doesn't have good
"branch-prediction" algorithms ?

Robert Hyatt

unread,

Oct 28, 1994, 3:04:23 PM10/28/94

to

In article <38qfkt$o...@vishnu.jussieu.fr>,

Pascal BOURQUIN <p...@cao-vlsi.ibp.fr> wrote:
>Robert Hyatt (hy...@willis.cis.uab.edu) wrote:
>
>: #define first_one(a) _leadz(a)
>: #define last_one(a) _leadz((a)^(a-1))
>
>Is the following not faster ? :
> #define last_one(a) _leadz((a)&(-a))
> ^^^^^

I'll have to think about this. my first thought is that I don't see how
it can work: a=1 (0001 on a 4-bit hypothetical machine)
-a=-1 (1111 on the same machine)

If you and 'em you get the original a once more.

a=3 (0011)
-a=-3 (1101)
and and you get 1, leadz on that gets the right answer.

In any case, computing a-1 and -a take the same number of
operations on a Cray, so it does't matter. Did I overlook
something silly? (I'm having one hell of a time anding and
oring in my head now after all this bitboard stuff.) Last nite
the program played a wonderful move "a6" followed by the a-pawn
capturing on the "h" file... yeah, I couldn't shift, and and count
so good and fouled it up. easy to fix, hard to find...

>Thank you very much, Robert, for showing us your techniques.
>
>Another question to the only Cray chess programmer :
> Why are you so afraid of loops ?
> Is it because Cray microprocessor(s ?) doesn't have good
> "branch-prediction" algorithms ?

Not afraid, I just want to do things without them if possible. IE, a
couple of ands/ors rather than plodding down a diagonal. I am going to
go back and insert loops into the bitboard stuff so that each diagonal
update is one iteration of a loop... that will then get it to vectorize.

I have avoided all "searching" loops and, instead, use the famous leadz()
instruction at present. My intent is to accomplish with an and or two,
what used to take me a loop to do. When this happens (and it's happened
often) its a big win.

Current Cray Blitz (on a Sparc) is roughly 4-5 times faster than old
code, with *lots* of unnecessary nodes being searched at present. will
get better....

>
>###-------------------------------------------------------------------###
>| |
>| +---+ Pascal BOURQUIN |
>| / \ / \ |
>| / / \ Laboratoire MASI |
>| +---+ \ \ equipe CAO-VLSI |
>| \ \ +---+ |
>| \ / / |
>| \ / \ / Couloir 55-65, 2eme etage, Bureau 211 |
>| +---+ Universite Pierre et Marie Curie (P6) |
>| 4 place Jussieu, 75252 Paris Cedex 05 |
>| |
>| Tel : + 33 1 44 27 30 43 Fax : + 33 1 44 27 62 86 |
>| Telex: UPMCSIX200145F e-mail: Pascal....@masi.ibp.fr |
>| |
>###-------------------------------------------------------------------###