# Bit Board Bonkers??

438 views

### Dave

Jul 28, 1997, 3:00:00 AM7/28/97
to

I'm re-writing a chess program and would like to add bit boards for
move generation, etc.

Does this sound right:
12 boards for the six piece types x two colors. (for current
placement)
64 boards for (raw) (move to) from each possible square on the
board, for each piece. 64 x 12 = 768 total.
---
780 bit boards, absolute minimum?

I *like* swimming, but doing it in bit boards will be something new!
Does this sound right, or have I gone bit board bonkers? (and what a
nice alliteration that is!)

Thanks a lot, everyone!
Dave

P.S. Dave Ewart, if you read this, please email me re: 64-bit bit
boards on data types less than 64 bits. Yes, aspirin, please!

### Robert Hyatt

Jul 28, 1997, 3:00:00 AM7/28/97
to

Dave (co...@sprintmail.com) wrote:
: I'm re-writing a chess program and would like to add bit boards for
: move generation, etc.

: Does this sound right:
: 12 boards for the six piece types x two colors. (for current
: placement)

yes, plus WhitePieces (all white pieces, to avoid having to OR the
6 white bitmaps together all the time), BlackPieces (same idea for
black), BishopsQueens and RooksQueens (locations of *all* bishops/queens
[diagonal movers] and rooks/queens [rank/file movers]) which you will need
once you start making this go fast.

And, finally, a Occupied bitmap which is the OR of the 12 piece bitmaps.
Then, to make this go fast enough to be useful, you will want to use rotated
bitmaps to make move generation go very fast, and you need three more, one
rotating the Occupied bitmap 90 degrees to get file into adjacent bits, and
then one rotating right 45 degrees and left 45 degrees to get each diagonal

: 64 boards for (raw) (move to) from each possible square on the

: board, for each piece. 64 x 12 = 768 total.

Not quite sure what this is, but it looks like a bitmap with 1's set for
each type of piece for every valid move from that square. If you go the
rotated bitmap approach, these aren't all you need...
: ---

: 780 bit boards, absolute minimum?

: I *like* swimming, but doing it in bit boards will be something new!
: Does this sound right, or have I gone bit board bonkers? (and what a
: nice alliteration that is!)

it's interesting, and will take some time to get used to. I don't ever
plan on "going back" however. :)

: Thanks a lot, everyone!

### John Scalo

Jul 28, 1997, 3:00:00 AM7/28/97
to

I have considered taking this full out bit board approach (I already use
them in selected situations like tracking passed pawns), but I still can't
quite figure out - after you maintain all the piece locations in bit
boards, how does move generation get any faster?

Right now, I simply maintain the square locations of all pieces in a
sctructure called pieceLocations that's passed to just about every where.
FindLegalMoves() starts at bishops, checks diagonally up/right, up/left,
etc and continues to add those moves until it hits a piece or exceeds the
board borders. Of course, there's a little more to it but that's about it.
Then it continues with the other pieces and their associated move rules.

Now what about bit boards makes move generation faster? How much faster?

Thanks!

-j

In article <5ri7gq\$g...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
(Robert Hyatt) wrote:

> Dave (co...@sprintmail.com) wrote:
> : I'm re-writing a chess program and would like to add bit boards for
> : move generation, etc.
>
> : Does this sound right:
> : 12 boards for the six piece types x two colors. (for current
> : placement)
>
> yes, plus WhitePieces (all white pieces, to avoid having to OR the
> 6 white bitmaps together all the time), BlackPieces (same idea for
> black), BishopsQueens and RooksQueens (locations of *all* bishops/queens
> [diagonal movers] and rooks/queens [rank/file movers]) which you will need
> once you start making this go fast.
>
> And, finally, a Occupied bitmap which is the OR of the 12 piece bitmaps.
> Then, to make this go fast enough to be useful, you will want to use rotated
> bitmaps to make move generation go very fast, and you need three more, one
> rotating the Occupied bitmap 90 degrees to get file into adjacent bits, and
> then one rotating right 45 degrees and left 45 degrees to get each diagonal
>

--
john scalo
sca...@apple.com.removethis

### Dave

Jul 28, 1997, 3:00:00 AM7/28/97
to

On 28 Jul 1997 23:12:47 GMT, sca...@apple.com.removethis (John Scalo)
wrote:

>Now what about bit boards makes move generation faster? How much faster?

John,

Here's the scoop (as far as I know) about bit boards improving your
chess program. The improvements are:

1) The bit board approach allows the computer to use logical
operations to perform move generation. Because of the way CPU's are
made, these operations can run much faster than explicit code which
you would write. How much I don't know, I'm just getting started
implementing this approach myself.<g>

2) Fewer operations in total to generate the move. To quote from page
58 of "Chess Skill in Man and Machine":

"Consider generating all legal moves for a White Knight on KB3. First,
the CPU "fetches" a bit map from memory that has a bit set for each
position representing the squares to which a knight on KB3 could move.
In this schema, central memory would contain 64 bit maps representing
potential moves for a knight from each of the 64 squares on the board.
Secondly, the CPU would fetch a bit map representing the positions of
all White pieces presently on the board. The CPU would then negate
(change 1's to 0's and 0's to 1's) the White piece map and then "AND"
this new map (board=map), with the knight move map. The resulting bit
map would represent all pseudo legal moves for the knight. Notice
that move generation in this case involved two "fetch" instructions
and two "Boolean" (logical) instructions. The Shannon "mailbox"
procedure requires many more computer operations to determine the same
result."

Bit boards or bit maps were an improvement found by three top
programming teams, independently: (CHESS,(Northwestern Univ.), KAISSA,
(Russia), and Hans Berliner (Carnegie-Melon Univ.). All three at one
time, won the world chess computer championship. Enough said?

I'd be totally amazed if there was a commercial program, or even a
seriously competitive non-commercial program, which did *not* use bit
boards or maps. Besides basic move generation, bit boards can
represent more complex piece relationships, also.

In the only two comparisons I've seen, bit maps used one-half to
one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
the old "mailbox" routines, to generate the same set of moves.

>I have considered taking this full out bit board approach (I already use
>them in selected situations like tracking passed pawns), but I still can't
>quite figure out - after you maintain all the piece locations in bit
>boards, how does move generation get any faster?
>
>Right now, I simply maintain the square locations of all pieces in a
>sctructure called pieceLocations that's passed to just about every where.
>FindLegalMoves() starts at bishops, checks diagonally up/right, up/left,
>etc and continues to add those moves until it hits a piece or exceeds the
>board borders. Of course, there's a little more to it but that's about it.
>Then it continues with the other pieces and their associated move rules.
>

You're still adding and subtracting offsets into the bishop's square
right? Bit maps have to be *much* faster!

Good luck, John! Love to see your finished product (not the code, just
the exe file.) Hope this helps.

Regards,
Dave

### Robert Hyatt

Jul 29, 1997, 3:00:00 AM7/29/97
to

John Scalo (sca...@apple.com.removethis) wrote:
: I have considered taking this full out bit board approach (I already use

: them in selected situations like tracking passed pawns), but I still can't
: quite figure out - after you maintain all the piece locations in bit
: boards, how does move generation get any faster?

: Right now, I simply maintain the square locations of all pieces in a
: sctructure called pieceLocations that's passed to just about every where.
: FindLegalMoves() starts at bishops, checks diagonally up/right, up/left,
: etc and continues to add those moves until it hits a piece or exceeds the
: board borders. Of course, there's a little more to it but that's about it.
: Then it continues with the other pieces and their associated move rules.

: Now what about bit boards makes move generation faster? How much faster?

Here's a clue:

think about generating rook moves along a rank only. If you know the square
of the rook, then there are only 128 different combinations of the remaining
pieces on the 7 other squares of that rank. If you take the entire 8 bits
for that rank, and index into a pre-computed set of attack entries (one
entry for each possible combination of occupied/unoccupied squares) your
move generation for this rank (both directions at once) becomes a simple
table lookup. Now if you could create a bitboard with squares on the
files in adjacent bits, you could take the 8 bits for a file, and do the
same sort of lookup, OR the two values together, and you've just generated
rook moves with *zero* loops of any kind. Do the same "rotation" trick for
the diagonals and these, too, become table lookups and nothing else...

: Thanks!

: -j

: In article <5ri7gq\$g...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu

### Robert Hyatt

Jul 29, 1997, 3:00:00 AM7/29/97
to

Dave (co...@sprintmail.com) wrote:

: I'd be totally amazed if there was a commercial program, or even a

: seriously competitive non-commercial program, which did *not* use bit
: boards or maps. Besides basic move generation, bit boards can
: represent more complex piece relationships, also.

It turns out that there are only a very few "full-blown" bitmap
programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru (Joel Rivat)
and a couple of others that are experimenting. On 32bit architectures, it is
not always a "win". On 64bit machines it gets better. When we get the needed
boolean operators like population count and find first one (already on the
Cray, rumored to be coming on the alpha before long) it will be a win,
finally...

: In the only two comparisons I've seen, bit maps used one-half to

: one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
: the old "mailbox" routines, to generate the same set of moves.

yes, but the problem is that there are two words/cycle to handle. New
super-scalar machines handle this very well, but it still doubles the number
of instructions needed, when a 32 bit architecture is used... We should start
seeing 64 bit PC's in 1999, when the P7 arrives. you can already buy an
alpha PC for under \$5,000...

### Dusan Dobes

Jul 29, 1997, 3:00:00 AM7/29/97
to

Dave (co...@sprintmail.com) wrote:

: 2) Fewer operations in total to generate the move. To quote from page

: 58 of "Chess Skill in Man and Machine":

: "Consider generating all legal moves for a White Knight on KB3. First,
: the CPU "fetches" a bit map from memory that has a bit set for each
: position representing the squares to which a knight on KB3 could move.
: In this schema, central memory would contain 64 bit maps representing
: potential moves for a knight from each of the 64 squares on the board.
: Secondly, the CPU would fetch a bit map representing the positions of
: all White pieces presently on the board. The CPU would then negate
: (change 1's to 0's and 0's to 1's) the White piece map and then "AND"
: this new map (board=map), with the knight move map. The resulting bit
: map would represent all pseudo legal moves for the knight. Notice
: that move generation in this case involved two "fetch" instructions
: and two "Boolean" (logical) instructions. The Shannon "mailbox"
: procedure requires many more computer operations to determine the same
: result."

I dont understand. I think move generation should generate data
for making/unmaking moves. How to make a single move from this?

: I'd be totally amazed if there was a commercial program, or even a

: seriously competitive non-commercial program, which did *not* use bit
: boards or maps.

Dusan

### David Eppstein

Jul 29, 1997, 3:00:00 AM7/29/97
to

hy...@crafty.cis.uab.edu (Robert Hyatt) writes:
> It turns out that there are only a very few "full-blown" bitmap
> programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru
> (Joel Rivat) and a couple of others that are experimenting.

My Fanorona program (http://www.ics.uci.edu/~eppstein/180a/projects/fanorona/)
is also completely bitboard based. I think bitboards are even more of a
win for games other than chess since you don't have to worry so much about
all the different kinds of pieces.

My program is in Java, which slows it way down compared to C/C++.
Also, it's written in a heavily object oriented style
(each node searched is two new objects, one for the node itself and one
for a move generator), which again makes it unnecessarily slow.
And yet, it still hits peaks of 32K nodes/second on my 200MHz PowerPC 603e.
I think bitboards should take the credit for that speed.

Java does have the advantage of standard built-in 64-bit long arithmetic.
Maybe it also has the advantage even for C/C++ programmers of providing
more pressure on processor designers to add 64-bit features.

> When we get the needed boolean operators like population count and
> find first one (already on the Cray, rumored to be coming on the alpha
> before long) it will be a win, finally...

I don't ever need find first one. I do use the standard x&~(x-1) trick
to find the last one...(that is, to find a bit representing the last one,
I don't ever need to know the number of that bit).

A one-instruction popcount would help, but it's not too hard to do without:

// count number of set bits in a word
static final long ONES = 0x5555555555555555L;
static final long TWOS = 0x3333333333333333L;
static final int FOURS = 0x0f0f0f0f;
public static final int count(long set) {
set = (set & ONES) + ((set >>> 1) & ONES);
set = (set & TWOS) + ((set >>> 2) & TWOS);
int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
result = (result & FOURS) + ((result >>> 4) & FOURS);
return (result * 0x01010101) >>> 24;
}

(Further ideas for speeding up this subroutine would be gratefully accepted...)
--
David Eppstein UC Irvine Dept. of Information & Computer Science
epps...@ics.uci.edu http://www.ics.uci.edu/~eppstein/

### chrisw

Jul 29, 1997, 3:00:00 AM7/29/97
to

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article
<5rkok8\$h...@juniper.cis.uab.edu>...
> Dave (co...@sprintmail.com) wrote:
>
> : I'd be totally amazed if there was a commercial program, or even a

> : seriously competitive non-commercial program, which did *not* use bit
> : boards or maps. Besides basic move generation, bit boards can
> : represent more complex piece relationships, also.
>

> It turns out that there are only a very few "full-blown" bitmap
> programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru (Joel
Rivat)

> and a couple of others that are experimenting. On 32bit architectures,
it is

> not always a "win". On 64bit machines it gets better. When we get the

needed
> boolean operators like population count and find first one (already on
the
> Cray, rumored to be coming on the alpha before long) it will be a win,
> finally...
>

> : In the only two comparisons I've seen, bit maps used one-half to

> : one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
> : the old "mailbox" routines, to generate the same set of moves.
>

> yes, but the problem is that there are two words/cycle to handle. New
> super-scalar machines handle this very well, but it still doubles the
number
> of instructions needed, when a 32 bit architecture is used... We should
start
> seeing 64 bit PC's in 1999, when the P7 arrives.

And the PC chip already has find first one, if you count bsf and bsr, or if
that is what you meant.

What concerns me about bit boards, is suppose you're trying to do a static
exchange eval on a square.
Then you know from the all_attacks bit mask that a square is attacked, but
to get each attacker and value it requires 16 test operations ..........
? And another 16 for the defenders ...... And each one with a memory read
operation ... ?

Chris Whittington

### chrisw

Jul 29, 1997, 3:00:00 AM7/29/97
to

David Eppstein <epps...@euclid.ics.uci.edu> wrote in article
<5rle07\$h...@euclid.ics.uci.edu>...

> hy...@crafty.cis.uab.edu (Robert Hyatt) writes:
> > It turns out that there are only a very few "full-blown" bitmap
> > programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru
> > (Joel Rivat) and a couple of others that are experimenting.
>

> My Fanorona program
(http://www.ics.uci.edu/~eppstein/180a/projects/fanorona/)
> is also completely bitboard based. I think bitboards are even more of a
> win for games other than chess since you don't have to worry so much
> all the different kinds of pieces.
>
> My program is in Java, which slows it way down compared to C/C++.
> Also, it's written in a heavily object oriented style
> (each node searched is two new objects, one for the node itself and one
> for a move generator), which again makes it unnecessarily slow.
> And yet, it still hits peaks of 32K nodes/second on my 200MHz PowerPC
603e.
> I think bitboards should take the credit for that speed.
>
> Java does have the advantage of standard built-in 64-bit long arithmetic.
> Maybe it also has the advantage even for C/C++ programmers of providing
> more pressure on processor designers to add 64-bit features.
>

> > When we get the needed boolean operators like population count and
> > find first one (already on the Cray, rumored to be coming on the alpha
> > before long) it will be a win, finally...
>

> I don't ever need find first one. I do use the standard x&~(x-1) trick
> to find the last one...(that is, to find a bit representing the last one,
> I don't ever need to know the number of that bit).
>
> A one-instruction popcount would help, but it's not too hard to do
without:
>
> // count number of set bits in a word
> static final long ONES = 0x5555555555555555L;
> static final long TWOS = 0x3333333333333333L;
> static final int FOURS = 0x0f0f0f0f;
> public static final int count(long set) {
> set = (set & ONES) + ((set >>> 1) & ONES);
> set = (set & TWOS) + ((set >>> 2) & TWOS);
> int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
> result = (result & FOURS) + ((result >>> 4) & FOURS);
> return (result * 0x01010101) >>> 24;
> }
>
> (Further ideas for speeding up this subroutine would be gratefully
accepted...)

Try:

mov edx,-1 ; pre-init the bit counter
loop:
inc edx ; inc the bit counter
mov eax,ecx ; ecx is the bit mask
neg eax
and eax,ecx ; eax is now the ls bit
xor ecx,eax ; ecx is the remainder
jnz loop ; loop until there's no remainder
;; exx is now the bit count

Convert to C 64 bit, et voila

{
count=-1;
do
{
count++;
return count;
}

Chris Whittington

### Robert Hyatt

Jul 29, 1997, 3:00:00 AM7/29/97
to

Dusan Dobes (do...@bart1.math.muni.cz) wrote:
: Dave (co...@sprintmail.com) wrote:

: : 2) Fewer operations in total to generate the move. To quote from page

: : 58 of "Chess Skill in Man and Machine":

: : "Consider generating all legal moves for a White Knight on KB3. First,
: : the CPU "fetches" a bit map from memory that has a bit set for each
: : position representing the squares to which a knight on KB3 could move.
: : In this schema, central memory would contain 64 bit maps representing
: : potential moves for a knight from each of the 64 squares on the board.
: : Secondly, the CPU would fetch a bit map representing the positions of
: : all White pieces presently on the board. The CPU would then negate
: : (change 1's to 0's and 0's to 1's) the White piece map and then "AND"
: : this new map (board=map), with the knight move map. The resulting bit
: : map would represent all pseudo legal moves for the knight. Notice
: : that move generation in this case involved two "fetch" instructions
: : and two "Boolean" (logical) instructions. The Shannon "mailbox"
: : procedure requires many more computer operations to determine the same
: : result."

: I dont understand. I think move generation should generate data

: for making/unmaking moves. How to make a single move from this?

the bitmap produced above is a set of "to" squares. You know the from
square. You use the FirstOne() function to get the number of a "to"
square and you have a complete from/to move... you clear this bit and
then FirstOne() returns the next "to" square...

: : I'd be totally amazed if there was a commercial program, or even a

: : seriously competitive non-commercial program, which did *not* use bit
: : boards or maps.

No. It uses a move table. Not a bitmapper at all.

: Dusan

### David Eppstein

Jul 29, 1997, 3:00:00 AM7/29/97
to

I wrote in rgcc:

> // count number of set bits in a word
> static final long ONES = 0x5555555555555555L;
> static final long TWOS = 0x3333333333333333L;
> static final int FOURS = 0x0f0f0f0f;
> public static final int count(long set) {
> set = (set & ONES) + ((set >>> 1) & ONES);
> set = (set & TWOS) + ((set >>> 2) & TWOS);
> int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
> result = (result & FOURS) + ((result >>> 4) & FOURS);
> return (result * 0x01010101) >>> 24;
> }

and asked for further improvements, to which Robert Hyatt and Chris
Whittington both suggested replacing it with a simple loop removing one
bit at a time -- e.g. Hyatt's code

int PopCnt(register BITBOARD a)
{
register int c=0;

while(a) {
c++;
a &= a - 1;
}
return(c);
}

Being too stubborn to accept the word of even these eminent authorities,
I decided to try putting it to a test. Over the course of a typical
game, I am getting an average of four nonzero bits per call to count(),
of which maybe 20% are zeros and the average of the rest is five
bits/call. Trying inputs with these statistics in the version of Java
I'm using (Apple MRJ 1.5B1, a reasonably fast JIT, on a 200MHz PowerPC
603e) gives the following rough results (averaged over 100000 calls to
each subroutine):

0.7 microseconds/call for my code
(with or without replacing the final multiply by shifts
and adds, and with or without an initial test for zero).
1.1 microseconds/call for Hyatt's code
(translated to Java with BITBOARD=64 bit long)
1.0 microseconds for code similar to Hyatt's,
but with two separate loops for the two separate
32-bit halves of the word.

I didn't try Hyatt's other suggestion, of pulling a byte at a time into
a lookup table, because the number of bits/call seemed too small to make
this worthwhile.

Conclusion: I'll stick with my version until someone else comes up with
something better (or until my Java environment changes enough to make
the times above significantly different).

### Robert Hyatt

Jul 30, 1997, 3:00:00 AM7/30/97
to

chrisw (chr...@cpsoft.demon.co.uk) wrote:

: Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article
: <5rkok8\$h...@juniper.cis.uab.edu>...
: > Dave (co...@sprintmail.com) wrote:
: >

: > : I'd be totally amazed if there was a commercial program, or even a

: > : seriously competitive non-commercial program, which did *not* use bit
: > : boards or maps. Besides basic move generation, bit boards can
: > : represent more complex piece relationships, also.

: >
: > It turns out that there are only a very few "full-blown" bitmap

: > programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru (Joel
: Rivat)

: > and a couple of others that are experimenting. On 32bit architectures,
: it is
: > not always a "win". On 64bit machines it gets better. When we get the

: needed
: > boolean operators like population count and find first one (already on
: the
: > Cray, rumored to be coming on the alpha before long) it will be a win,
: > finally...

: >
: > : In the only two comparisons I've seen, bit maps used one-half to

: > : one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
: > : the old "mailbox" routines, to generate the same set of moves.

: >
: > yes, but the problem is that there are two words/cycle to handle. New

: > super-scalar machines handle this very well, but it still doubles the
: number
: > of instructions needed, when a 32 bit architecture is used... We should
: start
: > seeing 64 bit PC's in 1999, when the P7 arrives.

: And the PC chip already has find first one, if you count bsf and bsr, or if
: that is what you meant.

: What concerns me about bit boards, is suppose you're trying to do a static
: exchange eval on a square.
: Then you know from the all_attacks bit mask that a square is attacked, but
: to get each attacker and value it requires 16 test operations ..........
: ? And another 16 for the defenders ...... And each one with a memory read
: operation ... ?

It's not that many... only 6 per side. Take the bitmap of all pieces
attacking square X, then to find the next white attacker, and this with
white pawns, if non-zero, assume a pawn capture and break; if not, and
with white knights, if non-zero, break. etc... for the 6 white piece
bitmaps. If you get thru the king bitmap, there are no more white
attackers and you can minimax what you have. Once you get a white
attacker, do the same for the black pieces. This all falls right into
cache, and each pass runs like heck. Remember that I use SEE for the
qsearch sorting, as well as for the capture moves in the regular search,
and the Swap() function is generally less than 10% of the total execution
time, even though I use it in places in the eval, as well as in some of
the extension tests...

: Chris Whittington

: >

### Robert Hyatt

Jul 30, 1997, 3:00:00 AM7/30/97
to

chrisw (chr...@cpsoft.demon.co.uk) wrote:

: David Eppstein <epps...@euclid.ics.uci.edu> wrote in article

: <5rle07\$h...@euclid.ics.uci.edu>...
: > hy...@crafty.cis.uab.edu (Robert Hyatt) writes:

: > > It turns out that there are only a very few "full-blown" bitmap
: > > programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru
: > > (Joel Rivat) and a couple of others that are experimenting.

: >
: > My Fanorona program

: (http://www.ics.uci.edu/~eppstein/180a/projects/fanorona/)
: > is also completely bitboard based. I think bitboards are even more of a
: > win for games other than chess since you don't have to worry so much
: > all the different kinds of pieces.
: >
: > My program is in Java, which slows it way down compared to C/C++.
: > Also, it's written in a heavily object oriented style
: > (each node searched is two new objects, one for the node itself and one
: > for a move generator), which again makes it unnecessarily slow.
: > And yet, it still hits peaks of 32K nodes/second on my 200MHz PowerPC
: 603e.
: > I think bitboards should take the credit for that speed.
: >
: > Java does have the advantage of standard built-in 64-bit long arithmetic.
: > Maybe it also has the advantage even for C/C++ programmers of providing
: > more pressure on processor designers to add 64-bit features.

: >
: > > When we get the needed boolean operators like population count and

: > > find first one (already on the Cray, rumored to be coming on the alpha
: > > before long) it will be a win, finally...
: >

: > I don't ever need find first one. I do use the standard x&~(x-1) trick

: > to find the last one...(that is, to find a bit representing the last one,
: > I don't ever need to know the number of that bit).
: >
: > A one-instruction popcount would help, but it's not too hard to do
: without:

: >
: > // count number of set bits in a word

: > static final long ONES = 0x5555555555555555L;
: > static final long TWOS = 0x3333333333333333L;
: > static final int FOURS = 0x0f0f0f0f;
: > public static final int count(long set) {
: > set = (set & ONES) + ((set >>> 1) & ONES);
: > set = (set & TWOS) + ((set >>> 2) & TWOS);
: > int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
: > result = (result & FOURS) + ((result >>> 4) & FOURS);
: > return (result * 0x01010101) >>> 24;

: > }
: >
: > (Further ideas for speeding up this subroutine would be gratefully
: accepted...)

: Try:

: mov edx,-1 ; pre-init the bit counter
: loop:
: inc edx ; inc the bit counter
: mov eax,ecx ; ecx is the bit mask
: neg eax
: and eax,ecx ; eax is now the ls bit
: xor ecx,eax ; ecx is the remainder
: jnz loop ; loop until there's no remainder
: ;; exx is now the bit count

: Convert to C 64 bit, et voila

: {
: count=-1;
: do
: {
: count++;
: return count;
: }

: Chris Whittington

: > --

: > David Eppstein UC Irvine Dept. of Information & Computer Science
: > epps...@ics.uci.edu http://www.ics.uci.edu/~eppstein/

: >

That works and I use the following in Crafty, which seems to be even
faster when you count the operations: The a&=a-1 seems to be a little
faster than a&=a^(a&-a)...

int PopCnt(register BITBOARD a)
{
register int c=0;

while(a) {
c++;
a &= a - 1;
}
return(c);
}

One caveat is that both are efficient for small numbers of 1-bits set,
while for larger numbers, a 4x table lookup would be faster. In Crafty,
I don't count bits very often. For mobility, I the same rotated business
I do for generating moves, but rather than retrieving a 64bit attacked
squares mask, I retrieve a pre-computed byte with the number of squares
attacked, which saves the PopCnt() when it would likely have to count a
large number of bits...

### brucemo

Jul 30, 1997, 3:00:00 AM7/30/97
to

Dave wrote:

> 1) The bit board approach allows the computer to use logical
> operations to perform move generation. Because of the way CPU's are
> made, these operations can run much faster than explicit code which
> you would write. How much I don't know, I'm just getting started
> implementing this approach myself.<g>

Bitboards are fine, as evidenced by Crafty. But they aren't a panacea.

> 2) Fewer operations in total to generate the move. To quote from page
> 58 of "Chess Skill in Man and Machine":
>
> "Consider generating all legal moves for a White Knight on KB3. First,
> the CPU "fetches" a bit map from memory that has a bit set for each
> position representing the squares to which a knight on KB3 could move.
> In this schema, central memory would contain 64 bit maps representing
> potential moves for a knight from each of the 64 squares on the board.
> Secondly, the CPU would fetch a bit map representing the positions of
> all White pieces presently on the board. The CPU would then negate
> (change 1's to 0's and 0's to 1's) the White piece map and then "AND"
> this new map (board=map), with the knight move map. The resulting bit
> map would represent all pseudo legal moves for the knight. Notice
> that move generation in this case involved two "fetch" instructions
> and two "Boolean" (logical) instructions. The Shannon "mailbox"
> procedure requires many more computer operations to determine the same
> result."

That book is extremely old. It is worth reading, but what worked best
then doesn't *necessarily* work best now.

Note also that the example involves a knight, which is the absolute best
case for this technique, it's not a sliding piece. If you discuss bishops
there is obviously a hidden cost because the bishop's vectors have to be
up to date or you will jump pieces and play 1. e4 c5 2. Bb5#

I like Crafty's future because a bitboard is 64-bits, and 64-bit will be
everywhere in a few years. Being able to operate on a bitboard with one
instruction has got to be an advantage.

> Bit boards or bit maps were an improvement found by three top
> programming teams, independently: (CHESS,(Northwestern Univ.), KAISSA,
> (Russia), and Hans Berliner (Carnegie-Melon Univ.). All three at one
> time, won the world chess computer championship. Enough said?

Fine programs all.

> I'd be totally amazed if there was a commercial program, or even a
> seriously competitive non-commercial program, which did *not* use bit
> boards or maps. Besides basic move generation, bit boards can
> represent more complex piece relationships, also.

Mine is a damned seriously competitive non-commercial program. It
finished well in the last two WMCCC's. It does have a data structure in
it which is an array of bits, and is used to store piece locations and
other info, but it is not used in any of the ways that Crafty uses it.

I doubt that "all" of the professional micros generate moves using
bitboards. I have no confirmation that *any* of them do this.

> In the only two comparisons I've seen, bit maps used one-half to
> one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
> the old "mailbox" routines, to generate the same set of moves.

You lose some, if not all, if not more than all, of the time back when you
are doing makemoves.

I'm not arguing against bitboards. I don't have any religious bias
against them, and I have great respect for Crafty in particular.

But there is more than one good way to do this.

bruce

### Randy Baker

Jul 30, 1997, 3:00:00 AM7/30/97
to

One tidbit in bitboards' favor is that they do make certain types of
knowledge easier to implement.

I recall a recent post (by Bob Hyatt I believe) in which he indicated that
bitboards could be used to determine whether the enemy King could catch a
speeding passed pawn in virtually a single instruction.

Of course, there are probably counterexamples of knowledge which is harder
to implement in bitboards... 8-)

--
Randy Baker (remove Z from address in email replies)

### brucemo

Jul 30, 1997, 3:00:00 AM7/30/97
to

chrisw wrote:

> David Eppstein <epps...@euclid.ics.uci.edu> wrote in article

> > // count number of set bits in a word

> > static final long ONES = 0x5555555555555555L;
> > static final long TWOS = 0x3333333333333333L;
> > static final int FOURS = 0x0f0f0f0f;
> > public static final int count(long set) {
> > set = (set & ONES) + ((set >>> 1) & ONES);
> > set = (set & TWOS) + ((set >>> 2) & TWOS);
> > int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
> > result = (result & FOURS) + ((result >>> 4) & FOURS);
> > return (result * 0x01010101) >>> 24;
> > }
> >
> > (Further ideas for speeding up this subroutine would be gratefully
> accepted...)

I don't know what "final" means, and I don't know what ">>>" is. Are these
any sort of standard C/C++? And do you really mean to use a multiply in your
return statement? Ouch.

Something else that might speed this up is to use constant values for your
ONES and TWOS and FOURS. Some compilers will put those into static memory
anyway, but you might as well let the compiler decide, rather than forcing it

There is a lot of code going on here, and even though it isn't iterating it
still might be slower than an iterative solution, on a specific architecture.

If you test this, please let us know how you tested it and what your result
was.

> mov edx,-1 ; pre-init the bit counter
> loop:
> inc edx ; inc the bit counter
> mov eax,ecx ; ecx is the bit mask
> neg eax
> and eax,ecx ; eax is now the ls bit
> xor ecx,eax ; ecx is the remainder
> jnz loop ; loop until there's no remainder
> ;; exx is now the bit count
>
> Convert to C 64 bit, et voila
>
> {
> count=-1;
> do
> {
> count++;
> return count;
> }

This seems like more code than other iterative solutions that I have seen.

{
int count;

for (count = 0; bit_mask; count++)
return count;
}

Doesn't this do the job more efficiently?

bruce

### brucemo

Jul 30, 1997, 3:00:00 AM7/30/97
to

Randy Baker wrote:
>
> One tidbit in bitboards' favor is that they do make certain types of
> knowledge easier to implement.

This is right. You can do a couple of operations and know that you have a
rook on the 7th, and whether you are in the favorable cases where the enemy
king is stuck on the 8th with no pawns on the 7th (absolute 7th), or the
other favorable case wherein there are enemy pawns on the 7th (pawn-eating
case). You could also check to see if you had two rooks on the 7th and
award an absolutely huge bonus for this.

When applied like this, I am not sure it is possible to make this bonus too
large, more often than not it shifts the result of the game one column in

> I recall a recent post (by Bob Hyatt I believe) in which he indicated that
> bitboards could be used to determine whether the enemy King could catch a
> speeding passed pawn in virtually a single instruction.

This isn't a case that he gets completely for free. If he has a passed pawn
bitmap he can determine if there are any pawns the king can't reach if it is
on a given square with the given side to move, but this passed pawn bitmap
has to come from somewhere, it either needs to be calculated (oof) or hashed
and retrieved (less oof). Also, it isn't completely accurate, so this could
almost be termed a speculative bonus.

bruce

### Robert Hyatt

Jul 31, 1997, 3:00:00 AM7/31/97
to

brucemo (bru...@seanet.com) wrote:

: bruce

What I do is compute these on start-up. The only mistake it makes is
blocked squares... IE the king is on a diagonal that goes to the queening
square "just in time". Since the diagonal is a "critical path" (there are
no alternative squares to reach the pawn in time) if any square is occupied,
or is attacked by the opponent, I get the wrong answer. I've seen it
happen a couple of times. Fortunately, it is not hard to correct, although
I have not done so yet...

I have about as many pre-computed masks as Carter has little pills now,
and the only drawback to them is memory bandwidth, once again. I have to
fetch these masks to do the and operations for the tests, and the fetch is
about 10x slower than the tests. And there are enough of them that many
don't hang around in cache too long either, except for the move generation
masks which are hit on pretty frequently...

### Dave

Jul 31, 1997, 3:00:00 AM7/31/97
to

On Wed, 30 Jul 1997 10:16:40 -0700, brucemo <bru...@seanet.com>
wrote:

>Dave wrote:
>
>>. . . .

>> I'd be totally amazed if there was a commercial program, or even a
>> seriously competitive non-commercial program, which did *not* use bit
>> boards or maps. Besides basic move generation, bit boards can
>> represent more complex piece relationships, also.
>

>Bruce replied (in part)...

>Mine is a damned seriously competitive non-commercial program. It
>finished well in the last two WMCCC's. It does have a data structure in
>it which is an array of bits, and is used to store piece locations and
>other info, but it is not used in any of the ways that Crafty uses it.
>

Congratulations, Bruce. I'm not familiar with your program, but it
sounds like a *very* strong program. What is the name of your program?

When you state that it "does have a data structure in it which is an
array of bits used to store piece locations and other info", that is
*precisely* what I call a bit board.

I said absolutely *nothing* about using them to the extent or in the
same way as Crafty. I have no concept of how Crafty uses bit boards.

>I doubt that "all" of the professional micros generate moves using
>bitboards. I have no confirmation that *any* of them do this.

If you weren't so scruffy, Bruce, they might *tell* you.<g>

>> In the only two comparisons I've seen, bit maps used one-half to
>> one-fifth (not one-fifth less, ONE-FIFTH!) the number of operations as
>> the old "mailbox" routines, to generate the same set of moves.
>

>You lose some, if not all, if not more than all, of the time back when you
>are doing makemoves.

Absolutely right. Updating the boards, etc., takes time.

>I'm not arguing against bitboards. I don't have any religious bias
>against them, and I have great respect for Crafty in particular.
>

That's good, since you use them in your program!

>But there is more than one good way to do this.

Always is, I just don't know of any strong chess program that doesn't
use bit boards to some extent. I have received email about a program
that uses a differential "table" to generate moves, but it only plays
on machines with Linux, so I have no idea what the real strength of
that program would be.

If anyone knows of such a program (that uses no bit boards), I would
certainly like to see it's move generation described in the NG in
enough detail to understand it's concept, and also post up it's
rating, if possible.

Regards,
Dave

### Dave Gomboc

Jul 31, 1997, 3:00:00 AM7/31/97
to

Bruce, the code is Java, not C++. Boot up the Visual J++ 1.1. :-)

In article <33DFA6...@seanet.com>, brucemo <bru...@seanet.com> wrote:
#chrisw wrote:
#
#> David Eppstein <epps...@euclid.ics.uci.edu> wrote in article
#
#> > // count number of set bits in a word
#> > static final long ONES = 0x5555555555555555L;
#> > static final long TWOS = 0x3333333333333333L;
#> > static final int FOURS = 0x0f0f0f0f;
#> > public static final int count(long set) {
#> > set = (set & ONES) + ((set >>> 1) & ONES);
#> > set = (set & TWOS) + ((set >>> 2) & TWOS);
#> > int result = (int) (set & 0xffffffff) + (int) (set >>> 32);
#> > result = (result & FOURS) + ((result >>> 4) & FOURS);
#> > return (result * 0x01010101) >>> 24;
#> > }
#> >
#> > (Further ideas for speeding up this subroutine would be gratefully
#> accepted...)
#
#I don't know what "final" means, and I don't know what ">>>" is. Are these
#any sort of standard C/C++? And do you really mean to use a multiply in your
#return statement? Ouch.
#
#Something else that might speed this up is to use constant values for your
#ONES and TWOS and FOURS. Some compilers will put those into static memory
#anyway, but you might as well let the compiler decide, rather than forcing it
#
#There is a lot of code going on here, and even though it isn't iterating it
#still might be slower than an iterative solution, on a specific architecture.
#
#If you test this, please let us know how you tested it and what your result
#was.
#
#> mov edx,-1 ; pre-init the bit counter
#> loop:
#> inc edx ; inc the bit counter
#> mov eax,ecx ; ecx is the bit mask
#> neg eax
#> and eax,ecx ; eax is now the ls bit
#> xor ecx,eax ; ecx is the remainder
#> jnz loop ; loop until there's no remainder
#> ;; exx is now the bit count
#>
#> Convert to C 64 bit, et voila
#>
#> {
#> count=-1;
#> do
#> {
#> count++;
#> return count;
#> }
#
#This seems like more code than other iterative solutions that I have seen.
#
#{
# int count;
#
# for (count = 0; bit_mask; count++)
# return count;
#}
#
#Doesn't this do the job more efficiently?
#
#bruce

### chrisw

Jul 31, 1997, 3:00:00 AM7/31/97
to

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article

<5rop95\$4...@juniper.cis.uab.edu>...

I implemented this test a few months ago. It is worth doing.

Chris Whittington

### Steven J. Edwards

Aug 1, 1997, 3:00:00 AM8/1/97
to

hy...@crafty.cis.uab.edu (Robert Hyatt) writes:

>Dave (co...@sprintmail.com) wrote:

>: I'd be totally amazed if there was a commercial program, or even a

>: seriously competitive non-commercial program, which did *not* use bit
>: boards or maps. Besides basic move generation, bit boards can
>: represent more complex piece relationships, also.

>It turns out that there are only a very few "full-blown" bitmap

>programs. chess 4.x, Crafty, Spector (Steven Edwards), ChessGuru (Joel Rivat)

>and a couple of others that are experimenting. On 32bit architectures, it is

>not always a "win". On 64bit machines it gets better. When we get the needed

>boolean operators like population count and find first one (already on the
>Cray, rumored to be coming on the alpha before long) it will be a win,
>finally...

For those with access to a good Common Lisp environment (I use
Macintosh Common Lisp 4.1), the Chess In Lisp project has a full bit
board implementation. The gzip tar archive file is CILv00.tar.gz and
can be found in the pub/chess/uploads/projects directory at the
chess.onenet.net ftp site.

-- Steven (s...@mv.mv.com)

### brucemo

Aug 2, 1997, 3:00:00 AM8/2/97
to

Dave Gomboc wrote:
>
> Bruce, the code is Java, not C++. Boot up the Visual J++ 1.1. :-)

My mistake.

Yuk :-)

bruce

### brucemo

Aug 2, 1997, 3:00:00 AM8/2/97
to

Dave wrote:
>
> On Wed, 30 Jul 1997 10:16:40 -0700, brucemo <bru...@seanet.com>
> wrote:

> >Bruce replied (in part)...
> >Mine is a damned seriously competitive non-commercial program. It
> >finished well in the last two WMCCC's. It does have a data structure in
> >it which is an array of bits, and is used to store piece locations and
> >other info, but it is not used in any of the ways that Crafty uses it.
> >
> Congratulations, Bruce. I'm not familiar with your program, but it
> sounds like a *very* strong program. What is the name of your program?

Ferret.

> When you state that it "does have a data structure in it which is an
> array of bits used to store piece locations and other info", that is
> *precisely* what I call a bit board.
>
> I said absolutely *nothing* about using them to the extent or in the
> same way as Crafty. I have no concept of how Crafty uses bit boards.

An array of 64-bits as related to a chessboard is called a bitboard or a
bitmap, I guess. Some programs use these very extensively in time-critical
areas, in order to perform functions that can, however, be performed in other
ways.

When we discuss bitboards here, we are usually speaking about those
functions:

move generation
board management
static exchange evaluation
in-check detection
detection of positional features involving pieces (not necessarily pawns
only)

My program doesn't use bitboards to do any of this.

> >But there is more than one good way to do this.
>
> Always is, I just don't know of any strong chess program that doesn't
> use bit boards to some extent. I have received email about a program
> that uses a differential "table" to generate moves, but it only plays
> on machines with Linux, so I have no idea what the real strength of
> that program would be.

The one that did well in the WMCCC two years ago used them a lot less, and
may not have used them at all. I do not remember.

> If anyone knows of such a program (that uses no bit boards), I would
> certainly like to see it's move generation described in the NG in
> enough detail to understand it's concept, and also post up it's
> rating, if possible.

A move generation process that uses no bitboards is found in GnuChess. You
have an initialization pass at the start of the program that sets up a big
table. There is a list of elements for each type of piece on each possible
square. When you want to generate moves for a piece, you go to the
appropriate location in the table. At that location you find a pointer to
the square you are trying to move to, a pointer to the next table element
assuming that that square was empty, and a pointer to the next table element
assuming that the square had something on it. You just rip through the table
checking squares, adding moves to your candidate move list as required, and
following the appropriate pointer until you hit a null.

You can also use the technique that has been referred to in here as 0x88.
Professional programs have use this in the past, I am certain of this.

I'm not saying that any of these techniques is the best way to do it, but
they are viable ways to do it, the bitboard approach doesn't
obviously out-strip all others.

bruce

### Dave

Aug 3, 1997, 3:00:00 AM8/3/97
to

On Sat, 02 Aug 1997 10:39:34 -0700, brucemo <bru...@seanet.com>
wrote:

>
>> >Bruce replied (in part)...
> . . . <*All Kinds of Good Stuff Snipped from Here***>

>I'm not saying that any of these techniques is the best way to do it, but
>they are viable ways to do it, the bitboard approach doesn't
>obviously out-strip all others.
>
>bruce

Thanks for the great post, Bruce. Appreciated the information very
much.

Didn't mean to try your patience *quite* so much.

Dave

### David Eppstein

Aug 8, 1997, 3:00:00 AM8/8/97
to

>: Both routines give correct results! However, even with the
>: multiplication, Davids strange Routine is 3 to 5 times faster!!!
>: Probably we all underestimated the costs of the loop.
>: Elisabeth

to which Bob Hyatt replied
> You have to do the test "right" to compare them. IE, do you have
> lots of 1 bits, or only a couple? If only a couple, my loop is
> probably faster. The odd "add/shift" approach in the other algorithm
> is well-known and has been used on the alpha where 64 bits make this
> even harder.

You should try collecting statistics on how many bits really are set and
then use them to do your benchmarks. In my program I found an average
of four bits/test which was enough for my loopless code to be faster by
maybe 50% (rather smaller win than Elisabeth's results). It may even be
worth it to have two versions for different parts of your program.

Also, you should take note that some compilers/optimizers will only
inline loopless functions.

There is one small further improvement I've found: replace

>: >> set = (set & ONES) + ((set >>> 1) & ONES);

with
set -= (set >>> 1) & ONES;

(For you C/C++ types, replace the >>> with unsigned >>. And note that in
Java "long" is 64 bits, so my code is already 64-bit not 32.)

### Robert Hyatt

Aug 8, 1997, 3:00:00 AM8/8/97
to

David Eppstein (epps...@euclid.ics.uci.edu) wrote:
: >: Both routines give correct results! However, even with the
: >: multiplication, Davids strange Routine is 3 to 5 times faster!!!
: >: Probably we all underestimated the costs of the loop.
: >: Elisabeth

: to which Bob Hyatt replied
: > You have to do the test "right" to compare them. IE, do you have
: > lots of 1 bits, or only a couple? If only a couple, my loop is
: > probably faster. The odd "add/shift" approach in the other algorithm
: > is well-known and has been used on the alpha where 64 bits make this
: > even harder.

: You should try collecting statistics on how many bits really are set and
: then use them to do your benchmarks. In my program I found an average
: of four bits/test which was enough for my loopless code to be faster by
: maybe 50% (rather smaller win than Elisabeth's results). It may even be
: worth it to have two versions for different parts of your program.

I did this check, in fact, a while back. For me (Crafty) the most common
answers are 0, 1 or 2. For my mobility, I don't count bits, I precompute
the "counts" and do a single 1-byte table lookup for a sliding piece...

I used to have two PopCnt() functions, one to use the loop, when I knew
that only a couple of bits were set, and one using a table lookup. In
doing *lots* of testing, the loop program was better, but note *in crafty*.
Depending on how you use the code, it might be better to use the other
variationsof population count. By the way, there are some truly remarkable
algorithms to do this, apparently developed by the crypto guys. Joel Rivat
(ChessGuru) and I tried at least a half-dozen different ways to do this way
back when we started using bitmaps...

### Andrew Tridgell

Aug 9, 1997, 3:00:00 AM8/9/97
to

Dave wrote:
> ...

> Always is, I just don't know of any strong chess program that doesn't
> use bit boards to some extent. I have received email about a program
> that uses a differential "table" to generate moves, but it only plays
> on machines with Linux, so I have no idea what the real strength of
> that program would be.

I wonder if you are referring to KightCap? KnightCap was developed
under Linux but runs on any unix system. I don't do windows :-)

> If anyone knows of such a program (that uses no bit boards), I would
> certainly like to see it's move generation described in the NG in
> enough detail to understand it's concept, and also post up it's
> rating, if possible.

Personally I don't like bitboards much. In KnightCap I use piece
bitlists
instead. The name isn't as catchy as bitboards but I personally think

The basic idea of bitboards is to manipulate 64 bit numbers that
represent squares that can be operated on for a particular piece. The
basic
idea of piece bitlists is to manipulate 32 bit numbers that represent
pieces that can be operated on for particular squares.

Now for some more detail.

The way bit piece lists work is that each square on the board has
associated with it a 32 bit number. Each bit represents a piece.
Which piece corresponds to which bit is fixed at startup. For example
I use the first bit to represent the white king, the 2nd bit for the
white queen etc. The first 16 bits are whites pieces and the 2nd 16
bits are blacks. Its handy that there are 32 pieces in chess :-)

If the bit for a piece is set on a particular square then it means
that the piece is attacking that square. It doesn't matter what is on
the square, pieces are allowed to "attack" pieces of the same colour.

Bits are not set for castling or pawn pushes. Those are handled
separately.

Given the above representation the move generator is trivial and
_very_ fast. You just loop over all squares and on each square
you immediately have available to you the list (in bitlist format)
of pieces that can get to that square. Putting this into a normal
"list of moves" format is trivial if you want that format. You also
have to add pawn pushes and castling, but that is certainly very
easy and doesn't cost much.

The only other overhead you have with this method is the routine that
updates the bitlists after each move. This is done incrementally and
is very fast, although you do have to do some careful thinking to get
the code right. You will know you have it right when your update routine
only explicitly recalculates the available moves for one piece after
each move. Essentially this update routine is doing all the work of
your move generator, giving you an incremental move generator.
The speedup comes from the fact that you only have to generate one
pieces moves per move, instead of generating all pieces moves.

So why do I bother? Once you have bitlists you will start to realise
how useful they are. For example, say you wish to know if there
if a white rook is connected to another rook of the same color.

/* the rook is connected to another rook */

}

or say you want to test to see if the white king is in check:

/* we are in check */

}

or if this pawn is supported by another pawn:

/* pawn is supported */
}

The masks in the above examples are:

there are hundreds of similar things you can do with this. Some of the
things I do in KnightCap using these structures include:

- work out what pieces are pinned, and what they are pinned to
- work out what pieces are hung
- work out what pieces are trapped (ie. they have no safe movements)

These things all involve more than a simple & operation, but without
the bitlists (+ some static tables) it would be incredibly slow.

Another really big use for bitlists is in the quiesce search. You can
list all captures very fast by looping over the piece lists
and looking for pieces that are threatened by enemy pieces (a &
operation).

So, in asnwer to your question there are definately other ways
of doing this stuff than bitboards. There are probably dozens
of good ways. Bitboards take advantage of the fact that there
are 64 squares on the board. Bitlists take advantage of the fact that
there are at most 32 pieces. I'm sure other systems take advantage
of other nice aspects of chess that suit modern computers.

I doubt you will find any strong correlation between the rating
of a program and its internal representation.

Cheers, Andrew

PS: I think Bob may use a topieces array in crafty, but I don't think he
bases his move generator on it. I also don't know if he updates it
incrementally.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Andrew Tridgell Dept. of Computer Science
email: Andrew....@anu.edu.au Australian National
University
Phone: +61 6 254 8209 Fax: +61 6 249 0010
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

### Robert Hyatt

Aug 9, 1997, 3:00:00 AM8/9/97
to

Andrew Tridgell (Andrew....@anu.edu.au) wrote:

: PS: I think Bob may use a topieces array in crafty, but I don't think he

: bases his move generator on it. I also don't know if he updates it
: incrementally.

I don't update anything but the occupied square bitmaps incrementally,
which is really just the board state. I get the attacks from a square
by a simple table lookup (2 for bishops/rooks, 4 for queens, 1 for
kings and knights).

It can be *very* fast too..

### brucemo

Aug 16, 1997, 3:00:00 AM8/16/97
to

Andrew Tridgell wrote:

> The way bit piece lists work is that each square on the board has
> associated with it a 32 bit number. Each bit represents a piece.
> Which piece corresponds to which bit is fixed at startup. For example
> I use the first bit to represent the white king, the 2nd bit for the
> white queen etc. The first 16 bits are whites pieces and the 2nd 16
> bits are blacks. Its handy that there are 32 pieces in chess :-)

> Given the above representation the move generator is trivial and
> _very_ fast.

How many NPS does the program do on the P6 you use on ICC, and is your program
spending a high percentage of its time doing move generation?

> The only other overhead you have with this method is the routine that
> updates the bitlists after each move. This is done incrementally and
> is very fast, although you do have to do some careful thinking to get
> the code right. You will know you have it right when your update routine
> only explicitly recalculates the available moves for one piece after
> each move. Essentially this update routine is doing all the work of
> your move generator, giving you an incremental move generator.
> The speedup comes from the fact that you only have to generate one
> pieces moves per move, instead of generating all pieces moves.

One bit of overhead that you have, if I am reading your description right, is that you
loop over every square looking for set bits. The number of squares is large (64).
Perhaps this doesn't matter much since you're burning through a 64-element array,
which is probably a pretty optimal thing to do.

Another is that once you have determined that there are pieces that can move to this
square, you have to figure out what they are.

Do you get significant speedup in the endgame?

> So why do I bother? Once you have bitlists you will start to realise
> how useful they are. For example, say you wish to know if there
> if a white rook is connected to another rook of the same color.
>
> if (topieces[square] & WROOK_MASK) {
> /* the rook is connected to another rook */

How do you deal with promoted pieces? You don't often see positions on the board with
two queens or three knights/bishops/rooks, but I bet they happen all the time in the
tree.

bruce

### Andrew Tridgell

Aug 17, 1997, 3:00:00 AM8/17/97
to

brucemo wrote:
> How many NPS does the program do on the P6 you use on ICC, and is your program
> spending a high percentage of its time doing move generation?

The NPS rate is very low (about 8-15k) but thats because the
evaluation function is very expensive. The move generator code
is cheap. KnightCap spends almost all of its time in the evaluation
code or in the mini-tactics estimator.

> One bit of overhead that you have, if I am reading your description right, is that you
> loop over every square looking for set bits. The number of squares is large (64).
> Perhaps this doesn't matter much since you're burning through a 64-element array,
> which is probably a pretty optimal thing to do.

yes, for the main search the move generator loops over all squares,
but for most squares the topieces[] element is 0 so it loops pretty
fast.

In the quiesce() search it loops over the piecelist instead, and gets
the possible captures by looking at topieces[pieces[pi].pos]

> Another is that once you have determined that there are pieces that can move to this
> square, you have to figure out what they are.

yep, I use a ff_one() macro (find first 1 bit) to loop over
the bits set in the topieces array.

>
> Do you get significant speedup in the endgame?

yes, but mostly because the evaluation function is much cheaper in
the endgame.

> How do you deal with promoted pieces? You don't often see positions on the board with
> two queens or three knights/bishops/rooks, but I bet they happen all the time in the
> tree.

Currently I always leave pieces in the same location in the bitlists.
This is OK for move generation but as you point out it can be wrong
for things like the connected-rooks test.

This isn't actually a problem at the moment as KnightCap only generates
queen promotions (it will accept any promotion, it just won't generate
them itself). When an opponent promotes to other than queen a full
regenerate of the topieces array is called, with piece sorting. If they
end up with 3 rooks on the board then the evaluation function will be
off slightly as things like the connected rook test won't work correctly
but move generation still works correctly, so its only a minor problem.

Cheers, Andrew