bitboard move generator

Joel Rivat

unread,

Nov 13, 1994, 6:10:37 AM11/13/94

to

In a previous posting Bob Hyatt gives his own implementation
of the chessboard using bitboards:

typedef struct {
BITBOARD w_pawn; /* locations of white pawns */
BITBOARD w_knight; /* locations of white knights */
BITBOARD w_bishop; /* locations of white bishops */
BITBOARD w_rook; /* locations of white rooks */
BITBOARD w_queen; /* locations of white queens */
BITBOARD w_king; /* locations of white kings */
BITBOARD w_occupied; /* locations of white anything */

BITBOARD b_pawn; /* locations of black pawns */
BITBOARD b_knight; /* locations of black knights */
BITBOARD b_bishop; /* locations of black bishops */
BITBOARD b_rook; /* locations of black rooks */
BITBOARD b_queen; /* locations of black queens */
BITBOARD b_king; /* locations of black kings */
BITBOARD b_occupied; /* locations of black anything */

BITBOARD hash_key; /* hash key used to access trans table */
BITBOARD pawn_hash_key; /* hash key used to lookup pawn scores */

BITBOARD rooks_queens; /* has a 1 for any square with a rook or
queen for either side. used to update
the attack boards. if you move a pawn
to a new square, you look vertically and
horizontally from the new square (and the
old square, of course) to see if you see a
rook or queen (using this board.) If so,
you have some attacks to update.
BITBOARD bishops_queens; /* ditto for bishops and queens */

BITBOARD enpassant_target; /* one bit set whenever a pawn advances two
ranks */

int w_castle; /* two bits, one for each rook. 0=that rook
moved, whole thing=0 means king moved */
int b_castle; /* ditto for black */

} CHESS_BOARD;

The two last declarations (about castles) do not seem to me
in the "bitboard spirit"

I suggest instead (no speedup expected!):
BITBOARD never_moved; /* pieces that are on their original square */

The castling info can be retrieved very easily, for example:
king_side_castle := ((never_moved and bitboard(e1,h1)) == bitboard(e1,h1))

But we can use also this never_moved info for other purpose,
for instance to stress development in the opening, etc...

I hope I will not be considered as a specialist for giving
improvements that improve nothing...
I think this discussion about bitboards is important with the arrival
of 64 bits processors...
At least on the CRAY it is probably the best approach, due to the
cost of any memory access (as Bob explained!...)
For other 64 bits processors (like DEC Alpha...) I am not convinced...
(But I would like te be...) as GNUCHESS-4.0-pl73 seems to be faster...
(Bob says 6K n/s on sparc 20 while gnu is about 15K n/s,
I would like to see the results on DEC Alpha...)

Joel Rivat

Peter Gillgasch

unread,

Nov 14, 1994, 3:03:55 PM11/14/94

to

In article <3a4s7d$h...@cismsun.univ-lyon1.fr>, ri...@caissa.univ-lyon1.fr (Joel Rivat) writes:

|> I think this discussion about bitboards is important with the arrival
|> of 64 bits processors...
|> At least on the CRAY it is probably the best approach, due to the
|> cost of any memory access (as Bob explained!...)
|> For other 64 bits processors (like DEC Alpha...) I am not convinced...
|> (But I would like te be...) as GNUCHESS-4.0-pl73 seems to be faster...
|> (Bob says 6K n/s on sparc 20 while gnu is about 15K n/s,
|> I would like to see the results on DEC Alpha...)

You cannot compare nps if you don't know how the programs count the node,
that is if you don't know how the programmers "define" a node.

I have seen various definitions of n:

(a) moves that are included in the search tree
(b) legal moves that are included in the search tree
(c) moves that are included in the search tree plus moves that
are forward pruned by quiesence rules or futility cutoffs

As far as I know, Bob uses definition (a). I don't know about GNUchess,
but I wouldn't be surprised if they use definition (c). I use definition
(b) and I can easily increase nps by *at least* 150% by switching to
definition (c). Program doesn't get faster, though )-:
Everybody can increase nps by removing move ordering heuristics etc.

Bottom line: Never trust any numbers like nps. If you really want to
compare the speed of two different programs, run a standard benchmark
and compare how long they need to to a n ply search. If you use that
method then you will realize that GNUchess is *slow*, although the
principle behind its move generation is quite clever...

-- Peter

+-------------------------------//-------------------//---------\\
| Peter W. Gillgasch //the way is clear // Gothic \\
| Klosterweg 28/I-113 //The road is closed // computer \\
| 76131 Karlsruhe/Germany //the damage done // award for the \\
| Phone ++49/(0)721/6904255 //and the course // Macintosh SE: \\
| email gil...@ira.uka.de //imposed you //Dead and monochrome\\
+-------------------------//-------------------//---------------------\\

Robert Hyatt

unread,

Nov 15, 1994, 10:23:45 AM11/15/94

to

In article <3a8frb$6...@nz12.rz.uni-karlsruhe.de>,

Peter Gillgasch <gil...@i41s19.ira.uka.de> wrote:
>In article <3a4s7d$h...@cismsun.univ-lyon1.fr>, ri...@caissa.univ-lyon1.fr (Joel Rivat) writes:
>
>|> I think this discussion about bitboards is important with the arrival
>|> of 64 bits processors...
>|> At least on the CRAY it is probably the best approach, due to the
>|> cost of any memory access (as Bob explained!...)
>|> For other 64 bits processors (like DEC Alpha...) I am not convinced...
>|> (But I would like te be...) as GNUCHESS-4.0-pl73 seems to be faster...
>|> (Bob says 6K n/s on sparc 20 while gnu is about 15K n/s,
>|> I would like to see the results on DEC Alpha...)
>
>You cannot compare nps if you don't know how the programs count the node,
>that is if you don't know how the programmers "define" a node.
>
>I have seen various definitions of n:
>
>(a) moves that are included in the search tree
>(b) legal moves that are included in the search tree
>(c) moves that are included in the search tree plus moves that
> are forward pruned by quiesence rules or futility cutoffs
>
>As far as I know, Bob uses definition (a). I don't know about GNUchess,
>but I wouldn't be surprised if they use definition (c). I use definition
>(b) and I can easily increase nps by *at least* 150% by switching to
>definition (c). Program doesn't get faster, though )-:
>Everybody can increase nps by removing move ordering heuristics etc.

One important thing: If you want to make your program search a large number
of nodes per second, remove alpha/beta. Minimax searches much faster, since
every move that you generate must be searched, eliminating much wasted
effort. A good example of this idea was presented by Tony Marsland years
ago when he parallelized an old version of Belle (a *really* old version
that had poor move ordering...) He got much better parallel performance
(improvements) with that program than he did on a much better informed
chess program (phoenix.) The conclusion: you can't simply compare nodes
per second. If you do, we *crush* Kasparov... :^)

>
>Bottom line: Never trust any numbers like nps. If you really want to
>compare the speed of two different programs, run a standard benchmark
>and compare how long they need to to a n ply search. If you use that
>method then you will realize that GNUchess is *slow*, although the
>principle behind its move generation is quite clever...
>

I agree. It would be really nice if we had some type of "benchmark"
that we could all run against. Would be interesting to see how each
program stacks up. Of course, it is nearly impossible. Genius 2.0
is a good example, how would you compare *your* two ply search against
genius's? Hard without knowing his rules. Maybe a set of problems
from something like Win at Chess or 1001 Brilliant Ways to Checkmate,
where the only requirement is that the problems be solvable by *any*
reasonable program. Then we could compare time-to-solution to see
how things compare.

That addresses tactical (search) speed. Of course, we still have to have
some way to address strategical (evaluation) quality as well. We have been
giving up speed for knowledge for several years. As a result, we are
probably significantly slower than we were 5 years ago, and would probably
solve most problems somewhat slower, even though we are now playing *better*
chess. That *really* makes it hard to compare speed, when speed is not the
only consideration... Tech used to be fast as hell, and would probably smoke
on a new Cray C90, yet I doubt it would win many games against todays better
programs.

I am still experimenting with bitboards, trying to find the best balance
between what we *can* do and what we *must* do... It remains to be seen
if the old Slate/Atkin ideas (to attacks and from attacks) are the best.
I already have a better [to attacks] algorithm, and am now trying to decide
if [from attacks] are really needed since they are expensive to maintain.
More as I get testing done...

Bob

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

Joel Rivat

unread,

Nov 15, 1994, 11:51:14 AM11/15/94

to

I 100% agree with you that in general we cannot deduce
anything from the nodes per second info.
This is howewer less true when considering a very simple
chess program (with very poor evaluation)
which we were speaking about...But forget this question...

I just wanted to know the speedup when using a 64 bits processor
(like alpha) in comparison with the (otherwise comparable) 32 bits sparc
(as I am using an alpha all the day...)
In other words what % time do you spend in 64 bits operations ?
Are the 6K n/s with or without the attack from info ?

The main point of my previous posting was:
what about replacing
int w_castle,b_castle;
by
BITBOARD never_used;
?
Any comments ? Or better idea ?

Another question: How do you find which piece is on which square
using your approach?
Do you have a trick for this or just:
* test if square is not empty,
* test queen or rook
* etc...
Or do you avoid (almost) completely this ?

Many thanks to Bob Hyatt for all his answers
and his excellent spirit about chess programming...
Many people hide information, and that is why their programs
make so little progress (IMHO) ...

Joel Rivat

Robert Hyatt

unread,

Nov 15, 1994, 1:50:35 PM11/15/94

to

In article <3aaou2$2...@cismsun.univ-lyon1.fr>,

Joel Rivat <ri...@caissa.univ-lyon1.fr> wrote:
> I 100% agree with you that in general we cannot deduce
>anything from the nodes per second info.
> This is howewer less true when considering a very simple
>chess program (with very poor evaluation)
>which we were speaking about...But forget this question...
>
> I just wanted to know the speedup when using a 64 bits processor
>(like alpha) in comparison with the (otherwise comparable) 32 bits sparc
>(as I am using an alpha all the day...)
>In other words what % time do you spend in 64 bits operations ?
>Are the 6K n/s with or without the attack from info ?

The 6K nodes per second (SparcStation 20) includes *everything* that I'm doing,
to.attacks, from.attacks, evaluation, hashing, the whole works. Here is the top
of the current "profile" output from the SS20 with an explanation of what each
routine is/does:

%Time Seconds Cumsecs #Calls msec/call Name
36.3 31.10 31.10 make_move just what you'd expect
this to do, update all
the attack bitboards.
10.6 9.05 40.1513384202 0.0007 first_one finds first one bit which
is one instruction on Cray.
7.4 6.34 46.49 _mcount stupid profiler routine
7.2 6.15 52.64 generate_moves_to move generator
6.4 5.48 58.12 next_move generates moves in "sets"
by calling generator with
various targets. includes
capture sort, killer moves,
move ordering, etc.
4.9 4.20 62.32 5278758 0.0008 popcnt population count (# of 1's)
4.4 3.76 66.08 134792 0.0279 evaluate fairly simple evaluator
(working on this right now)
3.3 2.84 68.92 lookup transposition table lookup
3.1 2.63 71.55 3568036 0.0007 last_one opposite of first_one()
2.9 2.50 74.05 4597866 0.0005 mask hardware instruction on the
cray used to produce masks
with consecutive one bits
starting from the left or
right depending on call.
2.0 1.70 75.75 exchange static exchange evaluator
uses from attack stuff to
"loosely" order capture moves.
1.9 1.64 77.39 search obvious.
1.7 1.45 78.84 quiesce what follows search..
1.4 1.19 80.03 755838 0.0016 check answers the question is
"side" in check.

>
> The main point of my previous posting was:
>what about replacing
>int w_castle,b_castle;
>by
>BITBOARD never_used;
> ?
>Any comments ? Or better idea ?

I apparently missed your post. we are expiring news frequently due to filling up
a 600mb partition. I somehow didn't read news frequently enough I suppose...
In any case, the reason I didn't do this was that the program is smart enough to
not move the king very often until the endgame; therefore I only have to alter
the above values on rook/king moves, not on *all* moves. Secondly, I really don't
see how to use never-moved for other pieces. For development, it is not quite good
enough as what do you do after a knight moves to f3, gets forced back to g1, and you
are now not so anxious to get it out? In any case, not seeing how to use information
such as you propose, the four possible values [can't castle] [can castle queen only]
[can castle king only] and [can castle either] turns out to be useful as I have to
include this in the transposition table hash key since identical positions (based only
on piece locations) are not identical if castling status is different. Now, I simply
fold in 1 of 4 random 64-bit values to "adjust" the hash_key to include castling
status. your "never_moved" would require more work, which is done at every node
in the tree since the first thing I try to do is look it up.

>
>Another question: How do you find which piece is on which square
>using your approach?

I have a board[64] array which contains the "real" piece on each square;
1=white pawn, -1=black pawn, 0=empty, etc. This is updated as the bit boards
are updated. I couldn't think of a better solution. I use the famous 21-bit
move format (6bits=from, 6bits=to, 3bits=moving piece, 3bits=captured piece,
3 bits=promotion to piece) and needed the captured piece. Rather than searching
(as you mentioned) through (potentially) 6 bit boards, I just go to board[square]
and there it is. The cost for updating it is really nothing.

>Do you have a trick for this or just:
> * test if square is not empty,
> * test queen or rook
> * etc...
>Or do you avoid (almost) completely this ?
>
>Many thanks to Bob Hyatt for all his answers
>and his excellent spirit about chess programming...
>Many people hide information, and that is why their programs
>make so little progress (IMHO) ...
>
> Joel Rivat
>

Hope this helps. One more point: We have discussed whether or not the from attacks
are worth the computational costs. I have spent a day working on this and found the
following:

removing the from attacks code from make_move() saved about 10% or so. However, I
now have more work to do to see if either side is in check, so that routine added
the 10% back. More importantly, the from-attacks are useful for move ordering deep
in the tree: move pieces from attacked squares to unattacked squares, then from
unattacked squares to unattacked squares, and finally from unattacked squares to
attacked squares. I had to give this up without from attacks, and the cost was
enough to make this version slower in all cases I tried.

As I mentioned, I like the from-attacks because of evaluation tricks. If I detect
a weak pawn, I can trivially count the number of attacks on it for evaluation
purposes. For passed pawns, I can count the number of attacks in front of them
to see how hard they are to "mobilize". Just to name a few, of course. More as I
write the code.

I have found a much better way of generating the attack bit-vector for a piece
after it has been moved (which eliminates the first_one() and some other stuff,
and also finds all attacks along a row/col/diagonal instead of treating each
direction separately. I will explain more once I get it all in place and again
feel confident that it isn't losing something along the way.

In any case, I now feel (for my application, anyway) that the from.attack stuff
is useful, necessary, and actually saves work (when I use it...) Now to get a
better evaluation so I can start to play some games against the old program.

Hope the above cleared up where I am, where I've been, and where I'm going with
this. By the way, I still have not tried to "optimize" this code, so I suspect that
the 6K nodes per second is a lower limit on the sparc. I noticed on the profiler
this this stupid compiler (Sun ansi C) doesn't know that my sparc can now multiply
and divide, and still uses the software emulation. As a result, I am fixing to
find out what's going on and what options to use to make it run as efficiently as
it can.

Keep the questions coming, every now and again a suggestion rings true and saves me
time! Thanks for the suggestions. May I one day repay you by having Cray Blitz crush
you over-the-board!!

Steven J. Edwards

unread,

Nov 15, 1994, 12:59:37 PM11/15/94

to

hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>In article <3a8frb$6...@nz12.rz.uni-karlsruhe.de>,
>Peter Gillgasch <gil...@i41s19.ira.uka.de> wrote:
>>In article <3a4s7d$h...@cismsun.univ-lyon1.fr>, ri...@caissa.univ-lyon1.fr (Joel Rivat) writes:

>>|> For other 64 bits processors (like DEC Alpha...) I am not convinced...
>>|> (But I would like te be...) as GNUCHESS-4.0-pl73 seems to be faster...
>>|> (Bob says 6K n/s on sparc 20 while gnu is about 15K n/s,
>>|> I would like to see the results on DEC Alpha...)
>>
>>You cannot compare nps if you don't know how the programs count the node,
>>that is if you don't know how the programmers "define" a node.

I think the most common definition is "legal move entered into search
tree". Each such node is visited by a separate routine call. A node
visited more than once may be counted more than once.

>>Bottom line: Never trust any numbers like nps. If you really want to
>>compare the speed of two different programs, run a standard benchmark
>>and compare how long they need to to a n ply search. If you use that
>>method then you will realize that GNUchess is *slow*, although the
>>principle behind its move generation is quite clever...

Wilkins' PARADISE program did some remarkable searches within the
domain of tactically active advantageous positions. It solved a
mate-in-10 from one of Reinfeld's books looking at only about 100
nodes in some 2400 seconds, a node frequency of about 42 mHz.

>I agree. It would be really nice if we had some type of "benchmark"
>that we could all run against. Would be interesting to see how each
>program stacks up. Of course, it is nearly impossible. Genius 2.0
>is a good example, how would you compare *your* two ply search against
>genius's? Hard without knowing his rules. Maybe a set of problems
>from something like Win at Chess or 1001 Brilliant Ways to Checkmate,
>where the only requirement is that the problems be solvable by *any*
>reasonable program. Then we could compare time-to-solution to see
>how things compare.

Such tests already exist. Based largely on the work of Richard Lang
and Warren Smith, there are a dozen of so of these test suites
available via ftp. The EPD versions can be found in the
pub/chess/Tests directory at the ics.onenet.net ftp site. (Read the
Manifest file first.) Various commercial programs are already EPD
compatible; reports of these include the latest versions of Zarkov,
Hiarcs, Chess Genius, and M-Chess. There are some research programs
that use EPD as well.

Both _Win at Chess_ and _1001 Brilliant Ways to Checkmate_ are there.
There are only 300 positions in the former, while the 1001 positions
of the latter are perhaps too easy for most programs. Spector unning
on a 33 MHz 486DX at 180 seconds per position gets about 255/300 of
WAC and aboout 950/1001 of BWTC. On longer time limits, Spector
solves all but ten of the BWTC problems. I think that the position
set from _1001 Winning Chess Sacrifices and Combinations_ is better
than either WAC or BWTC for program metrics; Spector gets about
790/1001 of these. Spector does somewhat worse percentage-wise on the
879 positions from the _Encyclopedia of Chess Middlegames_ test; these
are tougher problems with greater search depth requirements.

So, instead of time-to-solution or time-to-depth, I suggest using
percentage scores based on per-position time limits, and that 180
seconds is a good time limit as it approximates master level
tournalment time contols. I strongly agree with Lang and Smith
concerning their thought that "the more positions, the better".

-- Steven (s...@mv.mv.com)

Dave Giunti #136 @11554

unread,

Nov 17, 1994, 1:58:20 PM11/17/94

to

0R 34 11/17 15:50 WWIVNet 11561->11558
0R 34 11/17 15:12 WWIVNet 11014->11561
0R 34 11/17 13:59 WWIVnet ->11554
Response To: RIVAT#CAISSA.UNIV-LYON1.F

R> FROM: rivat#caissa.univ-lyon1.fr (Joel Rivat) #345 @11551
R>
R> Could you post the speed (nodes/sec.) of your chess program
R> based on bitboard move generator, using a simple evaluator
R> (like material only) and no killer, no hash table,
R> for a few positions (you choose...)
R> for a specific computer (give firm model clock speed) ?
R> suggested:
R> DEC Alpha (64 bits! the more interesting...)
R> SUN sparc 10
R> HP 7xx
R> IBM RS/6000
R> How does it compare with gnuchess-4.0.pl73 (without any change)
R> on the same computer ?
R>
R> Will you reach 20000 nodes/s on the DEC Alpha 3000 model 300 (150 Mhz) ?

I think that 20k n/s is a bit conservative. My GnuChess 30f (186)
version is pulling 13k on a 486dx2 66mhz and should easily break the
20k barrier on a P90.
Dave

---
JABBER v1.2 "Nyuk, Nyuk, Nyuk" - Curly Howard

WWIVMail/QWK 4.54 [REGISTERED]: Berkeley WWIV BBS - @11554 ON WWIVnet