Move generator design choices

43 views
Skip to first unread message

Martin Borriss

unread,
Apr 26, 1996, 3:00:00 AM4/26/96
to

I have a question regarding move generators.

I'm planning to re-design my move generator. Right now I still have the
simplest possible generator, looping over board, looking for pieces,
checking for edges of the board etc. This also applies to the in_check()
function (most frequently called function in a chess program, at least for me).

On purpose I leave bitmaps approachs out of this, since I don't know much
about them. But I'm interested to hear how they compare against other
techniques (assuming a current hardware architecture, in my case Intel).

Ok. People seem to agree that a design involving some form of piece lists
(e.g., data structures keeping the current positions of pieces) are a must
for a good move generator. True?

There is a little choice to make as well. One can either
1) update the piece list on each make_move() and undo it in unmake_move()
2) copy the piece list from the previous ply, update it in make_move.
Saves the cost for undoing it, but needs copying.

I can see some more advantages to piece lists:
1) in_check() function will be much more efficient
2) some speedup in the eval function

For the actual move generation I can see two choices:
1) Use the famous 0x88h method. I underestimated the speedup possible by this,
I think. Comparing modulo operations vs. an 'and 0x88' I first believed that
the difference is not that dramatic. However, on Intel a modulo requires
an 'IDIV' operation, taking almost 30 cycles vs. 2 cycles for the and '0x88'.
2) Using move a move table, where possible moves for each piece on every
square are precomputed. I had concerns on the size of such a table, but an
estimate I did gave something in the order of 20KB.

I would much appreciate any comments on this, especially your experiences on
the possible choice. I am most unclear on the last choice.

--
Martin....@inf.tu-dresden.de

Robert Hyatt

unread,
Apr 26, 1996, 3:00:00 AM4/26/96
to

In article <4lq1qm$g...@irz301.inf.tu-dresden.de>,
Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->I have a question regarding move generators.
-->
-->I'm planning to re-design my move generator. Right now I still have the
-->simplest possible generator, looping over board, looking for pieces,
-->checking for edges of the board etc. This also applies to the in_check()
-->function (most frequently called function in a chess program, at least for me).
-->
-->On purpose I leave bitmaps approachs out of this, since I don't know much
-->about them. But I'm interested to hear how they compare against other
-->techniques (assuming a current hardware architecture, in my case Intel).
-->
-->Ok. People seem to agree that a design involving some form of piece lists
-->(e.g., data structures keeping the current positions of pieces) are a must
-->for a good move generator. True?

Yes, because instead of looping over 64 squares to find pieces, you loop
over a piece-list of 16 elements. If you want to take the time to pack
it after a capture (maybe only at the root of the tree as we did in Cray
Blitz) then this loop shrinks after each piece is really removed from the
board.

-->
-->There is a little choice to make as well. One can either
-->1) update the piece list on each make_move() and undo it in unmake_move()

On a PC, don't copy. Insufficient memory bandwidth and you'll thrash
the heck out of cache. Make/UnMake is faster by a significant amount.
I've don't both in Crafty, currently doing the latter which is best on
the PC platform.

-->2) copy the piece list from the previous ply, update it in make_move.
--> Saves the cost for undoing it, but needs copying.
-->
-->I can see some more advantages to piece lists:
-->1) in_check() function will be much more efficient
-->2) some speedup in the eval function
-->
-->
-->
-->For the actual move generation I can see two choices:
-->1) Use the famous 0x88h method. I underestimated the speedup possible by this,
--> I think. Comparing modulo operations vs. an 'and 0x88' I first believed that
--> the difference is not that dramatic. However, on Intel a modulo requires
--> an 'IDIV' operation, taking almost 30 cycles vs. 2 cycles for the and '0x88'.

Correct. 0x88 is about as simple a thing as you can do for edge detection. On
the PC, it's the best of the offset move generator approaches I think.

-->2) Using move a move table, where possible moves for each piece on every
--> square are precomputed. I had concerns on the size of such a table, but an
--> estimate I did gave something in the order of 20KB.

This works, but you *still* have the edge detection problem, as well as where
to stop sliding a piece along a diagonal, so 0x88 works with pre-computed move
lists as well as with directly-computed move generators.

This is what HiTech hardware uses, special purpose hardware with "all the
right moves" pre-computed (that's the title of Carl's thesis on the topic.

-->
-->I would much appreciate any comments on this, especially your experiences on
-->the possible choice. I am most unclear on the last choice.
-->
-->--
-->Martin....@inf.tu-dresden.de


--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

MANOHARARAJAH VALAVAN

unread,
Apr 27, 1996, 3:00:00 AM4/27/96
to

In article <4lq1qm$g...@irz301.inf.tu-dresden.de>,
Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
>
>I have a question regarding move generators.
>
>I'm planning to re-design my move generator. Right now I still have the
>simplest possible generator, looping over board, looking for pieces,
>checking for edges of the board etc. This also applies to the in_check()
>function (most frequently called function in a chess program, at least for me).
>
>On purpose I leave bitmaps approachs out of this, since I don't know much
>about them. But I'm interested to hear how they compare against other
>techniques (assuming a current hardware architecture, in my case Intel).
>
>Ok. People seem to agree that a design involving some form of piece lists
>(e.g., data structures keeping the current positions of pieces) are a must
>for a good move generator. True?
>
>There is a little choice to make as well. One can either
>1) update the piece list on each make_move() and undo it in unmake_move()
>2) copy the piece list from the previous ply, update it in make_move.
> Saves the cost for undoing it, but needs copying.
>
>I can see some more advantages to piece lists:
>1) in_check() function will be much more efficient
>2) some speedup in the eval function
>
>
>
>For the actual move generation I can see two choices:
>1) Use the famous 0x88h method. I underestimated the speedup possible by this,
> I think. Comparing modulo operations vs. an 'and 0x88' I first believed that
> the difference is not that dramatic. However, on Intel a modulo requires
> an 'IDIV' operation, taking almost 30 cycles vs. 2 cycles for the and '0x88'.

if nice short code "turns you on", then 0x88 can produce some sleek looking
code. bit-board approaches can be hard to debug (especially those incremental
stuff).

>2) Using move a move table, where possible moves for each piece on every

> square are precomputed. I had concerns on the size of such a table, but an

> estimate I did gave something in the order of 20KB.

for a nice table implementation, take a look at gnuchess. there is a text
file describing their move generation algorithm.

>
>I would much appreciate any comments on this, especially your experiences on

>the possible choice. I am most unclear on the last choice.
>
>--

>Martin....@inf.tu-dresden.de


--
-------------------------------------------------------------------------------
man...@ecf.utoronto.ca | 3rd year Comp Eng., University of Toronto
Valavan Manohararajah |
-------------------------------------------------------------------------------

Joel Rivat

unread,
Apr 29, 1996, 3:00:00 AM4/29/96
to

Remember that you need a fast captures generator, which is called
maybe 10 or 20 times more often than the all-moves-generator. That's
why people have to deal with attacks in one way or another.

Concerning the offset move generator, I don't think the 0x88 method
is that much exiting. It is at most slightly faster than other offset
methods and will introduce the need to map and unmap squares from the
[0...63] to the [0...255] representation and will therefore make the
program difficult to understand and to maintain.

In any case you will use this move generator very few, and need
something else for captures (or you will loop a lot!) so why not
write first a capture generator, introducing the necessary data,
and then use this data (if useful) for the all-move-generator ?

I think the bitboard method is very good and very simple, and will
be widely used when 64 bits computers become the standard.

Joel Rivat

Martin Borriss

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <4lqe70$s...@pelham.cis.uab.edu>, hy...@cis.uab.edu (Robert Hyatt) writes:
> In article <4lq1qm$g...@irz301.inf.tu-dresden.de>,
> Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:

[...]

> -->Ok. People seem to agree that a design involving some form of piece lists
> -->(e.g., data structures keeping the current positions of pieces) are a must
> -->for a good move generator. True?
>
> Yes, because instead of looping over 64 squares to find pieces, you loop
> over a piece-list of 16 elements. If you want to take the time to pack
> it after a capture (maybe only at the root of the tree as we did in Cray
> Blitz) then this loop shrinks after each piece is really removed from the
> board.
>

I was planning to have distinct arrays for each possible kind of piece/pawn,
e.g., for white pawns, black queens etc. Since I still have my imaginary board
I can easily find out what kind of piece moved. There is no need to loop over
16 elements (or (slightly) less, as you point out). Those arrays are always
packed after a capture, yes.


> -->
> -->There is a little choice to make as well. One can either
> -->1) update the piece list on each make_move() and undo it in unmake_move()
>
> On a PC, don't copy. Insufficient memory bandwidth and you'll thrash
> the heck out of cache. Make/UnMake is faster by a significant amount.
> I've don't both in Crafty, currently doing the latter which is best on
> the PC platform.

Thanks! I suspected this, but now there is no need to try ;)

[...]

> -->2) Using move a move table, where possible moves for each piece on every
> --> square are precomputed. I had concerns on the size of such a table, but an
> --> estimate I did gave something in the order of 20KB.
>
> This works, but you *still* have the edge detection problem, as well as where
> to stop sliding a piece along a diagonal, so 0x88 works with pre-computed move
> lists as well as with directly-computed move generators.
>

Why do I still have the edge detection problem? The precomputed list contains
only legal moves,e.g. for an Bishop an f1 it should contain an entry to a list
containing g2,h3 ; an entry to a list containing e2..a6. If I make it to h3 or
a6 there are special values telling me to go on the next diagonal or to stop.
While sliding the piece I have to check whether the square is occupied, if so,
I also go on to next diagonal.


> This is what HiTech hardware uses, special purpose hardware with "all the
> right moves" pre-computed (that's the title of Carl's thesis on the topic.
>

Martin

--
Martin....@inf.tu-dresden.de

Martin Borriss

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <4m20rq$7...@tempo.univ-lyon1.fr>, ri...@caissa.univ-lyon1.fr (Joel Rivat) writes:
> Remember that you need a fast captures generator, which is called
> maybe 10 or 20 times more often than the all-moves-generator. That's
> why people have to deal with attacks in one way or another.
>

Yes. My current 'captures only' generator uses the same 'technology' as
the 'all-moves' generator, so I didn't mention it separately.

> Concerning the offset move generator, I don't think the 0x88 method
> is that much exiting. It is at most slightly faster than other offset
> methods and will introduce the need to map and unmap squares from the
> [0...63] to the [0...255] representation and will therefore make the
> program difficult to understand and to maintain.

Although not exciting, it's the best I can think of ;) Good and *simple*.
I hope that it will be one order faster than my very simple generator
( I wrote the generator the day I decided writing a program ;)) which uses
slow modulo operations for edge detection.

BTW, why do I need to map between different representations? Also, doesn't
the 0x88 method require a 128 square (8*16) board (not 256) ?



> In any case you will use this move generator very few, and need
> something else for captures (or you will loop a lot!) so why not
> write first a capture generator, introducing the necessary data,
> and then use this data (if useful) for the all-move-generator ?
>

Sounds logical. I wanted to do it simultaneously, since I probably have
to change some global data structures (such as fitting the board representation
to the 0x88 method).


> I think the bitboard method is very good and very simple, and will
> be widely used when 64 bits computers become the standard.
>
> Joel Rivat

As for now, I rather stick to what I have some understanding of ;) I am
reluctant to completely rewrite a working program ('Gullydeckel' on GICS).

Martin

--
Martin....@inf.tu-dresden.de

Luis A. Dissett

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <4lqe70$s...@pelham.cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:

> -->2) Using move a move table, where possible moves for each piece on every
> --> square are precomputed. I had concerns on the size of such a table,

> --> but an


> --> estimate I did gave something in the order of 20KB.
>
> This works, but you *still* have the edge detection problem, as well as where
> to stop sliding a piece along a diagonal, so 0x88 works with pre-computed
> move

How about the following: for each sliding piece on every square, store
the number of directions in which that piece can move, a pointer to an
array that contains (in consecutive locations) the lengths of the
``rays'' from that square to the edges, and a pointer to the location
where the squares on that ray are (from the initial square to the
edge). Loop over the directions, getting for each direction all the
squares in that ray until a non-empty square appears. Each empty
square is a possible move. At the end of the search for a direction,
you have to test for a possible capture.

A seudo-C implementation is like this:

Let p be the [sliding] piece you want to move, s the square it is on.

nDirs = nDirections[p][s];
LP = lengthPointer[p][s];
SP = startPointer[p][s];

for (d = 0; d < nDirs; d++) {
nsq = *(LP ++);
moves = SQUARES[*(SP ++)]

for (m = 0; m < nsq; m++)
if (board[newSq = *(moves++)] == EMPTY_SQUARE)
addMove(p,s,newSq);
else
break; /* Stop looking in this direction */

if (board[newSq] is an enemy piece)
addCapture(p,s,newSq);
}

Thus, for example, if you have a Bishop in c1, nDirs[B][c1] = 2,
lengthPointer[B][c1] points to a part of an array containing the
values 2 and 5 (the lengths of the two rays for a Bishop in c1), and
startPointer[B][c1] points to a part of an array that contains two
numbers, say 1000 and 2000. Then SQUARES[1000] = b2, SQUARES[1001] =
a3, SQUARES[2000] = d2, SQUARES[2001] = e3, SQUARES[2002] = f4,
SQUARES[2003] = g5, SQUARES[2004] = h6.

Of course, this just gives the `pseudo-legal' moves. You have to check
for pinnings and the like.

Regards,

Luis


Robert Hyatt

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <LDISSETT.9...@qew.cs.toronto.edu>,
Luis A. Dissett <ldis...@qew.cs.toronto.edu> wrote:

-->In article <4lqe70$s...@pelham.cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:
-->
-->> -->2) Using move a move table, where possible moves for each piece on every
-->> --> square are precomputed. I had concerns on the size of such a table,
-->> --> but an
-->> --> estimate I did gave something in the order of 20KB.
-->>
-->> This works, but you *still* have the edge detection problem, as well as where
-->> to stop sliding a piece along a diagonal, so 0x88 works with pre-computed
-->> move
-->
-->How about the following: for each sliding piece on every square, store
-->the number of directions in which that piece can move, a pointer to an
-->array that contains (in consecutive locations) the lengths of the
-->``rays'' from that square to the edges, and a pointer to the location
-->where the squares on that ray are (from the initial square to the
-->edge). Loop over the directions, getting for each direction all the
-->squares in that ray until a non-empty square appears. Each empty
-->square is a possible move. At the end of the search for a direction,
-->you have to test for a possible capture.
-->
-->A seudo-C implementation is like this:
-->
-->Let p be the [sliding] piece you want to move, s the square it is on.
-->
--> nDirs = nDirections[p][s];
--> LP = lengthPointer[p][s];
--> SP = startPointer[p][s];
-->
--> for (d = 0; d < nDirs; d++) {
--> nsq = *(LP ++);
--> moves = SQUARES[*(SP ++)]
-->
--> for (m = 0; m < nsq; m++)
--> if (board[newSq = *(moves++)] == EMPTY_SQUARE)
--> addMove(p,s,newSq);
--> else
--> break; /* Stop looking in this direction */
-->
--> if (board[newSq] is an enemy piece)
--> addCapture(p,s,newSq);
--> }
-->
-->Thus, for example, if you have a Bishop in c1, nDirs[B][c1] = 2,
-->lengthPointer[B][c1] points to a part of an array containing the
-->values 2 and 5 (the lengths of the two rays for a Bishop in c1), and
-->startPointer[B][c1] points to a part of an array that contains two
-->numbers, say 1000 and 2000. Then SQUARES[1000] = b2, SQUARES[1001] =
-->a3, SQUARES[2000] = d2, SQUARES[2001] = e3, SQUARES[2002] = f4,
-->SQUARES[2003] = g5, SQUARES[2004] = h6.
-->
-->Of course, this just gives the `pseudo-legal' moves. You have to check
-->for pinnings and the like.
-->
-->Regards,
-->
-->Luis
-->

When you implement and study this, you'll find all those array references
are killers. Instead of testing for a boundary condition as before, you
are loading the boundary condition up front (length of a ray) and then
still comparing each type thru the loop. Again, the x88 saves this initial
load only, so it's minor, but measurable...

Robert Hyatt

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <4m7jgg$4...@irz301.inf.tu-dresden.de>,
Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->In article <4lqe70$s...@pelham.cis.uab.edu>, hy...@cis.uab.edu (Robert Hyatt) writes:
-->> In article <4lq1qm$g...@irz301.inf.tu-dresden.de>,

-->> Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->[...]
-->
-->> -->Ok. People seem to agree that a design involving some form of piece lists
-->> -->(e.g., data structures keeping the current positions of pieces) are a must
-->> -->for a good move generator. True?
-->>
-->> Yes, because instead of looping over 64 squares to find pieces, you loop
-->> over a piece-list of 16 elements. If you want to take the time to pack
-->> it after a capture (maybe only at the root of the tree as we did in Cray
-->> Blitz) then this loop shrinks after each piece is really removed from the
-->> board.
-->>
-->
-->I was planning to have distinct arrays for each possible kind of piece/pawn,
-->e.g., for white pawns, black queens etc. Since I still have my imaginary board
-->I can easily find out what kind of piece moved. There is no need to loop over
-->16 elements (or (slightly) less, as you point out). Those arrays are always
-->packed after a capture, yes.
-->

As you are going to find out over the next N years, this issue is going to come
back over and over, because speed is important, and so is flexibility. You will
find better and better data structures as you get everything written, and then
re-written, and ....

However, one minor point for your idea is that you are going to have to loop
over all the pieces at some point in time. In crafty, because of the bitmap
approach, I basically have what you are doing, each type of piece is in its
own bitmap, one for white knights, one for black rooks, etc. This tends to
make code a lot longer, because you have to iterate over multiple lists in
successive blocks of code, rather than having one long loop over the entire
set of pieces. Which is best is likely a subject for much debate...

-->
-->> -->


-->> -->There is a little choice to make as well. One can either

-->> -->1) update the piece list on each make_move() and undo it in unmake_move()
-->>
-->> On a PC, don't copy. Insufficient memory bandwidth and you'll thrash
-->> the heck out of cache. Make/UnMake is faster by a significant amount.
-->> I've don't both in Crafty, currently doing the latter which is best on
-->> the PC platform.
-->
-->Thanks! I suspected this, but now there is no need to try ;)
-->
--> [...]


-->
-->> -->2) Using move a move table, where possible moves for each piece on every

-->> --> square are precomputed. I had concerns on the size of such a table, but an


-->> --> estimate I did gave something in the order of 20KB.
-->>
-->> This works, but you *still* have the edge detection problem, as well as where

-->> to stop sliding a piece along a diagonal, so 0x88 works with pre-computed move
-->> lists as well as with directly-computed move generators.
-->>
-->
-->Why do I still have the edge detection problem? The precomputed list contains
-->only legal moves,e.g. for an Bishop an f1 it should contain an entry to a list
-->containing g2,h3 ; an entry to a list containing e2..a6. If I make it to h3 or
-->a6 there are special values telling me to go on the next diagonal or to stop.
-->While sliding the piece I have to check whether the square is occupied, if so,
-->I also go on to next diagonal.

In other places. For example, does a bishop attack h7? you can fetch your pre-
computed move list, and hunt for h7 in the destination squares, or you can simply
repeatedly add some constant "n" that takes you in that direction for a bishop,
and if you reach h7 you say "yes" if you run off the board, you say "no".

In the eval, "is this rook connected with another rook?" To do this, you need to
look along the rank or file (depending on which you are interested in) looking for
another rook of the same color *or* the edge of the board in that direction. In
bitmaps this is a whole lot easier and faster, as I don't have *any* edge detection
problems anywhere, but that's a different story.

Robert Hyatt

unread,
May 1, 1996, 3:00:00 AM5/1/96
to

In article <4m7km2$4...@irz301.inf.tu-dresden.de>,

Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->In article <4m20rq$7...@tempo.univ-lyon1.fr>, ri...@caissa.univ-lyon1.fr (Joel Rivat) writes:
-->> Remember that you need a fast captures generator, which is called
-->> maybe 10 or 20 times more often than the all-moves-generator. That's
-->> why people have to deal with attacks in one way or another.
-->>
-->
-->Yes. My current 'captures only' generator uses the same 'technology' as
-->the 'all-moves' generator, so I didn't mention it separately.
-->
-->> Concerning the offset move generator, I don't think the 0x88 method
-->> is that much exiting. It is at most slightly faster than other offset
-->> methods and will introduce the need to map and unmap squares from the
-->> [0...63] to the [0...255] representation and will therefore make the
-->> program difficult to understand and to maintain.

I didn't do this in Cray Blitz. I just kept from and to as 0-127 and wasted
the two extra bits needed, rather than map them back into the range 0-63 to
save a bit here and there. On the Cray, memory was *never* an issue, and with
64bit words, neither was saving a bit or two. :)

The main advantage is to avoid loading a piece value from an illegal square,
since that's a memory cycle and is horribly slow on todays computers. Instead,
you can test the new "to" square itself, which is already in a register since
you just added some offset to it to produce the new value. The difference in
the test is 1 clock vs 28 clocks on a Cray... maybe 1 vs 19 on a fast PC.

-->
-->Although not exciting, it's the best I can think of ;) Good and *simple*.
-->I hope that it will be one order faster than my very simple generator
-->( I wrote the generator the day I decided writing a program ;)) which uses
-->slow modulo operations for edge detection.
-->
-->BTW, why do I need to map between different representations? Also, doesn't
-->the 0x88 method require a 128 square (8*16) board (not 256) ?
-->

yes. there actually is another 128 "virtual" squares that don't need to be
allocated since their offsets are illegal (high order bit = 1).

-->> In any case you will use this move generator very few, and need
-->> something else for captures (or you will loop a lot!) so why not
-->> write first a capture generator, introducing the necessary data,
-->> and then use this data (if useful) for the all-move-generator ?
-->>
-->
-->Sounds logical. I wanted to do it simultaneously, since I probably have
-->to change some global data structures (such as fitting the board representation
-->to the 0x88 method).
-->
-->> I think the bitboard method is very good and very simple, and will
-->> be widely used when 64 bits computers become the standard.
-->>
-->> Joel Rivat
-->
-->As for now, I rather stick to what I have some understanding of ;) I am
-->reluctant to completely rewrite a working program ('Gullydeckel' on GICS).

Always a good choice, too. :) In Cray Blitz, we had a "gencap" flag that would
make the move generator discard any potential move if the target square was empty,
so that we only got captures out of it. This helped in the q-search move ordering
since we didn't have to move a lot of non-capture moves around to get the captures
to the top of the list, or have to skip over a lot of non-captures while searching
for the next capture move to try.

In any case, there's something to be said for "comprehensible code". After a year,
bitmaps are simple to understand. For the first 6 months, they are a nightmare.

Tom Kerrigan

unread,
May 2, 1996, 3:00:00 AM5/2/96
to

You need a 128 square board for the 0x88 trick to work, not 256.

There is no law stating that a chess program must use a 64 square board. I never
use one in Stobor, so I never have to convert between board sizes.

Saving a piece list for every type of piece has a few drawbacks. First, every list
will need 10 elements, because of promotions. Second, when looping through your
piece list you need several loops, instead of just one. Third, you need another
array just to remember how many elements you have in each list. The upshot is that
this may not be such a hot idea.

I've been thinking about GNU-style piece lists. I think two arrays of 48 integers
may work well. One is the list of squares the pieces occupy. White pieces start at
element 0, white pawns are at 16, black pieces are at 24, and black pawns are at
40. The second array is to mark captured pieces. I suppose if a piece is captured,
you could set a bit in the integer with the square, but setting and testing this
bit sounds quite a bit slower than just keeping another array.

Cheers,
Tom

_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-

All Finagle Laws may be bypassed by learning the simple art of doing
without thinking.

Joel Rivat

unread,
May 2, 1996, 3:00:00 AM5/2/96
to

Sure there is no law stating that one should use a [0...63] square
representation. Howewer this choice simplifies the understanding of
the program (at least for me ! :) ).

The main idea in my previous post was that you need first a capture
generator. If the x88 method really helps for that then use it.
Otherwise forget it!

Joel Rivat


Vincent Diepeveen

unread,
May 2, 1996, 3:00:00 AM5/2/96
to

In <LDISSETT.9...@qew.cs.toronto.edu> ldis...@qew.cs.toronto.edu (Luis A. Dissett) writes:

>In article <4lqe70$s...@pelham.cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:
>
>> -->2) Using move a move table, where possible moves for each piece on every
>> --> square are precomputed. I had concerns on the size of such a table,

>> --> but an


>> --> estimate I did gave something in the order of 20KB.

For PC:

int movearray[2][6][64], *a;
a = movearray[side][piece];
...

is slower than or equal:

int piecelist[2][8][64],*a;
a = movearray[side][piece];
...

that 8 <-> 6 difference can mean that it uses

'imul' for 6, which is a slow instruction.

For this reason my arrays for move generation are larger than
they should be, and so are a lot of other arrays.

Move generation takes about 6% of the total systemtime in Diep.
5% is generating the whole list, 1% is generating about only the capture
list.

>> This works, but you *still* have the edge detection problem, as well as where

>> to stop sliding a piece along a diagonal, so 0x88 works with pre-computed
>> move
>

>How about the following: for each sliding piece on every square, store

>the number of directions in which that piece can move, a pointer to an

>array that contains (in consecutive locations) the lengths of the

>``rays'' from that square to the edges, and a pointer to the location

>where the squares on that ray are (from the initial square to the

>edge). Loop over the directions, getting for each direction all the

>squares in that ray until a non-empty square appears. Each empty

>square is a possible move. At the end of the search for a direction,

>you have to test for a possible capture.
>

>A seudo-C implementation is like this:
>

>Let p be the [sliding] piece you want to move, s the square it is on.
>

> nDirs = nDirections[p][s];

> LP = lengthPointer[p][s];

> SP = startPointer[p][s];
>

> for (d = 0; d < nDirs; d++) {

> nsq = *(LP ++);


> moves = SQUARES[*(SP ++)]
>

> for (m = 0; m < nsq; m++)

> if (board[newSq = *(moves++)] == EMPTY_SQUARE)

> addMove(p,s,newSq);
> else


> break; /* Stop looking in this direction */
>

> if (board[newSq] is an enemy piece)

> addCapture(p,s,newSq);


> }
>
>Thus, for example, if you have a Bishop in c1, nDirs[B][c1] = 2,

>lengthPointer[B][c1] points to a part of an array containing the

>values 2 and 5 (the lengths of the two rays for a Bishop in c1), and

>startPointer[B][c1] points to a part of an array that contains two

>numbers, say 1000 and 2000. Then SQUARES[1000] = b2, SQUARES[1001] =

>a3, SQUARES[2000] = d2, SQUARES[2001] = e3, SQUARES[2002] = f4,

>SQUARES[2003] = g5, SQUARES[2004] = h6.
>

>Of course, this just gives the `pseudo-legal' moves. You have to check

>for pinnings and the like.
>

>Regards,
>
>Luis
>
--
+--------------------------------------+
|| email : vdie...@cs.ruu.nl ||
|| Vincent Diepeveen ||
+======================================+

Robert Hyatt

unread,
May 2, 1996, 3:00:00 AM5/2/96
to

In article <4m94a5$e...@europa.frii.com>,
Tom Kerrigan <kerr...@frii.com> wrote:
-->You need a 128 square board for the 0x88 trick to work, not 256.
-->
-->There is no law stating that a chess program must use a 64 square board. I never
-->use one in Stobor, so I never have to convert between board sizes.
-->
-->Saving a piece list for every type of piece has a few drawbacks. First, every list
-->will need 10 elements, because of promotions. Second, when looping through your
-->piece list you need several loops, instead of just one. Third, you need another
-->array just to remember how many elements you have in each list. The upshot is that
-->this may not be such a hot idea.
-->
-->I've been thinking about GNU-style piece lists. I think two arrays of 48 integers
-->may work well. One is the list of squares the pieces occupy. White pieces start at
-->element 0, white pawns are at 16, black pieces are at 24, and black pawns are at
-->40. The second array is to mark captured pieces. I suppose if a piece is captured,
-->you could set a bit in the integer with the square, but setting and testing this
-->bit sounds quite a bit slower than just keeping another array.
-->

This is exactly how Cray Blitz worked. Saved looping over the board, and
saved keeping "last" pointers for each type of piece.

We did have a last for pawns and last for pieces, which were set at the
root to eliminate testing unused entries after a piece or pawn was *really*
captured...

Martin Borriss

unread,
May 3, 1996, 3:00:00 AM5/3/96
to

In article <4m94a5$e...@europa.frii.com>, kerr...@frii.com (Tom Kerrigan) writes:

> Saving a piece list for every type of piece has a few drawbacks. First, every list

> will need 10 elements, because of promotions. Second, when looping through your

> piece list you need several loops, instead of just one. Third, you need another

> array just to remember how many elements you have in each list. The upshot is that

> this may not be such a hot idea.
>

Mercilessly pointing out the drawbacks of the idea, Tom ;)
The question is whether some advantages make it worth it. Since I recently wrote
the Make_Move_On_PieceList function:
1) Updating the piece list should be easier (faster as you like) since I know the
type of piece moved; and deal only with its sub-list.
2) How is capturing handled? I don't see anything better than looping through
the list of enemy pieces, looking for the captured piece and killing it off.

The 'number of elements' array is needed for captures ('packing' the list, as
Bob Hyatt named it) and for promotions. Otherwise, each array is terminated by
a special element.
To be a little picky - not every list needs 10 elements (only for rooks, bishops
and knights). I am willing to spend the additional 20-30 Bytes, no problem
even for the DOS version :).

> I've been thinking about GNU-style piece lists. I think two arrays of 48 integers

> may work well. One is the list of squares the pieces occupy. White pieces start at

> element 0, white pawns are at 16, black pieces are at 24, and black pawns are at

> 40. The second array is to mark captured pieces. I suppose if a piece is captured,

> you could set a bit in the integer with the square, but setting and testing this

> bit sounds quite a bit slower than just keeping another array.
>

The conclusion is that I definitely have to look at the GNU sources before
making further guesses :) Not so easy to get the sources over here (at least in
finite time). AFAIK, there are not in ftp.math.uni-hamburg.de (which is a
mirror of some chess-site).

Martin

--
Martin....@inf.tu-dresden.de

Martin Borriss

unread,
May 3, 1996, 3:00:00 AM5/3/96
to

In article <LDISSETT.9...@qew.cs.toronto.edu>, ldis...@qew.cs.toronto.edu (Luis A. Dissett) writes:
> In article <4lqe70$s...@pelham.cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:
>
> > -->2) Using move a move table, where possible moves for each piece on every
> > --> square are precomputed. I had concerns on the size of such a table,
> > --> but an
> > --> estimate I did gave something in the order of 20KB.
> >

This matches my understanding of the move table exactly. No edge detection is
required here. I am not sure if the trouble generating this precomputed data
is worth the switch from the simpler 0x88 implementation. Currently I think
it is not worth it, but I love to hear arguments if the opposite is true...


Martin

--
Martin....@inf.tu-dresden.de

Martin Borriss

unread,
May 3, 1996, 3:00:00 AM5/3/96
to

In article <4m84d3$4...@pelham.cis.uab.edu>, hy...@cis.uab.edu (Robert Hyatt) writes:
> In article <4m7jgg$4...@irz301.inf.tu-dresden.de>,
> Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:

> As you are going to find out over the next N years, this issue is going to come
> back over and over, because speed is important, and so is flexibility. You will
> find better and better data structures as you get everything written, and then
> re-written, and ....
>

I am sure that you are right. All I want is a reasonable program. I can imagine
that I stop the day the program wins a slow blitz match against me...

[...]

> -->Why do I still have the edge detection problem? The precomputed list contains
> -->only legal moves,e.g. for an Bishop an f1 it should contain an entry to a list
> -->containing g2,h3 ; an entry to a list containing e2..a6. If I make it to h3 or
> -->a6 there are special values telling me to go on the next diagonal or to stop.
> -->While sliding the piece I have to check whether the square is occupied, if so,
> -->I also go on to next diagonal.
>
> In other places. For example, does a bishop attack h7? you can fetch your pre-
> computed move list, and hunt for h7 in the destination squares, or you can simply
> repeatedly add some constant "n" that takes you in that direction for a bishop,
> and if you reach h7 you say "yes" if you run off the board, you say "no".
>
> In the eval, "is this rook connected with another rook?" To do this, you need to
> look along the rank or file (depending on which you are interested in) looking for
> another rook of the same color *or* the edge of the board in that direction. In
> bitmaps this is a whole lot easier and faster, as I don't have *any* edge detection
> problems anywhere, but that's a different story.
>

Understood. Thank you for explaining this.

Martin


--
Martin....@inf.tu-dresden.de

Robert Hyatt

unread,
May 3, 1996, 3:00:00 AM5/3/96
to

In article <4mcd08$l...@irz301.inf.tu-dresden.de>,
Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->In article <LDISSETT.9...@qew.cs.toronto.edu>, ldis...@qew.cs.toronto.edu (Luis A. Dissett) writes:

-->> In article <4lqe70$s...@pelham.cis.uab.edu> hy...@cis.uab.edu (Robert Hyatt) writes:
-->>
-->
-->This matches my understanding of the move table exactly. No edge detection is
-->required here. I am not sure if the trouble generating this precomputed data
-->is worth the switch from the simpler 0x88 implementation. Currently I think
-->it is not worth it, but I love to hear arguments if the opposite is true...
-->
-->
-->Martin


But you are doing edge detection. You simply load the "edges" as counter limits
to control the loops. However, loading that very limit is just as bad performance
wise as anything else. The 0x88 idea tests something that's already in a processor
register, which is a fairly small savings but which adds up...

And don't forget, when you hit evaluations, you have to look up and down ranks,
files, diagonals, etc. You also have to test for endpoints there as well, and
looking it up in the move table is bad.

Also, don't forget that arrays of arrays (2d arrays) have extra computations
to calculate an offset from the start of an array, which is also costly.

Robert Hyatt

unread,
May 3, 1996, 3:00:00 AM5/3/96
to

In article <4mceen$m...@irz301.inf.tu-dresden.de>,

Martin Borriss <mb...@irz.inf.tu-dresden.de> wrote:
-->
-->In article <4m94a5$e...@europa.frii.com>, kerr...@frii.com (Tom Kerrigan) writes:
-->
-->> Saving a piece list for every type of piece has a few drawbacks. First, every list
-->> will need 10 elements, because of promotions. Second, when looping through your
-->> piece list you need several loops, instead of just one. Third, you need another
-->> array just to remember how many elements you have in each list. The upshot is that
-->> this may not be such a hot idea.
-->>
-->
-->Mercilessly pointing out the drawbacks of the idea, Tom ;)
-->The question is whether some advantages make it worth it. Since I recently wrote
-->the Make_Move_On_PieceList function:
-->1) Updating the piece list should be easier (faster as you like) since I know the
-->type of piece moved; and deal only with its sub-list.
-->2) How is capturing handled? I don't see anything better than looping through
-->the list of enemy pieces, looking for the captured piece and killing it off.

There's an easy way that we used in Cray Blitz. Another board, but each square
points to the corresponding piece list entry (if applicable) or is zero if there's
no piece on that square.

To capture a piece on board[n], you simply get piece_list_pointer[n] which
points to the piece list entry which represents the piece on board[n]. Of
course, this is yet another slow memory reference, but it's better than
scanning the whole piece list for a captured piece...

Steven J. Edwards

unread,
May 4, 1996, 3:00:00 AM5/4/96
to

Here's something about move generators you should think about before
you write the first line of code:

Why do you think that only *one* generator is allowable?

Also,

Why do you think that only *one* move representation is needed?

Spector has three move generators specialized by task.

The main generator is bitboard based and has a fairly complex move
representation; complex enough that pointers to moves are used more
often than passing the move structure itself.

The high speed tactical search used for move ordering and fast mate
detection at class one nodes thoughout the regular search. It uses
offset generation and has a simple move representation. It runs about
seven times faster than the regular search as it knows nothing about
position evaluation or other niceties.

The tablebase generator has a move generator similar but slightly
simpler than that in the tactical search. It also has a matched
retrogenerator that generates all possible moves that could have
reached the current position.

-- Steven (s...@mv.mv.com)


Tom Kerrigan

unread,
May 5, 1996, 3:00:00 AM5/5/96
to

Hum, I would have

int list_sub[64]; /* or 128, w/ 0x88 */

such that list_sub[x] is the subscript of the piece list element for the piece on
square x. Not sure this would be faster, but I do think it would cause fewer
headaches.

Cheers,
Tom

_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-

As soon as we started programming, we found to our surprise that it
wasn't as easy to get programs right as we had thought. Debugging had
to be discovered. I can remember the exact instant when I realized
that a large part of my life from then on was going to be spent in
finding mistakes in my own programs.
-- Maurice Wilkes discovers debugging, 1949

Urban Koistinen

unread,
May 5, 1996, 3:00:00 AM5/5/96
to

Vincent Diepeveen (vdie...@cs.ruu.nl) wrote:

: For PC:


:
: int movearray[2][6][64], *a;
: a = movearray[side][piece];
: ...

: is slower than or equal:
:
: int piecelist[2][8][64],*a;
: a = movearray[side][piece];
: ...

: that 8 <-> 6 difference can mean that it uses

: 'imul' for 6, which is a slow instruction.

: For this reason my arrays for move generation are larger than
: they should be, and so are a lot of other arrays.

int piecelist[6][2][64];
might win both speed and space.

Vincent Diepeveen

unread,
May 8, 1996, 3:00:00 AM5/8/96
to

Smart!

However it fails: you can store:

xgen = &piecelist[color][0][0];

in a pointer xgen, and save instructions.

In all cases: optimizing sucks!
In fact it is a waste of time, but necessary i'm afraid.

How far is the fast endgamedatabase program project currently?

Vincent Diepeveen
vdie...@cs.ruu.nl

Tom Kerrigan

unread,
May 9, 1996, 3:00:00 AM5/9/96
to

A necessary waste of time, Vincent? :)

It seems to me that half the fun of writing a chess program is getting it to run
faster than the next guy's.

Cheers,
Tom

_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-

Deck Us All With Boston Charlie

Deck us all with Boston Charlie,
Walla Walla, Wash., an' Kalamazoo!
Nora's freezin' on the trolley,
Swaller dollar cauliflower, alleygaroo!

Don't we know archaic barrel,
Lullaby Lilla Boy, Louisville Lou.
Trolley Molly don't love Harold,
Boola boola Pensacoola hullabaloo!
-- Walt Kelly

Urban Koistinen

unread,
May 9, 1996, 3:00:00 AM5/9/96
to

Vincent Diepeveen (vdie...@cs.ruu.nl) wrote:

: In <4minh7$g...@oden.abc.se> m10...@abc.se (Urban Koistinen) writes:

: >Vincent Diepeveen (vdie...@cs.ruu.nl) wrote:
: >
: >: For PC:
: >:
: >: int movearray[2][6][64], *a;
: >: a = movearray[side][piece];
: >: ...
: >
: >: is slower than or equal:
: >:
: >: int piecelist[2][8][64],*a;
: >: a = movearray[side][piece];
: >: ...
: >
: >: that 8 <-> 6 difference can mean that it uses
: >
: >: 'imul' for 6, which is a slow instruction.
: >
: >: For this reason my arrays for move generation are larger than
: >: they should be, and so are a lot of other arrays.
: >
: >int piecelist[6][2][64];
: >might win both speed and space.

: Smart!

: However it fails: you can store:

: xgen = &piecelist[color][0][0];

: in a pointer xgen, and save instructions.

Why not:
xgen = &piecelist[0][color][0];
?

: In all cases: optimizing sucks!

It can be fun too.

: In fact it is a waste of time, but necessary i'm afraid.

Like chess then.

: How far is the fast endgamedatabase program project currently?

My plan is now to do:
KK, KXK, KXXK, KPK, KPPK, KXPK, ...
where X is any one of "QRBNqrbn" and P is any one of "Pp".
I have done KK and part of KXK.
Also, I have wasted time optimizing 6-man endgames further.

Reply all
Reply to author
Forward
0 new messages