Does Unmake Move Really Save Time?

Adrian Jackson

未讀,

2001年3月7日晚上7:55:122001/3/7

收件者：

I've noticed that many web tutorials on minimax-type search
algorithms, and most chess source code I've seen, feature an "unmake"
move. Now, I've looked at some of that code (although I don't really
understand it right now). My question: does going through all the
contortions to unmake a move really save time versus, say, just
copying over the new move with the old one? Or is there some
benefit(s) to unmake that I'm missing?

Robert Hyatt

未讀,

2001年3月7日晚上10:07:082001/3/7

收件者：

The problem is the PC architecture. The copy-Make approach was the first
thing I tried in Crafty, because it worked perfectly on the Cray. I never
paid any attention to the "PeeCee" architecture back then, and only discovered
the memory bottleneck after I wrote the first version. The copy-Make was
way over twice as slow as the Make-UnMake approach I now use...

Hope that helps...

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

Adrian Jackson

未讀,

2001年3月8日凌晨1:03:562001/3/8

收件者：

On 8 Mar 2001 03:07:08 GMT, Robert Hyatt <hy...@crafty.cis.uab.edu>
wrote:

>Adrian Jackson <n...@spam.com> wrote:
Re: subject line....

>
>The problem is the PC architecture. The copy-Make approach was the first
>thing I tried in Crafty, because it worked perfectly on the Cray. I never
>paid any attention to the "PeeCee" architecture back then, and only discovered
>the memory bottleneck after I wrote the first version. The copy-Make was
>way over twice as slow as the Make-UnMake approach I now use...
>
>Hope that helps...

Well, it helps me decide to invest the time figuring unmake out. :)

Could you explain the "memory bottleneck" you're referring to? Is it
that, for Crafy, unmake has to write a lot less memory than a full
copy-over? Enough to cover the overhead of the logic involved in
unmaking a move?

ssalmine

未讀,

2001年3月8日上午8:27:012001/3/8

收件者：

> The problem is the PC architecture. The copy-Make approach was the first
> thing I tried in Crafty, because it worked perfectly on the Cray. I never
> paid any attention to the "PeeCee" architecture back then, and only discovered
> the memory bottleneck after I wrote the first version. The copy-Make was
> way over twice as slow as the Make-UnMake approach I now use...

Do you happen to remember how big your position structure was back then?
I'm using the simple copy approach now and I use about 15 bitboards in
my structure. It would be interesting to know where the line goes: how
bigh should the structure be to just unmake. 15 bitboards mean about 120
bytes to copy and VC seems to copy using rep stosd. That is about 30
repetitions.

Severi

Jeffrey A. Young

未讀,

2001年3月8日上午9:56:102001/3/8

收件者：

In article <qimdatcroce8e4qfn...@4ax.com>,

"Copying over the new move with the old one" will have to do all the
contortions of unmake anyway, right? In addition to remaking the
"old" move. And the unmake functionality is going to be the same
no matter what the "old" move is, so it makes sense from a software
design perspective to just factor that functionality out into its
own function/procedure/subroutine/module.

I'm not sure what you're thinking, but the only alternative to the
above that I see is to save the entire current board data structure
before making either move, so that you can just "go back" to that
data structure rather than doing an "unmove" to reproduce it. And
that will be prohibitively expensive in memory space very quickly
as you go down the tree of moves.

The only "contortions" needed in an implementation of unmove
are: having to save en-passant status, having to save what piece
was captured if any, and having to save any change in castling
rights if any (in addition to the to and from squares of course).

Jeff

Robert Hyatt

未讀,

2001年3月8日上午11:09:402001/3/8

收件者：

The issue is how much memory do you have to "copy" before you make a move,
so that instead of having to "unmake" you simply use the copy you had before
the make was done. A PC can sustain maybe 25M words per second through
memory. Maybe more for faster bus speeds, but that is not important.

A 1ghz processor wants to execute 2 instructions every nanosecond. That
means 8 bytes per nanosecond for the instruction fetches, plus the data
needed by those instructions. If you can hig around 500K nodes per second
on that machine, that is 500K copy operations per second. Depending on
the size of your chess board/data structures supporting it, this might be
a 100 byte copy or a 500 byte copy. Either is going to get in the way
of the CPU-to-memory instruction stream.

As a result, you have to spend a world of time trying to control your
memory traffic to avoid letting this bottleneck get out of hand...

Robert Hyatt

未讀,

2001年3月8日上午11:12:082001/3/8

收件者：

> Severi

My first guess would be 15 bitboards (6 for white, 6 for black, 1 for all
bishops/queens, 1 for all rooks/queens, and one for occupied squares. I
think I actually had 16 (occupied white and occupied black). That is
16*8 bytes. I also had a 64 byte array for the board itself to make it
easier to find which piece is on which square. So say 200 bytes and you
would be pretty close after adding in the 50-move counter, EnPassant and
castling status, etc.

Adrian Jackson

未讀,

2001年3月8日下午5:02:472001/3/8

收件者：

On 8 Mar 2001 09:56:10 -0500, jyo...@steptools.com (Jeffrey A. Young)
wrote:

>In article <qimdatcroce8e4qfn...@4ax.com>,
>Adrian Jackson <n...@spam.com> wrote:
>> I've noticed that many web tutorials on minimax-type search
>>algorithms, and most chess source code I've seen, feature an "unmake"
>>move. Now, I've looked at some of that code (although I don't really
>>understand it right now). My question: does going through all the
>>contortions to unmake a move really save time versus, say, just
>>copying over the new move with the old one? Or is there some
>>benefit(s) to unmake that I'm missing?
>
>"Copying over the new move with the old one" will have to do all the
>contortions of unmake anyway, right? In addition to remaking the
>"old" move. And the unmake functionality is going to be the same
>no matter what the "old" move is, so it makes sense from a software
>design perspective to just factor that functionality out into its
>own function/procedure/subroutine/module.

By contortions I meant the logic and move/capture histories required
by the program to figure out how to take back the move. A copy-over
seems much simpler to program and maintain in comparison.

>I'm not sure what you're thinking, but the only alternative to the
>above that I see is to save the entire current board data structure
>before making either move, so that you can just "go back" to that
>data structure rather than doing an "unmove" to reproduce it.

Just copy the parent node into a temp node, make the temp node a
successor by applying the next move to it, and send that successor
node to the recursive search function. Next time around, we do the
same thing--we just overwrite the temp node with the parent node, make
the next move on the temp node, send the temp node to the search.

>And
>that will be prohibitively expensive in memory space very quickly
>as you go down the tree of moves.

It's just one extra state structure per ply, isn't it?

>The only "contortions" needed in an implementation of unmove
>are: having to save en-passant status, having to save what piece
>was captured if any, and having to save any change in castling
>rights if any (in addition to the to and from squares of course).

That sounds reasonable.

Adrian Jackson

未讀,

2001年3月8日下午5:16:062001/3/8

收件者：

On 8 Mar 2001 16:09:40 GMT, Robert Hyatt <hy...@crafty.cis.uab.edu>
wrote:

Okay, that makes sense, at least for a game like chess. Thanks!

My goal right now is to write a general-purpose two-player, zero-sum
game search agent into which I can plug any such game I want without
having to change a line of the search agent's code. I've written a
simple (negamax, alpha-beta, evaluation-based move-ordering)
preliminary of this in Java, although I'm considering moving it to C++
(for speed). Before I do that, though, I want to work out some of the
kinks. One of those is whether to use make/unmake or make/copy (my
current method if vastly less efficient than either).

That's more than you wanted to know, I'm sure. :)

Robert Hyatt

未讀,

2001年3月9日凌晨12:59:122001/3/9

收件者：

I am not sure a "general" search is feasible. It can be done, but it
is not going to do well in most games. As it would seem you want to
make it very generic. Yet for chess, you want to implement specific
search extensions that are based on characteristics of the game. IE
extend if in check, for certain passed pawn cases, certain threat
conditions, etc. For checkers those would be completely meaningless
and you would do something different, particularly like extending on
captures since they are forced. For other games, you will probably
find far different things to extend on...

food for thought if nothing else.. :)

Robert Hyatt

未讀,

2001年3月9日凌晨1:00:442001/3/9

收件者：

This works. But slowly. The PC can't stand that kind of memory
traffic in addition to the normal instruction fetches..

>>And
>>that will be prohibitively expensive in memory space very quickly
>>as you go down the tree of moves.

> It's just one extra state structure per ply, isn't it?

Yes. Memory is cheap. You probably can't search beyond 60 plies, and
60 times anything reasonable is still reasonable. But the cost of
copying over and over is really tough to handle on a PC.

>>The only "contortions" needed in an implementation of unmove
>>are: having to save en-passant status, having to save what piece
>>was captured if any, and having to save any change in castling
>>rights if any (in addition to the to and from squares of course).

> That sounds reasonable.

Adrian Jackson

未讀,

2001年3月9日凌晨2:55:222001/3/9

收件者：

On 9 Mar 2001 06:00:44 GMT, Robert Hyatt <hy...@crafty.cis.uab.edu>
wrote:

>This works. But slowly. The PC can't stand that kind of memory

>traffic in addition to the normal instruction fetches..

Well, I went ahead and figured out/implemented unmake in my program,
and much to my surprise, it led to a seven-fold (!) increase in speed.

>>>And
>>>that will be prohibitively expensive in memory space very quickly
>>>as you go down the tree of moves.
>
>> It's just one extra state structure per ply, isn't it?
>
>Yes. Memory is cheap. You probably can't search beyond 60 plies, and
>60 times anything reasonable is still reasonable. But the cost of
>copying over and over is really tough to handle on a PC.

Apparently! Anyway, thanks for the insight, Robert! :)

Adrian Jackson

未讀,

2001年3月9日凌晨3:15:342001/3/9

收件者：

On 9 Mar 2001 05:59:12 GMT, Robert Hyatt <hy...@crafty.cis.uab.edu>
wrote:

>I am not sure a "general" search is feasible. It can be done, but it

>is not going to do well in most games. As it would seem you want to
>make it very generic. Yet for chess, you want to implement specific
>search extensions that are based on characteristics of the game. IE
>extend if in check, for certain passed pawn cases, certain threat
>conditions, etc. For checkers those would be completely meaningless
>and you would do something different, particularly like extending on
>captures since they are forced. For other games, you will probably
>find far different things to extend on...
>
>food for thought if nothing else.. :)

Yeah, there are some problems I haven't decided how to deal with yet.
I'd like to implement as much general-purpose stuff as I can, though.
Alpha-beta minimax with a zero-window search, quiescence serach, a
transposition table, and various move-ordering heuristics. That'd be
a good foundation to have without having to write (and debug) any of
the code for it. Perhaps there could even be provisions for opening
books and EGTBs.

Surely it's worth a shot? :)

Urban Koistinen

未讀,

2001年3月9日清晨5:03:362001/3/9

收件者：

RH> Adrian Jackson <n...@spam.com> wrote:
RH> > Could you explain the "memory bottleneck" you're referring
RH> to? Is it > that, for Crafy, unmake has to write a lot less
RH> memory than a full > copy-over? Enough to cover the overhead of
RH> the logic involved in > unmaking a move?

RH> The issue is how much memory do you have to "copy" before you
RH> make a move, so that instead of having to "unmake" you simply
RH> use the copy you had before the make was done. A PC can sustain
RH> maybe 25M words per second through memory. Maybe more for
RH> faster bus speeds, but that is not important.

Don't copy before, make makemove write in the new place.
Don't use main memory, use cache memory instead. (much faster)

RH> A 1ghz processor wants to execute 2 instructions every
RH> nanosecond. That means 8 bytes per nanosecond for the
RH> instruction fetches, plus the data needed by those
RH> instructions.

You have changed units from words to bytes.
That is confusing. Why did you do that?

RH> If you can hig around 500K nodes per second on
RH> that machine, that is 500K copy operations per second.
RH> Depending on the size of your chess board/data structures
RH> supporting it, this might be a 100 byte copy or a 500 byte
RH> copy. Either is going to get in the way
RH> of the CPU-to-memory instruction stream.

Your lower limit is to large.
With bitboards it might be:
4*occupied, white, kings, queens, rooks, bishops, knights, pawns,
and say 16 bytes for metainfo like hash, game50, ep status etc.

With make/unmake memory cost is:
Make:
8 bitmaps read and written
Unmake:
8 bitmaps read and written
Total: 16

With a copying make_move memory cost is:
13 bitmaps read and written

That is: 16 vs 13

Without the queen bitmap (using B&R for that) it would be 16+ vs 12.
With a bitmap/piece type&colour it would be 14 vs 18.

So, depending on your data structures,
a copying make_move can be faster.

Urban Koistinen

Robert Hyatt

未讀,

2001年3月9日上午11:36:522001/3/9

收件者：

Sure.. I'm just thinking of the problem issues. IE in Chess a q-search
might follow captures only. Or it might include checks and check-evasion
moves. In checkers, you _must_ make a capture if it is possible, while in
chess it is optional. That will change the rules of the q-search significantly.

Would certainly be interesting to try, perhaps finding a way to bury all the
'specific' sorts of rules in a separate module..

Robert Hyatt

未讀,

2001年3月9日上午11:47:292001/3/9

收件者：

Urban Koistinen <I...@abc.se> wrote:
> RH> Adrian Jackson <n...@spam.com> wrote:
> RH> > Could you explain the "memory bottleneck" you're referring
> RH> to? Is it > that, for Crafy, unmake has to write a lot less
> RH> memory than a full > copy-over? Enough to cover the overhead of
> RH> the logic involved in > unmaking a move?

> RH> The issue is how much memory do you have to "copy" before you
> RH> make a move, so that instead of having to "unmake" you simply
> RH> use the copy you had before the make was done. A PC can sustain
> RH> maybe 25M words per second through memory. Maybe more for
> RH> faster bus speeds, but that is not important.

> Don't copy before, make makemove write in the new place.
> Don't use main memory, use cache memory instead. (much faster)

I don't know how to control that from within a C program. Or even within
an asm program. Cache is transparent from the programmer's point of view.

> RH> A 1ghz processor wants to execute 2 instructions every
> RH> nanosecond. That means 8 bytes per nanosecond for the
> RH> instruction fetches, plus the data needed by those
> RH> instructions.

> You have changed units from words to bytes.
> That is confusing. Why did you do that?

because most people think in terms of words. multiply by 4 and everything
stays the same. 25M words per second, 100M bytes per second... I use the
two interchangably without thinking about it...

> RH> If you can hig around 500K nodes per second on
> RH> that machine, that is 500K copy operations per second.
> RH> Depending on the size of your chess board/data structures
> RH> supporting it, this might be a 100 byte copy or a 500 byte
> RH> copy. Either is going to get in the way
> RH> of the CPU-to-memory instruction stream.

> Your lower limit is to large.
> With bitboards it might be:
> 4*occupied, white, kings, queens, rooks, bishops, knights, pawns,
> and say 16 bytes for metainfo like hash, game50, ep status etc.

each of those is 8 bytes. (occupied, etc.) 12 bitboards for the individual
pieces, 4 more for occupied, total=16. 16*8=128 bytes absolute minimum. In
my case (and in the case of many others that have done this) I choose to
also keep a 64 byte array indicating the value of each piece on each specific
square, to avoid having to search thru the bitmaps to see what I am actually
capturing. we are now at 200 bytes total and still growing...

> With make/unmake memory cost is:
> Make:
> 8 bitmaps read and written
> Unmake:
> 8 bitmaps read and written
> Total: 16

You have to update the bitboard for the piece being moved. If it is a
capture, you have to update the bitboard for the piece being removed.
Then you have to update 1 occupied square bitboard for sure. You might
have to update the others worst case, but not every time. The best case
updates only 2, which happens for king, knight and pawn moves that are
not captures. Updating all the occupied squares bitmaps only happens for
the case of a queen moving and capturing another queen, which is so rare as
to be non-bothersome effectively.

> With a copying make_move memory cost is:
> 13 bitmaps read and written

> That is: 16 vs 13

No. With Copy you copy them _all_. Every time. With a make/unmake you
copy less than half worst case, and less than 1/4 best case. That is a
significant difference.

> Without the queen bitmap (using B&R for that) it would be 16+ vs 12.
> With a bitmap/piece type&colour it would be 14 vs 18.

> So, depending on your data structures,
> a copying make_move can be faster.

> Urban Koistinen

I don't believe it can _ever_ be faster. In a few rare cases, they might be
close. Butyour update count for make/unmake was way too pessimistic. For
copy/make you have the worst case _every_ time. For make/unmake you do
1/4 of that memory traffic to 1/2 of that memory traffic, worst case.

Severi Salminen

未讀,

2001年3月9日下午1:34:422001/3/9

收件者：

> > Do you happen to remember how big your position structure was back then?
> > I'm using the simple copy approach now and I use about 15 bitboards in
> > my structure. It would be interesting to know where the line goes: how
> > bigh should the structure be to just unmake. 15 bitboards mean about 120
> > bytes to copy and VC seems to copy using rep stosd. That is about 30
> > repetitions.
>
> > Severi
>
> My first guess would be 15 bitboards (6 for white, 6 for black, 1 for all
> bishops/queens, 1 for all rooks/queens, and one for occupied squares. I
> think I actually had 16 (occupied white and occupied black). That is
> 16*8 bytes. I also had a 64 byte array for the board itself to make it
> easier to find which piece is on which square. So say 200 bytes and you
> would be pretty close after adding in the 50-move counter, EnPassant and
> castling status, etc.

I rechecked and I'm using 16 bitboards (including 3 rotated). And in addition to
those 8*16=128 bytes I also have to copy the current board in the beginning of
searching the current node. So for every move to be searched I have to copy
128+n*128 bytes of information. That begins to sound quite a lot. If I wrote a
separate unmake() I could increase the number of bitboards to further increase
the efficiency of search. I'll probably write it tomorrow and report the results
at CCC and here.

Severi

Urban Koistinen

未讀,

2001年3月10日上午9:53:362001/3/10

收件者：

RH> You have to update the bitboard for the piece being moved. If
RH> it is a capture, you have to update the bitboard for the piece
RH> being removed. Then you have to update 1 occupied square
RH> bitboard for sure. You might have to update the others worst
RH> case, but not every time. The best case updates only 2, which
RH> happens for king, knight and pawn moves that are not captures.
RH> Updating all the occupied squares bitmaps only happens for the
RH> case of a queen moving and capturing another queen, which is so
RH> rare as to be non-bothersome effectively.

I would have thought that every move makes a difference to the
occupied bitmap and so to all permutations of it.
Do you mean that the rotated bitmaps are not the bits of
of the occupied bitmap in a different order?
Perhaps you mean something else by occupied bitboard than
one bit per square set to 1 for squares with something on it
and 0 for empty squares?

Strange.

Urban Koistinen

Robert Hyatt

未讀,

2001年3月10日上午11:00:002001/3/10

收件者：

Urban Koistinen <I...@abc.se> wrote:
> RH> You have to update the bitboard for the piece being moved. If
> RH> it is a capture, you have to update the bitboard for the piece
> RH> being removed. Then you have to update 1 occupied square
> RH> bitboard for sure. You might have to update the others worst
> RH> case, but not every time. The best case updates only 2, which
> RH> happens for king, knight and pawn moves that are not captures.
> RH> Updating all the occupied squares bitmaps only happens for the
> RH> case of a queen moving and capturing another queen, which is so
> RH> rare as to be non-bothersome effectively.

> I would have thought that every move makes a difference to the
> occupied bitmap and so to all permutations of it.

I have 4 occupied square bitmaps. one for all black pieces, one for all
white pieces, one for all bishops/queens, and one for all rooks/queens.
non-captures only change one or two of them. captures change two or
more.

> Do you mean that the rotated bitmaps are not the bits of
> of the occupied bitmap in a different order?

I wasn't even thinking of the rotatated bitmaps since rotated was not mentioned
and not everybody is using them. But if you do, then yes, there are three more
for the three rotations and these are updated for every move.

> Perhaps you mean something else by occupied bitboard than
> one bit per square set to 1 for squares with something on it
> and 0 for empty squares?

No... your idea is right. Just the original discussion didn't mention
"rotated" so I assumed they were not being used (there are several non-rotated
bitmap programs in development by different people).

> Strange.

> Urban Koistinen