I wonder if anyone can tell me about the performance of software move
generators. I've found it pretty hard to benchmark them (from the ones I
found source code for) separate from the rest of the engines.
Pointers to papers on the subject are welcome too!
Thanks,
Ties.
Regards Dan Andersson
"Ties Bos" <tb...@huygens.org> skrev i meddelandet
news:8p5930$46s$1...@ares.cs.utwente.nl...
>> I've spent some time coding a move generator for chess on a bunch of
> FPGAs.
>> I hope some of the chess engines can benefit from its speed (couple
> of
>> million psuedo-legal moves per second).
To give you a quick comparison, from the opening position, crafty can
generate about 5M moves per second on a PII/400 notebook (66mhz FSB).
In a typical middlegame position, it can generate about 7M moves per
second on that same machine.
> What are FPGAs? CPUs? Did you publish the code? I would be very
> interested, if it is written in C. Could you email it or post it here?
Field Programmable Gate Arrays. They are a sort-of special-purpose piece
of hardware that can be configured to do lots of things... they aren't
based on the CPU concept, nor do they execute instructions as you would
think of them.
>> I wonder if anyone can tell me about the performance of software
> move
>> generators. I've found it pretty hard to benchmark them (from the
> ones I
>> found source code for) separate from the rest of the engines.
For crafty, run it on the platform of your choice and type "perf"
to have it tell you how fast it can generate, and then generate/make/unmake
moves.
> Even moves per second are difficult to compare, that's true.
>> Pointers to papers on the subject are welcome too!
> See the code for the bitmap move generator in the chess program
> CRAFTY.
> --
> Alexa
--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170
FPGA=Field Programmable Gate Array. It's a device usually programmed in VHDL,
not C. It's a collection of gates and other low level logic that is very
popular for many types of signal processing applications (amoung other things).
Bob Pawlak
Chess Program Reviews
http://members.aol.com/rjpawlak/ch_widow.html
http://www.chessopolis.com
Ok, I tried 'perf' with crafty on my PIII 500 MHz. It says it can
generate/make/unmake about 2 Million moves/second.
From my FPGA I can do the same (except for castling and enpassent captures,
although generated, these need a little more CPU to actually make/unmake
since more than 2 squares are involved) at a rate of about 5 Million
moves/second. Besides generating the moves, the FPGA also in parallel keeps
track of material counts and generates 64-bit hashes for the current
position of all pieces and also of just the pawns.
Does the 'perf' test in crafty include some of the material counts or hashes
too?
The FPGA doesn't really generate all moves at once of course, it generates
one at a time such that high-value captures come first and non-captures come
last as I hope is useful for most a-b based search algorithms. It is also
possible to check whether a particular move is valid from the current
position such that a move coming from the transposition table can be
validated.
Currently I am awaiting a new PCB that closely connects the FPGA and a CPU.
Indeed the PCI interface does not allow me to run the FPGA at full speed. It
is idling most of the time since the interface is too slow.
I am in the process of writing down what I built, and will put things
including the VHDL code onto the web once I am done.
Thanks for your input,
Ties.
"Dan Andersson" <rhq...@tninet.se> wrote in message
news:8p5qft$fji$1...@cubacola.tninet.se...
> Ok, I tried 'perf' with crafty on my PIII 500 MHz. It says it can
> generate/make/unmake about 2 Million moves/second.
> From my FPGA I can do the same (except for castling and enpassent captures,
> although generated, these need a little more CPU to actually make/unmake
> since more than 2 squares are involved) at a rate of about 5 Million
> moves/second. Besides generating the moves, the FPGA also in parallel keeps
> track of material counts and generates 64-bit hashes for the current
> position of all pieces and also of just the pawns.
> Does the 'perf' test in crafty include some of the material counts or hashes
> too?
The Make/UnMake part of perf does _all_ of that. Make/UnMake updates all
the material counts, the hash keys, chess board, 50 move rule counter,
etc.
Actually you can make it generate all moves when you're in evaluation. These
moves can be put to good use. (specially if you get them for free )
Tony
If I can write to memory independent of the main CPU on my next board, I
could actually do so. Right now I only have a PCI slave interface which
really is a bottleneck. With DMA, these moves could indeed be almost 'free'
as you would only need a memory-read per move to get them as the FPGA could
put them in memory while the main CPU is evaluating, as you suggested.
I'll definitely keep that in mind.
Ties.
What is the cost of your FPGA board?
Urban Koistinen - now into electronics
> From my FPGA I can do the same (except for castling and enpassent captures,
> although generated, these need a little more CPU to actually make/unmake
> since more than 2 squares are involved) at a rate of about 5 Million
> moves/second.
Rather disappointing -- it's only on the order of 2 times faster than
PIII 500 MHz, which I suspect is a standard PC today.
What is it that takes so much time?
> Does the 'perf' test in crafty include some of the material counts or hashes
> too?
Yes, everything that's done for an ordinary move.
--
Anders Thulin Anders....@telia.se 040-10 50 63
Telia Prosoft AB, Box 85, S-201 20 Malmö, Sweden
The Move Generator is primarily based on Belle's hardware from 1981. About
equal time goes into finding a victim and an aggressor.
All victims are found at once and about 1/3 of the time goes into the
compare tree that selects the highest-value victim.
From timing analyses we found that the design is mostly slowed down by
high-fanout circuits and lots of routing delays.
I can note here that my 5M moves/second are full moves, including both a
move and an unmove. On current Xilinx Virtex chips pre-routing timing
analyses says we can expect the speed to go up to about 12M moves/second.
I expect the Pentium figures also include full moves and that moves are
prioritized such that ordering is failry good for Alpha-Beta searching.
Right? Maybe we did this the wrong way, but I can't really see how to speed
up the circuit another order of magnitude or so.
Please let me know what you think,
Ties.
> From timing analyses we found that the design is mostly slowed down by
> high-fanout circuits and lots of routing delays.
I don't recall the Belle algorithm, so I can't comment well on this.
But it suggests some kind of structure problem, I think.
> On current Xilinx Virtex chips pre-routing timing
> analyses says we can expect the speed to go up to about 12M moves/second.
A factor 6 ... that's more like it. Se below.
> I expect the Pentium figures also include full moves and that moves are
> prioritized such that ordering is failry good for Alpha-Beta searching.
> Right?
Full moves certainly -- but that's more a question of representation
than of any fundmental algorithm problem. Don't know about ordering -- I've
never studied Crafty that closely.
> Maybe we did this the wrong way, but I can't really see how to speed
> up the circuit another order of magnitude or so.
Wrong way ... don't know about that. Perhaps it's just that I expected
a hardware implementation to be faster on at least an order of magnitude,
preferrably two. It used to be, but perhaps it's not that simple anymore.
Perhaps speed is not an important factor to you.
I assume you know about the bitboard-based move generation that Crafty
uses (essentially a series of 64-bit binary operations, followed by
a conversion from bitboard to 'move' format), and I assume that you can
compare that approach with both the Belle algorithm and the limitations
of FPGA.
> I don't recall the Belle algorithm, so I can't comment well on this.
> But it suggests some kind of structure problem, I think.
It uses a 64-square state... and the find victim / find attacker operations
have to look through squares for sliding pieces. That is one delay. The
other is that you have to (using combinatorics) select the most valuable
victim and least valuable attacker, which turns into a tree.
> Full moves certainly -- but that's more a question of representation
> than of any fundmental algorithm problem. Don't know about ordering -- I've
> never studied Crafty that closely.
One thing that can't be done in the "belle" approach is things like
"killer moves", "hash table move", "history moves" and so forth. That
hurts. For the same ply depth, (comparing Belle with Cray Blitz, where
Ken and I used very similar search extensions, etc) his tree was about 8-10x
bigger than mine. But he was 8-10x faster too, so it was a "wash" in that
regard.
> Wrong way ... don't know about that. Perhaps it's just that I expected
> a hardware implementation to be faster on at least an order of magnitude,
> preferrably two. It used to be, but perhaps it's not that simple anymore.
> Perhaps speed is not an important factor to you.
FPGAs aren't great devices for doing this. Belle was based on something
similar (PLAs). DT/Deep Blue used the same approach but used an ASIC
device which gives way more flexibility in hiding some of the delays.
> I assume you know about the bitboard-based move generation that Crafty
> uses (essentially a series of 64-bit binary operations, followed by
> a conversion from bitboard to 'move' format), and I assume that you can
> compare that approach with both the Belle algorithm and the limitations
> of FPGA.
Belle didn't use bitmaps. It had a 64-word board, where each square on
the board was one complete circuit that could enumerate the squares
attacked by the piece on that square, and it could find all the pieces
that were attacking that square.
Ties.