Here is an update of recent work on Arasan, my Windows chess
program. Version 2.2 was released last December and is publically
available (on caissa.onenet.net, among other places).
I have spent the last few months working on the search engine, trying to
improve speed and performance. I tried some experiments with a bitboard
move representation, but I have backed off on that for now.
With bitboards, I got a nice reduction in tree size (bitboards allow a
more accurate swap-down analysis than I had before, and this improves
move ordering and allows better "culling" of losing captures in the
quiescence search).
However, currently the program relies pretty heavily on incrementally
generated attack information, and I didn't have that implemented to work
with bitboards, so I had to compute attacks as needed. Unfortunately,
the current program "needs" to do this much more often than Crafty does,
so the loss from computing attacks more than balanced out the gains from
bitboards. This is fixable, I'm sure, but I've put off further work on it
for now.
Then I tried some cleanup on the search module. Version 2.2 does a
limited quiescence search (generally 4 plies). It uses a "threat score"
function to do two things: help determine when to extend the search
further, and penalize apparently hung, trapped or pinned pieces at
terminal nodes. I took out the threat function and now use an
unlimited capture search (actually, the limit is 40 plies, but it doesn't
ever get there). Of course, when I first did this, the tree size blew
up, but I have gradually fixed things to the point where it's well-
controlled.
Making this change has improved nodes/sec. by about a factor of two, and
the tree sizes are also generally smaller. Version 2.2 sometimes finds
winning combinations one ply before Version 3.0 does, but overall 3.0
is faster and better. E.g. 2.2 solves about 237 of the WAC problems at
1 minute/move (Pentium 60), 3.0 solves about 250.
I have also put in two killer moves per ply (2.2 had one) and
implemented the history heuristic. The order of moves is now: principal
variation, winning captures and promotions (in order of estimated gain),
killers, and history moves (sorted on history scores). This hasn't
produced enormous benefits over the old move ordering scheme, but it is
measurably better.
I tried a few things that didn't work out. The quiescence search currently
doesn't use the hash table. I have found that if I store all the quiescence
nodes in the table, the table fills up fast with nodes that aren't really
much use (the regular search can't use them). If I just have the qsearch
query the hash table but not insert nodes, or if I insert only the first
couple of plies of quiescence nodes into the table, the tree size goes
down by 5% or so but the overhead of searching the table eliminates any
benefit. Your mileage may vary, but I haven't had any luck here.
Also I have tried recapture extensions (a la Crafty) with mixed results.
Performance on the WAC test suite goes up a little bit but it does worse
on the Bratko-Kopec tests. Not clearly a win, so far.
Check extensions are interesting, too. Generally these do seem to help,
but there are some positions where they hurt. For example, Fine's
Basic Chess Endings position #20:
8/8/8/3P1k/7P/3K1p/8/8 b - -
Arasan 3.0 finds the solution (Kf4) in 8 ply. With check extensions on it
takes 205868 nodes and 27 seconds; without check extensions it takes
89355 nodes and 11 seconds.
I think the biggest challenge in doing all this has been getting the bugs
out. I had a nasty one where the move sorting routine would drop a move
out of the list and put a duplicate move from the list in its place. Nothing
obvious broke, but performance went to pieces and nothing I did seemed to
help. I finally found it by looking through a 4 Meg trace file to find out
why the qsearch was so big on one problem, and noticed the duplicate
moves.
There were some pretty nasty bugs in the evaluation function, too - again,
nothing obvious: it doesn't crash or make illegal moves, it just plays
badly. Those are the hardest to find.
I also had an off-by-one test in the hash retrieval code (> sign instead of
>=) which I found only because I noticed it was failing WAC 38, which is
one of the easier problems.
The new version should be released some time this summer.
--Jon
--
Jon Dart
jd...@netcom.com
--
Eugene V. Debs for President!
You had the same experience I did. Bitboards were initially worse, and took
two significant things to make them fast: (1) working with them until it
became second nature to find cute bit-twiddling tricks and masks that make
some test absolutely, incredibly efficient (the square of the pawn is Crafty
is one single AND operation... as opposed to a procedure in Cray Blitz.)
(2) working out and implementing the rotated bitmap stuff that makes the
pre-computed attack stuff unnecessary, because it's actually faster to compute
them as needed since they are simply table lookups now.
Don't despair. Crafty started at 5K nodes per sec on the Sparc 20, and ended
up at 30K. And there's still more to go. I still think it can be made 2x
faster over time. We'll see if I can deliver.
-->
-->Then I tried some cleanup on the search module. Version 2.2 does a
-->limited quiescence search (generally 4 plies). It uses a "threat score"
-->function to do two things: help determine when to extend the search
-->further, and penalize apparently hung, trapped or pinned pieces at
-->terminal nodes. I took out the threat function and now use an
-->unlimited capture search (actually, the limit is 40 plies, but it doesn't
-->ever get there). Of course, when I first did this, the tree size blew
-->up, but I have gradually fixed things to the point where it's well-
-->controlled.
-->
-->Making this change has improved nodes/sec. by about a factor of two, and
-->the tree sizes are also generally smaller. Version 2.2 sometimes finds
-->winning combinations one ply before Version 3.0 does, but overall 3.0
-->is faster and better. E.g. 2.2 solves about 237 of the WAC problems at
-->1 minute/move (Pentium 60), 3.0 solves about 250.
-->
-->I have also put in two killer moves per ply (2.2 had one) and
-->implemented the history heuristic. The order of moves is now: principal
-->variation, winning captures and promotions (in order of estimated gain),
-->killers, and history moves (sorted on history scores). This hasn't
-->produced enormous benefits over the old move ordering scheme, but it is
-->measurably better.
-->
-->I tried a few things that didn't work out. The quiescence search currently
-->doesn't use the hash table. I have found that if I store all the quiescence
-->nodes in the table, the table fills up fast with nodes that aren't really
-->much use (the regular search can't use them). If I just have the qsearch
-->query the hash table but not insert nodes, or if I insert only the first
-->couple of plies of quiescence nodes into the table, the tree size goes
-->down by 5% or so but the overhead of searching the table eliminates any
-->benefit. Your mileage may vary, but I haven't had any luck here.
-->
I'll do some more testing here. I've been somewhat spoiled, in that Cray
Blitz typically runs on a machine with 16gb of main memory. (yes, that's
a "GB" and not a "MB" you see there.. :) ) As a result, that's 1.5 billion
entries for Cray Blitz, which is much more than it can search in any reasonable
game (at 1M nodes per sec that's 1500 seconds.)
I've run some tests and generally found it slightly better to hash the entire
q-search. The advantage is that you eliminate evaluate calls if nothing else
by transposing the order of captures. The tree size doesn't change by much,
but it's a little faster for me. I do things a little differently in Crafty
than most, which might affect this as well. For example, I try the trans/ref
move before generating anything, and I try the killer moves before generating
the non-capture moves. This, too, saves a few move generations. I took the
hash move first stuff out of the quiescence search and it costs about 3% in
crafty to not have that move to try before the move generation. The tree
doesn't shrink or expand much, but quite a few q-nodes only need one move to
fail high, and Crafty does it without generating anything.
-->Also I have tried recapture extensions (a la Crafty) with mixed results.
-->Performance on the WAC test suite goes up a little bit but it does worse
-->on the Bratko-Kopec tests. Not clearly a win, so far.
This is a difficult one. Ken Thompson tried it in 1983 and didnt like
it. I've been using it for a long time and, in general, found it good,
so long as you get enough depth that the penalty doesn't drop you into
searches that are too shallow. My advice is to always test each of these,
as two very similar programs can produce different results easily.
-->
-->Check extensions are interesting, too. Generally these do seem to help,
-->but there are some positions where they hurt. For example, Fine's
-->Basic Chess Endings position #20:
-->
-->8/8/8/3P1k/7P/3K1p/8/8 b - -
-->
-->Arasan 3.0 finds the solution (Kf4) in 8 ply. With check extensions on it
-->takes 205868 nodes and 27 seconds; without check extensions it takes
-->89355 nodes and 11 seconds.
There are two issues here. There are things that make a program solve a
problem better, and there are things that make a program play games better.
The best way to test this is to play on a chess server, so you can see the
effects of your changes in real games. Often the performance difference
between how it does on problems and how it does in games is dramatic. Then,
you have to decide which is most important.
-->
-->I think the biggest challenge in doing all this has been getting the bugs
-->out. I had a nasty one where the move sorting routine would drop a move
-->out of the list and put a duplicate move from the list in its place. Nothing
-->obvious broke, but performance went to pieces and nothing I did seemed to
-->help. I finally found it by looking through a 4 Meg trace file to find out
-->why the qsearch was so big on one problem, and noticed the duplicate
-->moves.
-->
-->There were some pretty nasty bugs in the evaluation function, too - again,
-->nothing obvious: it doesn't crash or make illegal moves, it just plays
-->badly. Those are the hardest to find.
-->
-->I also had an off-by-one test in the hash retrieval code (> sign instead of
-->>=) which I found only because I noticed it was failing WAC 38, which is
-->one of the easier problems.
-->
-->The new version should be released some time this summer.
-->
-->--Jon
-->--
-->Jon Dart
Good luck working on this. It's often fun, often frustrating, and
always difficult to make progress at times.
Bob
--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170
Cheers,
Tom
_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-
WARNING TO ALL PERSONNEL:
Firings will continue until morale improves.
Last I counted, there was a maximum of 32 pieces.
Maybe there's a connection here somewhere? :)
Sorry, I couldn't resist. But still, unless you are
including something other than captures in your capture
search...
dan
No, no, it makes sense !
If you count the pawn promotion as a 'capture' material change,
then 8 pawns to promote and 32 pieces to capture = 40 ply.
But, more accurately: 8 pawns promoting for either side = 16 (actually
not possible since they have to get past each other + the 32 (30, without
kings), gives 48 ply max depth for these kind of extensions.
Chris Whittington
Jon Dart has been doing this longer than you have, Tom.
bruce
of course, Richard Lang also limits the search depth for the quiescence
captures in Genius. Guess he's "out to lunch, too" ??? Seems to be
pretty damn effective for him. :) I just wish he'd cut it completely
out.
Indeed. It seems everybody has. :) I'm not quite sure this is a refutation to my
comment.
Hum, after the "12 ply" Genius quiescence search, doesn't Lang use a
"conventional" capture search? I think this is the only way Mark Lefler could
explain a few Genius PVs when he was really examining this issue.
Cheers,
Tom
_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-
If you think the problem is bad now, just wait until we've solved it.
-- Arthur Kasspe
No Idea. He also has singular extensions which can greatly extend some of
the PV's that I see. I've always assumed 4*12 means just that... 4 plies
full-width, 12 more plies to spend on captures, checks, threats and whatever
else he does in there. However, limiting the qsearch is not necessarily a
bad idea. There are bad ways to do it, but it'd take quite an "authority"
to convince me that it can't be done. I'd rather not dismiss anything
casually, but, instead, either test the ideas myself or let someone else
report the results they obtain. If everything's bad, just because no one
is currently doing it, then there's not going to be much progress made in
this domain.
Cheers,
Tom
_______________________________________________________________________________
Tom Kerrigan kerr...@frii.com O-
Living on Earth may be expensive, but it includes an annual free trip
around the Sun.
> Hum, after the "12 ply" Genius quiescence search, doesn't Lang use a
> "conventional" capture search? I think this is the only way Mark Lefler could
> explain a few Genius PVs when he was really examining this issue.
I asked this question to Richard some years ago and his answer was
"What is a capture/quiescence search ?".
Personnally I think that he is using a very well tuned Swap-Off
algorithm. But only Richard can say it! And he does not say too much...
--
From: Jean-Christophe Weill
NEC Research Institute (Room 2f07) 4 Independence Way Princeton NJ 08540
USA
Phone: (609) 951-2792 Email: j...@research.nj.nec.com
Fax: (609) 951-2482 ("for J.C. Weill")
I think noone can explain the PVs from Genius but Richard Lang.
I tried to find some rules or something while watching Genius play and,
believe me, I tried really hard, but I don't have a clue what he is
doing.
Well, I have a small clue, but that doesn't help me a lot :-(
Did you ever play Genius at level 0*12 and watch his PVs?
Amazing what he sees instantly.
Still a long way to go.
Stefan
I can think of several that do this, Crafty included (not limiting the
capture search depth at all.) However, that doesn't make it "right", just
means it's "popular". It seems that Lang doesn't even have a "capture
search at all." If true, that's about as harsh a limit as you could come
up with... :)