SEE ordering: this is Static Exchange Evaluation capture ordering
throughout the tree (normal and quiescence search). In the normal
search, winning and even captures are considered early, but all
captures are considered. In the quiescence search, all captures
that SEE computes as winning material are considered. All captures
that SEE computes as losing material are pruned with *no* consideration
at all. Even exchanges are only considered during the first few plies
of the quiescence search (either 2 or 4, I'm not exactly sure.)
MVV/LVA ordering: this is the Most Valuable Victim, Least Valuable
Attacker (or Aggressor, depending on your taste). All captures are
searched in the following order:
PxQ NxQ BxQ RxQ QxQ KxQ
PxR NxR BxR RxR QxR KxR
PxB NxB BxB RxB QxB KxB
PxN NxN BxN RxN QxN KxN
PxP NxP BxP RxP QxP KxP
In order to avoid modifying Crafty extensively, I simply replaced the
call to Swap() with the following:
This produced sort values of (say) PxQ=9600, RxQ=9300, etc. As a result,
after sorting the captures, we get the above order. No captures were
excluded in the quiescence search, and, in fact, *all* captures are
searched in the regular search before any non-captures, killers, etc.
total positions searched.......... 300
number right...................... 276
number wrong...................... 24
percentage right.................. 92
percentage wrong.................. 8
total nodes searched.........135695898
average search depth.............. 7.5
total positions searched.......... 300
number right...................... 265
number wrong...................... 35
percentage right.................. 88
percentage wrong.................. 11
total nodes searched.........146342459
average search depth.............. 7.1
The results: The SEE approach exhibited the following: First, SEE
solved 276 in 1 minute or less on the sparc 20, which searches roughly
10K nodes per second at present. MVV/LVA solved only 265, or 11 less.
Due to (a) better ordering and (b) forward pruning in the q-search,
SEE searched an average of 7.5 plies, while MVV/LVA only reached 7.1
plies. This number is interesting, but not significant for comparison
to other programs, because many of the WAC problems are mates that are
seen in 1 ply searches. This average depth is simply an average of the
max iteration number for each problem, divided by 300. So, SEE gives,
for this problem set, almost 1/2 ply. Further analysis will probably
show that it's a little better than this, since many of the problems
are quick for both approaches (mates).
For performance measuring, notice that SEE searched about 10 million
fewer nodes in the same amount of time (1 min per move except for mates
that terminate the search quickly.) This means that MVV/LVA searched
roughly 10% faster than SEE. This point supports my reminder that as
move ordering gets worse, program performance (nodes per second) gets
"better." A good reason why simply counting nodes per second is not a
particularly informative comparison.
This is a first cut at analysis, and seems to support SEE ordering and
SEE forward pruning in the q-search. However, the missing information,
which I'll produce shortly, is an analysis of the difference between the
two algorithms. Clearly SEE solved more, but it will be interesting to
see which ones it solved that MVV/LVA didn't, and why this happened. I
suspect that the extra depth is the reason. Even more interesting will
be to analyze any positions (if they exist, and they probably do) where
MVV/LVA either (a) solved problems that SEE did not, or (b) solved
problems at a shallower depth than SEE. This is going to be relatively
time-consuming, since the log files are around 15,000 lines each, and
will have to be analyzed position by position to figure out which ones
If you get the current version of Crafty from willis.cis.uab.edu, you
can play with this yourself. mvv=0 will force SEE ordering (as above)
while mvv=1 will switch everything to MVV/LVA. I will eventually cull
the switch, as it is not quite free, but for now it provides a useful
test mechanism if you find a position that seems to favor MVV/LVA.
Note that this test is somewhat different than the test that produced
the results I posted last week, based on the kopec positions. Rather
than searching to a fixed depth to measure the difference in the size
of the trees, I searched to a fixed time, to see if one technique would
accrue some sort of depth advantage or solution advantage over the other
in a mode that more closely resembles playing chess games. The final
chapter hasn't been written, of course, but, as I've said before, SEE
"appears" to work for me (and others.)
For a caveat, the major reason for running this test is that the new
version of crafty is now computing the attack bitmaps "on the fly" by
using the rotated bitboards described previously. This means that I can
now attribute cpu cycles to the "user" of those cycles directly, rather
than make_move() looking like a "hog." Swap is now in the 5%-10% range
(of total search time) making it "interesting" to consider throwing it
away. Obviously, throwing away 10% (max) is not going to pay for itself
if it costs 1/2 a ply, so I'm going to try tweaking Swap() some to see
if it can be made any faster, but it looks like it is going to have to
stay. For the record, notice that solving 11 problems more than MVV/LVA
is only a 3.6% improvement, so it's not like it's overwhelming or anything,
it's only somewhat better. the 1/2 ply is more significant, however, as
that translates to searching deeper which we know helps to play better
For these tests, Crafty was running in "normal" mode, with a tiny
transposition table (maybe 64K entries total), and it's normal move
ordering and evaluation code and search extensions enabled, including
null-move with R=2. The only variable was the above move ordering
Any comments or other things to try?
Your test results duplicate results that I obtained with an
experimental program on the same test database. My overall
conclusion was that implementing SEE made the program as
good as using MVV/LVA with twice the time limit. Thus, I
credited SEE as being a "factor of two" improvement.
I have recently revised my opinion to "the methods are equal."
The reason for my change of mind is that my test program was
not implemented as an MVV/LVA program "from the ground up."
It was a SEE program that had been modified (debilitated, that is)
to produce SEE ordering. Bob is doing the same with Crafty and
getting the same results.
What makes a program well-suited for SEE? It amounts to having
the attackto and attackfrom bitboards readily available, either
because of incremental updating or through the rotated-bitboard
method. If you compute attacks anyway, then a static-exchange
evaluator is not that hard (and very inexpensive) to implement
and you get a good gain.
What makes a program well-suited for MVV/LVA? You have to
design the program to consume each move as it
is generated, thereby saving a ton of memory bandwidth.
Compared to SEE, an MVV/LVA program will
1) Not pack the from and two squares into a structure, and
2) Not write that structure out to memory, and
3) Not read that structure back in to prioritize it, and
4) Not write that structure out in priority order, and
5) Not read that structure back in during make_move(), and
6) Not decode that structure to recover the squares and flags, and
7) Not write back out information needed to undo that move, and
8) Not read that information back in when undoing the move, and
9) Not decode that information into from and two squares.
This saves a lot of round trips to memory. If you are on a typical modern
CPU (even a Pentium qualifies under my liberal definition of "modern")
then your memory bandwidth is the crucial limitation in your processing
speed. Saving 9 trips to memory can make your program a lot faster.
A later program (also a test program) was designed to use
MVV/LVA from the start. And do you know what happened?
The program was twice as fast! So having SEE gets you the
equivalent of a factor of two from pruning, but MVV/LVA
can get you a factor of two in search speed if you design
your program to use that approach from scratch.
Bottom line: I believe both methods make sense. The lesson I draw from
my experience is to choose your method and then push it to the
limit using every available resource. In particular:
1) If you have cheaply available attack bitboards, then you should
use SEE, since the knowledge gained is worth a factor of two.
2) If you build your program for pure speed, then stick to MVV/LVA
and do a full-width capture search (pruned only using futility
of course), since that method minimizes your make-move expense.
One additional tantalizing prospect: if you really want to make a
breakthrough, just think up a way to do q-search pruning
without needing attack tables. If you can do it, you might get to
keep both factors of two.
Interesting, but I have one important question: Crafty is currently spending
less than 10% of the total search time in generating and making moves, which
is the very part that "on the fly MVV/LVA" is going to speed up. Where did
your factor of two come from in your program? IE, what "modules" were made
faster, and how do they compare "before" and "after"? If I could generate/make
moves instantly, Crafty would be 10% faster. If you include sorting moves,
picking them out of the list in some order, etc. that's *far* less than 20%
of the total search time (at present, anyway.) Note that Crafty's SEE "bonus"
comes from (a) better move ordering [seems to be 10% for kopec problems, I'll
try to compute it for the complete WAC set as well] and from tree size
reduction produced by forward-pruning losing captures [according to SEE] in
the quiescence search. For the WAC problems, it looks like a factor of 3
from my analysis so far, taking search time for SEE and dividing into the
search time for MVV/LVA for specific problems typically produces this number.
>What makes a program well-suited for MVV/LVA? You have to
>design the program to consume each move as it
>is generated, thereby saving a ton of memory bandwidth.
>Compared to SEE, an MVV/LVA program will
> 1) Not pack the from and two squares into a structure, and
> 2) Not write that structure out to memory, and
> 3) Not read that structure back in to prioritize it, and
> 4) Not write that structure out in priority order, and
> 5) Not read that structure back in during make_move(), and
> 6) Not decode that structure to recover the squares and flags, and
> 7) Not write back out information needed to undo that move, and
> 8) Not read that information back in when undoing the move, and
> 9) Not decode that information into from and two squares.
>This saves a lot of round trips to memory. If you are on a typical modern
>CPU (even a Pentium qualifies under my liberal definition of "modern")
>then your memory bandwidth is the crucial limitation in your processing
>speed. Saving 9 trips to memory can make your program a lot faster.
comments: (1) may not be avoidable... trans/ref table, killer table, etc.
all require the move in a compact format for storage. (7-9) shouldn't be
done for any program. Since the board data structure is so small, it's
been far more efficient for me to simply let make_move() create a new board
structure, and then "toss it" rather than unmaking. I do this now in Crafty,
I did the make/unmake in Cray Blitz. With *my* code, Crafty's more efficient
(note the emphasis on *my* here, there might be better implementations that
solve this.) (3-6) might well be offset by things you have to do to produce
one move at a time, such as somehow remember the last move so you can produce
the next one. No matter what you do, you are going to have to store and load
something at least once, and probably twice to figure out which move to
generate next. I suspect that it's still a wash, when you consider that I
don't have to unmake() which is roughly the same amount of work (memory
accesses) as make() (at least in Cray Blitz, the two were roughly equivalent
in computation time).
>A later program (also a test program) was designed to use
>MVV/LVA from the start. And do you know what happened?
>The program was twice as fast! So having SEE gets you the
>equivalent of a factor of two from pruning, but MVV/LVA
>can get you a factor of two in search speed if you design
>your program to use that approach from scratch.
An interesting aside: you could still use SEE for capture ordering, and
switch to "on the fly" generation for the rest of the moves, and take
advantage of a large part of the speed-up since the capture search no
longer dominates the non-captures as significantly.
Another question: I assume that you produced moves "on the fly" for all
types of moves, not just captures? If so, how do you implement things
like the history heuristic or whatever? I can understand killer moves,
since I try them before generating non-capturing moves anyway, but for
things like a history move, or checking moves (in the q-search) this
appears to incur a fairly high cost. Early versions of Crafty used this
"generate on the fly" approach, but I discarded it when it began to get
overly complicated as I worked on move ordering and special move inclusion
in the q-search. Note that this is *not* a criticism of your post. If
I overlooked something, I'm interested as on several occasions, I have canned
an idea, then later found out that it could be made to work well if done in
a different way. I'm interested in making sure that I haven't looked under
a rock somewhere that I didn't notice.
>Bottom line: I believe both methods make sense. The lesson I draw from
>my experience is to choose your method and then push it to the
>limit using every available resource. In particular:
> 1) If you have cheaply available attack bitboards, then you should
> use SEE, since the knowledge gained is worth a factor of two.
> 2) If you build your program for pure speed, then stick to MVV/LVA
> and do a full-width capture search (pruned only using futility
> of course), since that method minimizes your make-move expense.
>One additional tantalizing prospect: if you really want to make a
>breakthrough, just think up a way to do q-search pruning
>without needing attack tables. If you can do it, you might get to
>keep both factors of two.
The rotated bitboard stuff is a step in this direction. Producing an
attack vector is now very inexpensive, then "anding" it with the opponent's
occupied squares produces the captures very cheaply. One idea that's been
mentioned on occasion is to take this bitmap, tuck it away, and avoid producing
a stream of <from><to> moves as you mentioned above. Of course, it doesn't
look so hot when talking about SEE since it's necessary to step through all
the captures and evaluate each one in turn before they are ever made in the
search. However, it would eliminate lots of work in Generate_Moves() since
the loops would be dumped.
This looks like the classic optimization trade-off. The incremental attack
boards were great because they were always there, but they had a fairly high
cost (a lot lower after rotated bitboards) but still noticable. Going to the
dynamically computed attack bitmaps took Make_Move() out of the gprof listing
pretty much, but a lot of Make_Move()'s "cycles" were added elsewhere as the
attack bitmaps are now computed "where used". The first cut produced a minor
speedup. Further gains are coming as I evaluate what I was doing before and
make conscious decisions to eliminate things that might or might not help the
evaluation, but which now have a measurable impact on performance. Before,
*any* attack tests were completely free, so I used them in places where they
worked, but where other approaches would be just as good. Making the above
change to the way moves are generated will simply move computation from the
move generator to somewhere else. The gain occurs when you move this
computation out, and then can avoid doing it in cases due to alpha/beta
pruning. It pays off big at type-2 nodes, but not at the type-1 and type-3
nodes. I plan on continuing to look at such performance issues for a good
while as it is not clear to me (yet) exactly which approach is best.
Thanks for the input, as I said the question is not closed yet.
More as I analyze the data from the WAC test set.
Interesting, but all it said is that SEE pruning cannot be ruled out as
a bad proposition in Crafty's case. Two problems with the test: (1) is
the speed of mvv/lva optimized? I don't think so. In my case, the
choice is cleat cut--SEE pruning is not even faster, and has some nasty
types of errors... (2) does the results from the test benchmark really
have a bearing on the performance? The WAC positions were chosen based
on whether they are interesting to a human player. They don't necessarily
point out the program's flaw. We have seen in test games that even though
some of our search extensions have a strictly negative impact on the solving
speed of test set, having them proved to be beneficial in overall
performance. What happened was that without the extra extensions, the
program was very good at seeking out positions where the regular search
extensions find illusionary gain. My hunch is that Cray Blitz is/was
very good at finding positions where SEE pruning is royally screwed up.
No, it's not optimal, but I don't think it matters, the current make_move
and supporting code is optimized, that's why Swap() takes practically no
time now. The optimizing that I could do affects way less than 20% of the
total search time, so whether or not it's optimized has little effect on
these results. I agree, that in your case, SEE isn't feasible, because of
the hardware constraints you have to live with. As far as the nasty types
of errors, They simply aren't showing up yet, and I specifically tried a
tactical problem set which should exacerbate the problem since there are
plenty of tactical variations. However, I'd still bet that SEE can speed
any search. I'd just take your MVV/LVA code, use a SEE routine, and simply
toss any captures that appear to lose and go back and get another. Then the
only cost is the cost of Swap() which is currently under 5%, and it would
cut the size of the tree by a factor of 2x. In short, SEE could reject
moves instantly, otherwise we just try em as they are produced.
> (2) does the results from the test benchmark really
>have a bearing on the performance? The WAC positions were chosen based
>on whether they are interesting to a human player. They don't necessarily
>point out the program's flaw. We have seen in test games that even though
>some of our search extensions have a strictly negative impact on the solving
>speed of test set, having them proved to be beneficial in overall
>performance. What happened was that without the extra extensions, the
>program was very good at seeking out positions where the regular search
>extensions find illusionary gain. My hunch is that Cray Blitz is/was
>very good at finding positions where SEE pruning is royally screwed up.
I've already mentioned that I've been testing ideas in two distinct ways,
one is problem sets like WAC, the other is live on ICC. One example: the
null-move threat extension. It helps WAC quite a bit, but when run on ICC,
it consistently lowered Crafty's rating. Turn it on, rating goes down, turn
it off, rating goes up. I did this for two weeks and the results followed
right along. It's been quite a while now, since I switched from MVV/LVA to
SEE, but I did the same sort of test, and got the same results. Anything
that produces a 2x speed increase, and doesn't seem to offer any severe
tactical penalties, generally produces a rating improvement. I ran a short
test over the weekend, while I was doing the WAC stuff, and when I turned
MVV/LVA on, rating dropped, off, rating climbed. I didn't run it long enough
to be absolutely certain that this wasn't normal rating fluctuation, but if
you watch carefully enough, you pick up on search depth differences.
As far as Cray Blitz is concerned, it has played it's fair share of really
good games, and some really bad ones. I've never found a position where
simply turning off the forward q-search pruning affected a decision it
made in a game. I've found far more positions where the simple null-move
search blows things up, than any other single cause, other than the obvious
lack of depth issue that we see. I'm still searching for something to
prove that SEE is bad, but here's one wager I'd make with anyone out there:
If you are using the typical null-move search (R=2 or R=1, doesn't matter),
I'll bet that you get more errors from that than from using the SEE forward
pruning. null-move errors occur much further up the tree than the mistakes
made by SEE in the q-search. By far the best test of this is to set up a
position with black castled, black pawns at f7,g6,h5, white bishop at f6,
and white queen at g5. Do *anything* else you want, and watch how the
null-move algorithm will cleverly hide the mate threat Qh6 and Qg7 mate.
I see this relatively frequently on ICC, and turning null-move off fixes it
instantly. Of course, we still use it because of the extra depth it provides
in other circumstances. I've also tried ICC with and without null-move, and
this is a lot less clear than I expected. R=2 and R=1 seem to produce a
similar rating. R=0 seems to not affect rating much. I played with this
a few months back and put it away feeling confused and needing more data.
I already do this. At present, it only accepts 3/0 4/0 and 5/0 blitz
games, and 10/10 standard games. I have, on occasion, restricted it to
something like 60/60. I try to be careful drawing conclusions from Blitz
only, but rest assured that I'm *not* removing things to make it search
faster. It is just another data point, in addition to my own testing by
watching it play genius/etc on my machine, plus the ever-present problem
positions. It takes the big picture to evaluate changes, and it is *not*
particularly easy. I hope to get feedback from users as well, as there's
a *lot* of expertise out there that can be applied to this problem.
Something to be careful about when using ICC/FICS to evaluate changes
to a chess program: Fast blitz games played may have different results
than slower games. They _probably_ don't, but certain types of modifications
would be expected to improve blitz play vs. humans and problem-solving but
hurt performance against a human in slower games. Example: removing some
positional evaluation code (or not doing it in parts of the tree you
were doing it in) may well increase ply noticably in short-time searches,
and thus nail humans more often with tactical zaps, increasing it's blitz
rating. The same modification may cause rating in slow games to go down,
as (GM) humans make far fewer tactical mistakes in slow games, and position
This sort of difference may be the cause for why Fritz is considered
better at blitz and Genius better at slower play (against humans).
Robert: you may wish to make crafty have times when it only accepts
"slow" games, to ensure a sufficient number of slow games to make the rating
change for a modification be reasonably accurate.
Randell Jesup, Scala US R&D, Ex-Commodore-Amiga Engineer class of '94
Exon food: <offensive words censored by order of the Senate>
> Interesting, but I have one important question: Crafty is currently
> less than 10% of the total search time in generating and making moves,
> is the very part that "on the fly MVV/LVA" is going to speed up. Where
> your factor of two come from in your program?
I am not sure. It is very difficult to compare the SEE and MVV/LVA test
"on a level playing field." One key is certainly that my programs spend a
lower fraction of their time in their static evaluators than Crafty does.
if a program spends so much time in evaluating, then MVV/LVA might not
make sense from any perspective.