# Quiescence search problems

187 views

Aug 1, 1995, 3:00:00 AM8/1/95
to
My chess program Morgoth (a complete rewrite, started 7-28, of the
earlier Neptune program I wrote a few years ago) is having trouble with
its quiescence search. In particular, it is greatly increasing the
numbers of nodes that the program searches, such that often the program
searches >100000 nodes in a move, yet cannot complete a 3-ply search.
This can be a big problem when on my 486-25 it can only manage
1500-2500 NPS. In fact it loses many test games because it can only
manage short searches in crucial positions.

Move ordering puts transposition table moves first, followed by the
refutation table moves, followed by captures (highest value captured
piece first), followed by everything else sorted by the history
heuristic. With this ordering it seems that the numbers of positions
should be far lower. In fact, looking at articles claiming similar
ordering heuristics to mine, values of nodes for 5-ply searches often
can equal my 3 ply search numbers.

The following pseudo-code describes the routine. Much has been changed
here to make what is going on obvious, and a lot of subtleties have
been removed.

int quiesce(int side, int alpha, int beta, int q_depth)
{
int best, score, number_moves, i;
move_t move_list[MAX_WIDTH];

if (q_depth >= 4) return evaluate_position();
if (in_check(side)) {
generate_all_moves(move_list, &number_moves);
best = -INFINITY;
}
else {
generate_capture_moves(move_list, &number_moves);
best = evaluate_position();
}
if (best >= beta) return (best);
for (i = 0; i < number_moves; i++) {
make_move(move_list[i]);
score = -quiesce(!side, -beta, -MAX(alpha, best), q_depth+1);
unmake_move(move_list[i]);
if (score > best) {
best = score;
if (best >= beta) return (best);
}
}
}

This should be obvious except for the q_depth. I added this in an
attempt to cut back on the search time. It restricts the quiescence
search to 4 plies.

The routine is called from the search routine as follows:

if (depth <= 0) return (quiesce(side, alpha, beta, 0));

in place of:

if (depth <= 0) return (evaluate_position());

### Jon Dart

Aug 1, 1995, 3:00:00 AM8/1/95
to
In article <3vkpik\$q...@ixnews4.ix.netcom.com> plug...@ix.netcom.com (Larry Craighead) writes:
>My chess program Morgoth (a complete rewrite, started 7-28, of the
>earlier Neptune program I wrote a few years ago) is having trouble with
>its quiescence search. In particular, it is greatly increasing the
>numbers of nodes that the program searches, such that often the program
>searches >100000 nodes in a move, yet cannot complete a 3-ply search.

It can be pretty hard to get the quiescence part of the eearch working right.
Slate and Atkin mentioned in one of their papers (ca. 1974, I think) that they
could easily get the quiescence search to consume 90% of the search time.

I supsect that if you print out the actual moves being searched and look at the
output, you will find that it is searching a lot of unlikely captures (e.g.
Queen takes pawn defended by another pawn). To cut this down, you can
implement a simple static exchange evaluator that estimates the gain from a
capture move. If it's negative, prune it out right away. This is somewhat
risky, but it will cut down the search tree.

You also need to consider that at any point, a player can usually "stand pat"
and not capture at all. So in some cases if the score so far causes beta
cutoff, you can skip the rest of the capture search. However, my program
disallows this if the side to move has an (apparently) trapped or hung piece.

Implementing the null move heuristic also will reduce the size of the search
tree, because some nodes will be pruned after a shallower than normal search.

Try out a few of these ideas and see if they work. You will know you have it
close to right when you are completing 3-4 ply searches quickly but also
performing reasonably well on tactical problems (like the "Win at Chess"
series). Although, as Bob Hyatt has rightly pointed out, the real test is
playing games, preferably against a variety of opponents.

--Jon

--
-----------------------------------
Jon Dart jd...@netcom.com

### Joe Stella

Aug 1, 1995, 3:00:00 AM8/1/95
to
In article <3vkpik\$q...@ixnews4.ix.netcom.com>

Try the following:

>int quiesce(int side, int alpha, int beta, int q_depth)
>{
> int best, score, number_moves, i;
> move_t move_list[MAX_WIDTH];

> if (q_depth >= 4) return evaluate_position();
> if (in_check(side)) {
> generate_all_moves(move_list, &number_moves);
> best = -INFINITY;
> }
> else {
> generate_capture_moves(move_list, &number_moves);
> best = evaluate_position();
> }
> if (best >= beta) return (best);

if (best > alpha) alpha = best; <--------------------------

> for (i = 0; i < number_moves; i++) {
> make_move(move_list[i]);

if (material_score > alpha) { <------------

> score = -quiesce(!side, -beta, -MAX(alpha, best), q_depth+1);
}

else if (material_score > best) best = material_score; <------------

> unmake_move(move_list[i]);
> if (score > best) {
> best = score;
> if (best >= beta) return (best);
> }
> }
>}

Material_score comes from make_move. Make_move just adds the value of the
captured piece to the current score of the position. This is often good
enough to get a cutoff, without having to evaluate the position completely.

These ideas come from a very good paper on quiescence searching which I
found in the ICCA Journal, Vol. 12 No.1, March 1989. My program is able
to follow all capturing sequences to the end using this technique (I don't
need the q_depth >= 4 test).

Joe S.

### Peter Gillgasch

Aug 2, 1995, 3:00:00 AM8/2/95
to
In article <3vkpik\$q...@ixnews4.ix.netcom.com>, plug...@ix.netcom.com (Larry Craighead) writes:
|> My chess program Morgoth (a complete rewrite, started 7-28, of the
|> earlier Neptune program I wrote a few years ago) is having trouble with
|> its quiescence search. In particular, it is greatly increasing the
|> numbers of nodes that the program searches, such that often the program
|> searches >100000 nodes in a move, yet cannot complete a 3-ply search.
|> This can be a big problem when on my 486-25 it can only manage
|> 1500-2500 NPS. In fact it loses many test games because it can only
|> manage short searches in crucial positions.

Wait a minute. You do 1500 - 2500 nps and your 3 ply search blows up
to 100,000 nodes. That is a runtime of 40 - 66 seconds for 3 ply.
Assuming a branching factor of (say) 5 you will never pass 4 ply
in 3 minutes. Your program will often be crushed because of this.
The speed of the program is not too bad, but your trees are way
too large.

You should try the modified code I included. Then try to work to
make the full width search smaller.

|> Move ordering puts transposition table moves first, followed by the
|> refutation table moves, followed by captures (highest value captured
|> piece first), followed by everything else sorted by the history
|> heuristic. With this ordering it seems that the numbers of positions
|> should be far lower. In fact, looking at articles claiming similar
|> ordering heuristics to mine, values of nodes for 5-ply searches often
|> can equal my 3 ply search numbers.

The move ordering heuristics are ok. I have similar heuristics, but I don't
use refutation table moves. Maybe you should think about move ordering if
the history heuristic failed as well. Very little savings, but better than
none in my case.

Don't follow suggestions to implement static capture sequence evaluators.
Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
in terms of move ordering (hard to believe but true).

I suggest that you try the program without refutation table moves to make
sure that captures are tried first. Do some serious benchmarking.

Next try to include the refutation table moves directly after the captures
(before history) and benchmark again. If it helps keep it, kick it otherwise.

To do the benchmarks I suggest that you do fixed n ply searches and compare
tree size only. If the code gets slower while the trees get smaller you
have a problem with capture move generation.

Rule of thumb: Tree size is *important*. Everything else is implementation
dependent and can be tweaked until midnight. And longer. Sunrise in my case (^8

|> The following pseudo-code describes the routine. Much has been changed
|> here to make what is going on obvious, and a lot of subtleties have
|> been removed.

I add/rewrite some lines of code you may want to try. I am interested in
the "subtleties" and I hope that I don't have them re-invented (in your case).

int took_static[2]; /* this is set to 0 at the last full width node in the variation */

int quiesce(int side, int alpha, int beta, int q_depth) {

int best, score, number_moves, i;

move_t move_list[MAX_WIDTH]; /* cool. I called that type move_t as well. */
move_t best_move;

/* you should use the trans/ref during the capture search as well!
this implies that you number the level of the search in such a
way that you can use the usual trans/ref stuff:

n n-1 ... 2 1 0 -1 -2 -3 ...

^ border to selective searching

this will save you a lot of GenCaptures() calls and will shrink
the size of the search. Search the hashed move first if it is
a capture and the hash value is not enough. Note that you don't need
to check the draft of the hash entry - your code is not depending
on q_depth, so it doesn't matter.
*/

if ((!took_static[side]) && in_check(side)) {

/*
makes no sense to verify checkmates if you will "stand pat"
anyway.

If you drop into this you are *forced* to get out or die.
*/

generate_all_moves(move_list, &number_moves);
best = -INFINITY;
} else {

best = evaluate_position();
if (best > alpha) {
/* note that since you don't have a move you don't store into the trans/ref */

if (best >= beta) return (best);

alpha = best;
};
took_static[side] = 1;
generate_capture_moves(move_list, &number_moves);
};
best_move = 0;

for (i = 0; i < number_moves; i++) {
make_move(move_list[i]);

score = -quiesce(!side, -beta, -alpha, q_depth-1); /* save a lot of MAX stuff */

unmake_move(move_list[i]);
if (score > best) {

-MAX(alpha, best)
best_move = move_list[i];
best = score;
if (best >= beta) goto store;
alpha = MAX(alpha, best)
}
}

store:

if (best_move) {
/* store move in trans/ref if you have a move */
};
return best;
}

|> This should be obvious except for the q_depth. I added this in an
|> attempt to cut back on the search time. It restricts the quiescence
|> search to 4 plies.

Not needed. Probably you have used that to "cure" the problem.
You should be able to search all capture sequences to the end.

|> The routine is called from the search routine as follows:

/*
we cannot "stand pat" and have to prove that we survive
in case of check.
*/

took_static[0] = took_static[1] = 0;

|> if (depth <= 0) return (quiesce(side, alpha, beta, 0));
|>
|> in place of:
|>
|> if (depth <= 0) return (evaluate_position());

And make sure that you don't drop into quiesce() when your
material score is way too bad (futility pruning). Since you
said you search in most valuable victim order that means
that you often can restrict the search to some capturing
moves and skip the others as soon as the victim is not valuable
enough. That should solve the problem completely. In most cases
you can start to be selective at search level 1 (only interesting
captures and checks if you need to capture heavy material to make
the node interesting). The latter technique will reduce
your tree to 40% in most cases.

DarkThought uses these techniques and some others. It does
a full width search with extensions, a selective search and
an unlimited capture search and hits 9-10 ply on the alpha.
When using some more selectivity we pass 10 ply easily.

With the speed of your machine you should be able to do 6
ply searches in nearly all positions. On a 25 mhz 68030 it
did 1800 - 2500 nps and often hit 7 ply in the middle game.

-- Peter

------------------------------------------------------------
Peter W. Gillgasch
Klosterweg 28/I-113 email gil...@ira.uka.de
76131 Karlsruhe/Germany Phone ++49/(0)721/6904255

### Peter Gillgasch

Aug 2, 1995, 3:00:00 AM8/2/95
to
In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
|> In article <3vkpik\$q...@ixnews4.ix.netcom.com> plug...@ix.netcom.com (Larry Craighead) writes:
|> It can be pretty hard to get the quiescence part of the eearch working right.
|> Slate and Atkin mentioned in one of their papers (ca. 1974, I think) that they
|> could easily get the quiescence search to consume 90% of the search time.

I t is quite hard to keep the qsearch tight. But when it consumes
90% of the cpu time I'd say that the quality of the search suffers because
the full width part of the tree search suffers from being to shallow.

|> I supsect that if you print out the actual moves being searched and look at the
|> output, you will find that it is searching a lot of unlikely captures (e.g.
|> Queen takes pawn defended by another pawn).

So what? This will be caught and cut off 2 plies deeper (queen captures,
pawn captures, side with a queen down won't find a capture that is good enough
in most cases).

|> To cut this down, you can
|> implement a simple static exchange evaluator that estimates the gain from a
|> capture move. If it's negative, prune it out right away. This is somewhat
|> risky, but it will cut down the search tree.

It is risky indeed. A static exchange evaluator is not really cheap compared
to a material only evaluation function (that will suffice when the score is
far out of the window). I can see no benefit from that.

|> You also need to consider that at any point, a player can usually "stand pat"
|> and not capture at all. So in some cases if the score so far causes beta
|> cutoff, you can skip the rest of the capture search. However, my program
|> disallows this if the side to move has an (apparently) trapped or hung piece.

Maybe a good idea at the first selective layer, although I never tried that.
Threat extensions usually catch these.

|> Implementing the null move heuristic also will reduce the size of the search
|> tree, because some nodes will be pruned after a shallower than normal search.

Correct. Null moves speed up my program by a factor of 2 when used very
conservatively. When used more aggressive (bigger search depth reduction,
applied recursively) it starts to get ridiculous fast...

### Peter Gillgasch

Aug 2, 1995, 3:00:00 AM8/2/95
to
In article <joes.406...@ultranet.com>, jo...@ultranet.com (Joe Stella) writes:
|> In article <3vkpik\$q...@ixnews4.ix.netcom.com>
|>
|> Material_score comes from make_move. Make_move just adds the value of the
|> captured piece to the current score of the position. This is often good
|> enough to get a cutoff, without having to evaluate the position completely.

This high ply pruning technique is pretty unsafe when you don't consider
that the evaluation does (probably) add a positional score to the material
balance. The correct way to do that is to use this technique only when you
are significantly behind at the beginning of the node *and* to check alpha
against an optimistic bound that takes the positional part of the evaluation
into account as well:

if (material_score + MAX_POS > alpha) { ....

Correct?

### Jon Dart

Aug 2, 1995, 3:00:00 AM8/2/95
to
In article <3vn1eo\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch) writes:
>In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
>|> I supsect that if you print out the actual moves being searched and look at the
>|> output, you will find that it is searching a lot of unlikely captures (e.g.
>|> Queen takes pawn defended by another pawn).
>
>So what? This will be caught and cut off 2 plies deeper (queen captures,
>pawn captures, side with a queen down won't find a capture that is good enough
>in most cases).

So, that's a 2-ply search you don't need to do if you can cut the node right
away.

>
>|> To cut this down, you can
>|> implement a simple static exchange evaluator that estimates the gain from a
>|> capture move. If it's negative, prune it out right away. This is somewhat
>|> risky, but it will cut down the search tree.
>
>It is risky indeed. A static exchange evaluator is not really cheap compared
>to a material only evaluation function (that will suffice when the score is
>far out of the window). I can see no benefit from that.

As noted above, you may need to search before you can get a cutoff. So you
need to compare the cost of the evaluator to the cost of a shallow search.
Actually, with extensions, it may not be so shallow ..

In dedicated chess machines like Hitech and Deep Blue, searching is
very fast (not just absolutely, but relative to other program functions)
so they can be more aggressive about using extensions. In software-only
programs, though, it may be necessary to avoid even a shallow search where
possible. The Sargon series and Cray Blitz (among others) use exchange
evaluators, presumably for this reason.

The risk is that the exchange evaluator is imperfect. I guess there's also a
risk that it is really slower than searching, but you can measure that. What
you gain is a reduction in the combinatorial explosion of capture moves, which
(as you note) can prevent you from completing even a shallow search if you're
not careful.

### Joe Stella

Aug 2, 1995, 3:00:00 AM8/2/95
to
In article <3vn20i\$g...@nz12.rz.uni-karlsruhe.de>
gil...@ira.uka.de (Peter Gillgasch) writes:

>In article <joes.406...@ultranet.com>,
>jo...@ultranet.com (Joe Stella) writes:
>|>
>|> Material_score comes from make_move. Make_move just adds the value of the
>|> captured piece to the current score of the position. This is often good
>|> enough to get a cutoff, without having to evaluate the position completely.

>This high ply pruning technique is pretty unsafe when you don't consider
>that the evaluation does (probably) add a positional score to the material
>balance. The correct way to do that is to use this technique only when you
>are significantly behind at the beginning of the node *and* to check alpha
>against an optimistic bound that takes the positional part of the evaluation
>into account as well:

>if (material_score + MAX_POS > alpha) { ....

>Correct?

I do my positional evaluation _before_ the quiescence search, and then
never do it again. The entire q search is done using the "incremental"
score update given by make_move. The score increment adjusts for anything
that affects material balance (captures, promoted pawns...).

My program did not have any real speed at all until I began using this
technique (remember my "Chess Program Too Slow" posts of nearly a year
ago?). :-)

Joe S.

Aug 3, 1995, 3:00:00 AM8/3/95
to
In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
writes:
>
>In article <3vkpik\$q...@ixnews4.ix.netcom.com>, plug...@ix.netcom.com (Larry
>|> My chess program Morgoth (a complete rewrite, started 7-28, of the
>|> earlier Neptune program I wrote a few years ago) is having trouble with
>|> its quiescence search. In particular, it is greatly increasing the
>|> numbers of nodes that the program searches, such that often the program
>|> searches >100000 nodes in a move, yet cannot complete a 3-ply search.
>|> This can be a big problem when on my 486-25 it can only manage
>|> 1500-2500 NPS. In fact it loses many test games because it can only
>|> manage short searches in crucial positions.
>
>Wait a minute. You do 1500 - 2500 nps and your 3 ply search blows up
>to 100,000 nodes. That is a runtime of 40 - 66 seconds for 3 ply.
>Assuming a branching factor of (say) 5 you will never pass 4 ply
>in 3 minutes. Your program will often be crushed because of this.
>The speed of the program is not too bad, but your trees are way
>too large.

Yes, I have seen this in test games. For example, this game was played with
Morgoth at a constant 60 sec. a move (timing not done yet), and its opponent
Novag Emerald at an average of 2 sec. a move. Morgoth played the first 19 moves
from book.

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Nxe4 6. d4 b5 7. Bb3 d5 8. dxe5
Be6 9. c3 Bc5 10. Nbd2 O-O 11. Bc2 Nxf2 12. Rxf2 f6 13. exf6 Bxf2 14. Kxf2 Qxf6
15. Nf1 Ne5 16. Be3 Rae8 17. Bd4 Qh4+ 18. Kg1 Nxf3 19. gxf3 c6 20. Ne3? (2 ply
search only) Qh3 wins a pawn.

>You should try the modified code I included. Then try to work to
>make the full width search smaller.
>
>|> Move ordering puts transposition table moves first, followed by the
>|> refutation table moves, followed by captures (highest value captured
>|> piece first), followed by everything else sorted by the history
>|> heuristic. With this ordering it seems that the numbers of positions
>|> should be far lower. In fact, looking at articles claiming similar
>|> ordering heuristics to mine, values of nodes for 5-ply searches often
>|> can equal my 3 ply search numbers.
>
>The move ordering heuristics are ok. I have similar heuristics, but I don't
>use refutation table moves. Maybe you should think about move ordering if
>the history heuristic failed as well. Very little savings, but better than
>none in my case.

The HH cut off about 1/3 of the tree in my case. Very good results.

>Don't follow suggestions to implement static capture sequence evaluators.
>Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
>in terms of move ordering (hard to believe but true).

I don't use any of them.

>I suggest that you try the program without refutation table moves to make
>sure that captures are tried first. Do some serious benchmarking.

I will try this.

>Next try to include the refutation table moves directly after the captures
>(before history) and benchmark again. If it helps keep it, kick it otherwise.
>
>To do the benchmarks I suggest that you do fixed n ply searches and compare
>tree size only. If the code gets slower while the trees get smaller you
>have a problem with capture move generation.

Hmmm... My current approach is to generate all moves in either case (the
routines are only slightly different) and cut out non-promotion, non-capture
moves. Suppose I should include checks as well but those could be expensive to
look for. Anyway many important checks are also captures.

>Rule of thumb: Tree size is *important*. Everything else is implementation
>dependent and can be tweaked until midnight. And longer. Sunrise in my case (^8

I like to stay up until 3 AM except on schooldays. My parents hate it. :)

>|> The following pseudo-code describes the routine. Much has been changed
>|> here to make what is going on obvious, and a lot of subtleties have
>|> been removed.
>
>I add/rewrite some lines of code you may want to try. I am interested in
>the "subtleties" and I hope that I don't have them re-invented (in your case).

I could send you the full code of the program... 99% comment deficient (I have a
near-photographic memory) but more readable than Gnu for sure. Ask me if you
want a uuencoded copy. (say whether .zip or .tar.gz)

[code snipped]

> /* you should use the trans/ref during the capture search as well!
> this implies that you number the level of the search in such a
> way that you can use the usual trans/ref stuff:
>
> n n-1 ... 2 1 0 -1 -2 -3 ...
>
> ^ border to selective searching

I got the t-table to work completely at last and now it's in here. Earlier I had
a lot of trouble.

>
> this will save you a lot of GenCaptures() calls and will shrink
> the size of the search. Search the hashed move first if it is
> a capture and the hash value is not enough. Note that you don't need
> to check the draft of the hash entry - your code is not depending
> on q_depth, so it doesn't matter.
> */

[code snipped]

>|> This should be obvious except for the q_depth. I added this in an
>|> attempt to cut back on the search time. It restricts the quiescence
>|> search to 4 plies.
>
>Not needed. Probably you have used that to "cure" the problem.
>You should be able to search all capture sequences to the end.

What is the real problem? I didn't see any huge changes in the code.

[code snipped]

>And make sure that you don't drop into quiesce() when your
>material score is way too bad (futility pruning). Since you
>said you search in most valuable victim order that means
>that you often can restrict the search to some capturing
>moves and skip the others as soon as the victim is not valuable
>enough. That should solve the problem completely. In most cases
>you can start to be selective at search level 1 (only interesting
>captures and checks if you need to capture heavy material to make
>the node interesting). The latter technique will reduce
>your tree to 40% in most cases.

Interesting, but the futility pruning could cause trouble if queens
were exchanged on the last move of the full-width search.

>DarkThought uses these techniques and some others. It does
>a full width search with extensions, a selective search and
>an unlimited capture search and hits 9-10 ply on the alpha.
>When using some more selectivity we pass 10 ply easily.
>
>With the speed of your machine you should be able to do 6
>ply searches in nearly all positions. On a 25 mhz 68030 it
>did 1800 - 2500 nps and often hit 7 ply in the middle game.

I can't manage 6 plies without quiesce() in many middlegame positions.

>-- Peter
>
>------------------------------------------------------------
>Peter W. Gillgasch
>Klosterweg 28/I-113 email gil...@ira.uka.de
>76131 Karlsruhe/Germany Phone ++49/(0)721/6904255

### David Blackman

Aug 3, 1995, 3:00:00 AM8/3/95
to plug...@ix.netcom.com
Get rid of the depth>4 stuff, it is as likely to make the search slower
as faster. Perhaps you could apply it to the ischeck() only?

The other thing is to make sure the alpha and beta values are always odd
and the return values and evaluations are always even. This removes the
potential for a lot of obscure bugs that could make your search too
slow, or make it miss important tactics or both. (The problem is a
return value which is actually just an artifact of the alpha-beta cutoff
then causing a cutoff a couple of plies back when it shouldn't, or
alternatively, a genuine value that gets shifted back to the alpha-beta
boundary and then fails to cause further cutoffs at lower plies.) Of
course there are other ways to avoid these bugs but this seems to do the
trick without needing me to actually think about it.

The basic pseudocode looks OK, very much like mine except i don't do
check evasion in the quiescence search.
Best of luck.

### Robert Hyatt

Aug 3, 1995, 3:00:00 AM8/3/95
to
In article <jdartDC...@netcom.com>, Jon Dart <jd...@netcom.com> wrote:

>In article <3vkpik\$q...@ixnews4.ix.netcom.com> plug...@ix.netcom.com (Larry Craighead) writes:
>>My chess program Morgoth (a complete rewrite, started 7-28, of the
>>earlier Neptune program I wrote a few years ago) is having trouble with
>>its quiescence search. In particular, it is greatly increasing the
>>numbers of nodes that the program searches, such that often the program
>>searches >100000 nodes in a move, yet cannot complete a 3-ply search.
>
>It can be pretty hard to get the quiescence part of the eearch working right.
>Slate and Atkin mentioned in one of their papers (ca. 1974, I think) that they
>could easily get the quiescence search to consume 90% of the search time.
>
>I supsect that if you print out the actual moves being searched and look at the
>output, you will find that it is searching a lot of unlikely captures (e.g.
>Queen takes pawn defended by another pawn). To cut this down, you can
>implement a simple static exchange evaluator that estimates the gain from a
>capture move. If it's negative, prune it out right away. This is somewhat
>risky, but it will cut down the search tree.
>
>You also need to consider that at any point, a player can usually "stand pat"
>and not capture at all. So in some cases if the score so far causes beta
>cutoff, you can skip the rest of the capture search. However, my program
>disallows this if the side to move has an (apparently) trapped or hung piece.
>
>Implementing the null move heuristic also will reduce the size of the search
>tree, because some nodes will be pruned after a shallower than normal search.
>
>Try out a few of these ideas and see if they work. You will know you have it
>close to right when you are completing 3-4 ply searches quickly but also
>performing reasonably well on tactical problems (like the "Win at Chess"
>series). Although, as Bob Hyatt has rightly pointed out, the real test is
>playing games, preferably against a variety of opponents.
>
>--Jon
>
>--
>-----------------------------------
>Jon Dart jd...@netcom.com

The "unlikely" captures really don't add a lot to the quiescence
search. The reason is, that a poor capture (Qxp in your above
example) is refuted by pxq and the tree is cut off at that point
since the score is so bad. In fact, if you look at *all* captures,
it's not horrible because it is self-throttling... every capture
reduces the size of the potential capture tree since you just
removed a piece that might capture something. You *can* save some
minimal amount by "screening" captures (did it in Cray Blitz, doing
it in Crafty) with a routine that analyzes swaps on a given square,
but it's not major most of the time.

The most likely explanation for a quiescence search that's gone
"ballistic" is including non-capture moves, either intentionally
(checking moves or check-evasion moves) or unintentionally (like
trying a transposition/refutation table move, not realizing that
you might well pull a non-capture out of there. Other explanations
include screwed up alpha/beta code, broken evaluation code that is
returning really *large* scores erroneously, etc. The tree dump
is the best place to start debugging.

--
Robert Hyatt Computer and Information Sciences
hy...@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

### Robert Hyatt

Aug 3, 1995, 3:00:00 AM8/3/95
to
In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,

This overlooks one *important* problem. If you don't have a static
exchange evaluator, you don't have a way to order the captures with
the best capture first. Putting the biggest "piece" first is no
good. QxR is worse than QxP if the rook is defended and the pawn
is not. This will make quite a difference in the size of the tree,
from experience. Early Crafty had no such code, adding it cut the
size of the tree remarkably. Once you can statically evaluate swaps,
you will wonder how you did without it.

BTW, the code does *not* have to be obtuse, nor inaccurate. It's
easy to play out a sequence of captures on a given square, to see
if the piece sitting there is safe or not.

### Joe Stella

Aug 3, 1995, 3:00:00 AM8/3/95
to
In article <3vr7t9\$8...@ixnews4.ix.netcom.com> plug...@ix.netcom.com

>Hmmm... My current approach is to generate all moves in either case (the
>routines are only slightly different) and cut out non-promotion, non-capture
>moves. Suppose I should include checks as well but those could be expensive to
>look for. Anyway many important checks are also captures.

Generating all moves is definitely a waste of time. My program was speeded
up by almost a factor of two when I began generating only capture/promotions
in the q_search.

Another very important factor is the use of incremental score updating
that I have pointed out before. Your code looked like it was calling
evaluate_position() at each node in the q_search, and if this routine
does anything complicated at all then this really slows down the search.

An incremental update of the score, based upon material captured, is really
all you can afford to do in a q_search. "Positional" factors must be
evaluated _before_ the q_search begins.

Joe S.

### Robert Hyatt

Aug 3, 1995, 3:00:00 AM8/3/95
to
In article <jdartDC...@netcom.com>, Jon Dart <jd...@netcom.com> wrote:
>In article <3vn1eo\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch) writes:
>>In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
>>|> I supsect that if you print out the actual moves being searched and look at the
>>|> output, you will find that it is searching a lot of unlikely captures (e.g.
>>|> Queen takes pawn defended by another pawn).
>>
>>So what? This will be caught and cut off 2 plies deeper (queen captures,
>>pawn captures, side with a queen down won't find a capture that is good enough
>>in most cases).
>
>So, that's a 2-ply search you don't need to do if you can cut the node right
>away.
>
>>
>>|> To cut this down, you can
>>|> implement a simple static exchange evaluator that estimates the gain from a
>>|> capture move. If it's negative, prune it out right away. This is somewhat
>>|> risky, but it will cut down the search tree.
>>
>>It is risky indeed. A static exchange evaluator is not really cheap compared
>>to a material only evaluation function (that will suffice when the score is
>>far out of the window). I can see no benefit from that.
>
>As noted above, you may need to search before you can get a cutoff. So you
>need to compare the cost of the evaluator to the cost of a shallow search.
>Actually, with extensions, it may not be so shallow ..
>
>In dedicated chess machines like Hitech and Deep Blue, searching is
>very fast (not just absolutely, but relative to other program functions)
>so they can be more aggressive about using extensions. In software-only
>programs, though, it may be necessary to avoid even a shallow search where
>possible. The Sargon series and Cray Blitz (among others) use exchange
>evaluators, presumably for this reason.
>
>The risk is that the exchange evaluator is imperfect. I guess there's also a
>risk that it is really slower than searching, but you can measure that. What
>you gain is a reduction in the combinatorial explosion of capture moves, which
>(as you note) can prevent you from completing even a shallow search if you're
>not careful.
>
>--Jon

While this is true, experiments have shown that the difference between
looking at *all* captures and what appear to be *safe* captures is not
very much. It's pretty common in new programs, for the quiescence to
"run amok" at times until it's gotten under control. I sort of
suspect that was the problem that started this thread. Of course,
in both Cray Blitz and Crafty, I try to eliminate *every* unneeded
node, but we're not talking about a factor of 2 or 3, even, much
less the really large explosion discussed here before...

Bob

### Jim Gillogly

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <3vrcdv\$p...@pelham.cis.uab.edu>,

Robert Hyatt <hy...@willis.cis.uab.edu> wrote:
>The most likely explanation for a quiescence search that's gone
>"ballistic" is including non-capture moves, either intentionally
>(checking moves or check-evasion moves) or unintentionally (like
>trying a transposition/refutation table move, not realizing that
>you might well pull a non-capture out of there. Other explanations
>include screwed up alpha/beta code, broken evaluation code that is
>returning really *large* scores erroneously, etc. The tree dump
>is the best place to start debugging.

Two other specific places for quiescence to go ballistic:

(1) Not including a "null move" in each move generation -- that is, you
must allow the other guy to bail out of the capture sequence as soon
as capturing does him no good. From the description this looks to me
like the most likely first-order problem.

(2) Using a relatively continuous evaluation function. The more different
values you have in your evaluation, the fewer prunes you get. Obviously
if your program really <does> use bishops better than knights or a
knight better than 3 pawns, you need to keep that distinction. I'm
assuming you're not doing a lot of positional evaluation at the
quiescence leaf nodes, since it doesn't make nearly as much sense there.
Keep the evaluation as granular as possible without losing playing
strength... requires lots of tuning, of course.
--
Jim Gillogly
Sterday, 12 Wedmath S.R. 1995, 08:21

### Robert Hyatt

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <3vtk4e\$q...@pelham.cis.uab.edu>,
Robert Hyatt <hy...@willis.cis.uab.edu> wrote:
>In article <3vn20i\$g...@nz12.rz.uni-karlsruhe.de>,

>Peter Gillgasch <gil...@ira.uka.de> wrote:
>>In article <joes.406...@ultranet.com>, jo...@ultranet.com (Joe Stella) writes:
>>|> In article <3vkpik\$q...@ixnews4.ix.netcom.com>
>>|>
>>|> Material_score comes from make_move. Make_move just adds the value of the
>>|> captured piece to the current score of the position. This is often good
>>|> enough to get a cutoff, without having to evaluate the position completely.
>>
>>This high ply pruning technique is pretty unsafe when you don't consider
>>that the evaluation does (probably) add a positional score to the material
>>balance. The correct way to do that is to use this technique only when you
>>are significantly behind at the beginning of the node *and* to check alpha
>>against an optimistic bound that takes the positional part of the evaluation
>>into account as well:
>>
>>if (material_score + MAX_POS > alpha) { ....
>>
>>Correct?
>>
>>-- Peter
>>
>
>correct.
>
>--

One other point: While Peter and I agree on many things, I differ
in my opinion of the static exchange evaluator. While I can't (now)
guarantee that it's not beneficial, exhaustive testing 10+ years
ago proved its necessity in Cray Blitz. At that point in time,
this static exchange evaluator (Ripoff() if you've read technical
articles about CB) was consuming over 35% of the search time. I
tried "cheapo" tricks like if a piece is attacked by a lesser
piece, or attacked by anything and not defended, and used these to
order the capture search both within the full-width part as well as
in the quiescence. After months of work trying different things,
we chose to rewrite this code in assembly for the Cray (to get the
speed up) since it reduced the size of the tree significantly.

BEWARE of basing your judgement on nodes per second. Rather, base
them on the amount of time to complete an n-ply search. Such a
function is definitely going to slow the code down some (less than
10% in Crafty) but it's also going to reduce the size of the search
tree as well, particularly if you are doing something "really cheap"
like simply taking largest captures first. I'll run such a test
again for fun, but can't imagine it not being needed. As I developed
Crafty, I postponed writing Swap() and, instead, used a cheap
substitute, but when I added Swap() it paid for itself. It also
lets you do other things. For example, in the quiescence, you may
want to consider pushing a passed pawn to the 6th or 7th. You can
use Swap() to discover whether or not the pawn is safe after it
moves. If it can be immediately ripped, there's very little point
in considering it. Ditto for checks.

A final warning to *everyone*. NPS (nodes per second) is *not* a
good way to compare programs, or algorithms, or anything. If you
want to write the fastest program possible, do a minimax search with
*no* alpha/beta pruning. In that case, you never generate a move and
then throw it away, which wastes time. Or, you can write an alpha/
beta search, but do a sloppy job on move ordering. You get the same
effect (because you get no cutoffs).

An interesting discussion we ought to have here is to pick a *few*
positions that everyone has (non-mates preferably) and then search
these positions to a fixed depth, and let everyone post the number
of nodes they search for 2 ply, 3 ply, 4 ply, etc. Then, with a
brief explanation of a search engine (specific extensions, etc.)
we ought to be able to compare ideas to see what's working and what's
not. Everyone here has access to Crafty's source, so you can see
what I'm doing with no secrets. The only program I have any firm
info on is Ferret (Bruce Moreland at Microsoft) and for most
positions, Crafty seems to search almost 1/2 the nodes that Ferret
does. Am I doing particularly good, or is Bruce doing particularly
bad? No real way to know right now. Anyone interested in taking
this "challenge"?

We could also do some interesting things with turning off certain
types of ordering strategies (killers, history, trans/ref, etc.) and
compare notes to see how each of these affects our particular
implementation. If anyone's interested, I'm certainly game. If
you have "secret extensions" and don't want to discuss them, that's
not a problem, just "turn 'em off" for the tests. In fact, maybe we
could *all* come up with a standard set of things that we all do,
and turn off everything else (maybe keep "out of check" and
"recapture" as they are pretty common) which would make comparisons
even more interesting.

### Peter Gillgasch

Aug 4, 1995, 3:00:00 AM8/4/95
to
Finally an interesting thread on computer chess again...

In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
|> >writes:

|> >>Don't follow suggestions to implement static capture sequence evaluators.
|> >>Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
|> >>in terms of move ordering (hard to believe but true).

[bobbit]

I just knew that this will call Bob's attention and here he is:

|> This overlooks one *important* problem.

I know that the problem is an important one. But if you look closely at the
quote above you will see that I didn't overlook it (^8

|> If you don't have a static
|> exchange evaluator, you don't have a way to order the captures with
|> the best capture first.

Clearly not not true, sorry. A simple way to order the captures is
mvv-lva. Actually "order" it the wrong word. "Generate", yes.

|> Putting the biggest "piece" first is no
|> good. QxR is worse than QxP if the rook is defended and the pawn
|> is not.

Of course it is worse in the general case from the chess player's point
of view. But... In about 30 - 40 % of the cases the first
move comes from the trans/ref and is pretty reliable. When doing the
next iteration the bulk of the nodes is close to the horizon and most
positions there are loopsided - nearly every capture is "good enough".
If you don't restrict the capture search by static rules then you will
find even there (close behind the horizon of the last iteration) good
captures that were checked by a simple yet much more sophisticated
search.

I am convinced that mvv/lva is superior to anything else. AFAIK both
the ChipTest and the Phoenix teams have checked square safety ordering
against mvv/lva and came to the same conclusion... At least that is
written by FHH in his thesis.

After doing a lot of test runs with statistic gathering code I decided
to redesign DarkThought for mvv/lva. Before I generated all captures
and did a maximum selection scan (mvv/lva) and before that I used a
swapoff function - but I couldn't see any benefits from that because
the trees were *not* much smaller and the runtime was of course worse
than the simple mvv/lva scheme.

|> This will make quite a difference in the size of the tree,
|> from experience. Early Crafty had no such code, adding it cut the
|> size of the tree remarkably. Once you can statically evaluate swaps,
|> you will wonder how you did without it.

We need to split the discussion into 2 subproblems:

1. Effect of static capture evaluation on move ordering

2. Effect of static capture evaluation on tree size because of forward
pruning in the selective portion of the tree.

|> BTW, the code does *not* have to be obtuse, nor inaccurate. It's
|> easy to play out a sequence of captures on a given square, to see
|> if the piece sitting there is safe or not.

Of course it is relatively easy to code... But if you order your captures
that way you have to generate them first, then evaluate them, then
select the most promising one, then you can start to search and hopefully
you get a cutoff, otherwise you have to select the next one.

When using mvv/lva you can generate captures one by one and search
directly after having generated a single move. If you get a cutoff
right away you have saved a lot of work...

### Feng-Hsiung Hsu

Aug 4, 1995, 3:00:00 AM8/4/95
to
A few observations:

1) Simple capture quiescence roughly doubles the number of nodes searched.
That is, half of the node searched are in the quiescence search tree.
A "perfect" static exchange evaluator, even it could be built and takes
zero time, gives at best a factor of two in speedup.

2) "Safe"-capture-first move ordering probably doesn't pay. Ebeling's
thesis claimed 1.5 times speedup, but did not take into account futility
cutoff. Our measurements indicate that the number of nodes searched
decreased by no more than 5%, while the node generation time usually
increases significantly more than that. Surprisingly, different square
ordering could have a far more significant effect. In particular, giving
higher priority to forward moves was measured to reduce the node count by

3) Doing positional evaluation ONLY once before quiescence search probably is
not a good idea when the search depth exceeds 7 plies. I don't have the
statistics with me. But I seem to recall that at 7 plies, the percentage
of quiescence nodes that are close to the cutoff window and thus require
positional evaluation is around 20%, and the ratio drops to around 10%
at 10 plies and beyond. Certain exchanges produce very large shift in
positional values, and unless you can fold the effect into the incremental
evaluation, you could play very bad moves.

### Joe Stella

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <3vtdra\$16...@watnews1.watson.ibm.com>
f...@sawmill.watson.ibm.com (Feng-Hsiung Hsu) writes:

>3) Doing positional evaluation ONLY once before quiescence search probably is
> not a good idea when the search depth exceeds 7 plies. I don't have the
> statistics with me. But I seem to recall that at 7 plies, the percentage
> of quiescence nodes that are close to the cutoff window and thus require
> positional evaluation is around 20%, and the ratio drops to around 10%
> at 10 plies and beyond. Certain exchanges produce very large shift in
> positional values, and unless you can fold the effect into the incremental
> evaluation, you could play very bad moves.

Well OK, but if this is the case then you (or maybe I should say "those of
us who can't search 'billions and billions and billions' of nodes") must
find a fast and efficient way to take positional considerations into
account, because an evaluator that takes any significant time at all will
greatly slow down the q_search if it is called from each node in the q_search
tree.

My program does use piece-square tables, and the incremental "positional"
values resulting from captures are looked up in the piece-square table
and added in, along with the material difference. This mitigates the problem
only somewhat, but I don't know of any better solution.

Does someone out there want to supply one? (Or at least give a hint?)

Joe S.

### Robert Hyatt

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <3vn20i\$g...@nz12.rz.uni-karlsruhe.de>,
Peter Gillgasch <gil...@ira.uka.de> wrote:
>In article <joes.406...@ultranet.com>, jo...@ultranet.com (Joe Stella) writes:
>|> In article <3vkpik\$q...@ixnews4.ix.netcom.com>
>|>
>|> Material_score comes from make_move. Make_move just adds the value of the
>|> captured piece to the current score of the position. This is often good
>|> enough to get a cutoff, without having to evaluate the position completely.
>
>This high ply pruning technique is pretty unsafe when you don't consider
>that the evaluation does (probably) add a positional score to the material
>balance. The correct way to do that is to use this technique only when you
>are significantly behind at the beginning of the node *and* to check alpha
>against an optimistic bound that takes the positional part of the evaluation
>into account as well:
>
>if (material_score + MAX_POS > alpha) { ....
>
>Correct?
>
>-- Peter
>

correct.

--

### Jon Dart

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <joes.415...@ultranet.com> jo...@ultranet.com (Joe Stella) writes:
>
>An incremental update of the score, based upon material captured, is really
>all you can afford to do in a q_search. "Positional" factors must be
>evaluated _before_ the q_search begins.
>
> Joe S.
>

I've tried this and similar techniques, but there is the basic problem that
in alpha-beta, scores from terminal nodes eventually get backed up to the
root of the tree ( a lot of them don't, due to cutoffs, but some of them do).
So if you return values from the capture search that are positionally
inaccurate, you may wind up pruning away a positionally "good" tree in favor
of one that's really positionally "bad" (e.g. due to doubled pawns).

At least, that's my experience.

### Peter Gillgasch

Aug 4, 1995, 3:00:00 AM8/4/95
to
In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
|> In article <3vn1eo\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch) writes:
|> >In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
|> >|> I supsect that if you print out the actual moves being searched and look at the
|> >|> output, you will find that it is searching a lot of unlikely captures (e.g.
|> >|> Queen takes pawn defended by another pawn).
|> >
|> >So what? This will be caught and cut off 2 plies deeper (queen captures,
|> >pawn captures, side with a queen down won't find a capture that is good enough
|> >in most cases).
|>
|> So, that's a 2-ply search you don't need to do if you can cut the node right
|> away.

Ok, let's think this to the end:

(1) wtm: takes static evaluation ("null move")
Generates a bunch of captures and finally hits this one:
DoMove: QxP
(2) btm: takes static evaluation
DoMove: PxQ
(3) wtm: takes static evaluation (material balance is sufficient)
tries to find a capture that captures *real* big wood...
finds none and returns.

So the cost is close to nil since *2* nodes have been added
and one of them didn't need any meaningful evaluation. If you
generate captures the right way then *1* move has been generated
at (2) and it was enough for a cutoff. If you implement it correctly
move generation was not even tried at (3)...

Compare this to generating all captures and applying a swapoff
function to all of them...

### Robert Hyatt

Aug 5, 1995, 3:00:00 AM8/5/95
to
In article <joes.415...@ultranet.com>,

Joe Stella <jo...@ultranet.com> wrote:
>In article <3vr7t9\$8...@ixnews4.ix.netcom.com> plug...@ix.netcom.com
>
>>Hmmm... My current approach is to generate all moves in either case (the
>>routines are only slightly different) and cut out non-promotion, non-capture
>>moves. Suppose I should include checks as well but those could be expensive to
>>look for. Anyway many important checks are also captures.
>
>Generating all moves is definitely a waste of time. My program was speeded
>up by almost a factor of two when I began generating only capture/promotions
>in the q_search.
>
>Another very important factor is the use of incremental score updating
>that I have pointed out before. Your code looked like it was calling
>evaluate_position() at each node in the q_search, and if this routine
>does anything complicated at all then this really slows down the search.
>
>An incremental update of the score, based upon material captured, is really
>all you can afford to do in a q_search. "Positional" factors must be
>evaluated _before_ the q_search begins.
>
> Joe S.
>

however, you will find that nearly all programs (Fritz is the lone
excepting I'm aware of) do call the evaluation procedure at every
endpoint in the tree, not excluding those in the quiescence. If
you don't do this, you will overlook things (in the quiescence) like
Bxf3 gxf3, shattering your king-side. In fact, captures so alter the
positional evaluation that not doing a positional evaluation after
a capture is going to cost you dearly in games. It will make you
miss a way to get an opponent's rook off of your 2nd (his 7th) rank,
make you overlook winning and removing passed pawns, or creating
passed pawns... The list goes on. Remember, that speed is not nearly
so important (nodes per second) as is accuracy.

Note that if you don't do positional analysis in the quiescence
search, you can save even more time by only doing the positional
analysis for each move at the root of the tree. Of course, this
would only increase the error in the evaluation.

Next, I can't imagine how not generating non-captures in the q-search
would speed your program up by a factor of two. I don't generate
non-captures unless I need to include checking moves, but I don't
spent anywhere *near* 50% of my time (really, it would have to be
*much* more) in the move generator. In crafty it's under 5% of the
total search time. If you could make it so efficient it took zero
cycles to generate the list of moves, I would only see the 5%
improvement.

### Joe Stella

Aug 5, 1995, 3:00:00 AM8/5/95
to
In article <3vvslc\$r...@pelham.cis.uab.edu>
hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>however, you will find that nearly all programs (Fritz is the lone
>excepting I'm aware of) do call the evaluation procedure at every
>endpoint in the tree, not excluding those in the quiescence. If
>you don't do this, you will overlook things (in the quiescence) like

>[...]

Well ok, I can accept that, but the last time I tried this my program
became dog-slow. That was a while ago, and I have made enhancements
to trim down the evaluator while not loosing too much accuracy. I guess
I should go back now and try calling the evaluator in the q_search again,
and see what happens...

I think the trick must be to call the positional evaluator only when it is
needed, i.e. don't bother calling it if one side is way ahead in material.

>Next, I can't imagine how not generating non-captures in the q-search
>would speed your program up by a factor of two. I don't generate
>non-captures unless I need to include checking moves, but I don't
>spent anywhere *near* 50% of my time (really, it would have to be
>*much* more) in the move generator. In crafty it's under 5% of the
>total search time. If you could make it so efficient it took zero
>cycles to generate the list of moves, I would only see the 5%
>improvement.

I guess I should have mentioned that my program makes/unmakes _all_ generated
moves, even in the q_search. In the q_search, it doesn't _expand_ any
non-capture moves. So when I took out the non-captures, I eliminated
lots of make/unmakes. I suppose this may not hold for other programs.

Joe S.

### Robert Hyatt

Aug 5, 1995, 3:00:00 AM8/5/95
to
In article <joes.421...@ultranet.com>,

Joe Stella <jo...@ultranet.com> wrote:
>In article <3vvslc\$r...@pelham.cis.uab.edu>
>hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>
>
>>however, you will find that nearly all programs (Fritz is the lone
>>excepting I'm aware of) do call the evaluation procedure at every
>>endpoint in the tree, not excluding those in the quiescence. If
>>you don't do this, you will overlook things (in the quiescence) like
>>[...]
>
>Well ok, I can accept that, but the last time I tried this my program
>became dog-slow. That was a while ago, and I have made enhancements
>to trim down the evaluator while not loosing too much accuracy. I guess
>I should go back now and try calling the evaluator in the q_search again,
>and see what happens...
>
>I think the trick must be to call the positional evaluator only when it is
>needed, i.e. don't bother calling it if one side is way ahead in material.
>

Absolutely. you should do something like this:

quick_eval=material_evaluation-largest_positional_eval_seen_so_far;
if (quick_eval > beta) return(beta);
eval=evaluate();
if (eval > beta) return(beta);

This lets you get away with a "quick evaluation" based on material
to see if there's any chance that the largest positional evaluation
seen so far can bring the material score back down to inside the
alpha/beta window. This can eliminate at least 50% of the calls
to the real evaluate() function.

>

>>Next, I can't imagine how not generating non-captures in the q-search
>>would speed your program up by a factor of two. I don't generate
>>non-captures unless I need to include checking moves, but I don't
>>spent anywhere *near* 50% of my time (really, it would have to be
>>*much* more) in the move generator. In crafty it's under 5% of the
>>total search time. If you could make it so efficient it took zero
>>cycles to generate the list of moves, I would only see the 5%
>>improvement.
>
>I guess I should have mentioned that my program makes/unmakes _all_ generated
>moves, even in the q_search. In the q_search, it doesn't _expand_ any
>non-capture moves. So when I took out the non-captures, I eliminated
>lots of make/unmakes. I suppose this may not hold for other programs.

Why waste the time making a move that doesn't belong in the q-search
at all? In Crafty, I spend a lot of time making sure that I don't
make *any* move that absolutely is not required, since make_move()
using bitboards is pretty expensive (although the new rotated bit
board code is *much* better in this respect.)

To go really fast, and, at the same time, play really well, you
have to do *both* of the following: (1) do *everything* you need
to do (evaluations, etc.), but, (2) do *nothing* you can avoid
as long as this doesn't conflict with (1). Then you will be on
the road to high performance *and* high quality.

### Dan Thies

Aug 5, 1995, 3:00:00 AM8/5/95
to
Joe Stella (jo...@ultranet.com) wrote:
: I think the trick must be to call the positional evaluator only when it is

: needed, i.e. don't bother calling it if one side is way ahead in material.

Joe:

Perhaps your evaluator is structured significantly different than mine,
but it seems obvious that you wouldn't proceed past material evaluation
"positional" score.

We are using a 4-step evaluation, like this:
1. Material
2. Space & Development
3. Pawns
4. K-Safety etc.

After completing each step, we check to see if that step has generated
sufficient positional advantage to give a cutoff, EVEN IF all of the
following evaluations are totally unfavorable.

Each step has some MAXIMUM amount that it could possibly add to or
subtract from the score for that position, so we can always cut off if the
value already generated can not be modified enough to produce a different
result (cutoff).

Since it is unlikely that any step will return anything close to the
maximum, we can set the value we check against to something less than the
maximum and get more cutoffs with less evaluation. This does make the
search less accurate, but we have found that the risk is very minimal.

We have not yet experimented with trying to do the cutoffs in
mid-evaluation on any given step, although it should produce a small
increase in speed if coded correctly.

Perhaps we are so far behind the times that this method is not the best,
if so, hopefully someone will let us know.

Dan
--
Dan Thies = Off-The-Wall Solutions = rt...@cyberspace.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
"I flunked my Turing Test, but at least I passed the driving exam!"

### Feng-Hsiung Hsu

Aug 5, 1995, 3:00:00 AM8/5/95
to
In article <joes.418...@ultranet.com> jo...@ultranet.com (Joe Stella) writes:
>In article <3vtdra\$16...@watnews1.watson.ibm.com>
>f...@sawmill.watson.ibm.com (Feng-Hsiung Hsu) writes:
>
>>3) Doing positional evaluation ONLY once before quiescence search probably is
>> not a good idea when the search depth exceeds 7 plies. I don't have the
>> statistics with me. But I seem to recall that at 7 plies, the percentage
>> of quiescence nodes that are close to the cutoff window and thus require
>> positional evaluation is around 20%, and the ratio drops to around 10%
>> at 10 plies and beyond. Certain exchanges produce very large shift in
>> positional values, and unless you can fold the effect into the incremental
>> evaluation, you could play very bad moves.
>
>Well OK, but if this is the case then you (or maybe I should say "those of
>us who can't search 'billions and billions and billions' of nodes") must
>find a fast and efficient way to take positional considerations into
>account, because an evaluator that takes any significant time at all will
>greatly slow down the q_search if it is called from each node in the q_search
>tree.

What I said did not require very big trees. If you are careful and your
program is fast enough to search, say, 7 plies, then only 20% of the nodes
need to do full positional evaluation, which is actually less than the
number of full positional evaluation that you would do if done once right at
the horizon (which would be about 50% of the nodes searched). I suspect
that even for 5-ply searches, a lazy, on-demand evaluation would still be
better than always doing a full evaluation at the horizon. You could shift
the balance by moving the your full evaluation further up the tree, but
it would create very strange horizon effects.

Remember that you only need to do a full evaluation when the incremental
evaluation is not too high or too low wrt the window.

### Joe Stella

Aug 6, 1995, 3:00:00 AM8/6/95
to
In article <40168h\$s...@pelham.cis.uab.edu>
hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>Absolutely. you should do something like this:

>quick_eval=material_evaluation-largest_positional_eval_seen_so_far;
>if (quick_eval > beta) return(beta);
>eval=evaluate();
>if (eval > beta) return(beta);

Thanks, I will try this as soon as I get some time to get back to chess
programming again (hopefully soon... :) )

>In article <joes.421...@ultranet.com>,
>Joe Stella <jo...@ultranet.com> wrote:

>>I guess I should have mentioned that my program makes/unmakes _all_ generated
>>moves, even in the q_search. In the q_search, it doesn't _expand_ any
>>non-capture moves. So when I took out the non-captures, I eliminated
>>lots of make/unmakes. I suppose this may not hold for other programs.

>Why waste the time making a move that doesn't belong in the q-search
>at all? In Crafty, I spend a lot of time making sure that I don't
>make *any* move that absolutely is not required, since make_move()
>using bitboards is pretty expensive (although the new rotated bit
>board code is *much* better in this respect.)

My q_search originally followed the algorithm given in G. Shrufer's paper
in the March 1989 issue of the ICCA journal. His idea was to do a type
of selective search by including moves in the q_search that had a high
positional score. So a move is made, a quick evaluate is performed that
takes positional as well as material scores into account, then the position
is searched further if the resulting score is >alpha.

I was only doing a material evaluation and no positional eval, since I
figured that if I could not get this working fast then there was no
sense in trying to expand more moves by including "positional" moves in
the q_search as well as captures.

When all this did not work out well, I just changed the move generator
to generate only captures.

Joe S.

### Robert Hyatt

Aug 6, 1995, 3:00:00 AM8/6/95
to
In article <joes.418...@ultranet.com>,

Joe Stella <jo...@ultranet.com> wrote:
>In article <3vtdra\$16...@watnews1.watson.ibm.com>
>f...@sawmill.watson.ibm.com (Feng-Hsiung Hsu) writes:
>
>>3) Doing positional evaluation ONLY once before quiescence search probably is
>> not a good idea when the search depth exceeds 7 plies. I don't have the
>> statistics with me. But I seem to recall that at 7 plies, the percentage
>> of quiescence nodes that are close to the cutoff window and thus require
>> positional evaluation is around 20%, and the ratio drops to around 10%
>> at 10 plies and beyond. Certain exchanges produce very large shift in
>> positional values, and unless you can fold the effect into the incremental
>> evaluation, you could play very bad moves.
>
>Well OK, but if this is the case then you (or maybe I should say "those of
>us who can't search 'billions and billions and billions' of nodes") must
>find a fast and efficient way to take positional considerations into
>account, because an evaluator that takes any significant time at all will
>greatly slow down the q_search if it is called from each node in the q_search
>tree.
>
>My program does use piece-square tables, and the incremental "positional"
>values resulting from captures are looked up in the piece-square table
>and added in, along with the material difference. This mitigates the problem
>only somewhat, but I don't know of any better solution.
>
>Does someone out there want to supply one? (Or at least give a hint?)
>
> Joe S.
>

Yes. Simply take the "standard" negamax alpha/beta search
implementation (which has been published dozens of times by
now) and do *exactly* what it says... call evaluate at every
endpoint where the material score is not so far out of the window
that the position is hopeless. I'm running Crafty on a PC, and
don't search billions of nodes, yet I do this as does every other
program (excepting Fritz) that I know of. It is *not* a problem
in terms of speed, in Crafty less than (typically) 1/3 of the
positions searched require full evaluations. This is probably
less than you are doing right now, but, in any case, it does
not overlook positional considerations of capturing moves.

Where you call eval in the q-search should look just like the
last post I wrote, explaining how to avoid unnecessary evaluations
without avoiding important ones.

### Robert Hyatt

Aug 6, 1995, 3:00:00 AM8/6/95
to
In article <3vu00r\$9...@nz12.rz.uni-karlsruhe.de>,

Peter Gillgasch <gil...@ira.uka.de> wrote:
>Finally an interesting thread on computer chess again...
>
>In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
>|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
>|> >writes:
>|> >>Don't follow suggestions to implement static capture sequence evaluators.
>|> >>Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
>|> >>in terms of move ordering (hard to believe but true).
>
>[bobbit]
>
>I just knew that this will call Bob's attention and here he is:
>
>|> This overlooks one *important* problem.
>
>I know that the problem is an important one. But if you look closely at the
>quote above you will see that I didn't overlook it (^8
>
>|> If you don't have a static
>|> exchange evaluator, you don't have a way to order the captures with
>|> the best capture first.
>
>Clearly not not true, sorry. A simple way to order the captures is
>mvv-lva. Actually "order" it the wrong word. "Generate", yes.
>
>|> Putting the biggest "piece" first is no
>|> good. QxR is worse than QxP if the rook is defended and the pawn
>|> is not.
>
>Of course it is worse in the general case from the chess player's point
>of view. But... In about 30 - 40 % of the cases the first
>move comes from the trans/ref and is pretty reliable. When doing the
>next iteration the bulk of the nodes is close to the horizon and most
>positions there are loopsided - nearly every capture is "good enough".
>If you don't restrict the capture search by static rules then you will
>find even there (close behind the horizon of the last iteration) good
>captures that were checked by a simple yet much more sophisticated
>search.
>
>I am convinced that mvv/lva is superior to anything else. AFAIK both
>the ChipTest and the Phoenix teams have checked square safety ordering
>against mvv/lva and came to the same conclusion... At least that is
>written by FHH in his thesis.

Note completely relevant for us, however, as we don't have any hardware
constraints about what/how we do things. Hsu was trying to "force"
Belle onto a single chip (which he did after making a couple of cute
optimizations that Ken didn't catch). However, his hardware imposed
some constraints that we don't have to live with.

As for it being "superior" I can't see how that can be true. It
might be "nearly as good", but, for absolutely certain, the tree
is going to be larger. How much is unknown, but it simply can't
be any better than a good static exchange evaluator, unless the
SEE is so slow that it hinders performance.

>
>After doing a lot of test runs with statistic gathering code I decided
>to redesign DarkThought for mvv/lva. Before I generated all captures
>and did a maximum selection scan (mvv/lva) and before that I used a
>swapoff function - but I couldn't see any benefits from that because
>the trees were *not* much smaller and the runtime was of course worse
>than the simple mvv/lva scheme.
>

>|> This will make quite a difference in the size of the tree,
>|> from experience. Early Crafty had no such code, adding it cut the
>|> size of the tree remarkably. Once you can statically evaluate swaps,
>|> you will wonder how you did without it.
>

>We need to split the discussion into 2 subproblems:
>
>1. Effect of static capture evaluation on move ordering
>
>2. Effect of static capture evaluation on tree size because of forward
> pruning in the selective portion of the tree.
>

>|> BTW, the code does *not* have to be obtuse, nor inaccurate. It's
>|> easy to play out a sequence of captures on a given square, to see
>|> if the piece sitting there is safe or not.
>

>Of course it is relatively easy to code... But if you order your captures
>that way you have to generate them first, then evaluate them, then
>select the most promising one, then you can start to search and hopefully
>you get a cutoff, otherwise you have to select the next one.

I will once again test this, but my old test data from Cray Blitz
is still in my (paper) files, and move ordering (capturing) was
very important to overall tree size. I tried your mvv/lva code
as it's trivial to implement, and was but a fraction of the cost
of the old "ripoff" code in Cray Blitz. It made the tree grow
enough to more than offset the savings of not using ripoff.

I'll run some tests with Crafty to verify this, however, the cost
for "swap()" is not much in the current code; it's so low, in fact,
that it doesn't show up until the 2nd page in profile output. More
after I run some tests.

>
>When using mvv/lva you can generate captures one by one and search
>directly after having generated a single move. If you get a cutoff
>right away you have saved a lot of work...

Note, that generating captures "one by one" is *not* a good thing
to do everywhere in the tree. for type=3 nodes, you have to generate
*all* moves anyway, so you simply add lots of extra procedure calls
into the mix. For type=2 nodes, yes, one at a time is better, but
it's not "cut and dried" due to the type-3 nodes where move ordering
is completely insignificant..

Rest assured that I've beat on this for years, as parallel searching
algorithms have to be *very* careful where to split the tree, choosing
a type=2 node results in no usefull parallel work. A single "wrong"
capture first kills me, for this very reason. To you, an extra 10%
might be acceptable if that costs less than your swap() function, but
to me it's going to really harm parallel performance.

### Joe Stella

Aug 6, 1995, 3:00:00 AM8/6/95
to
In article <403b3i\$b...@pelham.cis.uab.edu>
hy...@willis.cis.uab.edu (Robert Hyatt) writes:

>In article <joes.418...@ultranet.com>,
>Joe Stella <jo...@ultranet.com> wrote:

>>My program does use piece-square tables, and the incremental "positional"
>>values resulting from captures are looked up in the piece-square table
>>and added in, along with the material difference. This mitigates the problem
>>only somewhat, but I don't know of any better solution.
>>
>>Does someone out there want to supply one? (Or at least give a hint?)
>>
>> Joe S.
>>

>Yes. Simply take the "standard" negamax alpha/beta search
>implementation (which has been published dozens of times by
>now) and do *exactly* what it says...

>[...]

>Where you call eval in the q-search should look just like the
>last post I wrote, explaining how to avoid unnecessary evaluations
>without avoiding important ones.

Yes, I intend to do exactly that, sometime soon. Please note that I
you may have seen one before the other due to the vagaries of Usenet,
which may have given you the impression that I was being obtuse.

By the way, what are "mvv-lva" and "type-2" and "type-3" positions?

Joe S.

### Robert Hyatt

Aug 6, 1995, 3:00:00 AM8/6/95
to
In article <3vu00r\$9...@nz12.rz.uni-karlsruhe.de>,
Peter Gillgasch <gil...@ira.uka.de> wrote:
>Finally an interesting thread on computer chess again...
>
>In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
>|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
>|> >writes:
>|> >>Don't follow suggestions to implement static capture sequence evaluators.
>|> >>Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
>|> >>in terms of move ordering (hard to believe but true).
>
>[bobbit]
>
>I just knew that this will call Bob's attention and here he is:
>
>|> This overlooks one *important* problem.
>
>I know that the problem is an important one. But if you look closely at the
>quote above you will see that I didn't overlook it (^8
>
>|> If you don't have a static
>|> exchange evaluator, you don't have a way to order the captures with
>|> the best capture first.
>
>Clearly not not true, sorry. A simple way to order the captures is
>mvv-lva. Actually "order" it the wrong word. "Generate", yes.
>
>|> Putting the biggest "piece" first is no
>|> good. QxR is worse than QxP if the rook is defended and the pawn
>|> is not.
>
>Of course it is worse in the general case from the chess player's point
>of view. But... In about 30 - 40 % of the cases the first
>move comes from the trans/ref and is pretty reliable. When doing the
>next iteration the bulk of the nodes is close to the horizon and most
>positions there are loopsided - nearly every capture is "good enough".
>If you don't restrict the capture search by static rules then you will
>find even there (close behind the horizon of the last iteration) good
>captures that were checked by a simple yet much more sophisticated
>search.
>
>I am convinced that mvv/lva is superior to anything else. AFAIK both
>the ChipTest and the Phoenix teams have checked square safety ordering
>against mvv/lva and came to the same conclusion... At least that is
>written by FHH in his thesis.
>
>After doing a lot of test runs with statistic gathering code I decided
>to redesign DarkThought for mvv/lva. Before I generated all captures
>and did a maximum selection scan (mvv/lva) and before that I used a
>swapoff function - but I couldn't see any benefits from that because
>the trees were *not* much smaller and the runtime was of course worse
>than the simple mvv/lva scheme.
>
>|> This will make quite a difference in the size of the tree,
>|> from experience. Early Crafty had no such code, adding it cut the
>|> size of the tree remarkably. Once you can statically evaluate swaps,
>|> you will wonder how you did without it.
>
>We need to split the discussion into 2 subproblems:
>
>1. Effect of static capture evaluation on move ordering
>
>2. Effect of static capture evaluation on tree size because of forward
> pruning in the selective portion of the tree.
>
>|> BTW, the code does *not* have to be obtuse, nor inaccurate. It's
>|> easy to play out a sequence of captures on a given square, to see
>|> if the piece sitting there is safe or not.
>
>Of course it is relatively easy to code... But if you order your captures
>that way you have to generate them first, then evaluate them, then
>select the most promising one, then you can start to search and hopefully
>you get a cutoff, otherwise you have to select the next one.
>
>When using mvv/lva you can generate captures one by one and search
>directly after having generated a single move. If you get a cutoff
>right away you have saved a lot of work...
>
>-- Peter
>

First round of testing complete. First, let me describe the test
I ran, to compare mvv/lva ordering vs static exchange evaluation
ordering.

I have two search modules, Search() and Quiesce(). I *only*
modified Search() path for this test. Since I specifically
exclude captures that appear to "lose" in the q-search, I didn't
want to alter too much to run the test.

In my Next_Move() function, I was generating all capture moves
(no others just yet) and then using Swap() to order moves based
on expected gain. I replaced Swap() with a simple function "lva()"
that simply computed expected gain = value of attacked piece (MVV)
- value of attacking piece (LVA) and used this to sort the captures
as before. Note that this is a *better* approximation since a move
like QxP (pawn undefended) will now look better than QxR, while with
MVV/LVA, QxR would be done first (most valuable victim = rook, least
valuable attacker = Queen [the only attacker for this example]).

Running this test over several test positions resulted in search
trees that grew by a minimum of 25%. In no case was MVV/LVA anywhere
close to Swap(). Since Swap takes a minimal amount of time to
compute in Crafty, the result was slower searches. Nodes per second
went up somewhat, of course, because this approach is faster, but the
tree size grew enough to more than offset it.

Notice that I did *not* alter the Q-search yet. I then modified the
Next_Capture() code in crafty. Here, I had to make the previous
modification, plus not exclude any captures since I now don't have
a clue about which captures are good and which are not. Now the tree
really blew up, and ended up about double what I was searching before
I removed Swap(). My conclusion here is that Swap() is *more* than
worth the time spent using it to order moves. I agree that I could
speed the program up a little more by using MVV/LVA to generate
captures on the fly, but for this test, I ignored cpu time completely,
and simple measured the size of the tree for fixed-depth searches.

Now, it's time to test Dark Thought. Here's the problem that you
are going to have to solve: There are several captures that you
can make, including QxR, QxB, QxN, and QxP. The move at the previous
ply in the tree "hung the pawn" and taking it will refute that move
and cause a cutoff. However, your MVV/LVA algorithm is going to
try the other three captures first, even though they are all
capturing defended pieces, which loses material. I agree, that for
a "quick and dirty" approach, MVV/LVA works, but this (admittedly)
simple test seems to suggest that it is a long way from using a
good (fast) static exchange evaluator. This approach sped Crafty
up less than 5% overall, yet doubled the size of the tree. *NOT*
a good deal for me.

I'll try to run some more sophisticated tests, but, as I said before,
testing years ago led me to use a static exchange evaluator, and, so
far, nothing has changed my opinion. I can run most any type of
test or problem you want to use to compare notes with. Before we go
too far, however, we really need to run a couple of well-known
positions to a fixed depth to get a node count, so that we can compare
trees before we start fricking around with things. Kopec #22 is one
I use frequently. Crafty solves it in 8 plies, so a fixed-search to
depth=8 would give us a good tree size estimate, although only one
position is not enough. Even better would be to take Kopec 1-12
and run them to some fixed depth.

Bob

### Robert Hyatt

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <joes.423...@ultranet.com>,
Joe Stella <jo...@ultranet.com> wrote:
>In article <403b3i\$b...@pelham.cis.uab.edu>

MVV/LVA is a capture ordering idea developed by Ken Thompson when he
you probably remember the hardware discussion on "find victim" and
"find aggressor". victim is the most valuable piece that can be
captured, aggressor is the least valuable piece attacking this
victim. So, you simply find the most valuable piece being attacked,
and capture it with the least valuable available capturing piece.
Very simple, but error-prone. Of course, it was developed for the
special-purpose hardware in these chess machines.

the "type" nodes comes from the famous Knuth/Moore paper on alpha
beta years ago. The root node is a type-1 node. The first
successor of a type-1 node is a type-1 node. Other successors of
a type-1 node are type-2 nodes. For the rest of the tree, the
successor of a type-2 is type-3 and vice-versa.

The interesting part of this is that type-3 nodes need to have
every possible move examined, and move ordering is not important
at all. type-2 nodes need only have 1 branch evaluated, IF it's
a "good" branch. Type-1 nodes need to have all branches evaluated,
*AND* they have to be searched in good order.

### Jon Dart

Aug 7, 1995, 3:00:00 AM8/7/95
to

This has been an interesting discussion.

I tried one of the suggestions in this thread (Peter's, I think) about
using a hash table in the quiescence search. I put in a separate table
just for the quiescence search and after it filled, implemented a FIFO
replacement policy. I got about a 10% reduction in moves generated and
positions evaluated (the two statistics I keep), but the time to complete
a search actually went up! So the overhead of this technique seems to
cancel out its benefits, at least for me. Btw. I kept 2000 nodes in the
table - I also tried 4000 and 8000, with similar results.

### Peter Gillgasch

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <403bs9\$b...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|> In article <3vu00r\$9...@nz12.rz.uni-karlsruhe.de>,
|> Peter Gillgasch <gil...@ira.uka.de> wrote:
|> >Finally an interesting thread on computer chess again...
|> >
|> >In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|> >|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
|> >|> Larry Craighead <plug...@ix.netcom.com> wrote:
|> >|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
|> >|> >writes:
|> >|> >>Don't follow suggestions to implement static capture sequence evaluators.
|> >|> >>Big fuzz to implement, slow, unsecure, hard to predict. And inefficient
|> >|> >>in terms of move ordering (hard to believe but true).

[snip]

|> >I am convinced that mvv/lva is superior to anything else. AFAIK both
|> >the ChipTest and the Phoenix teams have checked square safety ordering
|> >against mvv/lva and came to the same conclusion... At least that is
|> >written by FHH in his thesis.
|>
|> Note completely relevant for us, however, as we don't have any hardware
|> constraints about what/how we do things. Hsu was trying to "force"
|> Belle onto a single chip (which he did after making a couple of cute
|> optimizations that Ken didn't catch). However, his hardware imposed
|> some constraints that we don't have to live with.

Correct. We have much more freedom since we are using software only.
But Hitech is using something like an SEE (although it is not that
sophisticated) and it's performance was a bit disappointing compared
to mvv-lva with killer heuristic when looking at the tree size.

|> As for it being "superior" I can't see how that can be true. It
|> might be "nearly as good", but, for absolutely certain, the tree
|> is going to be larger. How much is unknown, but it simply can't
|> be any better than a good static exchange evaluator, unless the
|> SEE is so slow that it hinders performance.

It is superior in the sense that it does not forward prune in the
quiesence (have you ever tried q-searches without static forward
pruning but with SEE ordering?) and still is blazingly fast. It is
superior in the sense that *no* time is wasted for move ordering in
the quiesence. It is superior in the sense that futility pruning is
a piece of cake. I am very interested in some data that supports your
claim that SEE is worth it. But again, we have to distinguish between
*forward pruning* savings and move ordering savings... I am not too
much interested in forward pruning savings...

|> I'll run some tests with Crafty to verify this, however, the cost
|> for "swap()" is not much in the current code; it's so low, in fact,
|> that it doesn't show up until the 2nd page in profile output. More
|> after I run some tests.

It is not only the cost of swap()... Think about how much code code be
removed if you don't have to worry about mover ordering when doing
captures in general and making sure that your futility pruning is
correct, and... The new bitboard search controller I wrote is about
15 k small (PowerPC) and this handles *everything* with the exception
of DoMove(), eval(), and generating non-capturing moves... In other
words it does repetition check, trans/ref handling, capture move
generation, move ordering (for non captures) and the alpha beta search
itself including keeping the size of the capture/selective search as
small as possible...

|> Note, that generating captures "one by one" is *not* a good thing
|> to do everywhere in the tree. for type=3 nodes, you have to generate
|> *all* moves anyway, so you simply add lots of extra procedure calls
|> into the mix. For type=2 nodes, yes, one at a time is better, but
|> it's not "cut and dried" due to the type-3 nodes where move ordering
|> is completely insignificant..

I haven't optimized anything for type-3 nodes... Probably I will look
at this as soon as the new engine is at least as functional as the old
one.

|> Rest assured that I've beat on this for years, as parallel searching
|> algorithms have to be *very* careful where to split the tree, choosing
|> a type=2 node results in no usefull parallel work. A single "wrong"
|> capture first kills me, for this very reason. To you, an extra 10%
|> might be acceptable if that costs less than your swap() function, but
|> to me it's going to really harm parallel performance.

No, 10% is *not* acceptable. Since DoMove() is pretty expensive
relative to the "cost" of it in the old code I try to avoid it
everywhere I can...

### Peter Gillgasch

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <jdartDC...@netcom.com>, jd...@netcom.com (Jon Dart) writes:
|>
|> This has been an interesting discussion.

It still is. Don't drop out early (^8

|> I tried one of the suggestions in this thread (Peter's, I think) about
|> using a hash table in the quiescence search. I put in a separate table
|> just for the quiescence search and after it filled, implemented a FIFO
|> replacement policy. I got about a 10% reduction in moves generated and
|> positions evaluated (the two statistics I keep), but the time to complete
|> a search actually went up! So the overhead of this technique seems to
|> cancel out its benefits, at least for me. Btw. I kept 2000 nodes in the
|> table - I also tried 4000 and 8000, with similar results.

Hm. Why did you use a seperate hash table? I am using the
"big one" and it is clearly a win. The only explanations
I have for your observations are the following:

1. Your eval function is so fast so that the gain is
cancelled since eval is faster than your hash table
lookup code. If this makes sense in your case then ensure
that you "stand pat" before the hash lookup (that could
give you a cutoff right away). I am doing it the other
way round because I have real *expensive* data in the hash
table and I *need* to use it before the evaluation function
is called - it has to do with disc seek time, so you get
the idea (^8

2. Your hash table lookup code is too slow. Possible
problem here: You implemented it in such a way that
you generate more than *one* move to verify a "hit".

When the node count drops and the move generation time drops
and the time to completion goes up this is clearly a hint
that something is really broken or poorly implemented.

I suggest to look at those explanations and to use a profiler
to get a better idea of the problem. Also, add more statistics,
you could use conditional compilation if you don't want them.

### Joe Stella

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <4062ic\$n...@nz12.rz.uni-karlsruhe.de>
gil...@ira.uka.de (Peter Gillgasch) writes:

>[...]

>It is superior in the sense that it does not forward prune in the
>quiesence (have you ever tried q-searches without static forward
>pruning but with SEE ordering?) and still is blazingly fast. It is
>superior in the sense that *no* time is wasted for move ordering in
>the quiesence. It is superior in the sense that futility pruning is
>a piece of cake.

>[...]

Incredibly Dumb Question #2:

What is SEE ordering?

Joe S.

### Peter Gillgasch

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <403gpp\$e...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|> In article <3vu00r\$9...@nz12.rz.uni-karlsruhe.de>,
|> Peter Gillgasch <gil...@ira.uka.de> wrote:

[big snipping here and there]

|> >Finally an interesting thread on computer chess again...
|> >
|> >In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
|> >|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
|> >|> Larry Craighead <plug...@ix.netcom.com> wrote:
|> >|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
|> >|> >writes:
|> >|> >>Don't follow suggestions to implement static capture sequence evaluators.

|> First round of testing complete. First, let me describe the test

|> I ran, to compare mvv/lva ordering vs static exchange evaluation
|> ordering.

Interesting. I am pretty amazed that you are willing to look at
one of your major design decisions in Cray Blitz / Crafty.

|> I have two search modules, Search() and Quiesce(). I *only*
|> modified Search() path for this test. Since I specifically
|> exclude captures that appear to "lose" in the q-search, I didn't
|> want to alter too much to run the test.

While I can understand that you are not interested in altering too
much this has the unwanted effect that your search path ends often
in a position where you have a capture that works but that is turned
down by ripoff(). In the next iteration I get it from the trans/ref,
while you will (maybe) search another capture first.

|> In my Next_Move() function, I was generating all capture moves
|> (no others just yet) and then using Swap() to order moves based
|> on expected gain. I replaced Swap() with a simple function "lva()"
|> that simply computed expected gain = value of attacked piece (MVV)
|> - value of attacking piece (LVA) and used this to sort the captures
|> as before. Note that this is a *better* approximation since a move
|> like QxP (pawn undefended) will now look better than QxR, while with
|> MVV/LVA, QxR would be done first (most valuable victim = rook, least
|> valuable attacker = Queen [the only attacker for this example]).

Just to check that I understood this correctly:

What you did before was using a static exchange evaluator that enumerates
the capture sequence and then uses simple minimaxing to compute the estimated
value of the capture.

What you are doing now for the experiment is that you compute the material
difference between victim and aggressor.

Unfortunately if this is correct it is *not* mvv-lva. mvv-lva is picking
the most valuable piece as victim and the lowest valued piece as aggressor.
When writing down the ever possible captures in a priority list (highest
priority first) you get the following:

KxQ, PxQ, NxQ, BxQ, RxQ, QxQ,
KxR, PxR, NxR, BxR, RxR, QxR,
KxB, PxB, NxB, BxB, RxB, QxB,
KxN, PxN, NxN, BxN, RxN, QxN,
KxP, PxP, NxP, BxP, RxP, QxP

This is clearly something different (e.g. QxQ has the same priority as RxR in
your experiment) and there is a paper in an old ICCA Journal that compares
capture ordering schemes in the capture search - this material difference

Note that it is called DarkThought. AppleSpeak (^8

I aggree that you will often make a "saner" capture first. The question
is: does it matter? Where in the tree is it important? At CUT nodes and
at PV nodes. Most of the time the new ground the search is breaking is
close to the horizon (or beyond). Most positions there are so horrible
loopsided that it does not matter. When I pull a move from the hash table
it is most of the time "good enough". When I am far away from the horizon
I do internal iterative deepening (when some preconditions hold that are
not the topic of this discussion) so a "bad" capture won't result in a
full blown search that goes into the wrong direction.

|> I'll try to run some more sophisticated tests, but, as I said before,
|> testing years ago led me to use a static exchange evaluator, and, so
|> far, nothing has changed my opinion. I can run most any type of
|> test or problem you want to use to compare notes with. Before we go
|> too far, however, we really need to run a couple of well-known
|> positions to a fixed depth to get a node count, so that we can compare
|> trees before we start fricking around with things. Kopec #22 is one
|> I use frequently. Crafty solves it in 8 plies, so a fixed-search to
|> depth=8 would give us a good tree size estimate, although only one
|> position is not enough. Even better would be to take Kopec 1-12
|> and run them to some fixed depth.

Ok. To allow *everybody* to get a similar setting I ran this positions
*brute force* with check extensions only, no null moves, only captures
in the quiesence (and I talk about *all* captures that can change the
value of the root). Note that I am using an old (about 5 days.
That is pretty old in my world) "prerelease" version of
the new engine we will be using in the next tourney. This old code
is pretty slow on the 175 mhz Alpha compared to what I have now
at home on my PowerMacintosh... The evaluation is not very good
and incomplete right now (I integrate an enhanced version of the
evaluation we used in the Hong Kong code pretty soon). In short,
take it with a grain of salt and don't think that this is the end
of the road for me and my team mates... This engine still lacks some
of the cute stuff that reduced the tree size in the old code, but I
think that it is good enough to compare results.

BK #12

+---+---+---+---+---+---+---+---+
8 |*R |:::| |:::|*R |:::|*K |:::|
+---+---+---+---+---+---+---+---+
7 |*P:|*P |*Q:|*B |:::|*P |*P:|*P |
+---+---+---+---+---+---+---+---+
6 | |:::| |:::| |:::| |:::|
+---+---+---+---+---+---+---+---+
5 |:::| |:::| |*P:| |:N:| Q |
+---+---+---+---+---+---+---+---+
4 | |:::| |:::| |:::| |:::|
+---+---+---+---+---+---+---+---+
3 |:::| |:P:| |:::| |:::| |
+---+---+---+---+---+---+---+---+
2 | P |:P:| |:::| |:P:| P |:P:|
+---+---+---+---+---+---+---+---+
1 |:R:| |:::| |:R:| |:K:| |
+---+---+---+---+---+---+---+---+
A B C D E F G H
Black moves Bf5
* 38 legal moves.
* evaluation is -15 (-0.06 pawns).

00:00:00 1 ordering search bd7-f5 -> 3
infinite timing
00:00:00 2 ordering search bd7-f5 -> -21
00:00:00 3 [ -53 75] ( 1) bd7-f5 -> 2
00:00:00 4 [ -62 66] ( 1) bd7-f5 -> -11
00:00:00 5 [ -43 85] ( 1) bd7-f5 -> -15
00:00:01 6 [ -79 49] ( 1) bd7-f5 -> -8
00:00:08 7 [ -40 88] ( 1) bd7-f5 -> 7
00:00:28 8 [ -57 71] ( 1) bd7-f5 -> 2
3286806 n 21477.64 nps
runtime 153.03 seconds

To scare the <censored> out of the competition here is the
same with check extensions, recaptures and null moves. The
current version is much faster but I was too lazy to install
it here on our unix boxes... (^8

Black moves Bf5
* 38 legal moves.
* evaluation is -15 (-0.06 pawns).

00:00:00 1 ordering search bd7-f5 -> 3
00:00:00 2 ordering search bd7-f5 -> -21
00:00:00 3 [ -53 75] ( 1) bd7-f5 -> 2
00:00:00 4 [ -62 66] ( 1) bd7-f5 -> -11
00:00:00 5 [ -43 85] ( 1) bd7-f5 -> -15
00:00:00 6 [ -79 49] ( 1) bd7-f5 -> -8
00:00:02 7 [ -40 88] ( 1) bd7-f5 -> -2
00:00:06 8 [ -66 62] ( 1) bd7-f5 -> 0
572402 n 21355.19 nps
runtime 26.80 seconds

### Robert Hyatt

Aug 7, 1995, 3:00:00 AM8/7/95
to
In article <4062ic\$n...@nz12.rz.uni-karlsruhe.de>,
Peter Gillgasch <gil...@ira.uka.de> wrote:

<snip>

>
>|> >I am convinced that mvv/lva is superior to anything else. AFAIK both
>|> >the ChipTest and the Phoenix teams have checked square safety ordering
>|> >against mvv/lva and came to the same conclusion... At least that is
>|> >written by FHH in his thesis.
>|>

>|> Not completely relevant for us, however, as we don't have any hardware

>|> constraints about what/how we do things. Hsu was trying to "force"
>|> Belle onto a single chip (which he did after making a couple of cute
>|> optimizations that Ken didn't catch). However, his hardware imposed
>|> some constraints that we don't have to live with.
>
>Correct. We have much more freedom since we are using software only.
>But Hitech is using something like an SEE (although it is not that
>sophisticated) and it's performance was a bit disappointing compared
>to mvv-lva with killer heuristic when looking at the tree size.

Now I'm really confused. MVV/LVA has nothing to do with the "killer
Heuristic". SEE without killer vs MVV/LVA with killer doesn't tell
me anything. You need to change only one thing at a time. I'm
using both killers and history moves in crafty. Switching nothing
but the move ordering to MVV/LVA costs me a factor of 2x in nodes
while speeding up the nodes per second less than 5%.

>
>
>|> As for it being "superior" I can't see how that can be true. It
>|> might be "nearly as good", but, for absolutely certain, the tree
>|> is going to be larger. How much is unknown, but it simply can't
>|> be any better than a good static exchange evaluator, unless the
>|> SEE is so slow that it hinders performance.
>
>It is superior in the sense that it does not forward prune in the
>quiesence (have you ever tried q-searches without static forward
>pruning but with SEE ordering?) and still is blazingly fast. It is

Yes I have. I did this in Cray Blitz for years. I later started
culling "losing" (according to SEE) captures, and then, after several
thousand test positions, started culling "swaps" after the first four
plies of q-search, leaving only (SEE) winning captures plus the other
things like checks and whatever.

>superior in the sense that *no* time is wasted for move ordering in
>the quiesence. It is superior in the sense that futility pruning is
>a piece of cake. I am very interested in some data that supports your
>claim that SEE is worth it. But again, we have to distinguish between
>*forward pruning* savings and move ordering savings... I am not too
>much interested in forward pruning savings...

Note that the results I posted after the above were based *only*
on move ordering, with no changes to futility cutoffs or anything
else... I still used the Swap() function for everything *but*
move ordering. The point: MVV/LVA doubled the tree, with *no*
other changes. Whether or not I use this type of capture ordering,
I can still use SEE elsewhere, so that's not an issue. My claim was
simply that switching the capture move ordering (both in basic part
of the tree *and* q-search) to MVV/LVA doubled the size of the tree
without speeding up the program much at all. To produce usable data
here, you need to make only one change at a time, and then measure
what effect it had. Don't modify futility code, forward pruning,
move ordering (such as killers, etc.) or anything else. The question
at hand is "does MVV/LVA perform close to SEE ordering for any known
chess program?"

>
>|> I'll run some tests with Crafty to verify this, however, the cost
>|> for "swap()" is not much in the current code; it's so low, in fact,
>|> that it doesn't show up until the 2nd page in profile output. More
>|> after I run some tests.
>
>It is not only the cost of swap()... Think about how much code code be
>removed if you don't have to worry about mover ordering when doing
>captures in general and making sure that your futility pruning is
>correct, and... The new bitboard search controller I wrote is about
>15 k small (PowerPC) and this handles *everything* with the exception
>of DoMove(), eval(), and generating non-capturing moves... In other
>words it does repetition check, trans/ref handling, capture move
>generation, move ordering (for non captures) and the alpha beta search
>itself including keeping the size of the capture/selective search as
>small as possible...

Of course, you don't win prizes for small code, only for winning
games. :^)

However, MVV/LVA doesn't affect half of the things above by necessity.
For example, what does it have to do with futility cutoffs and what does
it simplify that SEE doesn't? They both give an estimate of the benefit
of a capture, with SEE being more accurate. However, this does not
affect the *algorithm* but only the *results.* I still have to worry
about futility cutoffs being right with either ordering strategy. It's
*not* much code in Crafty.

>
>|> Note, that generating captures "one by one" is *not* a good thing
>|> to do everywhere in the tree. for type=3 nodes, you have to generate
>|> *all* moves anyway, so you simply add lots of extra procedure calls
>|> into the mix. For type=2 nodes, yes, one at a time is better, but
>|> it's not "cut and dried" due to the type-3 nodes where move ordering
>|> is completely insignificant..
>
>I haven't optimized anything for type-3 nodes... Probably I will look
>at this as soon as the new engine is at least as functional as the old
>one.

In your case, it will probably crush your program, since you move
ordering is already not so good (for captures) thanks to the MVV/LVA
code. If you try to predict type-3 nodes with MVV/LVA it is going to
be wrong so many times you will choke the search, since you will not
be ordering moves, yet you need to because the type-3 node is going
to turn out to be a type-2, etc.

>
>|> Rest assured that I've beat on this for years, as parallel searching
>|> algorithms have to be *very* careful where to split the tree, choosing
>|> a type=2 node results in no usefull parallel work. A single "wrong"
>|> capture first kills me, for this very reason. To you, an extra 10%
>|> might be acceptable if that costs less than your swap() function, but
>|> to me it's going to really harm parallel performance.
>
>No, 10% is *not* acceptable. Since DoMove() is pretty expensive
>relative to the "cost" of it in the old code I try to avoid it
>everywhere I can...

If 10% is not acceptable, you need to run the same tests I did,
order with MVV/LVA, and then with a static exchange evaluator. My
trees doubled, which was *not* acceptable. I talked with Bruce
Moreland (Ferret). He also uses static exchange evaluation to
order the tree, and his tests also produced bigger trees. Something
is wrong somewhere. I can understand why hardware uses such an
algorithm, since static exchange would take a major part of the
chip's transistor count to do, but I can't understand how/why a
software program uses MVV/LVA; even more important, I can't see
how it can be as good as a static exchange evaluator. Before we
can go on, we need data from Dark Thought for both cases. Don't
worry about performance right now, just the number of nodes in
the tree. You should be able to grab Craftys SEE code and make
it work since you already have the bitboard stuff done. I'm
interested in what's going on inside your tree if SEE is not
roughly 1/2 the tree size of MVV/LVA.

### Robert Hyatt

Aug 8, 1995, 3:00:00 AM8/8/95
to
In article <405ven\$n...@nz12.rz.uni-karlsruhe.de>,

Note that what I did is actually better than MVV/LVA, in that it will
search nxn (score=0) *before* QxR (rook defended, but score=-4). I
also tried "pure" MVV/LVA yesterday, by taking the list of captures,
finding the highest-valued victim, then trying the lowest valued
capturing piece first. No noticable change. This was when I asked
Ferret (Bruce Moreland) and he had found the same thing.

You've said this before, but I have yet to see a program that finds
even 1/3 of the moves in the trans/ref table, for normal opening and
middlegame positions. Cray Blitz has generally set the standard
here, hashing about 1/3 of middlegame positions (20-30%). Most
programs hit around 10-15%.

The real problem is trying a bad capture at a "cut" (type=2) node
and the capture fails to cause a cutoff as it should. This turns
the following type=3 node into a type=2 node, and ifyou blow the
ordering there, more trouble, particularly for a parallel search.

Note that I'm more concerned about the tree size than anything, as
is necessary for any search that is going to be done in parallel.
If you blow type-2 nodes, you blow the search.

Here's crafty, normal search with null-moves, etc enable. Running on
a Sparc2 (about 1/3 the machine used on ICC/FICS):

Crafty v6.3

time surplus 0.0s time limit 166:39 [easy move]
depth time value variation (1)
3 0.3s -0.079 Bf5 Qf3 Qd7
3-> 0.6s -0.079 Bf5 Qf3 Qd7
4 0.9s -0.291 Bf5 Nf3 Bg6 Qg4
4-> 2.9s -0.291 Bf5 Nf3 Bg6 Qg4
5 4.0s -0.161 Bf5 Nf3 Bg6 Qg5 f6
5-> 7.0s -0.161 Bf5 Nf3 Bg6 Qg5 f6
6 10.1s -0.276 Bf5 Nf3 Bg6 Qg5 Qb6 b4
6-> 18.5s -0.276 Bf5 Nf3 Bg6 Qg5 Qb6 b4
7 43.7s -0.040 Bf5 Nf3 Bg6 Qg5 Qb6 Re2 Bd3
7-> 52.1s -0.040 Bf5 Nf3 Bg6 Qg5 Qb6 Re2 Bd3
8 1:41 -0.215 Bf5 Nf3 Bg6 Qg5 f6 Qg4 e4 Nd4
8-> 2:32 -0.215 Bf5 Nf3 Bg6 Qg5 f6 Qg4 e4 Nd4
time: 2:32 cpu:98% mat:0 n:458198/176807 maxd:23 nps:3008
ext-> checks:2757 recaps:208 pawns:0
hashing-> trans/ref:10% pawn:96% king:91% evals:1%
hashing-> value:0% bound:2% cutoff:4% used:w99% b99%
nulls used[68520/147279] wasted[13931/44978]

Here's Crafty, running on A Cray T90: (same problem)

Crafty v6.3

time surplus 0.0s time limit 166:39 [easy move]
depth time value variation (1)
3 0.0s -0.079 Bf5 Qf3 Qd7
3-> 0.0s -0.079 Bf5 Qf3 Qd7
4 0.0s -0.291 Bf5 Nf3 Bg6 Qg4
4-> 0.0s -0.291 Bf5 Nf3 Bg6 Qg4
5 0.0s -0.161 Bf5 Nf3 Bg6 Qg5 f6
5-> 0.0s -0.161 Bf5 Nf3 Bg6 Qg5 f6
6 0.0s -0.276 Bf5 Nf3 Bg6 Qg5 Qb6 b4
6-> 0.0s -0.276 Bf5 Nf3 Bg6 Qg5 Qb6 b4
7 0.0s -0.040 Bf5 Nf3 Bg6 Qg5 Qb6 Re2 Bd3
7-> 0.0s -0.040 Bf5 Nf3 Bg6 Qg5 Qb6 Re2 Bd3
8 0.0s -0.215 Bf5 Nf3 Bg6 Qg5 f6 Qg4 e4 Nd4
8-> 0.0s -0.215 Bf5 Nf3 Bg6 Qg5 f6 Qg4 e4 Nd4
time: 0.0s cpu:3198% mat:0 n:458198/176807 maxd:23 nps:3139008
ext-> checks:2757 recaps:208 pawns:0
hashing-> trans/ref:10% pawn:96% king:91% evals:1%
hashing-> value:0% bound:2% cutoff:4% used:w99% b99%
nulls used[68520/147279] wasted[13931/44978]

Note, we normally are searching around 5M nps here, but due to the
short search time, nps is calculated with a *very* small clock value, which
produces a figure that's too small. We have to search at least 10
plies to get a non-zero search time printed in this position.

Will run it without null moves when I reach the office.

>
>
>-- Peter
>
>------------------------------------------------------------
>Peter W. Gillgasch
>Klosterweg 28/I-113 email gil...@ira.uka.de
>76131 Karlsruhe/Germany Phone ++49/(0)721/6904255

### Robert Hyatt

Aug 8, 1995, 3:00:00 AM8/8/95
to
In article <jdartDC...@netcom.com>, Jon Dart <jd...@netcom.com> wrote:
>
>This has been an interesting discussion.
>
>I tried one of the suggestions in this thread (Peter's, I think) about
>using a hash table in the quiescence search. I put in a separate table
>just for the quiescence search and after it filled, implemented a FIFO
>replacement policy. I got about a 10% reduction in moves generated and
>positions evaluated (the two statistics I keep), but the time to complete
>a search actually went up! So the overhead of this technique seems to
>cancel out its benefits, at least for me. Btw. I kept 2000 nodes in the
>table - I also tried 4000 and 8000, with similar results.
>
>--Jon
>
>--
>-----------------------------------
>Jon Dart jd...@netcom.com

You shouldn't keep two separate transposition/refutation tables. Use the
same one, and you will find even more transpositions since captures near the
horizon can transpose with captures in the q-search. The only "bother" is
that you have to "screen" non-captures (or non-checks if you include
checking moves here) when in the q-search. I've done this from day one in
CB (and now in Crafty) and it helps.

### Robert Hyatt

Aug 8, 1995, 3:00:00 AM8/8/95
to
In article <407oui\$2...@pelham.cis.uab.edu>,

Robert Hyatt <hy...@willis.cis.uab.edu> wrote:
>In article <405ven\$n...@nz12.rz.uni-karlsruhe.de>,
>Peter Gillgasch <gil...@ira.uka.de> wrote:
>>In article <403gpp\$e...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>|> In article <3vu00r\$9...@nz12.rz.uni-karlsruhe.de>,
>>|> Peter Gillgasch <gil...@ira.uka.de> wrote:
>>
>>[big snipping here and there]
>>
>>|> >Finally an interesting thread on computer chess again...
>>|> >
>>|> >In article <3vrcq7\$p...@pelham.cis.uab.edu>, hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>|> >|> In article <3vr7t9\$8...@ixnews4.ix.netcom.com>,
>>|> >|> Larry Craighead <plug...@ix.netcom.com> wrote:
>>|> >|> >In <3vn08b\$g...@nz12.rz.uni-karlsruhe.de> gil...@ira.uka.de (Peter Gillgasch)
>>|> >|> >writes:
>>|> >|> >>Don't follow suggestions to implement static capture sequence evaluators.
>>
>>|> First round of testing complete. First, let me describe the test
>>|> I ran, to compare mvv/lva ordering vs static exchange evaluation
>>|> ordering.
>>
>>Interesting. I am pretty amazed that you are willing to look at
>>one of your major design decisions in Cray Blitz / Crafty.

Note: From "day one" Crafty has been designed with the Software Engineering
principle of "design for change" in mind. As a result, making such tests al
this is trivial. I can't count the number of different things I've tried in
Crafty and then either kept them or canned them depending on the results. this
is one of the reasons it's written as it is, so I can try things. CB was
nearly impossible to modify, making such tests extremely difficult and
costly if they didn't work out. BTW, this is *not* a "major" design decision,
it's actually a trivial issue of move ordering. I used MVV/LVA early in the
development of Crafty, but found that I could reduce the trees significantly
with the static exchange evaluator approach I had used in Cray Blitz.

>>
>>|> I have two search modules, Search() and Quiesce(). I *only*
>>|> modified Search() path for this test. Since I specifically
>>|> exclude captures that appear to "lose" in the q-search, I didn't
>>|> want to alter too much to run the test.
>>
>>While I can understand that you are not interested in altering too
>>much this has the unwanted effect that your search path ends often
>>in a position where you have a capture that works but that is turned
>>down by ripoff(). In the next iteration I get it from the trans/ref,
>>while you will (maybe) search another capture first.

However, this argument is faulty. What you are saying (in effect) is that
if you search an inferior move now, you might be able to look it up in the
trans/ref table later. But suppose you don't search beyond the current
iteration? Even worse, put this poor move in the exhaustive part of the
tree, and the branch grown can be significant, and it's all wasted effort.

>>
>>|> In my Next_Move() function, I was generating all capture moves
>>|> (no others just yet) and then using Swap() to order moves based
>>|> on expected gain. I replaced Swap() with a simple function "lva()"
>>|> that simply computed expected gain = value of attacked piece (MVV)
>>|> - value of attacking piece (LVA) and used this to sort the captures
>>|> as before. Note that this is a *better* approximation since a move
>>|> like QxP (pawn undefended) will now look better than QxR, while with
>>|> MVV/LVA, QxR would be done first (most valuable victim = rook, least
>>|> valuable attacker = Queen [the only attacker for this example]).
>>
>>Just to check that I understood this correctly:
>>
>>What you did before was using a static exchange evaluator that enumerates
>>the capture sequence and then uses simple minimaxing to compute the estimated
>>value of the capture.

yes.

>>
>>What you are doing now for the experiment is that you compute the material
>>difference between victim and aggressor.
>>
>>Unfortunately if this is correct it is *not* mvv-lva. mvv-lva is picking
>>the most valuable piece as victim and the lowest valued piece as aggressor.
>>When writing down the ever possible captures in a priority list (highest
>>priority first) you get the following:
>>
>>KxQ, PxQ, NxQ, BxQ, RxQ, QxQ,
>>KxR, PxR, NxR, BxR, RxR, QxR,
>>KxB, PxB, NxB, BxB, RxB, QxB,
>>KxN, PxN, NxN, BxN, RxN, QxN,
>>KxP, PxP, NxP, BxP, RxP, QxP

I hope you don't put Kx first... that is *not* the lowest-valued attacker.

>>
>>This is clearly something different (e.g. QxQ has the same priority as RxR in
>>your experiment) and there is a paper in an old ICCA Journal that compares
>>capture ordering schemes in the capture search - this material difference

I don't see how one is worse than the other.. it's going to depend on the
position. playing QxR before QxQ is worse if the rook is defended, better
if the rook is hanging. However, as I said, I went back and tried pure MVV/LVA
as well and saw no difference between the two.

programs hit around 10-15%. It's always been significantly better than
other programs since it uses the vector hardware on the Cray to do a much
better job of handling different keys that hash to the same memory address.
I can check 16 different table entries just as quickly as I can check only
one.

Note that this is not a great test position. 1 Queen, two rooks, and one
minor piece don't leave a lot of room for captures, particularly with mate
threats at h7/f7. Kopec22 is better, more captures, no mates, etc.

### Johanes Suhardjo

Aug 8, 1995, 3:00:00 AM8/8/95
to
On 8 Aug 1995, Robert Hyatt wrote:
> In article <405ven\$n...@nz12.rz.uni-karlsruhe.de>,
> Peter Gillgasch <gil...@ira.uka.de> wrote:
> >
> >BK #12
> >
> > +---+---+---+---+---+---+---+---+
> >8 |*R |:::| |:::|*R |:::|*K |:::|
> > +---+---+---+---+---+---+---+---+
> >7 |*P:|*P |*Q:|*B |:::|*P |*P:|*P |
> > +---+---+---+---+---+---+---+---+
> >6 | |:::| |:::| |:::| |:::|
> > +---+---+---+---+---+---+---+---+
> >5 |:::| |:::| |*P:| |:N:| Q |
> > +---+---+---+---+---+---+---+---+
> >4 | |:::| |:::| |:::| |:::|
> > +---+---+---+---+---+---+---+---+
> >3 |:::| |:P:| |:::| |:::| |
> > +---+---+---+---+---+---+---+---+
> >2 | P |:P:| |:::| |:P:| P |:P:|
> > +---+---+---+---+---+---+---+---+
> >1 |:R:| |:::| |:R:| |:K:| |
> > +---+---+---+---+---+---+---+---+
> > A B C D E F G H
> >Black moves Bf5

Can a newbie join the fun? Okay, my program's (a hacked up gnuchess 4.0 patch
level 17) result on a SPARC 10/40 is:
Move# 1 Target= 29996750 Clock: 59994000
1. -206 0 472 c7c4
2. -86 0 1128 d7f5 a1d1
3& -54 0 2554 d7f5 a1d1 h7h6
3. -54 0 3998 d7f5 a1d1 h7h6
4& -45 1 8701 d7f5 a1c1 h7h6 c1d1
4. -45 1 10488 d7f5 a1c1 h7h6 c1d1
5& -35 3 32055 d7f5 h5f3 f5g6 a1d1 g8h8
5. -35 3 39183 d7f5 h5f3 f5g6 a1d1 g8h8
6& -38 7 104103 d7f5 h5f3 f5g6 a1d1 g8h8 d1d5
6. -38 9 142406 d7f5 h5f3 f5g6 a1d1 g8h8 d1d5
7& -8 20 334677 d7f5 h5f3 f5g6 a1d1 h7h6 g5e4 c7c4
7. -8 30 473780 d7f5 h5f3 f5g6 a1d1 h7h6 g5e4 c7c4
8& 0 108 1719540 d7f5 h5f3 f5g6 a1d1 f7f6 g5e4 a8d8 d1d8
e8d8
8. 0 135 2211309 d7f5 h5f3 f5g6 a1d1 f7f6 g5e4 a8d8 d1d8
e8d8
Nodes 2211309 Tree 538 Eval 266472 Rate 16380 RS high 1 low 2
Hin/Hout/Coll/Fin/Fout = 101190/64924/0/0/0
My move is: d7f5

Pretty bad, huh? It is still better though than the current gnuchess (4.0
patch level 74): 4521467 nodes in 332 seconds. I'm sure when I put in some of
the "tricks" mention in this thread I'll be able to improve my program.

P.S. Bob, I want to try crafty on my machine, what are the commands to run
BK#12? I'm rather confused with the "setboard" command.

Johanes Suhardjo (joh...@farida.cc.nd.edu)
--
Frankfort, Kentucky, makes it against the law to shoot off a
policeman's tie.

### Feng-Hsiung Hsu

Aug 8, 1995, 3:00:00 AM8/8/95
to
In article <406n5q\$2...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>Now I'm really confused. MVV/LVA has nothing to do with the "killer
>Heuristic". SEE without killer vs MVV/LVA with killer doesn't tell
>me anything. You need to change only one thing at a time. I'm
>using both killers and history moves in crafty. Switching nothing
>but the move ordering to MVV/LVA costs me a factor of 2x in nodes
>while speeding up the nodes per second less than 5%.

Strange. For me, it would be the other way around:). Twice as slow, but
only 5% reduction in node count:). A few things come into play here.
Question one is whether you use internal interative deepening. (My guess
is you don't.) Question two is what kind of square ordering you are using.
(My guess is you are using a somewhat random square ordering). Question
three is what kind of pruning is active. The pruning we used reduces the
subtree below a bad capture by almost one ply. Question four is how deep
you are searching. The node count reduction gets monotonically smaller
as the search depth is increased. At shallow depths, we saw more than 10%
reduction, but at depths of interest, it was down to 5%.

Schaeffer was claiming tree sizes within 1.5 times optimal with MVV/LVA, and if
you saw a factor of two, then you had a move ordering that is 1.35 times
better than optimal:).

>Yes I have. I did this in Cray Blitz for years. I later started
>culling "losing" (according to SEE) captures, and then, after several
>thousand test positions, started culling "swaps" after the first four
>plies of q-search, leaving only (SEE) winning captures plus the other
>things like checks and whatever.

Hm, is that the reason Cray Blitz did not see why it was losing major material
for more than 10 moves in THAT game:)? Just kidding. :)

### Robert Hyatt

Aug 8, 1995, 3:00:00 AM8/8/95
to
In article <joes.426...@ultranet.com>,

Joe Stella <jo...@ultranet.com> wrote:
>In article <4062ic\$n...@nz12.rz.uni-karlsruhe.de>
>gil...@ira.uka.de (Peter Gillgasch) writes:
>
>>[...]
>>It is superior in the sense that it does not forward prune in the
>>quiesence (have you ever tried q-searches without static forward
>>pruning but with SEE ordering?) and still is blazingly fast. It is
>>superior in the sense that *no* time is wasted for move ordering in
>>the quiesence. It is superior in the sense that futility pruning is
>>a piece of cake.
>>[...]
>
>Incredibly Dumb Question #2:
>
>What is SEE ordering?
>
> Joe S.
>

YATLA. (Yet Another Three Letter Acronym :^) )

Seriously, it stands for Static Exchange Evaluation.

MVV/LVA is Most Valuable Victim (captured piece) Least Valuable
Attacker (capturing piece).

Aug 9, 1995, 3:00:00 AM8/9/95
to
In <409365\$3...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert

Hyatt) writes:
>
>In article <joes.426...@ultranet.com>,
>Joe Stella <jo...@ultranet.com> wrote:
>>In article <4062ic\$n...@nz12.rz.uni-karlsruhe.de>
>>gil...@ira.uka.de (Peter Gillgasch) writes:
>>
>>>[...]
>>>It is superior in the sense that it does not forward prune in the
>>>quiesence (have you ever tried q-searches without static forward
>>>pruning but with SEE ordering?) and still is blazingly fast. It is
>>>superior in the sense that *no* time is wasted for move ordering in
>>>the quiesence. It is superior in the sense that futility pruning is
>>>a piece of cake.
>>>[...]
>>
>>Incredibly Dumb Question #2:
>>
>>What is SEE ordering?
>>
>> Joe S.
>>
>
>
>YATLA. (Yet Another Three Letter Acronym :^) )
>
>Seriously, it stands for Static Exchange Evaluation.
>
>MVV/LVA is Most Valuable Victim (captured piece) Least Valuable
>Attacker (capturing piece).

And what about those "type 2" and "type 3" nodes?

### Gijsb. Wiesenekker

Aug 9, 1995, 3:00:00 AM8/9/95
to
Peter Gillgasch (gil...@ira.uka.de) wrote:
(stuff deleted)
: +---+---+---+---+---+---+---+---+

Here is the ZZZZZZ output (null move, check extensions,
in the quiscence also check extensions, captures sorted by
material-value-captured-piece).

1/0 0.00 3 <= -900 c7c3
1/0 0.00 35 == -1000 c7c3
1/1 0.00 61 >= -999 a7a6
1/1 0.00 119 == -100 a7a6
1/11 0.00 269 >= -99 d7c6
1/11 0.00 324 == -98 d7c6
1/15 0.00 483 >= -97 d7f5
1/15 0.00 531 == 2 d7f5
2/0 0.00 764 == 0 d7f5
3/0 0.00 2789 == 0 d7f5
4/0 0.21 8930 == 0 d7f5
5/0 0.42 24850 == 0 d7f5
6/0 2.31 99867 == 0 d7f5
7/0 5.67 212725 == 0 d7f5
8/0 12.60 492415 == 0 d7f5
nodes per second=38024.62

(numbers mean ply/move time nodes == score best-move)

Gijsbert

### Thorsten Greiner

Aug 9, 1995, 3:00:00 AM8/9/95
to

My chess program AmyII uses what I would like to call a 'first order
approximation' to a static exchange evaluator: AmyII scans all 'capture
targets' in decreasing value: first queen, than rook, bishop,...

In the first sweep, all gaining or even captures are examined. These are all
captures of undefended pieces, or if the target is defended, captures which
will not loose material (i.e. RxQ or BxN will be searched, but not QxR if the
rook is defended). In the second sweep, the rest of the captures are searched.

This ordering applies to the ordinary full width search as well as to the
quiescence search.

I plan to implement a static exchange evaluator a la crafty's Swap() and see
wether this does better.

Bob Hyatt proposed using the test position #22 from the Bratko-Kopec test to
compare the nodes searched for equal depths. Unfortunately, not all programs
find the correct move at the same depth. As an example, I included the search
reports from AmyII and from crafty (v6.3). AmyII finds the correct move Bxe4
in depth 6 and needs about 600,000 nodes to complete depth 8, while crafty
finds it only in depth 8, and owing to the frequent researches needs twice as
much nodes as AmyII to complete depth 8.

A better test position might be the following, which I took from the ICCA
journal (don't have the exact source handy, if interested, I will look it up):

epd: r4rk1/1pqbbppp/p3pn2/4n3/3N1B2/2N3Q1/PPP1B1PP/R4RK1 b - - bm Bd6;

+---+---+---+---+---+---+---+---+
|*R*| | | | |*R*|*K*| |
+---+---+---+---+---+---+---+---+
| |*P*|*Q*|*B*|*B*|*P*|*P*|*P*|
+---+---+---+---+---+---+---+---+
|*P*| | | |*P*|*N*| | |
+---+---+---+---+---+---+---+---+
| | | | |*N*| | | |
+---+---+---+---+---+---+---+---+ black to move
| | | | N | | B | | |
+---+---+---+---+---+---+---+---+
| | | N | | | | Q | |
+---+---+---+---+---+---+---+---+
| P | P | P | | B | | P | P |
+---+---+---+---+---+---+---+---+
| R | | | | | R | K | |
+---+---+---+---+---+---+---+---+

The best and only move here is Bd6, while there are enough pieces on the board
to see how the capture search does. (In this position, AmyII needs about 3
million nodes to complete depth 8)

Just my 2 Pfennige...

-Thorsten

Results for Bratko-Kopec #22:

+---+---+---+---+---+---+---+---+
| | |*R*| | |*R*|*K*| |
+---+---+---+---+---+---+---+---+
| |*B*|*Q*|*N*|*B*|*P*|*P*| |
+---+---+---+---+---+---+---+---+
| |*P*| |*P*|*P*|*N*| |*P*|
+---+---+---+---+---+---+---+---+
|*P*| P | | | | | | |
+---+---+---+---+---+---+---+---+
| N | | P | | P | | | |
+---+---+---+---+---+---+---+---+
| P | | | B | | N | | P |
+---+---+---+---+---+---+---+---+
| | B | | | Q | P | P | |
+---+---+---+---+---+---+---+---+
| R | | | R | | | K | |
+---+---+---+---+---+---+---+---+

AmyII:

BF/SL Time Score Nodes principal Variation
1/ 1 00:00:00 0.149 490 1. ... Nc5 MaxTime = 00:05:00, MaxExt = 2
2/ 2 00:00:00 0.159 1138 1. ... Nc5 2. Nxc5 Qxc5
2/ 2 00:00:00 0.159 2044 1. ... Nc5 MaxTime = 00:05:00, MaxExt = 12
3/ 4 00:00:00 -0.177 2871 1. ... Nc5 2. Nxc5 Qxc5 3. Bd4
3/ 4+ 00:00:01 -0.158 10517 1. ... Rfe8 ( 5/33)
3/ 4 00:00:01 -0.014 16238 1. ... Rfe8 2. Qe3 Nc5
3/ 5 00:00:01 -0.014 19281 1. ... Rfe8 MaxTime = 00:05:00, MaxExt = 12
4/ 5 00:00:02 -0.014 22828 1. ... Rfe8 2. Qe3 Nc5 3. Nxc5 Qxc5
4/ 7 00:00:02 -0.014 26103 1. ... Rfe8 MaxTime = 00:05:00, MaxExt = 12
5/ 9 00:00:05 -0.058 67380 1. ... Rfe8 2. Qe3 Red8 3. Bc3 Nc5
5/12 00:00:10 -0.058 128422 1. ... Rfe8 MaxTime = 00:05:00, MaxExt = 12
6/ 9 00:00:13 -0.058 170207 1. ... Rfe8 2. Qe3 Red8 3. Bc3 Nc5 4. Nxc5 Qxc5
6/11+ 00:00:19 -0.057 254040 1. ... Bxe4 (23/33)
6/11 00:00:21 0.742 278484 1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4 4. Nxb6 Nxb6
6/11 00:00:21 0.742 280082 1. ... Bxe4 MaxTime = 00:05:00, MaxExt = 12
7/12 00:00:24 0.707 324258 1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4 4. Nxb6 Rxe4 5. Nxd7
7/12 00:00:26 0.707 360916 1. ... Bxe4 MaxTime = 00:05:00, MaxExt = 12
8/13 00:00:36 0.690 516941 1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4 4. Nxb6 Nxb6 5. Bd3 Rc5
8/13 00:00:42 0.690 605470 1. ... Bxe4 MaxTime = 00:05:00, MaxExt = 12

crafty:

depth time value variation (1)

3 0.0s -0.328 Nc5 Nxc5 Qxc5 a4
3 0.0s ++ Rfe8
3 0.0s -0.141 Rfe8 Bd4 Rf8
3-> 0.0s -0.141 Rfe8 Bd4 Rf8
4 0.0s -0.225 Rfe8 Bd4 d5 cd ed
4 0.0s ++ Nc5
4 0.0s -0.198 Nc5 Nxc5 Qxc5 Bd4 Qc7
4 0.0s ++ Nh5
4 0.0s -0.178 Nh5 Qd2 Nc5 Nxc5 Qxc5
4 0.0s ++ Qd8
4-> 1.0s -0.178 Qd8
5 1.0s -0.226 Qd8 Bd4 Qc7 Qe3 Bd8
5 1.0s ++ Ra8
5-> 3.0s -0.226 Ra8
6 4.0s -0.364 Ra8 Bd4 Rad8 Qe3 Nc5 Nxc5 dxc5
6 4.0s ++ Qd8
6 5.0s -0.321 Qd8 Bd4 d5 cd ed Nc3 Nxe4 Nxe4
dxe4 Bxe4
6 5.0s ++ Nh5
6 6.0s -0.163 Nh5 Qe3 Bf6 Nc3 Ne5 Nxe5 Bxe5
6-> 8.0s -0.163 Nh5 Qe3 Bf6 Nc3 Ne5 Nxe5 Bxe5
7 10.0s -0.163 Nh5 Qe3 Bf6 Nc3 Ne5 Nxe5 Bxe5 ...
7 15.0s ++ Rfe8
7 23.0s -0.127 Rfe8 Qe3 d5 e5 Nh5 Bd4 Nc5 Nxc5
Bxc5
7-> 26.0s -0.127 Rfe8 Qe3 d5 e5 Nh5 Bd4 Nc5 Nxc5
Bxc5
8 40.0s -0.200 Rfe8 Nc3 Qc5 Na4 Qc7
8 52.0s ++ Nh5
8 1:09 -0.148 Nh5 Qe3 Bf6 Bxf6 Nhxf6 Nc3 Rfe8 a4
Qc5 Qxc5 Nxc5
8 1:22 ++ Bxe4
8 1:27 ++ Bxe4
8 1:34 0.806 Bxe4 Bxe4 Qxc4 Qxc4 Rxc4 Bxf6 Nxf6 Bh7+
Nxh7 Nxb6
8-> 1:35 0.806 Bxe4 Bxe4 Qxc4 Qxc4 Rxc4 Bxf6 Nxf6 Bh7+
Nxh7 Nxb6
time: 1:35 cpu:99% mat:1 n:1293444/462496 maxd:23 nps:13615
ext-> checks:2434 recaps:2000 pawns:8
hashing-> trans/ref:7% pawn:98% king:96% evals:2%
hashing-> value:0% bound:1% cutoff:4% used:w99% b99%
nulls used[131064/388208] wasted[19946/78564]

### Robert Hyatt

Aug 9, 1995, 3:00:00 AM8/9/95
to
In article <408c4r\$o...@watnews1.watson.ibm.com>,

Feng-Hsiung Hsu <f...@sawmill.watson.ibm.com> wrote:
>In article <406n5q\$2...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>>Now I'm really confused. MVV/LVA has nothing to do with the "killer
>>Heuristic". SEE without killer vs MVV/LVA with killer doesn't tell
>>me anything. You need to change only one thing at a time. I'm
>>using both killers and history moves in crafty. Switching nothing
>>but the move ordering to MVV/LVA costs me a factor of 2x in nodes
>>while speeding up the nodes per second less than 5%.
>
>Strange. For me, it would be the other way around:). Twice as slow, but
>only 5% reduction in node count:). A few things come into play here.
>Question one is whether you use internal interative deepening. (My guess
>is you don't.) Question two is what kind of square ordering you are using.
>(My guess is you are using a somewhat random square ordering). Question
>three is what kind of pruning is active. The pruning we used reduces the
>subtree below a bad capture by almost one ply. Question four is how deep
>you are searching. The node count reduction gets monotonically smaller
>as the search depth is increased. At shallow depths, we saw more than 10%
>reduction, but at depths of interest, it was down to 5%.

1. Internal iterative deepening: No at present. 2. Square ordering:
I've played with several things here, including the Cray Blitz
approach of moving pieces toward the center, and toward the
opposing king. With killers, and even more importantly, with
the history ordering, this has not made any appreciable difference
in tree size. Note that with the bitboard info, I do try to move
attacked pieces to safe squares, then unattacked pieces to safe
squares, and then what's left. 3. typical null-move (R=2 at present,
but I've played with lots of different things here); Futility cutoffs
are also used. The depths I'm seeing at blitz chess on ICC is
typically 7 plies, although the depths I used for testing this
ordering was 8-9 on the Sparc, and 10-12 on the Cray, although
I didn't try many positions on the Cray.

>
>Schaeffer was claiming tree sizes within 1.5 times optimal with MVV/LVA, and if
>you saw a factor of two, then you had a move ordering that is 1.35 times
>better than optimal:).

This doesn't follow. My evidence was the opposite, that MVV/LVA was
much worse than 1.5x optimal. :^) Of course, the main point is that
I can't see any reason for claiming that SEE is any worse, and since
(my implementation, anyway) is not incurring any large computational
costs, it seems to be better. Note that this SEE code is very fast,
and is also very accurate, much better than the one we used in Cray
Blitz.

>
>>Yes I have. I did this in Cray Blitz for years. I later started
>>culling "losing" (according to SEE) captures, and then, after several
>>thousand test positions, started culling "swaps" after the first four
>>plies of q-search, leaving only (SEE) winning captures plus the other
>>things like checks and whatever.
>

>Hm, is that the reason Cray Blitz did not see why it was losing major material
>for more than 10 moves in THAT game:)? Just kidding. :)

No. You outsearched us so far in that game that it wasn't funny at
all. And I'm *not* kidding. :^)

I'm continuing to look at this, as obviously, tree ordering is critical
to the parallel search code. One interesting point to date is that
Crafty is searching significantly smaller trees than CB did. Even
when I turn all the cute things off in both, it seems that the move
ordering in Crafty is better by a significant amount. I haven't
taken a lot of time to figure out why, but the code for moving
pieces that are attacked to safe squares was one thing I found that
helped.

### Robert Hyatt

Aug 9, 1995, 3:00:00 AM8/9/95
to
In article <409fm0\$2...@ixnews6.ix.netcom.com>,

>In <409365\$3...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert
>Hyatt) writes:
>>
>>In article <joes.426...@ultranet.com>,
>>Joe Stella <jo...@ultranet.com> wrote:
>>>In article <4062ic\$n...@nz12.rz.uni-karlsruhe.de>
>>>gil...@ira.uka.de (Peter Gillgasch) writes:
>>>
>>>>[...]
>>>>It is superior in the sense that it does not forward prune in the
>>>>quiesence (have you ever tried q-searches without static forward
>>>>pruning but with SEE ordering?) and still is blazingly fast. It is
>>>>superior in the sense that *no* time is wasted for move ordering in
>>>>the quiesence. It is superior in the sense that futility pruning is
>>>>a piece of cake.
>>>>[...]
>>>
>>>Incredibly Dumb Question #2:
>>>
>>>What is SEE ordering?
>>>
>>> Joe S.
>>>
>>
>>
>>YATLA. (Yet Another Three Letter Acronym :^) )
>>
>>Seriously, it stands for Static Exchange Evaluation.
>>
>>MVV/LVA is Most Valuable Victim (captured piece) Least Valuable
>>Attacker (capturing piece).
>
>And what about those "type 2" and "type 3" nodes?
>

From Knuth and Moore:

A type 1 node is also called a PV node. The root of the tree is a
type-1 node, and the *first* successor of a type-1 node is a type-1
node also. A type-1 node must have all branches examined, but it is
unique in that we don't know anything about alpha and beta yet,
because we haven't searched the first move to establish them.

A type 2 node is either (a) a successor of any type-3 node, or, (b) it's
any successor (other than the first) of a type-1 node. With perfect
move ordering, the first branch at a type-2 node will "refute" the
move made at the previous ply via the alpha/beta algorithm. This
node requires good move ordering, because you have to find a move
good enough that your opponent would not play the move he chose that
led to this position. If you try a poor move first, it won't produce
a cutoff, and you have to search another move (or more) until you find
the "good" move that would make him not play his move.

A type-3 node follows a type-2 node. Here, you have to try every
has played a "strong" move (supposedly the "best" move) you are going
to have to try every move you have in an effort to refute this move.
None will do so (unless your opponent tried some poor move first due
to incorrect move ordering). Here, move ordering is not worth the
trouble, since the ordering won't let you avoid searching some moves.
Of course, with the transposition/refutation table, ordering might help
you get more "hits" if your table is not large enough (we don't have
a problem here in Cray Blitz, we only use tables with 1 billion
entries on a C90... :^) )

As you can see, at type-1 nodes you have to do good move ordering
to choose that "1" move (or to choose that one out of a very few)
that is good enough to cause a cutoff, while avoiding choosing those
that are no good. At a type-1 node, the same thing applies. If you
don't pick the best move first (take the moves at the root for
example) you will search an inferior move, establish alpha or beta
incorrectly, and thereby increase the size of the total tree by
a *substantial* amount.

By the way, some authors call type-1 nodes "PV" nodes, type-2 nodes
"CUT" nodes, and type-3 nodes "ALL" nodes. These make it easier to
read, but, unfortunately, I "cut my teeth" on the Knuth/Moore paper
and think in terms of type1,2,3.

Hsu:

I broke one of my time-honored rules about an hour ago when I
responded to a couple of questions you asked. This rule is to
*never* post early in the morning, as the cobwebs are not quite
gone yet. :^)

and I responded without thinking, and told you what I'm doing in
Crafty (at present... note that this program is a "research" program
and is changed at least daily.)

What I overlooked, is that we are discussing MVV/LVA vs SEE in this
thread. As a result, while my answers were factual, they were also
not relevant. The MVV/etc stuff is a method of ordering captures.
How the rest of the moves are ordered, pruning algorithms, etc. don't
have any affect on the comparison between MVV/LVA vs SEE. In fact,
when I ran the tests I ran, I only changed SEE to MVV/LVA, leaving
killers, history moves, trans/ref moves, and all the other ordering
and pruning things alone. Note that when I ran the Kopec12 test on
Crafty, I left *everything* turned on, quiescence checks, out of
check extensions, recapture extensions, passed-pawn-to-7th extensions,
etc. all turned on. Yet Crafty's tree was smaller than Peters by
some small but significant amount (10-20% if memory serves.)

I'm interested in the topic, because if MVV/LVA can really come close
to SEE, I can use that to speed up Crafty's move generation, since it's
easy, with bitboards, to incrementally generate moves as you do in your
hardware. Fortunately, replacing SEE by MVV/LVA was trivial for me,
although the implementation was not efficient. I simply generated
all captures, then made one pass thru them to pick the most valuable
victim, then another pass to find the least valuable attacker for this
victim, then played that move. Doing that sloppy approach resulted
in no noticable performance improvement (granted it's a sorry way to
do it, but it was the *easy* way for testing purposes) in nodes per
second, yet made the tree double in size.

My next test will be the following: I'll "disable" the SEE-based
forward pruning in the q-search, so that current crafty looks at *all*
captures, rather than culling losers anywere in the q-search, and culling
swaps after either 2 or 4 plies (don't remember which, been working on
so many different things that memory fails me here.) This will then
give me a good comparison between SEE ordering and MVV/LVA ordering,
although the "speed" will be worse than what a good MVV/LVA algorithm
would produce. In any case, I will run the first 12 kopec problems
thru both codes. These are mostly positional pawn-level type
problems that don't get tactically wild. In fact, I'll run the entire
set of problems through both so we can see tactical and non-tactical
results.

Unfortunately, it's harder to turn off some things, like the ordering
where I move attacked pieces to safe squares and the like, but I can
turn off all search extensions and stuff if needed to compare to another
program. In any case, I don't see how SEE vs MVV/LVA will be affected
by anything else, or how the choise between the two algorithms will
affect anything else either. In my case, MVV/LVA will make it
impossible to eliminate losers in the q-search. Maybe I should also
run the 24 kopec problems and report the node counts for the following
different modifications:

1. standard Crafty (as on the ftp server and as distributed)
2. standard crafty, but including *all* captures in q-search.
3. (2) but with MVV/LVA ordering.

I'll run them on a relatively slow machine, but to a fixed depth
that is useful (maybe depth 8 if the machine can do that in a
reasonable amount of time). I'll then post the results. We can
go from there.

Bob

### Jon Dart

Aug 9, 1995, 3:00:00 AM8/9/95
to
In article <407t6i\$3...@pelham.cis.uab.edu> hy...@willis.cis.uab.edu (Robert Hyatt) writes:
>
>You shouldn't keep two separate transposition/refutation tables. Use the
>same one, and you will find even more transpositions since captures near the
>horizon can transpose with captures in the q-search. The only "bother" is
>that you have to "screen" non-captures (or non-checks if you include
>checking moves here) when in the q-search. I've done this from day one in
>CB (and now in Crafty) and it helps.
>--

You are right, this is better. I am still not gettting search time reductions
proportionate to the reduction in nodes, but I will investigate that (I have
a profiler).

B.t.w. I have found the following to be a good test position:

r4rk1/1b3ppp/ppq2b2/2p1p2N/3p1PQ1/1P1BP3/P1PP2PP/R4RK1 w - - 0 1

This is diagram #155A from Chernev and Reinfeld's "Win At Chess". The correct
move (Be4) is not at all hard to find, but I have found that search trees for
this position tend to be quite a bit larger than average.