nodes per second - how to count?

Peter Kappler

unread,

Jan 20, 1997, 3:00:00 AM1/20/97

to

I recently read a post by Vincent Diepeveen in which he claimed that
his program, Diep, could search 200,000 nps on a P100, not counting
quiescence nodes.

Vincent was subsequently flamed to a crisp, and hasn't been heard from
since. I want to avoid being sizzled in similar fashion, so my
question is this: What is the convention for counting nodes?

I would assume that all nodes visited in the tree are counted,
regardless of whether they occur before or during the quiescence
search. I also assume that nodes which are generated, but not visited
due to cutoffs, are not counted.

What confuses me is whether or not all the nodes visited across every
level of iterative deepening should be counted, or should only nodes
from the last iteration be counted? Also, in a fail low situation,
should nodes that are re-searched get counted again?

Thanks,
Peter Kappler

Vincent Diepeveen

unread,

Jan 20, 1997, 3:00:00 AM1/20/97

to

In <32e2f265....@nntp.best.com> pe...@bitsource.com (Peter Kappler) writes:

>I recently read a post by Vincent Diepeveen in which he claimed that
>his program, Diep, could search 200,000 nps on a P100, not counting
>quiescence nodes.

I counted the number of alfabeta() calls.
the program was made together with Bart Weststrate
from the program Kallisto to see how fast it is possible to
search on a given computer.

Of course the program didn't contain more than
incremental piece-square tables, material.

>Vincent was subsequently flamed to a crisp, and hasn't been heard from

I DON'T want to publish source code. I'm not a fool, nor i am crazy.

Why don't they believe this, this programs is really stupid.
it played on the first move 1.e3... but searching 14 ply in no
time... :)

Look it at this way: it is not very difficult to generate
2 million moves a second at a P100 (in C). It is done, it will be done,
as long as you don't use bitboards.

Suppose you have a branching factor of 4.
Then you have 400,000 nodes a second.

Add some overhead overhead namely something like Makemove and Unmakemove,
use all functions inline and you don't spend time on things like
CALLS and RETS:

Makemove(int move,int side) {
int f,t;
f = move>>8; (or in assembler split AX in ah and al, much faster!)
t = move&63;

*movestack++ = move;
*scorestack++ = score;
score += PieceSquaretable[side][piece][(from<<8)|to];
if( move&capture ) {
cappiece = bord[to];
score += PieceValue[cappiece];
score -= PieceSquaretable2[(side^1)][cappiece][to];
*capturestack++ = cappiece;
}
bord[from] = bord[to];
bord[from] = 0;
}

Of course the program was an experiment and again showed that you can
search very deeply without having a thing about it. Especially extensions
are very hard (taking too much systemtime. Not the extensions, but
the functions detecting the extension :) ).
Q-search must be so terrible fast that you cannot look
to things like checks and other drastic score changing moves.
Also you may not use something like an recursive q-search.
Besides that i didn't manage to place it inline, it just slowed down
search. Just 1 recapture. MOre than enough :)

Can anyone explain me why one cannot
make a 200,000 nps program at a P100?

>since. I want to avoid being sizzled in similar fashion, so my
>question is this: What is the convention for counting nodes?

there is no convention. In order to save system time i only counted
the number of alfabeta calls. The captures done in q-search
are so limited, and don't give a good indication. It only gives an indication
how bad or limited your q-search is.

>I would assume that all nodes visited in the tree are counted,

Yep, this is what i did. Simply count all makemoves.

Of course: in q-search you DON'T do Makemoves. simply count the piecevalue
as soon as you generate the move, and cutoff before you generate
rest of the moves... :)

>regardless of whether they occur before or during the quiescence
>search. I also assume that nodes which are generated, but not visited
>due to cutoffs, are not counted.
>
>What confuses me is whether or not all the nodes visited across every
>level of iterative deepening should be counted, or should only nodes
>from the last iteration be counted? Also, in a fail low situation,
>should nodes that are re-searched get counted again?
>
>
>Thanks,
>Peter Kappler

Vincent
--
+----------------------------------------------------+
| Vincent Diepeveen email: vdie...@cs.ruu.nl |
| http://www.students.cs.ruu.nl/~vdiepeve/ |
+----------------------------------------------------+

Robert Hyatt

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

Peter Kappler (pe...@bitsource.com) wrote:

: I recently read a post by Vincent Diepeveen in which he claimed that

: his program, Diep, could search 200,000 nps on a P100, not counting
: quiescence nodes.

: Vincent was subsequently flamed to a crisp, and hasn't been heard from
: since. I want to avoid being sizzled in similar fashion, so my

: question is this: What is the convention for counting nodes?

: I would assume that all nodes visited in the tree are counted,
: regardless of whether they occur before or during the quiescence

: search. I also assume that nodes which are generated, but not visited
: due to cutoffs, are not counted.

: What confuses me is whether or not all the nodes visited across every
: level of iterative deepening should be counted, or should only nodes
: from the last iteration be counted? Also, in a fail low situation,
: should nodes that are re-searched get counted again?

: Thanks,
: Peter Kappler

Conventional practice is that a "position" is any new chess position created
by making a move. Most programs insert a nodes++; at the top of any and all
Search()-like functions, such as (in Crafty for example) SearchRoot(), Search()
and Quiesce(). What this means is that Crafty's "nodes" counter counts the
number of times the chessboard is updated by making a move and then changing
sides to see if we want to make a move there as well. Most programs count 'em
like this. That means every iteration counts the early nodes over and over and
adds more for the deeper ply. Fail highs also add to the node count. This count
is not *different positions searched*, rather it is *total positions searched* even
if many are duplicates. It does not count "potential" positions such as what
might be not examined because of cutoffs, it does not count positions that are
avoided by using a transposition table, etc...

Bob

cma...@ix.netcom.com

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

>hy...@cis.uab.edu (Robert Hyatt) wrote:

>Conventional practice is that a "position" is any new chess position created
>by making a move. Most programs insert a nodes++; at the top of any and all
>Search()-like functions, such as (in Crafty for example) SearchRoot(), Search()
>and Quiesce(). What this means is that Crafty's "nodes" counter counts the
>number of times the chessboard is updated by making a move and then changing
>sides to see if we want to make a move there as well. Most programs count 'em
>like this. That means every iteration counts the early nodes over and over and
>adds more for the deeper ply. Fail highs also add to the node count. This count
>is not *different positions searched*, rather it is *total positions searched* even
>if many are duplicates. It does not count "potential" positions such as what
>might be not examined because of cutoffs, it does not count positions that are
>avoided by using a transposition table, etc...

>Bob

There are so many different flavors of 'nodes', optimizations of
'move/unmake move', and capture only plies, moves generated but
pruned, hash table hits, etc etc etc, that it is very confusing to try
to count these elusive things. Originally I just counted Evaluations,
because that seemed to do the right thing. Moves gen'ed but pruned
never result in a call to Evaluate(). Hash table hits never call
Evaluate(). Everybody has an Evaluate(). We all agree what it does.
I remember you explained this to me once before and I remember knowing
why, but could you please re explain why this isn't a good idea?

Thanks,
Chris Mayer

Brian Sheppard

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

Peter Kappler <pe...@bitsource.com> wrote in article
<32e2f265....@nntp.best.com>...

>
> I recently read a post by Vincent Diepeveen in which he claimed that
> his program, Diep, could search 200,000 nps on a P100, not counting
> quiescence nodes.
>
> Vincent was subsequently flamed to a crisp, and hasn't been heard from
> since. I want to avoid being sizzled in similar fashion, so my
> question is this: What is the convention for counting nodes?

Simplest: use the same MakeMove() routine in your main search and
quiescence search, and just count calls to MakeMove(). It gets harder
to count calls to search because you might want to specialize your
search routine (for base-ply, full-width, get-out-of-check, terminal-
ply, quiescence, and I'm sure you could think of other reasons).

The speed Vincent quoted is quite possible. I once wrote an experimental
program that achieved about 150K NPS on a p5/90. 200K NPS is achievable.

The qualification to this achievement is that the program is not very
strong. Many useful things have to be stripped out to achieve this speed.
But this is a real alpha/beta program, with a real quiescence search,
some evaluation, some move ordering, etc.

Programs like this should be looked at as ideals; I know that it is
possible to make a program that runs at 150K NPS on a p5/90. Maybe I will
someday make an ideal program that has an effective branching factor of 3
moves/node. And maybe I will make a program that spends just 20% of its
time in quiescence searching. Etc. Each of these ideal achievements bounds
the solution space for chess programs. Any new chess program I come up with
can be compared against my ideal results along a number of dimensions. This
comparison will tell me whether I am making progress.

Brian

Peter Kappler

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

On 22 Jan 1997 12:54:24 GMT, hy...@cis.uab.edu (Robert Hyatt) wrote:

>: >Conventional practice is that a "position" is any new chess position created
>: >by making a move. Most programs insert a nodes++; at the top of any and all
>: >Search()-like functions, such as (in Crafty for example) SearchRoot(), Search()
>: >and Quiesce(). What this means is that Crafty's "nodes" counter counts the
>: >number of times the chessboard is updated by making a move and then changing
>: >sides to see if we want to make a move there as well. Most programs count 'em
>: >like this. That means every iteration counts the early nodes over and over and
>: >adds more for the deeper ply. Fail highs also add to the node count. This count
>: >is not *different positions searched*, rather it is *total positions searched* even
>: >if many are duplicates. It does not count "potential" positions such as what
>: >might be not examined because of cutoffs, it does not count positions that are
>: >avoided by using a transposition table, etc...
>
>: >Bob

Thanks for the answer! Strangely, my news server did not get your
original response - it only appeared when Chris sent his reply.

After thinking about this for a bit, it strikes me that this
convention for measuring nodes/second is a really poor measure of
search efficiency. I'm now curious why it is used. It seems that
counting nodes which are skipped due to cutoffs would produce a much
more meaningful indicator, especially since cutoffs are the result of
good move ordering, a good transposition table, etc.

Peter

Peter W. Gillgasch

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

<cma...@ix.netcom.com> wrote:

> There are so many different flavors of 'nodes', optimizations of
> 'move/unmake move', and capture only plies, moves generated but
> pruned, hash table hits, etc etc etc, that it is very confusing to try
> to count these elusive things. Originally I just counted Evaluations,
> because that seemed to do the right thing. Moves gen'ed but pruned
> never result in a call to Evaluate(). Hash table hits never call
> Evaluate(). Everybody has an Evaluate().

I don't...

> We all agree what it does.

Wasting clock cycles and making the program overly optimistic...

Anyway the discussion was interesting. I was somehow under the
impression that Vincent claimed 700 k for a slightly modified version
of Diep. Maybe this was a communication problem or an oversight on my
part and I apologize to Vincent for flaming him a bit 8^) Now it looks
like that he is talking about an assembly speed monster by Bart doing
150 k on the P100. I believe that. I just wonder why it is so slow and
why it has no move ordering...

-- Peter

Robert Hyatt

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

cma...@ix.netcom.com wrote:
: >hy...@cis.uab.edu (Robert Hyatt) wrote:

: >Conventional practice is that a "position" is any new chess position created
: >by making a move. Most programs insert a nodes++; at the top of any and all
: >Search()-like functions, such as (in Crafty for example) SearchRoot(), Search()
: >and Quiesce(). What this means is that Crafty's "nodes" counter counts the
: >number of times the chessboard is updated by making a move and then changing
: >sides to see if we want to make a move there as well. Most programs count 'em
: >like this. That means every iteration counts the early nodes over and over and
: >adds more for the deeper ply. Fail highs also add to the node count. This count
: >is not *different positions searched*, rather it is *total positions searched* even
: >if many are duplicates. It does not count "potential" positions such as what
: >might be not examined because of cutoffs, it does not count positions that are
: >avoided by using a transposition table, etc...

: >Bob

: There are so many different flavors of 'nodes', optimizations of

: 'move/unmake move', and capture only plies, moves generated but
: pruned, hash table hits, etc etc etc, that it is very confusing to try
: to count these elusive things. Originally I just counted Evaluations,
: because that seemed to do the right thing. Moves gen'ed but pruned
: never result in a call to Evaluate(). Hash table hits never call

: Evaluate(). Everybody has an Evaluate(). We all agree what it does.
: I remember you explained this to me once before and I remember knowing

: why, but could you please re explain why this isn't a good idea?

: Thanks,
: Chris Mayer

Because of another of those damned optimizations. :) I don't call eval at
every tip node. If the score is so bad (or so good) that eval can't possible
bring the score back into the window, I don't call it. And the number would
be different for everyone since the test for whether to call it or not depends
on the range of scores it can produce.

Comparing nodes is difficult at best... And maybe impossible, until you get
into the millions of nodes range, then at least the numbers should be within
an order of magnitude. :)

Bob

Robert Hyatt

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Peter Kappler (pe...@bitsource.com) wrote:

: On 22 Jan 1997 12:54:24 GMT, hy...@cis.uab.edu (Robert Hyatt) wrote:

: >: >Conventional practice is that a "position" is any new chess position created
: >: >by making a move. Most programs insert a nodes++; at the top of any and all
: >: >Search()-like functions, such as (in Crafty for example) SearchRoot(), Search()
: >: >and Quiesce(). What this means is that Crafty's "nodes" counter counts the
: >: >number of times the chessboard is updated by making a move and then changing
: >: >sides to see if we want to make a move there as well. Most programs count 'em
: >: >like this. That means every iteration counts the early nodes over and over and
: >: >adds more for the deeper ply. Fail highs also add to the node count. This count
: >: >is not *different positions searched*, rather it is *total positions searched* even
: >: >if many are duplicates. It does not count "potential" positions such as what
: >: >might be not examined because of cutoffs, it does not count positions that are
: >: >avoided by using a transposition table, etc...
: >
: >: >Bob

: Thanks for the answer! Strangely, my news server did not get your

: original response - it only appeared when Chris sent his reply.

: After thinking about this for a bit, it strikes me that this
: convention for measuring nodes/second is a really poor measure of
: search efficiency. I'm now curious why it is used. It seems that
: counting nodes which are skipped due to cutoffs would produce a much
: more meaningful indicator, especially since cutoffs are the result of
: good move ordering, a good transposition table, etc.

: Peter

I don't know how to count that which I don't search. For example, I get
a hash hit and can stop searching here. How many nodes did I "skip"?? Of
course I don't search any of the possible legal moves at this ply, but I
have not yet generated any either and don't know how many there are. And
I really have no clue about how many nodes *below* this ply I'd search, if
I were searching. (sounds like how much wood can a woodchuck chuck if a
woodchuck could chuck wood?) :)

If you turn the hash table completely off, you still have the issue of
alpha/beta tossing things out. All I can say is that when you see Crafty
announce 80,000 nps, or 3,512,090 nodes, you know what it means, even if
you don't really have a clue about what those nodes did, anymore than I
do... Somewhat like computing the # of atoms of a gas in a certain volume,
at a certain pressure. If you are off by a few hundred, or even a few
million, I won't notice... The tree is a really vague and amorphous
structure that's hard to analyze under a microscope... it's too big, but the
pieces are too small...

Vincent Diepeveen

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

In <1997012121494817596@[194.121.104.134]> gil...@ilk.de (Peter W. Gillgasch) writes:

><cma...@ix.netcom.com> wrote:
>
>> There are so many different flavors of 'nodes', optimizations of
>> 'move/unmake move', and capture only plies, moves generated but
>> pruned, hash table hits, etc etc etc, that it is very confusing to try
>> to count these elusive things. Originally I just counted Evaluations,
>> because that seemed to do the right thing. Moves gen'ed but pruned
>> never result in a call to Evaluate(). Hash table hits never call
>> Evaluate(). Everybody has an Evaluate().
>

>I don't...

>
>> We all agree what it does.
>

>Wasting clock cycles and making the program overly optimistic...
>
>Anyway the discussion was interesting. I was somehow under the
>impression that Vincent claimed 700 k for a slightly modified version
>of Diep. Maybe this was a communication problem or an oversight on my
>part and I apologize to Vincent for flaming him a bit 8^) Now it looks
>like that he is talking about an assembly speed monster by Bart doing
>150 k on the P100. I believe that. I just wonder why it is so slow and
>why it has no move ordering...

I only took the move generators from Diep and started over again,
changing for example the way a move is generated to a 32 bits word.

I wrote it in C, and handed (assembly and C) listing over to Bart.
He concluded that the program
could be speeded up at least 2 times converting to assembler

First we used the dumb approach:
minimax. i got 350,000 nodes a second on a P5/100 in C. That is at least
700k in Assembler. So 700k was never hit, but 350,000 and from the assembler
code Bart concluded that it could be speeded up easily.

This program sucked on all points of course except speed.

So i introduced killermoves,nullmove,hashtables and alfabeta.

Now speed dropped considerably (in C) to about 115000.

Vincent

>-- Peter

Brian Sheppard

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Robert Hyatt <hy...@cis.uab.edu> wrote in article
<5c5rq7$q...@juniper.cis.uab.edu>...

> Peter Kappler (pe...@bitsource.com) wrote:
> : On 22 Jan 1997 12:54:24 GMT, hy...@cis.uab.edu (Robert Hyatt) wrote:
>
> : After thinking about this for a bit, it strikes me that this
> : convention for measuring nodes/second is a really poor measure of
> : search efficiency. I'm now curious why it is used. It seems that
> : counting nodes which are skipped due to cutoffs would produce a much
> : more meaningful indicator, especially since cutoffs are the result of
> : good move ordering, a good transposition table, etc.
>
> : Peter

It is a mistake to try to make a single statistic that measures the
totality of a chess program's behavior. A chess programs has high
"dimensionality;" it is futile to try to capture its complexity in
a single number. Instead, you should craft a specific statistic for
each specific purpose.

NPS, as it is commonly defined, measures how quickly a program disposes
of an average node in the tree. That is very important, because many
changes are made to increase that number. Counts of nodes not searched
are not at all helpful towards this end.

If you want to improve your move ordering, the best statistic
is the average number of moves searched at nodes that are cut off.
This statistic, called B, was first described in Ebeling's thesis
about HiTech. If B is 1.0, then you have perfect move ordering. If
B is less than 1.1, then your move ordering is about as good as it can be.

If you want to maximize xref performance, then measure total tree size
for a fixed search. The reason is that the xref table affects several
dimensions (transpositions, alpha/beta bounds, and move ordering) which
are difficult to disentangle. So don't disentangle them! Just
agglomerate them by measuring tree size for fixed search depths.

Brian

Tom C. Kerrigan

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Peter Kappler (pe...@bitsource.com) wrote:

> After thinking about this for a bit, it strikes me that this
> convention for measuring nodes/second is a really poor measure of
> search efficiency. I'm now curious why it is used. It seems that
> counting nodes which are skipped due to cutoffs would produce a much
> more meaningful indicator, especially since cutoffs are the result of
> good move ordering, a good transposition table, etc.

That misses the point of the word "node". If you're searching a game tree,
the nodes are the places you've visited. Semantics.

It's also quite difficult to tell how many nodes you save with a cutoff.
To do this correctly, you would have to search everything, which totally
misses the point of cutting off.

Cheers,
Tom

brucemo

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Vincent Diepeveen wrote:

> First we used the dumb approach:
> minimax. i got 350,000 nodes a second on a P5/100 in C. That is at least
> 700k in Assembler. So 700k was never hit, but 350,000 and from the assembler
> code Bart concluded that it could be speeded up easily.

I think the days of 100% speedup via assembly code are past.

I use MSVC with "fastcall".

I tried to speed up sections of my program, there were areas I could speed up,
but only a couple of percent.

And before you criticize my assembly coding skill, I've been doing i86
assembly programming for ten years, and have read and understand the books on
P5 assembler.

I think the deal is that the compilers are getting extremely good, at least at
compiling simple routines. I could not improve my "in-check" function by even
one instruction (it's a loop with some bitmap operations and a ray traversal).

bruce

Robert Hyatt

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

Brian Sheppard (bri...@mstone.com) wrote:

: Robert Hyatt <hy...@cis.uab.edu> wrote in article

: <5c5rq7$q...@juniper.cis.uab.edu>...
: > Peter Kappler (pe...@bitsource.com) wrote:

: > : On 22 Jan 1997 12:54:24 GMT, hy...@cis.uab.edu (Robert Hyatt) wrote:
: >
: > : After thinking about this for a bit, it strikes me that this
: > : convention for measuring nodes/second is a really poor measure of
: > : search efficiency. I'm now curious why it is used. It seems that
: > : counting nodes which are skipped due to cutoffs would produce a much
: > : more meaningful indicator, especially since cutoffs are the result of
: > : good move ordering, a good transposition table, etc.

: >
: > : Peter

: It is a mistake to try to make a single statistic that measures the
: totality of a chess program's behavior. A chess programs has high
: "dimensionality;" it is futile to try to capture its complexity in
: a single number. Instead, you should craft a specific statistic for
: each specific purpose.

: NPS, as it is commonly defined, measures how quickly a program disposes
: of an average node in the tree. That is very important, because many
: changes are made to increase that number. Counts of nodes not searched
: are not at all helpful towards this end.

: If you want to improve your move ordering, the best statistic
: is the average number of moves searched at nodes that are cut off.
: This statistic, called B, was first described in Ebeling's thesis
: about HiTech. If B is 1.0, then you have perfect move ordering. If
: B is less than 1.1, then your move ordering is about as good as it can be.

: If you want to maximize xref performance, then measure total tree size
: for a fixed search. The reason is that the xref table affects several
: dimensions (transpositions, alpha/beta bounds, and move ordering) which
: are difficult to disentangle. So don't disentangle them! Just
: agglomerate them by measuring tree size for fixed search depths.

: Brian

This last idea won't work. I can show you examples of positions that search
a smaller tree without hash tables. Because often the hash tables introduce
new (and unavailable) information into the search that can just as easily
prevent a node from cutting off (where it would without the hash hit) as it
will cause the node to cutoff. This is not too uncommon an event. I would
still claim positions searched with a trans/ref table are better, because even
if the tree is bigger, it's bigger because the table has highlighted some sort
of difficulty and is helping to work around it.

one example is fine # 70, the famous ending. Crafty actually solves that
position at depth=18. There are no search extensions used because everything
is locked up except for king moves. The solution takes 22 or 23 plies to
"see" if you count it out. The transposition table however, takes information
from one place in the tree, and grafts it to another and produces the right
move at an earlier ply. However, that earlier depth is going to have a larger
number of nodes, because when you can capture a pawn, alpha/beta has to be
widened, and things that would cut off before, now don't. But I'd rather have
the larger 18 ply search which produced the right answer, than the smaller 18
ply search that still wanted to play something else (perhaps Kb2 as most
programs like until they finally see that Kb1 wins.)

Vincent Diepeveen

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In <32E664...@nwlink.com> brucemo <bru...@nwlink.com> writes:

>Vincent Diepeveen wrote:
>
>> First we used the dumb approach:
>> minimax. i got 350,000 nodes a second on a P5/100 in C. That is at least
>> 700k in Assembler. So 700k was never hit, but 350,000 and from the assembler
>> code Bart concluded that it could be speeded up easily.
>
>I think the days of 100% speedup via assembly code are past.

the more registers processors get the more pipelines read and write cache,
the more a hand optimized program could theoretically win in speed.

Practically i also guess that at the end it will be impossible to code it
by hand (although i'm not too good in assembler probably to judge
100% about it, i only can read assembler code).

>I use MSVC with "fastcall".
>
>I tried to speed up sections of my program, there were areas I could speed up,
>but only a couple of percent.
>
>And before you criticize my assembly coding skill, I've been doing i86
>assembly programming for ten years, and have read and understand the books on
>P5 assembler.
>
>I think the deal is that the compilers are getting extremely good, at least at
>compiling simple routines. I could not improve my "in-check" function by even
>one instruction (it's a loop with some bitmap operations and a ray traversal).

It seems there are more programmers with this problem. On the other hand
it seems that Rebel is one of the programs that doesn't have this problem.

It seems that programs having more loops (which you call simple code),
so a limited amount of code that must be repeadly used win more using
the new processors than programs which one can count the number of cycles
pro move.

Vincent.

>bruce

Peter W. Gillgasch

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

brucemo <bru...@nwlink.com> wrote:

> I think the days of 100% speedup via assembly code are past.
>

> I use MSVC with "fastcall".
>
> I tried to speed up sections of my program, there were areas I could speed up,
> but only a couple of percent.
>
> And before you criticize my assembly coding skill, I've been doing i86
> assembly programming for ten years, and have read and understand the books on
> P5 assembler.
>
> I think the deal is that the compilers are getting extremely good, at least at
> compiling simple routines. I could not improve my "in-check" function by even
> one instruction (it's a loop with some bitmap operations and a ray traversal).

Depends. If you intertwine everything with anything and do away with
most subroutines and keep all your stuff in registers (we are talking
about programs that have about 4000 instructions in their time critical
kernel for a machine that has a lot of registers) then allocating the
right registers becomes paramount. If you know a little about the
application (and no compiler can know that a killer move has at least a
70% chance to produce a cutoff for example) then you win hands down
versus compiled code. In my program a wrong decision by the compiler
with regard to the right register allocation costs me 10% of the speed,
which is about one old DarkThought.

I think the days of 100% portability via 'C' code are past. For me that
is. I am tired of "making love to the compiler"...

-- Peter

May God grant me the serenity to accept the things I cannot change,
courage to choke the living shit out of those who piss me off,
and wisdom to know where I should hide the bodies...

Peter Kappler

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

On 22 Jan 1997 16:00:20 GMT, "Brian Sheppard" <bri...@mstone.com>
wrote:

I like this. Perhaps it could be combined with nps as follows, to try
and gauge search efficiency, which I'll call 'e'.

e = nps / B

Seems like this is a better indicator than either nps or B,
individually. Comments?

Peter

Stefan Meyer-Kahlen

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

On Thu, 23 Jan 1997 15:09:51 +0100, gil...@ilk.de (Peter W. Gillgasch)
wrote:

>brucemo <bru...@nwlink.com> wrote:
>
>> I think the days of 100% speedup via assembly code are past.
>>
>> I use MSVC with "fastcall".
>>
>> I tried to speed up sections of my program, there were areas I could speed up,
>> but only a couple of percent.
>>
>> And before you criticize my assembly coding skill, I've been doing i86
>> assembly programming for ten years, and have read and understand the books on
>> P5 assembler.
>>
>> I think the deal is that the compilers are getting extremely good, at least at
>> compiling simple routines. I could not improve my "in-check" function by even
>> one instruction (it's a loop with some bitmap operations and a ray traversal).
>
>Depends. If you intertwine everything with anything and do away with
>most subroutines and keep all your stuff in registers (we are talking
>about programs that have about 4000 instructions in their time critical
>kernel for a machine that has a lot of registers) then allocating the
>right registers becomes paramount. If you know a little about the
>application (and no compiler can know that a killer move has at least a
>70% chance to produce a cutoff for example) then you win hands down
>versus compiled code. In my program a wrong decision by the compiler
>with regard to the right register allocation costs me 10% of the speed,
>which is about one old DarkThought.

Does that mean your chess demon is 10x faster than DarkTought?
This makes me scared....

Stefan