M-Chess Pro 6.0 program description

john quill taylor

unread,

Oct 17, 1996, 3:00:00 AM10/17/96

to

>The following is a brief description of what s new with M-Chess Pro 6.0.

When can we buy it? What's the cost for upgrading? Is it on CD-ROM?
Does it have the same copy-protection scheme? Does it run under
Windows NT?

>1. Substantial improvements have been achieved in the chess logic
>including tactics, positional understanding, endgame evaluations and
>search speed.

Bravo! We've come to expect no less.

>2. The Opening Book has been extensively updated to reflect the latest
>innovations in Master play. In addition the variety of Opening lines
>played, even in Tournament Book Mode, has been significantly increased.

>3. The maximum hash-table size is increased from 10 Mbytes to 60 Mbytes.

Does it sense the memory available and use it for hash automatically?

Thanks for the update, Marty!

__
john quill taylor / /\
writer at large / / \
Hewlett-Packard, Storage Systems Division __ /_/ /\ \
Boise, Idaho U.S.A. /_/\ __\ \ \_\ \
e-mail: jqta...@hpdmd48.boi.hp.com \ \ \/ /\\ \ \/ /
Telephone: (208) 396-2328 (MDT = GMT - 6) \ \ \/ \\ \ /
Snail Mail: Hewlett-Packard \ \ /\ \\ \ \
11413 Chinden Blvd \ \ \ \ \\ \ \
Boise, Idaho 83714 \ \ \_\/ \ \ \
Mailstop 852 \ \ \ \_\/
\_\/
"When in doubt, do as doubters do." - jqt -

haiti, rwanda, cuba, bosnia, ... we have a list,
where is our schindler?

MCHESS PRO

unread,

Oct 17, 1996, 3:00:00 AM10/17/96

to

The following is a brief description of what s new with M-Chess Pro 6.0.

1. Substantial improvements have been achieved in the chess logic

including tactics, positional understanding, endgame evaluations and
search speed.

2. The Opening Book has been extensively updated to reflect the latest

innovations in Master play. In addition the variety of Opening lines
played, even in Tournament Book Mode, has been significantly increased.

3. The maximum hash-table size is increased from 10 Mbytes to 60 Mbytes.

4. The maximum depth-of-search is increased from 26 to 33 half-moves.

5. The "look and feel" has been improved with a sharper screen design,
better organization of features and new mouse controls for easier
operation.

6. M-Chess Pro 6.0 comes with six preset color schemes, saves color
modifications automatically and allows you to create up to six custom
color schemes of your own.

7. M-Chess Pro 6.0 allows you to save and restore up to six games in the
clipboard.

8. M-Chess Pro 6.0 can maintain the names, wins, draws, losses and ratings
of up to six separate opponents.

9. The "Book Learning" feature of M-Chess Pro 6.0 can store up to ten
moves from the current game to be played instantly, at the same or any
lower level, in future games.

10. Position setup, subdirectory handling, game save and restore, handling
of epd files, opening randomization, mate announcements, auto-play and the
learning feature have all been improved in small ways.

The following features are new with M-Chess Pro 6.0.

1. Shuffle-Games: M-Chess Pro 6.0 can randomize the placement of pieces
on the first and eighth ranks, in a symmetrical manner, allowing you to
play "Shuffle-Chess". When you ask M-Chess Pro 6.0 for a "Shuffle-Game"
it selects at random one of 3060 unusual positions to start your game
from.

2. Game Archive--Add to Work Book: M-Chess Pro 6.0 can merge a PGN file
into a User-Book (an editable Opening Book). You can select "Active for
White" and/or "Active for Black" and you can specify the move limit. As
one example, if you start with a file WKASP.PGN containing all of Kasparov
s wins with White, and another file BKASP.PGN containing all of Kasparov s
wins and draws with Black, then with just two commands you can create the
file KASPAROV.UBK, an editable User-Book containing all of Kasparov s
successful Opening Repertoire which you can print out and train with!

3. Custom Playing Styles: M-Chess Pro 6.0 allows you to modify the
Program s material value for each type of piece, its level of score
randomization, and its concern for each of the following parameters:
Aggression, Mobility, Centrality, King Placement, Pawn Structure and
Passed Pawns. Customized styles can be named, saved to disk, modified,
and reloaded at will. You can select at any time between the default
playing style and the current custom style.

4. Tablebase Connection: M-Chess Pro 6.0, as shipped, plays perfect
endgames when the position becomes King and Pawn, King and Rook, or King
and Queen versus the bare King. Additional tablebase files can be
obtained which will extend the perfect play to situations of King and
Queen vs. King and Rook; King, Bishop and Knight vs. the bare King; King
and Rook vs. King and Pawn, and more. M-Chess Pro 6.0 uses the Steven
Edwards endgame tablebase files which are available free-of charge from
the ftp site chess.onenet.net. Additional tablebase data will be
available at this location from time to time. In addition, a large set of
tablebase data will be made available on CD-ROM. M-Chess Pro 6.0 can read
this data from two subdirectories at the same time (and these can be on
different disk drives). Therefore, customers using endgame tablebase data
with Crafty, for example, can use their existing data with M-Chess Pro 6.0
without the need for additional disk space.

-Marty Hirsch

Steven Schwartz

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

jqta...@hpdmd48.boi.hp.com (john quill taylor) wrote:

>
>mche...@aol.com (MCHESS PRO) wrote:
>
>>The following is a brief description of what s new with M-Chess Pro 6.0.

>

>When can we buy it? What's the cost for upgrading? Is it on CD-ROM?
>Does it have the same copy-protection scheme? Does it run under
>Windows NT?

John,
We will have MPro 6 next week. The upgrade will be $79.95, and we are on
the verge of convincing Marty that the upgrade process is smoother and
faster when old disks do not have to be returned (ala Rebel 8). The copy
protection scheme is as you are used to. It probably will not run under
Windows NT. There should be three extensive MChess Pro 6.0 reviews on the
Web CCR within the next week or two from Dr. Enrique Irazoqui, IM Larry
Kaufman, and Komputer Korner, all of whom are receiving or have received
production copies of MPro 6. Regards, Steve (ICD/Your Move Chess & Games)

Shep

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

mche...@aol.com (MCHESS PRO) wrote:

>The following is a brief description of what s new with M-Chess Pro 6.0.

>3. The maximum hash-table size is increased from 10 Mbytes to 60 Mbytes.

>4. The maximum depth-of-search is increased from 26 to 33 half-moves.

>10. Position setup, subdirectory handling, game save and restore, handling

>of epd files, opening randomization, mate announcements, auto-play and the
>learning feature have all been improved in small ways.

>The following features are new with M-Chess Pro 6.0.

>2. Game Archive--Add to Work Book: M-Chess Pro 6.0 can merge a PGN file
>into a User-Book (an editable Opening Book). You can select "Active for
>White" and/or "Active for Black" and you can specify the move limit. As
>one example, if you start with a file WKASP.PGN containing all of Kasparov
>s wins with White, and another file BKASP.PGN containing all of Kasparov s
>wins and draws with Black, then with just two commands you can create the
>file KASPAROV.UBK, an editable User-Book containing all of Kasparov s
>successful Opening Repertoire which you can print out and train with!

>4. Tablebase Connection: M-Chess Pro 6.0, as shipped, plays perfect
>endgames when the position becomes King and Pawn, King and Rook, or King
>and Queen versus the bare King. Additional tablebase files can be
>obtained which will extend the perfect play to situations of King and
>Queen vs. King and Rook; King, Bishop and Knight vs. the bare King; King
>and Rook vs. King and Pawn, and more. M-Chess Pro 6.0 uses the Steven
>Edwards endgame tablebase files which are available free-of charge from
>the ftp site chess.onenet.net. Additional tablebase data will be
>available at this location from time to time. In addition, a large set of
>tablebase data will be made available on CD-ROM. M-Chess Pro 6.0 can read
>this data from two subdirectories at the same time (and these can be on
>different disk drives). Therefore, customers using endgame tablebase data
>with Crafty, for example, can use their existing data with M-Chess Pro 6.0
>without the need for additional disk space.

>-Marty Hirsch

This is great! Marty really knows what we users want.
All the things I quoted above are suggestions that I sent Marty in
late August (though I would have wished for more than just 33 ply
search depth because it still can't solve some famous "Mate in 18"
problems now). Since I doubt that it was me who gave him these ideas,
he must have felt the same as me about improvements.

Now I have no excuses to favour Rebel 8.0 anymore. :-)
... and something else to look forward to besides a P6...

Shep

Ledoc

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

I am interested in obtaining mchess pro 6, unfortunately I purchased
MchessPro 5 from PBM
Chess before the postings in rec.games.chess.computers warned about their
shady deals and
unauthorized sales. Am I still eligible for an upgrade version of the
program with you?
George

john quill taylor

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

klu...@mi.uni-koeln.de (Shep) wrote:

>...

>Now I have no excuses to favour Rebel 8.0 anymore. :-)
>... and something else to look forward to besides a P6...

I am a long-time M-Chess user, but I still have a few reservations
about it. I'll probably upgrade, but I will probably also acquire
Rebel, Fritz, or Genius. Somehow I ruined my M-Chess program disk
when trying to install it on my P6 (I booted with a DOS floppy, since
it's an NT machine). Marty promptly replaced my program disk with a
new one, but it's still a pain to re-boot every time I want to play
a game.

As far as improvements, I am curious if the "number of positions
searched" bug has been fixed: after MCP5 reaches approximately two
billion nodes, it begins to count down! It's just a minor bug, and
you'll probably only see it on a P6 (or faster machine).

I'm wondering about the timing of the release of M-Chess Pro 6.0:
Was it ready in time for Jakarta? It would be great to see some
kind of Internet playoff between M-Chess Pro 6.0, Rebel 8.0,
Shredder, Crafty, Fritz, Genius, etc. My guess is that such an event
would result in a tremendous amount of sales. (Bob, you'll have to
raise your price!)

Robert Hyatt

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

john quill taylor (jqta...@hpdmd48.boi.hp.com) wrote:

: klu...@mi.uni-koeln.de (Shep) wrote:
:
: >...
: >Now I have no excuses to favour Rebel 8.0 anymore. :-)
: >... and something else to look forward to besides a P6...
:
: I am a long-time M-Chess user, but I still have a few reservations
: about it. I'll probably upgrade, but I will probably also acquire
: Rebel, Fritz, or Genius. Somehow I ruined my M-Chess program disk
: when trying to install it on my P6 (I booted with a DOS floppy, since
: it's an NT machine). Marty promptly replaced my program disk with a
: new one, but it's still a pain to re-boot every time I want to play
: a game.
:
: As far as improvements, I am curious if the "number of positions
: searched" bug has been fixed: after MCP5 reaches approximately two
: billion nodes, it begins to count down! It's just a minor bug, and
: you'll probably only see it on a P6 (or faster machine).

Really nothing he can do. I use unsigned int's in Crafty for this
purpose, but that only gets you to 4 billion on a 32bit machine. Of
course on the Cray, the Alpha, the HP PA8000, the MIPS R10000, and the
Intel P7 this isn't a problem. :)

:
: I'm wondering about the timing of the release of M-Chess Pro 6.0:

: Was it ready in time for Jakarta? It would be great to see some
: kind of Internet playoff between M-Chess Pro 6.0, Rebel 8.0,
: Shredder, Crafty, Fritz, Genius, etc. My guess is that such an event
: would result in a tremendous amount of sales. (Bob, you'll have to
: raise your price!)

I'm going to try to do something like this before long. Still working on
details so there's no cry of cheating and so forth. Think it would be fun,
if taken in the right way.

BTW, Crafty's already overpriced. :)

Lonnie Cook

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

On Fri, 18 Oct 1996 08:56:27 GMT, klu...@mi.uni-koeln.de (Shep) wrote:

>mche...@aol.com (MCHESS PRO) wrote:
>
>>The following is a brief description of what s new with M-Chess Pro 6.0.
>
>
>

Will MCP6 be able to utilize hash tables through windows? THe max I think under 5.0 was
128K.
Lonnie J. Cook
<lonni...@riconnect.com>
"Lonnie" on A-FICS,E-FICS,ICC & MMEICS
ICD/Your Move Chess & Games
E-MAIL:i...@icdchess.com
Toll-Free (U.S.): 1-800-645-4710
Phone (outside U.S.): 516-424-3300
Fax: 516-424-3405

Stephen B Streater

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

In article <549b8b$k...@juniper.cis.uab.edu>, Robert Hyatt

<URL:mailto:hy...@crafty.cis.uab.edu> wrote:
>
> john quill taylor (jqta...@hpdmd48.boi.hp.com) wrote:
> : klu...@mi.uni-koeln.de (Shep) wrote:
> :
> : >...
> : >Now I have no excuses to favour Rebel 8.0 anymore. :-)
> : >... and something else to look forward to besides a P6...
> :
> : I am a long-time M-Chess user, but I still have a few reservations
> : about it. I'll probably upgrade, but I will probably also acquire
> : Rebel, Fritz, or Genius. Somehow I ruined my M-Chess program disk
> : when trying to install it on my P6 (I booted with a DOS floppy, since
> : it's an NT machine). Marty promptly replaced my program disk with a
> : new one, but it's still a pain to re-boot every time I want to play
> : a game.
> :
> : As far as improvements, I am curious if the "number of positions
> : searched" bug has been fixed: after MCP5 reaches approximately two
> : billion nodes, it begins to count down! It's just a minor bug, and
> : you'll probably only see it on a P6 (or faster machine).
>
> Really nothing he can do. I use unsigned int's in Crafty for this
> purpose, but that only gets you to 4 billion on a 32bit machine. Of
> course on the Cray, the Alpha, the HP PA8000, the MIPS R10000, and the
> Intel P7 this isn't a problem. :)

I often get 30,000,000,000 nodes searched on my StrongARM. The solution
is to check the (unsigned) int eg every time a ply 6 node is searched,
and cream off the top 8 bits into a new 32 bit variable. As 6 ply < 2^32
nodes, this never overflows, and as it hardly ever happens compared with
deeper nodes, it doesn't slow the program. This gives me 7.2E16/9 nodes
before it overflows = 8E15. (The /9 is a feature of the node counter - it
counts positions generated by looking at the stack and seeing how
much it has grown.)

--
Stephen B Streater

Robert Hyatt

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
: In article <549b8b$k...@juniper.cis.uab.edu>, Robert Hyatt

:

Sounds fast as all hell... :)

Steven Schwartz

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

Hi George,
There is no problem with upgrading MPro even if you purchased it from PBM.
Please call (1-800-645-4710) during the week or email. Regards, Steve

Stephen B Streater

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

In article <54apas$6...@juniper.cis.uab.edu>, Robert Hyatt

> Sounds fast as all hell... :)

It was quite an achievement to write a program with even less positional
expertise than me :) but it always gets me on the tactics. It's very good
at wriggling out of duff situations as well, due to its exhaustive search.

I've got past 1,000,000 nps on some games now, though I have some more
improvements on the alpha-beta to do next.

PS You can tell what my computer does at night :-)

--
Stephen B Streater

Urban Koistinen

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

Robert Hyatt (hy...@crafty.cis.uab.edu) wrote:
RH: john quill taylor (jqta...@hpdmd48.boi.hp.com) wrote:
RH: : As far as improvements, I am curious if the "number of positions
RH: : searched" bug has been fixed: after MCP5 reaches approximately two
RH: : billion nodes, it begins to count down! It's just a minor bug, and
RH: : you'll probably only see it on a P6 (or faster machine).

RH: Really nothing he can do. I use unsigned int's in Crafty for this
RH: purpose, but that only gets you to 4 billion on a 32bit machine. Of
RH: course on the Cray, the Alpha, the HP PA8000, the MIPS R10000, and the
RH: Intel P7 this isn't a problem. :)

One thing that he could do is:
Each time the screen is updated, calculate the number of nodes
visited since last time and add this number to a 64 bit int.
It is all right for this to be slow as it is only done when
the screen is updated which is slow anyway.
You could do it in Crafty too.
I'd do it for you if you like, but my price is steep:
100 copies of Crafty, made and delivered by me.
I don't know if you can afford it.

Urban.K...@abc.se - e...@algonet.se

Stuart Cracraft

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

How long does it take your StrongARM to accumulate
the 3x10^10 nodes?

--Stuart

Stephen B Streater <ste...@surprise.demon.co.uk> wrote:

>In article <54apas$6...@juniper.cis.uab.edu>, Robert Hyatt
><URL:mailto:hy...@crafty.cis.uab.edu> wrote:
>>
>> Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
>> : In article <549b8b$k...@juniper.cis.uab.edu>, Robert Hyatt
>> : <URL:mailto:hy...@crafty.cis.uab.edu> wrote:

>> : > Really nothing he can do. I use unsigned int's in Crafty for this

>> : > purpose, but that only gets you to 4 billion on a 32bit machine. Of

>> : > course on the Cray, the Alpha, the HP PA8000, the MIPS R10000, and the

>> : > Intel P7 this isn't a problem. :)

Robert Hyatt

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

:
: >It was quite an achievement to write a program with even less positional

: >expertise than me :) but it always gets me on the tactics. It's very good
: >at wriggling out of duff situations as well, due to its exhaustive search.
:
: >I've got past 1,000,000 nps on some games now, though I have some more
: >improvements on the alpha-beta to do next.
:
: >PS You can tell what my computer does at night :-)
:
: >--
: >Stephen B Streater

Can you tell us more about your machine? 1M nodes per second typically
means you are doing around 2 billion instructions per second, since about
the least "instructions per node" I've seen is around 2K (Cray Blitz was
around 7,500 clock cycles [instructions] per node, although it did use
lots of cpus to multiply this speed significantly.). In any case, at that
kind of speed, you'd likely have the strongest microcomputer program that's
ever been seen. I see about 1/10th of that on a P6/200, maybe 300K on a
fast alpha, so that's an impressive number to say the least... Any plans
on playing on ICC or another server to show how it plays?

Bob

Stephen B Streater

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

In article <54c71n$o...@juniper.cis.uab.edu>, Robert Hyatt

<URL:mailto:hy...@crafty.cis.uab.edu> wrote:
>
> :
> : >It was quite an achievement to write a program with even less positional
> : >expertise than me :) but it always gets me on the tactics. It's very good
> : >at wriggling out of duff situations as well, due to its exhaustive search.
> :
> : >I've got past 1,000,000 nps on some games now, though I have some more
> : >improvements on the alpha-beta to do next.
> :
> : >PS You can tell what my computer does at night :-)

> Can you tell us more about your machine? 1M nodes per second typically

> means you are doing around 2 billion instructions per second, since about
> the least "instructions per node" I've seen is around 2K (Cray Blitz was
> around 7,500 clock cycles [instructions] per node, although it did use

> lots of cpus to multiply this speed significantly.)...

The machine is an Acorn Risc PC. They have just released a UKP 250 upgrade
from the 40MHz ARM710 to the new 202.4MHz StrongARM. This is a redesigned
ARM RISC processor - the redesign being by Digital Semiconductor to make use
of their Alpha production technology.

The StrongARM is built on the Alpha 0.35 micron production line, and because
of its small die size and simplicity allows Digital to sell it for USD 49 per
chip. Although the Acorn machine has relatively slow memory (my last 32MB simm
was UKP 100) and no L2 cache, my program runs almost exclusively in the on-chip
16kB instruction cache and 16kB write-back data cache.

Being a RISC chip, instructions take one clock cycle each, and you get a free
barrel shift on every instruction (ie 0 cycle extra time taken). The chip has
16 registers, 9 of which are used for the board: 4 bits per piece gives you
1 register per row, with an extra register to containing misc other board
information such as whether KRRkrr have moved and ply to continue searching
before the end of exhaustive search (4 bits) and whether you are in the
quiescent search (1 bit).

The chip also has all instructions conditional, rather than just the usual
branches. This means that branches (and hence instruction pipeline flushes)
are rare.

The current peak performance for any move is 1.3M nodes/sec, though it still
averages under 1M.

So here's a typical instruction:

ANDS R0, mask_1, row_to, LSL #(column_to*4-4) ; mask_1 = 0xF<<28

This puts the contents of square (row_to,column_to) in R0.
The optional S flag also sets EQ if the square is empty, GT for white,
LT for black, LE for not white etc.

So to generate pawn moves, which need an empty square in front of them, I use:
ANDS R0, mask_1, row_to, LSL #(column_to*4-4) ;mask_1 = 0xF<<28
ORREQ row_to, row_to, #(pawn+8-side*8)<<32-column_to*4 ; Do move
STMEQIA (stack)!, {R1-R9} ; Save new position
BICEQ row_to, row_to, #&F<<32-column_to*4 ; Restore position

The store multiple takes 1 cycle if it isn't executed, and 9 if it is.
R0 isn't used here, but a

LDRNE R0, [R14, R0, LSR #26] ; R14 points to table of piece values

would give you the value of the piece just taken in R0 in 1 cycle where needed
(R0=0 => empty square so value of 0 is still correct even though LDR doesn't
happen).

I almost forgot to mention that the program is very written out, so every
possible piece on every possible square has a special bit of code to generate
legal moves from that place - ie no checking for the edge of the board is
needed, for example.

> In any case, at that kind of speed, you'd likely have the strongest microcomputer
> program that's ever been seen.

That's the long term plan :-) However, I've concentrated only on the tactical
side so far - the positional stuff is going to wait until I exceed 1M nps
reliably (before 5-processor upgrade that is).

> I see about 1/10th of that on a P6/200, maybe 300K on a fast alpha, so that's
> an impressive number to say the least... Any plans on playing on ICC or
> another server to show how it plays?

I could do - you'd see what I meant by positional weakness rather quickly then.
Also, I've got a new sort algorithm for the alpha-beta, which I'm hoping will
add a couple of ply to the search depth (!) I'll keep you posted on how this goes.

PS how do you get onto ICC? It sounds like a good place to experiment.

--
Stephen B Streater

Robert Hyatt

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
:

: The machine is an Acorn Risc PC. They have just released a UKP 250 upgrade

: from the 40MHz ARM710 to the new 202.4MHz StrongARM. This is a redesigned
: ARM RISC processor - the redesign being by Digital Semiconductor to make use
: of their Alpha production technology.
:
: The StrongARM is built on the Alpha 0.35 micron production line, and because
: of its small die size and simplicity allows Digital to sell it for USD 49 per
: chip. Although the Acorn machine has relatively slow memory (my last 32MB simm
: was UKP 100) and no L2 cache, my program runs almost exclusively in the on-chip
: 16kB instruction cache and 16kB write-back data cache.
:
: Being a RISC chip, instructions take one clock cycle each, and you get a free
: barrel shift on every instruction (ie 0 cycle extra time taken). The chip has
: 16 registers, 9 of which are used for the board: 4 bits per piece gives you
: 1 register per row, with an extra register to containing misc other board
: information such as whether KRRkrr have moved and ply to continue searching
: before the end of exhaustive search (4 bits) and whether you are in the
: quiescent search (1 bit).
:
: The chip also has all instructions conditional, rather than just the usual
: branches. This means that branches (and hence instruction pipeline flushes)
: are rare.
:
: The current peak performance for any move is 1.3M nodes/sec, though it still
: averages under 1M.

:

You still have my curiosity way up there. :) 200M instructions per second
(you didn't say super-scalar so I assume single instruction issue?) gives
you 200 instructions per node? *very* fast, to say the least...

<snip>
:
: I almost forgot to mention that the program is very written out, so every

: possible piece on every possible square has a special bit of code to generate
: legal moves from that place - ie no checking for the edge of the board is
: needed, for example.

This sounds like "all the right moves", an idea used in HiTech's special-
purpose hardware. It does help since there are not really all that many
different moves. One question, however, would be "what about cache" since
you do have 64 squares with different code for each? Doesn't this blow out
a 16kb instruction cache pretty badly?

:
: > In any case, at that kind of speed, you'd likely have the strongest microcomputer

: > program that's ever been seen.
:
: That's the long term plan :-) However, I've concentrated only on the tactical
: side so far - the positional stuff is going to wait until I exceed 1M nps
: reliably (before 5-processor upgrade that is).
:
: > I see about 1/10th of that on a P6/200, maybe 300K on a fast alpha, so that's
: > an impressive number to say the least... Any plans on playing on ICC or
: > another server to show how it plays?
:
: I could do - you'd see what I meant by positional weakness rather quickly then.
: Also, I've got a new sort algorithm for the alpha-beta, which I'm hoping will
: add a couple of ply to the search depth (!) I'll keep you posted on how this goes.
:
: PS how do you get onto ICC? It sounds like a good place to experiment.
:
: --
: Stephen B Streater

:

telnet chessclub.com 5000
log in as guest (following the instructions). one place to visit is to
say +chan 64, which is where the computer chess types gather to have
discussions...

Bob

Stephen B Streater

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

In article <54db21$7...@juniper.cis.uab.edu>, Robert Hyatt

<URL:mailto:hy...@crafty.cis.uab.edu> wrote:
>
> Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

> : The current peak performance for any move is 1.3M nodes/sec, though it still
> : averages under 1M.
>

> You still have my curiosity way up there. :) 200M instructions per second
> (you didn't say super-scalar so I assume single instruction issue?) gives
> you 200 instructions per node? *very* fast, to say the least...

Thanks - I specialise in optimisation. I entered the first version into
the "First International Coputer Games Olympiad" or something like that
in London. I'm not any good at remembering names, but Chris rings a bell.

> : I almost forgot to mention that the program is very written out, so every

> : possible piece on every possible square has a special bit of code to generate
> : legal moves from that place - ie no checking for the edge of the board is
> : needed, for example.
>

> This sounds like "all the right moves", an idea used in HiTech's special-
> purpose hardware. It does help since there are not really all that many
> different moves. One question, however, would be "what about cache" since
> you do have 64 squares with different code for each? Doesn't this blow out
> a 16kb instruction cache pretty badly?

The original version, written for ARM2 @ 8MHz didn't have any cache, so this
didn't matter. The intermediate ARM3, ARM610 (4k cache) suffered quite badly,
and I reduced the number of instructions by about a factor of two to help these.
The ARM710 (8kB cache) is just big enough, though gets noticeably faster if
queens are exchanged :-)

Fortunately the 16kB+16kB caches on StrongARM are big enough. This is because
the pieces don't move very much in 12 ply, so most pieces are still cached
from last time. The depth first search has a habit at looking at similar
positions, so they cache "very" well. I don't know how well, but the
program is about 12 times faster than the ARM610 @30MHz, and this is with
no L2 cache and the same speed external RAM, so _that's_ how well. Also,
the caches are each 32-way set associative, so they get used very efficiently.

> : PS how do you get onto ICC? It sounds like a good place to experiment.

> telnet chessclub.com 5000
> log in as guest (following the instructions). one place to visit is to
> say +chan 64, which is where the computer chess types gather to have
> discussions...

--
Stephen B Streater

Chris Whittington

unread,

Oct 20, 1996, 3:00:00 AM10/20/96

to

Stephen B Streater <ste...@surprise.demon.co.uk> wrote:
>

> In article <54db21$7...@juniper.cis.uab.edu>, Robert Hyatt

> <URL:mailto:hy...@crafty.cis.uab.edu> wrote:
> >
> > Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
>

> > : The current peak performance for any move is 1.3M nodes/sec, though it still
> > : averages under 1M.
> >

> > You still have my curiosity way up there. :) 200M instructions per second
> > (you didn't say super-scalar so I assume single instruction issue?) gives
> > you 200 instructions per node? *very* fast, to say the least...
>
> Thanks - I specialise in optimisation. I entered the first version into
> the "First International Coputer Games Olympiad" or something like that
> in London. I'm not any good at remembering names, but Chris rings a bell.
>

Yes, hi !

I remember you from way back then.

If I remember right you had an Archimedes and entered with a
very primitive chess program; talked to the other programmers
betwwen rounds and added one search heuristic after another
between rounds !

I was deeply impressed by this ability to code at such a fantastic
rate.

Then never heard of you again, till now :)

Sounds fast this ARM chip.

But you'll need some positional ..... :)

Chris Whittington

> > : I almost forgot to mention that the program is very written out, so every

> > : possible piece on every possible square has a special bit of code to generate
> > : legal moves from that place - ie no checking for the edge of the board is
> > : needed, for example.
> >

Stephen B Streater

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

In article <84584439...@cpsoft.demon.co.uk>, Chris Whittington

<URL:mailto:chr...@cpsoft.demon.co.uk> wrote:
>
> Stephen B Streater <ste...@surprise.demon.co.uk> wrote:
> >

> > In article <54db21$7...@juniper.cis.uab.edu>, Robert Hyatt

> > <URL:mailto:hy...@crafty.cis.uab.edu> wrote:
> > >
> > > Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
> >

> > > : The current peak performance for any move is 1.3M nodes/sec, though it still
> > > : averages under 1M.
> > >

> > > You still have my curiosity way up there. :) 200M instructions per second
> > > (you didn't say super-scalar so I assume single instruction issue?) gives
> > > you 200 instructions per node? *very* fast, to say the least...
> >
> > Thanks - I specialise in optimisation. I entered the first version into
> > the "First International Coputer Games Olympiad" or something like that
> > in London. I'm not any good at remembering names, but Chris rings a bell.
> >
>
> Yes, hi !
>
> I remember you from way back then.
>
> If I remember right you had an Archimedes and entered with a
> very primitive chess program; talked to the other programmers
> betwwen rounds and added one search heuristic after another
> between rounds !
>
> I was deeply impressed by this ability to code at such a fantastic
> rate.
>
> Then never heard of you again, till now :)
>
> Sounds fast this ARM chip.
>
> But you'll need some positional ..... :)

Last time I got it all wrong, as I discovered :)

> Chris Whittington

Thanks for your help back then. Are there any other friendly
competitions to meet up these days?

--
Stephen B Streater

Ed Schröder

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to ste...@surprise.demon.co.uk

Stephen,

Is the strong ARM compatible with old ARM2 / ARM3 code?

What is the speed factor between the ARM2-8Mhz and the strong ARM?

Also what is the speed factor between the strong ARM and the modified
version of the ARM from Tasc for example the ARM2-32 Mhz with fast SRAM
also known as the ChessMachine?

How would you compare the strong ARM and the Pentium Pro 200 Mhz.

I spend years programming the ARM and I like the specific instructions the
ARM offers very much especially:
ADDEQ or MOVNE
Mutiple load / store register <> memory
In fact I still miss them all the time programming the PC.

- Ed Schroder =

Marcel van Kervinck

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Robert Hyatt (hy...@crafty.cis.uab.edu) wrote:

: You still have my curiosity way up there. :) 200M instructions per second

: (you didn't say super-scalar so I assume single instruction issue?) gives
: you 200 instructions per node? *very* fast, to say the least...

It's pretty fast, but I don't think it's undoable on the right
architecture. My program typically needs 800 instructions/node on a plain
68000 in the middle-game. This includes dynamic piece/square tables
(that is, recomputed in the tree when needed) and some sort of SEE
used in move ordening. No bitboards are involved, ofcourse, on this
architecture.

Still, Streater made me very curious about the ARM because I plan
to leave the 680x0 series this year and I still don't know where
to go. Alpha, PowerPC and Ultra Sparc are promising and offer good
perspectives for the future, and now ARM joins them.

Marcel
-- _ _
_| |_|_|
|_ |_ Marcel van Kervinck
|_| bue...@urc.tue.nl

Chris Whittington

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Stephen B Streater <ste...@surprise.demon.co.uk> wrote:
>
> In article <84584439...@cpsoft.demon.co.uk>, Chris Whittington
> <URL:mailto:chr...@cpsoft.demon.co.uk> wrote:
> >
> > Stephen B Streater <ste...@surprise.demon.co.uk> wrote:
> > >

> > > In article <54db21$7...@juniper.cis.uab.edu>, Robert Hyatt

> > > <URL:mailto:hy...@crafty.cis.uab.edu> wrote:
> > > >
> > > > Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
> > >

> > > > : The current peak performance for any move is 1.3M nodes/sec, though it still
> > > > : averages under 1M.
> > > >

> > > > You still have my curiosity way up there. :) 200M instructions per second
> > > > (you didn't say super-scalar so I assume single instruction issue?) gives
> > > > you 200 instructions per node? *very* fast, to say the least...
> > >

> > > Thanks - I specialise in optimisation. I entered the first version into
> > > the "First International Coputer Games Olympiad" or something like that
> > > in London. I'm not any good at remembering names, but Chris rings a bell.
> > >
> >
> > Yes, hi !
> >
> > I remember you from way back then.
> >
> > If I remember right you had an Archimedes and entered with a
> > very primitive chess program; talked to the other programmers
> > betwwen rounds and added one search heuristic after another
> > between rounds !
> >
> > I was deeply impressed by this ability to code at such a fantastic
> > rate.
> >
> > Then never heard of you again, till now :)
> >
> > Sounds fast this ARM chip.
> >
> > But you'll need some positional ..... :)
>
> Last time I got it all wrong, as I discovered :)
>
> > Chris Whittington
>
> Thanks for your help back then. Are there any other friendly
> competitions to meet up these days?
>

There are competitions, but whether you'ld call them friendly is
another matter :)

I think there is a Dutch competition in Leiden shortly. (email
to Vincent Diepeveen, he knows about it).

There's been nothing in England for some time, and nothing
planned as far as I know.

Chris Whittington

> --
> Stephen B Streater
>

Stephen B Streater

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <54gtne$9...@news.xs4all.nl>, Ed Schr=F6der
<URL:mailto:rebc...@xs4all.nl> wrote:
>=20
> Stephen,
>=20

> Is the strong ARM compatible with old ARM2 / ARM3 code?

Yes - except for the following differences:

STRs don't update the instructon cache, so you self-modifying code
needs some care

The pipline is longer, and STR PC, [Rn] stores PC value 4 out from before.

Also, STR/LDR take one cycle now, instead of 4-6 on original ARM 2,
and B takes 2 cycles instead of 3, multiply does 12 bits/cycle instead
of 2. Also there are now 16 bit LDR and STR instructions.

> What is the speed factor between the ARM2-8Mhz and the strong ARM?

This depends on what you do. Multiply is 202.4/8*12/2 times faster =3D 150x
faster than on 8 Mhz ARM2. LDR (cached) is 202.4/1/8*5 times faster =3D 126x.
On the other hand, ADD is 202.4/8 times faster =3D only 25x faster.

BTW, BBC basic is 1,500 times faster than running on a 1MHz 6502.

> Also what is the speed factor between the strong ARM and the modified=20
> version of the ARM from Tasc for example the ARM2-32 Mhz with fast SRAM=20

> also known as the ChessMachine?

With SRAM and faster clock rate, the cycle times get smaller -=20
MUL is now 202.4/32*12/2 times faster =3D 38x
LDR is now 202.4/1/32*3 times faster =3D 19x
ADD is now 202.4/32 =3D times faster =3D 6.3x

> How would you compare the strong ARM and the Pentium Pro 200 Mhz.

The StrongARM is much nicer. Also, at $49, it's much cheaper.
Somehow, the integer performance seems to be comparable. The PP200
has multiple pipelines and L2 cache, but for chess, I think the SA
is faster as it has conditional instructions and free barrel shifter
to get at the insides of registers, and a much better L1 cache.

The only thing the SA doesn't have is floating point.

> I spend years programming the ARM and I like the specific instructions the=20

> ARM offers very much especially:
> ADDEQ or MOVNE
> Mutiple load / store register <> memory
> In fact I still miss them all the time programming the PC.

I tried PC programming for a bit as well, and found it quite frustrating.
Perhaps you could port your programs to the new Newton, or the new Psion
palmtop - of even the new StrongARM Acorn machine :-)

--=20
Stephen B Streater

Tord Kallqvist Romstad

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

: The StrongARM is much nicer. Also, at $49, it's much cheaper.

What is the price of a complete computer with this processor?
Which operating systems are available?

Tord

Stephen B Streater

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <54i5ul$8...@maud.ifi.uio.no>, Tord Kallqvist Romstad

<URL:mailto:tor...@ifi.uio.no> wrote:
>
> Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
> : The StrongARM is much nicer. Also, at $49, it's much cheaper.
>
> What is the price of a complete computer with this processor?

I never know the prices because we buy them for resale with
our software video editing systems, so get a discount. They
must be around UKP 1,200 with an ARM610 - and if you buy
before 31.12.96 you get the StrongARM upgrade for UKP99.

> Which operating systems are available?

They come with RISC OS. Unix is available, but not widely
used, and if you buy a 486 or 586 upgrade(?) card, you
can run both the StrongARM and the 586 at the same time,
with RISC OS on the SA and Windows 95 on the 586. They
share the same RAM/screen (W95 runs in a Window), discs,
ethernet cards (though they have different ip addresses
so programs don't get confused). When you are only using
one, all the RAM is available for that.

The Acorn World show starts on November 1st in London -
you are welcome to come if you are interested - my company
has a stand there :-)

PS You may like to look at some of the following newsgroups
for more information:

comp.sys.acorn.advocacy Why Acorn computers and programs are better
comp.sys.acorn.announce Announcements for Acorn and ARM users (Moderated)
comp.sys.acorn.apps Acorn software applications.
comp.sys.acorn.extra-cpu Extra CPUs in Acorn computers.
comp.sys.acorn.games Discussion of games for Acorn machines
comp.sys.acorn.hardware Acorn hardware.
comp.sys.acorn.misc Acorn computing in general.
comp.sys.acorn.networking Networking of Acorn computers.
comp.sys.acorn.programmer Programming of Acorn computers.

Also comp.sys.arm about the ARM chips (including StrongARM).

--
Stephen B Streater

Komputer Korner

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Stephen B Streater wrote:
>snipped

Sounds like dedicated chess machines could make a comeback especially
since they don't use floating point. Do you think that your company
could produce a dedicated chess machine cheaper than one of the
multi purpose Strong Arm machines that you quoted?
--
Komputer Korner
The komputer that couldn't kompute the square root of
36^n.

Stephen B Streater

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

In article <326D9E...@netcom.ca>, Komputer Korner

<URL:mailto:kor...@netcom.ca> wrote:
>
> Stephen B Streater wrote:
> >snipped
>
> Sounds like dedicated chess machines could make a comeback especially
> since they don't use floating point. Do you think that your company
> could produce a dedicated chess machine cheaper than one of the
> multi purpose Strong Arm machines that you quoted?

We could do - our main work is in video compression technology and
video games. We are planning to make some gadgets using the StrongARM
as an embedded processor, but we know how to make a dedicated chess
computer around SA.

As Chris knows - I need to put some more work into the positional
side before it is a viable product.

Here is my current list of priorities:

1) Increase quiescent search depth (now max 14 ply) up to 18 ply.

2) Optimise code a bit more 1,000,000 nps should be minimum speed :-)

3) Make program multitask and run in a window - it still has 1988?
non-wimp interface in 320x256x2bpp mode.

4) Potentially contaversial :)
Make it look through previous positions it has been in at ever deeper depths
to create its own (killer?) book when not actually playing. The idea is that
any position it has seen before will give an instant move with deeper result
than possible otherwise. Why wait around if it already has calculated the
answer? Also, each time you play it, it will be better, as it will have been
thinking about the positions for billions of nodes. If you go on holiday for
a couple of weeks, you'll come back with 1 trillion nodes under its belt :)

5) Increase max exhaustive search depth to 15 or 16 ply from 12 ply - 12 ply
doesn't take long enough to make (4) worthwhile.

6) Improve sort function for alpha-beta (currently sorts in order of piece
value on the board, which means all the first things to be checked are duff
captures). I expect this to halve the number of nodes for each extra move
eg [wishful thinking mode] 64 times faster to 12 ply [/wtm].

7) Put better positional function in - I'll ask for help from rgcc here :-)

8) Publish in an Acorn magazine - currently scheduled for Risc User.
At that stage program (currently up to C_898h) would become PD.

9) Port to five processor StrongARM upgrade. This should be 5 times faster
if my theories are correct. Then you'd only have to take a weekend off
to get 1 trillion nodes :-)

?) Play it on ICC. Unfortunately I can't run any of the graphical interfaces
on my machine, but more importantly have to get to stage 3 first so I can
run my internet link at the same time as the chess program. Following
comments on here, I may skimp a bit on stage 2 so I can get to stage 3
sooner.

--
Stephen B Streater

Simon Read

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Come to think of it, wasn't Deep Blue getting 30,000,000,000
nodes examined for a single move? (give or take ten billion or so!)
That means you can check out one of Deep Blue's moves overnight.
You won't have the search fragmentation that Deep Blue had with
its many processors, which is an advantage for you, but then
you won't have the selective extensions that Deep Blue has.
Have you tried investigating the Bxh7 possibility in game 6 ?

Of course, Deep Blue will now be lots faster, since they're going to
plug in more chips and turn the mains voltage up from 110V at 60Hz to
1,100V at 600Hz. Then they're going to pour in lots of oil and
watch it fry. Add a little vinegar and Pouffe! Silicon flambe!

Simon

Robert Hyatt

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Simon Read (s.r...@cranfield.ac.uk) wrote:
: Come to think of it, wasn't Deep Blue getting 30,000,000,000

:

Don't forget, too, that DB's single chip is way over a million nodes per
second....

And they have 1024 of 'em now I hear...

Simon Read

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

crac...@ix.netcom.com (Stuart Cracraft) wrote:
>How long does it take your StrongARM to accumulate
>the 3x10^10 nodes?
>
>--Stuart

He says he's almost getting 1,000,000 nodes per second. That makes
3 x 10^4 seconds, which is eight and a half hours.

I note his comment:

> PS You can tell what my computer does at night :-)

which might be a bit of a hint as to when the 8 1/2 hours occur.

Stephen, confirmation?

Simon

Stephen B Streater

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article <326e4...@news.cranfield.ac.uk>, Simon Read

Spot on.

--
Stephen B Streater

Stephen B Streater

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article <326e7...@news.cranfield.ac.uk>, Simon Read

<URL:mailto:s.r...@cranfield.ac.uk> wrote:
>
> Come to think of it, wasn't Deep Blue getting 30,000,000,000
> nodes examined for a single move? (give or take ten billion or so!)
> That means you can check out one of Deep Blue's moves overnight.
> You won't have the search fragmentation that Deep Blue had with

> its many processors..

I don't believe you need to get this anyway in alpha-beta. One of the
"features" I have is a alpha-beta windowing, so it calculates the score
at all moves from 3 to n incrementally. It has a good idea of what score
to look for at deeper ply. This means that you can search all ply 1
moves at the same time (for example), using the window based on the
previous ply actual score.

This gives you an approx linear speed up provided you have more possible
moves than processors - unfortunately the case for me, even though
StrongARM die is only 7mm by 7mm in size and they make lots of them on a
single wafer; but they insist on cutting up the wafers and selling the
SAs individually :(

--
Stephen B Streater

Robert Hyatt

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

: In article <326e7...@news.cranfield.ac.uk>, Simon Read

:

No it doesn't. If you count nodes for each root move, you'll note that
the first move searched wil include nearly all of the nodes you search.

I see searches all the time where the first move takes a minute, the remaining
35 moves take 10 seconds total. If you spread this over N processors, you
get almost no speedup. This has been tried many times. Cray Blitz version
1983 for example. With no null move, it got a speedup of about 2.5 with 16
processors when we tried it again in 1985 (we had gone to PVS parallel search
by then which does somewhat better.)

Sorry, but your math doesn't hold up, although I certainly wish it did. :)

Bob

Jesper Antonsson

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article <54nlo2$g...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu (Robert Hyatt) wrote:
>I see searches all the time where the first move takes a minute, the remaining
>35 moves take 10 seconds total. If you spread this over N processors, you
>get almost no speedup. This has been tried many times. Cray Blitz version
>1983 for example. With no null move, it got a speedup of about 2.5 with 16
>processors when we tried it again in 1985 (we had gone to PVS parallel search
>by then which does somewhat better.)
>
>Sorry, but your math doesn't hold up, although I certainly wish it did. :)
>
>Bob

Hi Bob!

How is PVS done? I had an idea about how to parallellize alpha/beta. Assume
that you have 36 processors, and 36 possible moves per position.

Now, you start by examining the first reply to each root move. This you have
to do anyway, and you can't alpha/beta-cutoff these moves, right? When the
searches has terminated, you start examining the other replies to move 1
simultaneously. (And you hopefully have a good value from reply 1 to do
cutoffs
with.) When this is done, the first move is done, and you have searched all
first
replies to the other root-moves. Now iterate over the other root-moves.
If you can do an instant cut-off, do it. If not, start searching the other
replies
simultaneously. Then you are done with that root-move, and either discard it,
or
choose this as new best-move. Then you iterate in the same way over the rest
of the moves.

Is this a possible algorithm? This should generalize pretty easily, and give a
good gain.

Regards,

Jesper

Simon Read

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

ES: article <54gtne$9...@news.xs4all.nl>, Ed Schro"der
ES> How would you compare the strong ARM and the Pentium Pro 200 Mhz.
-->
SB: Stephen B Streater <ste...@surprise.demon.co.uk>
SB> The StrongARM is much nicer. Also, at $49, it's much cheaper.
-->
Also, it only dissipates half a Watt running at full speed.

I have to agree with the "much nicer." Some processors set flags
when you don't want them set, some processors don't set them when
you want them set. With the ARM (and StrongARM), on almost every
single instruction, there's a bit in the opcode field to TELL the
processor what you would like it to do: set the flags or not. Very
nice, very logical. Having used it I don't know how I ever lived
without it.

Some processors have address registers, data registers, Registers
Which Must Be Used For Loops, etc. The ARM just has registers.
Even the PC is number 15, a register like the rest. If you really
really really want to right-shift the PC a few bits, you can.
You'll crash, but that's your fault!!

Every single instruction is conditional. There are no exceptions
to this. This is very powerful for avoiding labels and jumps if
you only have one or two things to do. This really speeds up lots
of those little operations.

Now consider the conditional instruction and the "choose whether or
not to set flags" part with the compare instruction, and you have
a VERY powerful tool: conditional compare instructions. The CMP
instruction is only executed if your chosen condition is met. A
small combination of these can lead to useful combinations being
expressed in a single flag without any need for a sequence of
branches. Do you want to set a flag if a number is within a certain
range? That can be done in two instructions. How about "an odd number
within a certain range"? That can be done in three instructions:
a compare, a conditional compare, then a conditional shift-and-set-carry.
There are loads of useful things you can do with this without
having to create little branches.

Simon

Stephen B Streater

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article <54nlo2$g...@juniper.cis.uab.edu>, Robert Hyatt

<URL:mailto:hy...@crafty.cis.uab.edu> wrote:
>
> Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

> : I don't believe you need to get this anyway in alpha-beta. One of the
> : "features" I have is a alpha-beta windowing, so it calculates the score
> : at all moves from 3 to n incrementally. It has a good idea of what score
> : to look for at deeper ply. This means that you can search all ply 1
> : moves at the same time (for example), using the window based on the
> : previous ply actual score.
> :
> : This gives you an approx linear speed up provided you have more possible
> : moves than processors - unfortunately the case for me, even though
> : StrongARM die is only 7mm by 7mm in size and they make lots of them on a
> : single wafer; but they insist on cutting up the wafers and selling the
> : SAs individually :(

> No it doesn't. If you count nodes for each root move, you'll note that

> the first move searched wil include nearly all of the nodes you search.
>

> I see searches all the time where the first move takes a minute, the remaining
> 35 moves take 10 seconds total. If you spread this over N processors, you
> get almost no speedup. This has been tried many times. Cray Blitz version
> 1983 for example. With no null move, it got a speedup of about 2.5 with 16
> processors when we tried it again in 1985 (we had gone to PVS parallel search
> by then which does somewhat better.)
>
> Sorry, but your math doesn't hold up, although I certainly wish it did. :)

But I have a trick up my sleeve...

How big was your alpha-beta window on your first move? If the first move took
so long, it can only mean you are using it to calculate a narrow window for
alpha/beta - which I already know from the search to the previous depth. If
you start with the "correct" value with no tolerance, you get to find out if
the value is out of range very quickly on your first move, and whether it is
too small or too big, and all moves take about equal time.

I don't know if I have explained this well, but basically the difference in
time is a result of your alpha-beta range being bigger on your first move.
So start with a range of 0, and all moves will be very fast and the same time.

--
Stephen B Streater

Robert Hyatt

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

: In article <54nlo2$g...@juniper.cis.uab.edu>, Robert Hyatt

:

my alpha/beta window is almost zero. Remember that I'm using (and used
in Cray Blitz) the negascout algorithm. As a result, hardly any nodes
are searched with anything other than a zero-width window (that is, beta=
alpha+1, so that *no* scores fit inside...

however, all moves can't take equal time. Best reference I can think of
is to dig up a subscription to Parallel Computing (at your local university
library if you are lucky) and find the paper I wrote in Jan of 1988 I
believe, called "Parallel Alpha/Beta Tree Search", authors were Hyatt,
Suter and Nelson. In it I explain exactly how many nodes are in the first
branch and the remaining branches for a perfectly-ordered tree. The first
branch is close to 50% of the total nodes searched, because down the left-
hand side of the tree, every node has to have *every* successor searched.

The range doesn't matter at all, because with a perfect move ordering, you
still will find that the first branch has 50% of the nodes, and with ordering
perfect you can't search more efficiently regardless of the window you use.
Certainly programs don't have perfect ordering, but you should simply count
nodes below each root node in your tree. You'll be amazed...

Jim G.

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

I've just been trying to educate myself about this subject. Regarding
possible speed-up through parallel processing, is this data from the
Caltech Concurrent Computation Program relevant and timely?

Data at:
http://www.npac.syr.edu/copywrite/pcw/node354.html#SECTION001635000000000000000

(That's one long URL!)

-- Jim G.

===
'hy...@crafty.cis.uab.edu (Robert Hyatt)' wrote:
>Jesper Antonsson (d93j...@und.ida.liu.se) wrote:
>: In article <54nlo2$g...@juniper.cis.uab.edu>, hy...@crafty.cis.uab.edu
(Robert

>Hyatt) wrote:
>: >I see searches all the time where the first move takes a minute, the
>remaining
>: >35 moves take 10 seconds total. If you spread this over N processors,
you
>: >get almost no speedup. This has been tried many times. Cray Blitz
version
>: >1983 for example. With no null move, it got a speedup of about 2.5
with 16
>: >processors when we tried it again in 1985 (we had gone to PVS parallel

>search
>: >by then which does somewhat better.)
>: >
>: >Sorry, but your math doesn't hold up, although I certainly wish it did.
:)
>: >

>: >Bob

>:
>: Hi Bob!
>:
>: How is PVS done? I had an idea about how to parallellize alpha/beta.
Assume
>: that you have 36 processors, and 36 possible moves per position.
>

>The basic idea ( assuming you are doing a search to depth=N ) is to
>follow the left-most path to ply=N, then split the work up there and
>search the branches in parallel. This will give you a score for the
>ply=N node. Then you back this up one ply to N-1, and search the
>remaining moves in parallel (the first has already been searched and
>you have the alpha value for it). repeat this until you back up to
>the root.
>
>This can give you maybe a factor of 4-5 with 8 processors, but not
>much better with 16 or even 32 processors. To go beyond the factor
>of 4-5, you have to drop PVS and go to something much more complicated
>(in Cray Blitz I used what I called dynamic tree splitting, which seems
>to be similar to "young brothers wait" or other suce algorithms. Get
>ready to spend a lot of time tuning and debugging. :)
>
>
>:
>: Now, you start by examining the first reply to each root move. This you

have
>
>: to do anyway, and you can't alpha/beta-cutoff these moves, right? When
the
>: searches has terminated, you start examining the other replies to move 1

>: simultaneously. (And you hopefully have a good value from reply 1 to do

Robert Hyatt

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

Jan Eric Larsson

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote:
>Don't forget, too, that DB's single chip is way over a million nodes per
>second....

What is the nps for a DB single chip? I have seen/heard 8M, 6M,
and other things too. Does anyone have a realiable answer?

--
Jan Eric Larsson
Institutionen för Informationsteknologi, Lunds Tekniska Högskola
Box 118, 221 00 Lund, E-mail: Jan...@IT.LTH.Se
Tel: 046-222 7523, Sek: 046-222 7520, Fax: 046-222 4717

Urban Koistinen

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

Simon Read (s.r...@cranfield.ac.uk) wrote:
: ES: article <54gtne$9...@news.xs4all.nl>, Ed Schro"der

: ES> How would you compare the strong ARM and the Pentium Pro 200 Mhz.
: -->
: SB: Stephen B Streater <ste...@surprise.demon.co.uk>
: SB> The StrongARM is much nicer. Also, at $49, it's much cheaper.
: -->
: Also, it only dissipates half a Watt running at full speed.

Quite lovely.

How much support circuits does it need?
How much would a PCI (or any other common PC-bus) card with
the StrongARM and 1Mbyte of DRAM cost?
Or, how about a minimal computer with StrongARM, 1Mbyte of DRAM
and a serial line?
Would it be possible to get it down to USD 100?
(Isn't it so now that DRAM is almost as fast as SRAM when you
read or write a whole block? Are they not as fast when you can
schedule the read ahead of time?) Is that true for EDO RAM?
If you build a full PC based on the StrongARM I don't see how
it would get much cheaper. Disk drives, power supplies and
boards&memory make up the greater part of the hardware cost
of a common PC. Say 30% for CPU.

Robert Hyatt

unread,

Oct 27, 1996, 2:00:00 AM10/27/96

to

Jan Eric Larsson (jan...@dit.lth.se) wrote:

The original deep thought was 700K nodes per second per processor. Hsu
thought he could hit 5M with a re-design, but I've been seeing numbers
that seem to suggest that he didn't get as much of a performance boost
as he hoped, likely because he was trying to correct some design problems
such as the special chips couldn't detect draw by repetition. Recent
stories suggest around 1M nodes per second per processor, because they
are supposedly looking at 1024 for next year's match, while they were
using 256 for the last match which seemed to be producing 250M nodes per
second.

All speculation of course... except for the speed of the original chip.

Stephen B Streater

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <54rc01$n...@oden.abc.se>, Urban Koistinen

<URL:mailto:m10...@abc.se> wrote:
>
> Simon Read (s.r...@cranfield.ac.uk) wrote:
> : ES: article <54gtne$9...@news.xs4all.nl>, Ed Schro"der
> : ES> How would you compare the strong ARM and the Pentium Pro 200 Mhz.
> : -->
> : SB: Stephen B Streater <ste...@surprise.demon.co.uk>
> : SB> The StrongARM is much nicer. Also, at $49, it's much cheaper.
> : -->
> : Also, it only dissipates half a Watt running at full speed.
>
> Quite lovely.
>
> How much support circuits does it need?

Good question. The place to start looking is http://www.arm.com -
I know more about the software than the hardware.

> If you build a full PC based on the StrongARM I don't see how
> it would get much cheaper.

They may make network computers out of it - then you don't even
need the disc drive :-)

--
Stephen B Streater

Simon Read

unread,

Oct 31, 1996, 3:00:00 AM10/31/96

to

I think probably comp.sys.arm or comp.sys.acorn.misc would give you
a lot more information on the ARM and StrongARM than I could.

Simon

Urban Koistinen

unread,

Oct 31, 1996, 3:00:00 AM10/31/96

to

Stephen B Streater (ste...@surprise.demon.co.uk) wrote:

SB: In article <54rc01$n...@oden.abc.se>, Urban Koistinen
SB: <URL:mailto:m10...@abc.se> wrote:
SB: >
SB: > Simon Read (s.r...@cranfield.ac.uk) wrote:
SB: > : ES: article <54gtne$9...@news.xs4all.nl>, Ed Schro"der
SB: > : ES> How would you compare the strong ARM and the Pentium Pro 200 Mhz.
SB: > : -->
SB: > : SB: Stephen B Streater <ste...@surprise.demon.co.uk>
SB: > : SB> The StrongARM is much nicer. Also, at $49, it's much cheaper.
SB: > : -->
SB: > : Also, it only dissipates half a Watt running at full speed.
SB: >
SB: > Quite lovely.
SB: >
SB: > How much support circuits does it need?

SB: Good question. The place to start looking is http://www.arm.com -
SB: I know more about the software than the hardware.

I checked, seems like ARM chips come in all kinds of flavours,
including single chip PDA.

SB: > If you build a full PC based on the StrongARM I don't see how
SB: > it would get much cheaper.

SB: They may make network computers out of it - then you don't even
SB: need the disc drive :-)

Yes, or how about making network CPUs out of it.
A single chip with Firewire for all IO and power.

In volume that would not have to cost more than USD 100/CPU in
a box with connectors.

Urban.K...@abc.se - e...@algonet.se

Stephen B Streater

unread,

Nov 4, 1996, 3:00:00 AM11/4/96

to

In article <559nb5$m...@oden.abc.se>, Urban Koistinen

<URL:mailto:m10...@abc.se> wrote:
>
> Stephen B Streater (ste...@surprise.demon.co.uk) wrote:
> SB: In article <54rc01$n...@oden.abc.se>, Urban Koistinen
> SB: <URL:mailto:m10...@abc.se> wrote:
> SB: >
> SB: > Simon Read (s.r...@cranfield.ac.uk) wrote:
> SB: > : ES: article <54gtne$9...@news.xs4all.nl>, Ed Schro"der
> SB: > : ES> How would you compare the strong ARM and the Pentium Pro 200 Mhz.
> SB: > : -->
> SB: > : SB: Stephen B Streater <ste...@surprise.demon.co.uk>
> SB: > : SB> The StrongARM is much nicer. Also, at $49, it's much cheaper.
> SB: > : -->
> SB: > : Also, it only dissipates half a Watt running at full speed.
> SB: >
> SB: > Quite lovely.
> SB: >
> SB: > How much support circuits does it need?
>
> SB: Good question. The place to start looking is http://www.arm.com -
> SB: I know more about the software than the hardware.
>
> I checked, seems like ARM chips come in all kinds of flavours,
> including single chip PDA.

> Yes, or how about making network CPUs out of it.

> A single chip with Firewire for all IO and power.
>
> In volume that would not have to cost more than USD 100/CPU in
> a box with connectors.

I'd buy one :-)

--
Stephen B Streater